::p_load(ggdist, ggridges, ggthemes,
pacman colorspace, tidyverse)
Hands-on Exercise 4a: Visualising Distribution
1 Getting Started
In this exercise, we will use the following our R packages.
tidyverse, a family of R packages for data science process,
ggridges, a ggplot2 extension specially designed for plotting ridgeline plots, and
ggdist for visualizing distribution and uncertainty.
The code chunk below uses p_load()
of pacman package to check if these packages are installed in the computer and load them onto your working R environment.
The code chunk below imports exam_data.csv into R environment by using read_csv()
function of readr package.
<- read.csv("data/Exam_data.csv") exam
The code chunk below uses summary()
to summarize the data.
summary(exam)
ID CLASS GENDER RACE
Length:322 Length:322 Length:322 Length:322
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
ENGLISH MATHS SCIENCE
Min. :21.00 Min. : 9.00 Min. :15.00
1st Qu.:59.00 1st Qu.:58.00 1st Qu.:49.25
Median :70.00 Median :74.00 Median :65.00
Mean :67.18 Mean :69.33 Mean :61.16
3rd Qu.:78.00 3rd Qu.:85.00 3rd Qu.:74.75
Max. :96.00 Max. :99.00 Max. :96.00
2 Visualizing Distribution with Ridgeline Plot
Ridgeline plot (sometimes called Joyplot) is a data visualization technique for revealing the distribution of a numeric value for several groups. Distribution is shown in histograms or density plots and all are aligned to the same horizontal scale and presented with a slight overlap.
2.1 Plotting ridgeline graph: ggridges method
In this section, we will plot ridgeline plot by using ggridges package.
There two main geom to plot ridgeline plots:
The below plot is using geom_density_ridges()
ggplot(exam,
aes(x = ENGLISH,
y = CLASS)) +
geom_density_ridges(
scale = 3,
rel_min_height = 0.01,
bandwidth = 3.4,
fill = lighten("pink", .5),
color = "black"
+
) scale_x_continuous(
name = "English grades",
expand = c(0, 0)
+
) scale_y_discrete(name = NULL, expand = expansion(add = c(0.2, 2.6))) +
theme_ridges()
2.2 Varying fill colors along the x axis
Sometimes we would like to have the area under a ridgeline filled with colors that vary in some form along the x axis. This can be done using either geom_ridgeline_gradient() or geom_density_ridges_gradient(). Both geoms work like geom_ridgeline()
and geom_density_ridges()
, except that they allow for varying fill colors. However, they do not allow for alpha transparency in the fill.
ggplot(exam,
aes(x = ENGLISH,
y = CLASS,
fill = stat(x))) +
geom_density_ridges_gradient(
scale = 3,
rel_min_height = 0.01) +
scale_fill_viridis_c(name = "Temp. [F]",
option = "F") +
scale_x_continuous(
name = "English grades",
expand = c(0, 0)
+
) scale_y_discrete(name = NULL, expand = expansion(add = c(0.2, 2.6))) +
theme_ridges()
2.3 Mapping the probabilities directly onto color
gridges package also provides a stat function called stat_density_ridges() that replaces stat_density() of ggplot2.
The below figure is plotted by mapping the probabilities calculated by using stat(ecdf)
which represent the empirical cumulative density function for the distribution of English score.
ggplot(exam,
aes(x = ENGLISH,
y = CLASS,
fill = 0.5 - abs(0.5-stat(ecdf)))) +
stat_density_ridges(geom = "density_ridges_gradient",
calc_ecdf = TRUE) +
scale_fill_viridis_c(name = "Tail probability",
direction = 1) +
theme_ridges()
It is important include the argument calc_ecdf = TRUE
in stat_density_ridges()
.
2.4 Ridgeline plots with quantile lines
By using geom_density_ridges_gradient()
, we can color the ridgeline plot by quantile, via the calculated stat(quantile)
aesthetic as shown in the figure below.
ggplot(exam,
aes(x = ENGLISH,
y = CLASS,
fill = factor(stat(quantile))
+
)) stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = TRUE,
quantiles = 4,
quantile_lines = TRUE) +
scale_fill_viridis_d(name = "Quartiles") +
theme_ridges()
Instead of using number to define the quantiles, we can also specify quantiles by cut points such as 2.5% and 97.5% tails to colour the ridgeline plot as shown in the figure below.
ggplot(exam,
aes(x = ENGLISH,
y = CLASS,
fill = factor(stat(quantile))
+
)) stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = TRUE,
quantiles = c(0.025, 0.975)
+
) scale_fill_manual(
name = "Probability",
values = c("red", "#A0A0A0A0", "purple"),
labels = c("(0, 0.025]", "(0.025, 0.975]", "(0.975, 1]")
+
) theme_ridges()
3 Visualizing Distribution with Raincloud Plot
Raincloud Plot is a data visualisation techniques that produces a half-density to a distribution plot. The raincloud (half-density) plot enhances the traditional box-plot by highlighting multiple modalities (an indicator that groups may exist). The boxplot does not show where densities are clustered, but the raincloud plot does!
In this section, We will create a raincloud plot by using functions provided by ggdist and ggplot2 packages.
3.1 Plotting a Half Eye graph
We will plot a Half-Eye graph by using stat_halfeye() of ggdist package.
ggplot(exam,
aes(x = RACE,
y = ENGLISH)) +
stat_halfeye(adjust = 0.5,
justification = -0.2,
.width = 0,
point_color = NA,
fill = "lightblue")
We remove the slab interval by setting .width = 0 and point_color = NA.
3.2 Adding the boxplot with geom_boxplot()
To add the second geometry layer, we will use geom_boxplot()
of ggplot2. This produces a narrow boxplot. We reduce the width and adjust the opacity.
ggplot(exam,
aes(x = RACE,
y = ENGLISH)) +
stat_halfeye(adjust = 0.5,
justification = -0.2,
.width = 0,
point_colour = NA,
fill = "lightblue") +
geom_boxplot(width = .20,
outlier.shape = NA)
3.3 Adding the Dot Plots with stat_dots()
To add the third geometry layer, we will use stat_dots() of ggdist package. This produces a half-dotplot, which is similar to a histogram that indicates the number of samples (number of dots) in each bin. We select side = “left” to indicate we want it on the left-hand side.
ggplot(exam,
aes(x = RACE,
y = ENGLISH)) +
stat_halfeye(adjust = 0.5,
justification = -0.2,
.width = 0,
point_colour = NA,
fill = "lightblue") +
geom_boxplot(width = .20,
outlier.shape = NA) +
stat_dots(side = "left",
justification = 1.2,
binwidth = .5,
dotsize = 2)
3.4 Finishing touch
coord_flip() of ggplot2 package will be used to flip the raincloud chart horizontally to give it the raincloud appearance. At the same time, theme_economist_white()
of ggthemes package is used to give the raincloud chart a professional publishing standard look.
ggplot(exam,
aes(x = RACE,
y = ENGLISH)) +
stat_halfeye(adjust = 0.5,
justification = -0.2,
.width = 0,
point_colour = NA,
fill = "lightblue") +
geom_boxplot(width = .20,
outlier.shape = NA) +
stat_dots(side = "left",
justification = 1.2,
binwidth = .5,
dotsize = 2) +
coord_flip() +
theme_economist_white()