Skip to content

Commit

Permalink
Adjust multi-column plots (hadley#1354)
Browse files Browse the repository at this point in the history
Co-authored-by: mine-cetinkaya-rundel <[email protected]>
  • Loading branch information
hadley and mine-cetinkaya-rundel authored Mar 10, 2023
1 parent 86efe55 commit ac74f98
Show file tree
Hide file tree
Showing 7 changed files with 172 additions and 133 deletions.
3 changes: 1 addition & 2 deletions EDA.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,6 @@ But maybe that's because frequency polygons are a little hard to interpret - the
A visually simpler plot for exploring this relationship is using side-by-side boxplots.

```{r}
#| fig-height: 3
#| fig-alt: >
#| Side-by-side boxplots of prices of diamonds by cut. The distribution of
#| prices is right skewed for each cut (Fair, Good, Very Good, Premium, and
Expand Down Expand Up @@ -417,7 +416,6 @@ ggplot(mpg, aes(x = class, y = hwy)) +
To make the trend easier to see, we can reorder `class` based on the median value of `hwy`:

```{r}
#| fig-height: 3
#| fig-alt: >
#| Side-by-side boxplots of highway mileages of cars by class. Classes are
#| on the x-axis and ordered by increasing median highway mileage (pickup,
Expand Down Expand Up @@ -567,6 +565,7 @@ You will need to install the hexbin package to use `geom_hex()`.

```{r}
#| layout-ncol: 2
#| fig-width: 3
#| fig-alt: >
#| Plot 1: A binned density plot of price vs. carat. Plot 2: A hexagonal bin
#| plot of price vs. carat. Both plots show that the highest density of
Expand Down
5 changes: 5 additions & 0 deletions base-R.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -518,9 +518,14 @@ Here's a quick example from the diamonds dataset:

```{r}
#| dev: png
#| fig-width: 3
#| fig-asp: 1
#| layout-ncol: 2
# Left
hist(diamonds$carat)
# Right
plot(diamonds$carat, diamonds$price)
```

Expand Down
116 changes: 71 additions & 45 deletions communication.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -383,22 +383,23 @@ Note that `breaks` is in the original scale of the data.

```{r}
#| layout-ncol: 2
#| fig-width: 4
#| fig-alt: >
#| Two side-by-side box plots of price versus cut of diamonds. The outliers
#| are transparent. On both plots the y-axis labels are formatted as dollars.
#| The y-axis labels on the plot start at $0 and go to $15,000, increasing
#| by $5,000. The y-axis labels on the right plot start at $1K and go to
#| are transparent. On both plots the x-axis labels are formatted as dollars.
#| The x-axis labels on the plot start at $0 and go to $15,000, increasing
#| by $5,000. The x-axis labels on the right plot start at $1K and go to
#| $19K, increasing by $6K.
# Left
ggplot(diamonds, aes(x = cut, y = price)) +
ggplot(diamonds, aes(x = price, y = cut)) +
geom_boxplot(alpha = 0.05) +
scale_y_continuous(labels = scales::label_dollar())
scale_x_continuous(labels = scales::label_dollar())
# Right
ggplot(diamonds, aes(x = cut, y = price)) +
ggplot(diamonds, aes(x = price, y = cut)) +
geom_boxplot(alpha = 0.05) +
scale_y_continuous(
scale_x_continuous(
labels = scales::label_dollar(scale = 1/1000, suffix = "K"),
breaks = seq(1000, 19000, by = 6000)
)
Expand Down Expand Up @@ -454,19 +455,22 @@ The theme setting `legend.position` controls where the legend is drawn:
```{r}
#| layout-ncol: 2
#| fig-width: 4
#| fig-height: 2
#| fig-alt: >
#| Four scatterplots of highway fuel efficiency versus engine size of cars
#| where points are colored based on class of car. Clockwise, the legend
#| is placed on the left, top, bottom, and right of the plot.
#| is placed on the right, left, top, and bottom of the plot.
base <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class))
base + theme(legend.position = "left")
base + theme(legend.position = "top")
base + theme(legend.position = "bottom")
base + theme(legend.position = "right") # the default
base + theme(legend.position = "left")
base +
theme(legend.position = "top") +
guides(col = guide_legend(nrow = 3))
base +
theme(legend.position = "bottom") +
guides(col = guide_legend(nrow = 3))
```

If your plot is short and wide, place the legend at the legend at the top or bottom, and if it's tall and narrow, place the legend at the left or right.
Expand Down Expand Up @@ -505,8 +509,7 @@ For example, it's easier to see the precise relationship between `carat` and `pr
```{r}
#| fig-align: default
#| layout-ncol: 2
#| fig-width: 4
#| fig-height: 3
#| fig-width: 3
#| fig-alt: >
#| Two plots of price versus carat of diamonds. Data binned and the color of
#| the rectangles representing each bin based on the number of points that
Expand Down Expand Up @@ -548,8 +551,7 @@ The two plots below look similar, but there is enough difference in the shades o
```{r}
#| fig-align: default
#| layout-ncol: 2
#| fig-width: 4
#| fig-height: 3
#| fig-width: 3
#| fig-alt: >
#| Two scatterplots of highway mileage versus engine size where points are
#| colored by drive type. The plot on the left uses the default
Expand Down Expand Up @@ -630,8 +632,8 @@ These scales are available as continuous (`c`), discrete (`d`), and binned (`b`)
```{r}
#| fig-align: default
#| layout-ncol: 2
#| fig-width: 4
#| fig-asp: 1
#| fig-width: 3
#| fig-asp: 0.75
#| fig-alt: >
#| Three hex plots where the color of the hexes show the number of observations
#| that fall into that hex bin. The first plot uses the default, continuous
Expand All @@ -646,19 +648,19 @@ df <- tibble(
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed() +
labs(title = "Default, continuous")
labs(title = "Default, continuous", x = NULL, y = NULL)
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed() +
scale_fill_viridis_c() +
labs(title = "Viridis, continuous")
labs(title = "Viridis, continuous", x = NULL, y = NULL)
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed() +
scale_fill_viridis_b() +
labs(title = "Viridis, binned")
labs(title = "Viridis, binned", x = NULL, y = NULL)
```

Note that all color scales come in two varieties: `scale_color_*()` and `scale_fill_*()` for the `color` and `fill` aesthetics respectively (the color scales are available in both UK and US spellings).
Expand All @@ -671,38 +673,59 @@ There are three ways to control the plot limits:
2. Setting the limits in each scale.
3. Setting `xlim` and `ylim` in `coord_cartesian()`.

To zoom in on a region of the plot, it's generally best to use `coord_cartesian()`.
Compare the following two plots:
We'll demonstrate these options in a series of plots.
The plot on the left shows the relationship between engine size and fuel efficiency, colored by type of drive train.
The plot on the right shows the same variables, but subsets the data that are plotted.
Subsetting the data has affected the x and y scales as well as the smooth curve.

```{r}
#| layout-ncol: 2
#| fig-width: 4
#| fig-height: 3
#| message: false
# Left
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 6), ylim = c(10, 30))
geom_point(aes(color = drv)) +
geom_smooth()
# Right
mpg |>
filter(displ >= 5, displ <= 6, hwy >= 10, hwy <= 30) |>
filter(displ >= 5 & displ <= 6 & hwy >= 10 & hwy <= 25) |>
ggplot(aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_point(aes(color = drv)) +
geom_smooth()
```

You can also set the `limits` on individual scales.
Reducing the limits is basically equivalent to subsetting the data.
It is generally more useful if you want to *expand* the limits, for example, to match scales across different plots.
Let's compare these to the two plots below where the plot on the left sets the `limits` on individual scales and the plot on the right sets them in `coord_cartesian()`.
We can see that reducing the limits is equivalent to subsetting the data.
Therefore, to zoom in on a region of the plot, it's generally best to use `coord_cartesian()`.

```{r}
#| layout-ncol: 2
#| fig-width: 4
#| message: false
#| warning: false
# Left
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = drv)) +
geom_smooth() +
scale_x_continuous(limits = c(5, 6)) +
scale_y_continuous(limits = c(10, 25))
# Right
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = drv)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 6), ylim = c(10, 25))
```

On the other hand, setting the `limits` on individual scales is generally more useful if you want to *expand* the limits, e.g., to match scales across different plots.
For example, if we extract two classes of cars and plot them separately, it's difficult to compare the plots because all three scales (the x-axis, the y-axis, and the color aesthetic) have different ranges.

```{r}
#| layout-ncol: 2
#| fig-width: 4
#| fig-height: 3
suv <- mpg |> filter(class == "suv")
compact <- mpg |> filter(class == "compact")
Expand All @@ -721,7 +744,6 @@ One way to overcome this problem is to share scales across multiple plots, train
```{r}
#| layout-ncol: 2
#| fig-width: 4
#| fig-height: 3
x_scale <- scale_x_continuous(limits = range(mpg$displ))
y_scale <- scale_y_continuous(limits = range(mpg$hwy))
Expand Down Expand Up @@ -773,14 +795,11 @@ In this particular case, you could have simply used faceting, but this technique
d. Adding informative plot labels.
e. Placing breaks every 4 years (this is trickier than it seems!).
4. Use `override.aes` to make the legend on the following plot easier to see.
4. First, create the following plot.
Then, modify the code using `override.aes` to make the legend easier to see.
```{r}
#| fig-format: "png"
#| out-width: "50%"
#| fig-alt: >
#| Scatterplot of price versus carat of diamonds. The points are colored
#| by cut of the diamonds and they're very transparent.
#| fig-show: hide
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), alpha = 1/20)
Expand Down Expand Up @@ -845,13 +864,13 @@ A few other helpful `theme()` components are used to change the placement for fo
#| economy' with the caption pointing to the source of the data, fueleconomy.gov.
#| The caption and title are left justified, the legend is inside of the plot
#| with a black border.
#|
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point() +
labs(
title = "Larger engine sizes tend to have lower fuel economy",
caption = "Source: https://fueleconomy.gov."
) +
) +
theme(
legend.position = c(0.6, 0.7),
legend.direction = "horizontal",
Expand All @@ -860,7 +879,7 @@ ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
plot.title.position = "plot",
plot.caption.position = "plot",
plot.caption = element_text(hjust = 0)
)
)
```

For an overview of all `theme()` components, see help with `?theme`.
Expand All @@ -883,6 +902,8 @@ Note that you first need to create the plots and save them as objects (in the fo
Then, you place them next to each other with `+`.

```{r}
#| fig-width: 6
#| fig-asp: 0.5
#| fig-alt: >
#| Two plots (a scatterplot of highway mileage versus engine size and a
#| side-by-side boxplots of highway mileage versus drive train) placed next
Expand All @@ -904,6 +925,8 @@ You can also create complex plot layouts with patchwork.
In the following, `|` places the `p1` and `p3` next to each other and `/` moves `p2` to the next line.

```{r}
#| fig-width: 6
#| fig-asp: 0.8
#| fig-alt: >
#| Three plots laid out such that first and third plot are next to each other
#| and the second plot stretched beneath them. The first plot is a
Expand All @@ -928,7 +951,8 @@ Finally, we have also customized the heights of the various components of our pa
Patchwork divides up the area you have allotted for your plot using this scale and places the components accordingly.

```{r}
#| fig-width: 10
#| fig-width: 8
#| fig-asp: 1
#| fig-alt: >
#| Five plots laid out such that first two plots are next to each other. Plots
#| three and four are underneath them. And the fifth plot stretches under them.
Expand Down Expand Up @@ -980,7 +1004,7 @@ If you'd like to learn more about combining and layout out multiple plots with p
Can you explain why this happens?

```{r}
#| results: hide
#| fig-show: hide
p1 <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
Expand All @@ -998,6 +1022,8 @@ If you'd like to learn more about combining and layout out multiple plots with p
2. Using the three plots from the previous exercise, recreate the following patchwork.
```{r}
#| fig-width: 7
#| fig-asp: 0.8
#| echo: false
#| fig-alt: >
#| Three plots: Plot 1 is a scatterplot of highway mileage versus engine size.
Expand Down
39 changes: 20 additions & 19 deletions data-visualize.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,6 @@ One commonly used visualization for distributions of continuous variables is a h

```{r}
#| warning: false
#| layout-ncol: 2
#| fig-alt: >
#| A histogram of body masses of penguins. The distribution is unimodal
#| and right skewed, ranging between approximately 2500 to 6500 grams.
Expand All @@ -557,18 +556,16 @@ A binwidth of 200 provides a sensible balance.

```{r}
#| warning: false
#| layout-ncol: 3
#| layout-ncol: 2
#| fig-width: 3
#| fig-alt: >
#| Three histograms of body masses of penguins, one with binwidth of 20
#| (right), one with binwidth of 200 (center), and one with binwidth of
#| 2000 (left). The histogram with binwidth of 20 shows lots of ups and
#| downs in the heights of the bins, creating a jagged outline. The histogram
#| with binwidth of 2000 shows only three bins.
#| Two histograms of body masses of penguins, one with binwidth of 20
#| (left) and one with binwidth of 2000 (right). The histogram with binwidth
#| of 20 shows lots of ups and downs in the heights of the bins, creating a
#| jagged outline. The histogram with binwidth of 2000 shows only three bins.
ggplot(penguins, aes(x = body_mass_g)) +
geom_histogram(binwidth = 20)
ggplot(penguins, aes(x = body_mass_g)) +
geom_histogram(binwidth = 200)
ggplot(penguins, aes(x = body_mass_g)) +
geom_histogram(binwidth = 2000)
```
Expand Down Expand Up @@ -702,25 +699,29 @@ Note the terminology we have used here:
### Two categorical variables

We can use stacked bar plots to visualize the relationship between two categorical variables.
For example, the following two stacked bar plots both display the relationship between `island` and `species`, or specifically, visualizing the distribution of `species` within each island.

The two stacked bar plots below both display the relationship between `island` and `species`, or specifically, visualizing the distribution of `species` within each island.
The plot on the left shows the frequencies of each species of penguins on each island and the plot on the right shows the relative frequencies (proportions) of each species within each island (despite the incorrectly labeled y-axis that says "count").
The first plot shows the frequencies of each species of penguins on each island and the plot on the right shows the relative frequencies (proportions) of each species within each island (despite the incorrectly labeled y-axis that says "count").
The plot of frequencies show that there are equal numbers of Adelies on each island.
But we don't have a good sense of the percentage balance within each island.
In the proportions plot, we've lost our notion of total penguins, but we've gained the advantage of "breakdown by island".

The relative frequency plot, created by setting `position = "fill"` in the geom, is more useful for comparing species distributions across islands since it's not affected by the unequal numbers of penguins across the islands.
Based on the plot on the left, we can see that Gentoo penguins all live on Biscoe island and make up roughly 75% of the penguins on that island, Chinstrap all live on Dream island and make up roughly 50% of the penguins on that island, and Adelie live on all three islands and make up all of the penguins on Torgersen.

```{r}
#| layout-ncol: 2
#| fig-alt: >
#| Bar plots of penguin species by island (Biscoe, Dream, and Torgersen).
#| On the right, frequencies of species are shown. On the left, relative
#| frequencies of species are shown.
#| Bar plots of penguin species by island (Biscoe, Dream, and Torgersen)
ggplot(penguins, aes(x = island, fill = species)) +
geom_bar()
```

The second plot is a relative frequency plot, created by setting `position = "fill"` in the geom is more useful for comparing species distributions across islands since it's not affected by the unequal numbers of penguins across the islands.
Using this plot we can see that Gentoo penguins all live on Biscoe island and make up roughly 75% of the penguins on that island, Chinstrap all live on Dream island and make up roughly 50% of the penguins on that island, and Adelie live on all three islands and make up all of the penguins on Torgersen.

```{r}
#| fig-alt: >
#| Bar plots of penguin species by island (Biscoe, Dream, and Torgersen)
#| the bars are scaled to the same height, making it a relative frequencies
#| plot
ggplot(penguins, aes(x = island, fill = species)) +
geom_bar(position = "fill")
```
Expand Down
Loading

0 comments on commit ac74f98

Please sign in to comment.