Code
%>%
penguins group_by(island, species) %>%
count() %>%
ggplot(aes(x = species, y = n)) +
geom_col() +
labs(title = "penguins") +
facet_wrap(~island)
Data visualization is a huge part of data storytelling, one of the core parts of being a data scientist. This is especially relevant to environmental science: you’re responsible for communicating not just about the environment, but what evidence (i.e. data) supports your claim. Therefore, it is crucial that environmental scientists communicate about their data clearly and effectively.
In this class, your plots will be assessed using three (very broad) criteria:
1. accuracy: is your plot accurately and truthfully representing the data?
2. clarity: is your plot clearly representing a pattern, relationship, message?
3. aesthetics: does your plot look good?
(adapted from Allison Horst)
ggplot
, but you should double check)In general, the simpler you can make a plot, the better.
Some examples of bad and better plots follow using data from the {palmerpenguins}
package.
Why is this bad?
- gap between bottom of bars and x-axis
- meaningless y axis
- gray background against gray bars and black text is hard to see
- gridlines don’t do much
penguins %>%
group_by(island, species) %>%
count() %>%
ggplot(aes(x = island, y = n)) +
# fill = fills in the shape, color = controls the outline
geom_col(fill = "darkgrey", color = "#000000") +
# expand takes away the gap at the bottom and at the top of the plot
# limits sets the limits of the axis
scale_y_continuous(expand = c(0, 0), limits = c(0, 130)) +
# change titles to be meaningful
labs(title = "Penguin counts differ across species and islands",
x = "Island",
y = "Penguin count") +
# one of the built-in themes in ggplot
theme_bw() +
theme(# changing text sizes
axis.text = element_text(size = 12),
axis.title = element_text(size = 14),
strip.text = element_text(size = 14),
# getting rid of gridlines
panel.grid = element_blank(),
# making the subplot titles (aka strips) have a transparent background
strip.background = element_blank(),
# making the plot title bigger and centering it
plot.title = element_text(size = 20, hjust = 0.5),
plot.title.position = "plot",
text = element_text(family = "Times")
) +
facet_wrap(~species)
Why is this better?
- text is bigger
- gridlines are gone
- easier to see columns agains background
- complete axes
- grayscale color scheme (good for printing out and paper reports in black and white)
One way to represent discrete (i.e. count) variables is to make a lollipop plot. This cuts down on visual clutter, as the bars in a bar chart are thick and take up a lot of space, but lollipop plots show the same thing but with smaller shapes.
penguins %>%
group_by(island, species) %>%
count() %>%
ggplot(aes(x = island,
y = n)) +
# fill = fills in the shape, color = controls the outline
geom_point(size = 3) +
geom_segment(aes(y = 0,
yend = n)) +
# expand takes away the gap at the bottom and at the top of the plot
# limits sets the limits of the axis
scale_y_continuous(expand = c(0, 0), limits = c(0, 130)) +
# change titles to be meaningful
labs(title = "Penguin counts differ across species and islands",
x = "Island",
y = "Penguin count") +
# one of the built-in themes in ggplot
theme_bw() +
theme(# changing text sizes
axis.text = element_text(size = 12),
axis.title = element_text(size = 14),
strip.text = element_text(size = 14),
# getting rid of gridlines
panel.grid = element_blank(),
# making the subplot titles (aka strips) have a transparent background
strip.background = element_blank(),
# making the plot title bigger and centering it
plot.title = element_text(size = 20, hjust = 0.5),
plot.title.position = "plot",
text = element_text(family = "Times")
) +
facet_wrap(~species)
Why is this bad?
- grey background, black dots
- hides some meaningful variation across species (for example, we know that Gentoo penguins tend to be bigger than Adelie and Chinstrap)
- axes are meaningless
- small text size
- points likely overlap, so some parts of the data are hidden
ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm, color = species, shape = species)) +
geom_point(size = 3, alpha = 0.7) +
# specify color scheme
scale_color_manual(values = c("darkorange", "cornflowerblue", "darkgreen")) +
# meaningful titles
labs(title = "Larger penguins tend to have longer flippers",
x = "Body mass (g)",
y = "Flipper length (mm)",
# have to specify color and shape separately (based on color and shape in aes() call)
color = "Penguin species", shape = "Penguin species") +
# another ggplot built-in theme
theme_classic() +
theme(# putting legend in plot area
legend.position = c(0.85, 0.2),
# legend text sizes
legend.text = element_text(size = 14),
legend.title = element_text(size = 14),
# text size, position, and font adjustment
axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 18, hjust = 0.5),
plot.title.position = "plot",
text = element_text(family = "Garamond")
)
Why is this better?
- white background, no grid lines
- points are shaped and colored by species, so you can easily see the differences between groups
- transparency shows overlapping points
- complete axis labels
- text is larger and font is changed
- legend is in plot area (if there’s space to do this, generally good)
ggplot(data = penguins, aes(x = species, y = body_mass_g)) +
# fill the boxplot shape using the species column
# make the boxplots narrower
geom_boxplot(aes(fill = species), width = 0.2) +
# fill the violin shape using the species column: every species has a different color
# alpha argument: makes the violin shape more transparent (scale of 0 to 1)
geom_jitter(aes(color = species), alpha = 0.5) +
# specify the colors you want to use for each species
scale_color_manual(values = c("#F56A56", "#3D83F5", "#A9A20B")) +
scale_fill_manual(values = c("#F56A56", "#3D83F5", "#A9A20B")) +
# relabel the axis titles, plot title, and caption
labs(x = "Penguin species", y = "Body mass (g)",
title = "Gentoo penguins tend to be heavier than Adelie or Chinstrap",
caption = "Data source: {palmerpenguins}, \n Horst AM, Hill AP, Gorman KB.") +
# themes built in to ggplot
theme_bw() +
# other theme adjustments
theme(legend.position = "none",
axis.title = element_text(size = 13),
axis.text = element_text(size = 12),
plot.title = element_text(size = 14),
plot.caption = element_text(face = "italic"),
text = element_text(family = "Times New Roman"),
panel.grid = element_blank())
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
@online{bui,
author = {Bui, An},
title = {Finalizing Plots},
url = {https://an-bui.github.io/ES-193DS-W23/resources/finalizing-plots.html},
langid = {en}
}
---
title: "Finalizing plots"
description: "tips for making your plots readable and professional"
author:
- name: An Bui
url: https://an-bui.com/
affiliation: UC Santa Barbara, Ecology, Evolution, and Marine Biology
affiliation-url: https://www.eemb.ucsb.edu/
format:
html:
code-fold: true
freeze: auto
categories: [data visualization]
citation:
url: https://an-bui.github.io/ES-193DS-W23/resources/finalizing-plots.html
---
# Why do we care about how plots, figures, graphs, etc. look?
Data visualization is a _huge_ part of data storytelling, one of the core parts of being a data scientist. This is _especially_ relevant to environmental science: you're responsible for communicating not just about the environment, but what evidence (i.e. data) supports your claim. Therefore, it is crucial that environmental scientists communicate about their data clearly and effectively.
In this class, your plots will be assessed using three (very broad) criteria:
1. **accuracy**: is your plot accurately and _truthfully_ representing the data?
2. **clarity**: is your plot clearly representing a pattern, relationship, message?
3. **aesthetics**: does your plot look good?
# General rules for good-looking plots
_(adapted from Allison Horst)_
## Non-negotiable (if you are missing these, you will not get full credit for your plot)
- Axes must have **complete** labels with units (very few exceptions to this)
- for example: body_mass_g should be "Body mass (g)"
- If plotting regression or correlation lines, underlying data must be displayed on plot in addition to predicted lines
- In all other cases (for example, comparing means between groups), underlying data must be displayed on plot when possible
- Concise, descriptive title (if presented alone and not in a report)
## Additional points
- logical start and end values of x or y axes (these are usually by default in `ggplot`, but you should double check)
- if gridlines aren't useful to understand the data, remove them
- figure background should be white (easier to see points)
- text labels should be large enough to see/read clearly
- use color sparingly and be aware of color-blind friendly palettes
- use one font throughout a plot
- figure fonts should match text font (for example, don't use Arial in a figure when the rest of your text is in Times New Roman)
- make sure your plot size and aspect ratio renders correctly
In general, **the simpler you can make a plot, the better.**
# Bad/better examples
Some examples of bad and better plots follow using data from the `{palmerpenguins}` package.
```{r message = FALSE, echo = FALSE}
library(palmerpenguins)
library(tidyverse)
```
## Bar chart
```{r fig.width = 8, fig.height = 6}
penguins %>%
group_by(island, species) %>%
count() %>%
ggplot(aes(x = species, y = n)) +
geom_col() +
labs(title = "penguins") +
facet_wrap(~island)
```
**Why is this bad?**
- gap between bottom of bars and x-axis
- meaningless y axis
- gray background against gray bars and black text is hard to see
- gridlines don't do much
```{r fig.width = 8, fig.height = 6, warning = FALSE}
penguins %>%
group_by(island, species) %>%
count() %>%
ggplot(aes(x = island, y = n)) +
# fill = fills in the shape, color = controls the outline
geom_col(fill = "darkgrey", color = "#000000") +
# expand takes away the gap at the bottom and at the top of the plot
# limits sets the limits of the axis
scale_y_continuous(expand = c(0, 0), limits = c(0, 130)) +
# change titles to be meaningful
labs(title = "Penguin counts differ across species and islands",
x = "Island",
y = "Penguin count") +
# one of the built-in themes in ggplot
theme_bw() +
theme(# changing text sizes
axis.text = element_text(size = 12),
axis.title = element_text(size = 14),
strip.text = element_text(size = 14),
# getting rid of gridlines
panel.grid = element_blank(),
# making the subplot titles (aka strips) have a transparent background
strip.background = element_blank(),
# making the plot title bigger and centering it
plot.title = element_text(size = 20, hjust = 0.5),
plot.title.position = "plot",
text = element_text(family = "Times")
) +
facet_wrap(~species)
```
**Why is this better?**
- text is bigger
- gridlines are gone
- easier to see columns agains background
- complete axes
- grayscale color scheme (good for printing out and paper reports in black and white)
## Bonus plot: lollipop plot
One way to represent discrete (i.e. count) variables is to make a lollipop plot. This cuts down on visual clutter, as the bars in a bar chart are thick and take up a lot of space, but lollipop plots show the same thing but with smaller shapes.
```{r}
penguins %>%
group_by(island, species) %>%
count() %>%
ggplot(aes(x = island,
y = n)) +
# fill = fills in the shape, color = controls the outline
geom_point(size = 3) +
geom_segment(aes(y = 0,
yend = n)) +
# expand takes away the gap at the bottom and at the top of the plot
# limits sets the limits of the axis
scale_y_continuous(expand = c(0, 0), limits = c(0, 130)) +
# change titles to be meaningful
labs(title = "Penguin counts differ across species and islands",
x = "Island",
y = "Penguin count") +
# one of the built-in themes in ggplot
theme_bw() +
theme(# changing text sizes
axis.text = element_text(size = 12),
axis.title = element_text(size = 14),
strip.text = element_text(size = 14),
# getting rid of gridlines
panel.grid = element_blank(),
# making the subplot titles (aka strips) have a transparent background
strip.background = element_blank(),
# making the plot title bigger and centering it
plot.title = element_text(size = 20, hjust = 0.5),
plot.title.position = "plot",
text = element_text(family = "Times")
) +
facet_wrap(~species)
```
## Scatterplot
```{r fig.width = 8, fig.height = 6, warning = FALSE}
ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm)) +
geom_point()
```
**Why is this bad?**
- grey background, black dots
- hides some meaningful variation across species (for example, we know that Gentoo penguins tend to be bigger than Adelie and Chinstrap)
- axes are meaningless
- small text size
- points likely overlap, so some parts of the data are hidden
```{r fig.width = 8, fig.height = 6, warning = FALSE}
ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm, color = species, shape = species)) +
geom_point(size = 3, alpha = 0.7) +
# specify color scheme
scale_color_manual(values = c("darkorange", "cornflowerblue", "darkgreen")) +
# meaningful titles
labs(title = "Larger penguins tend to have longer flippers",
x = "Body mass (g)",
y = "Flipper length (mm)",
# have to specify color and shape separately (based on color and shape in aes() call)
color = "Penguin species", shape = "Penguin species") +
# another ggplot built-in theme
theme_classic() +
theme(# putting legend in plot area
legend.position = c(0.85, 0.2),
# legend text sizes
legend.text = element_text(size = 14),
legend.title = element_text(size = 14),
# text size, position, and font adjustment
axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 18, hjust = 0.5),
plot.title.position = "plot",
text = element_text(family = "Garamond")
)
```
**Why is this better?**
- white background, no grid lines
- points are shaped and colored by species, so you can easily see the differences between groups
- transparency shows overlapping points
- complete axis labels
- text is larger and font is changed
- legend is in plot area (if there's space to do this, generally good)
# Other plots
## boxplot and jitter
```{r}
ggplot(data = penguins, aes(x = species, y = body_mass_g)) +
# fill the boxplot shape using the species column
# make the boxplots narrower
geom_boxplot(aes(fill = species), width = 0.2) +
# fill the violin shape using the species column: every species has a different color
# alpha argument: makes the violin shape more transparent (scale of 0 to 1)
geom_jitter(aes(color = species), alpha = 0.5) +
# specify the colors you want to use for each species
scale_color_manual(values = c("#F56A56", "#3D83F5", "#A9A20B")) +
scale_fill_manual(values = c("#F56A56", "#3D83F5", "#A9A20B")) +
# relabel the axis titles, plot title, and caption
labs(x = "Penguin species", y = "Body mass (g)",
title = "Gentoo penguins tend to be heavier than Adelie or Chinstrap",
caption = "Data source: {palmerpenguins}, \n Horst AM, Hill AP, Gorman KB.") +
# themes built in to ggplot
theme_bw() +
# other theme adjustments
theme(legend.position = "none",
axis.title = element_text(size = 13),
axis.text = element_text(size = 12),
plot.title = element_text(size = 14),
plot.caption = element_text(face = "italic"),
text = element_text(family = "Times New Roman"),
panel.grid = element_blank())
```