tips for making your plots readable and professional
data visualization
Author
Affiliation

Why do we care about how plots, figures, graphs, etc. look?

Data visualization is a huge part of data storytelling, one of the core parts of being a data scientist. This is especially relevant to environmental science: you’re responsible for communicating not just about the environment, but what evidence (i.e. data) supports your claim. Therefore, it is crucial that environmental scientists communicate about their data clearly and effectively.

In this class, your plots will be assessed using three (very broad) criteria:
1. accuracy: is your plot accurately and truthfully representing the data?
2. clarity: is your plot clearly representing a pattern, relationship, message?
3. aesthetics: does your plot look good?

General rules for good-looking plots

(adapted from Allison Horst)

Non-negotiable (if you are missing these, you will not get full credit for your plot)

  • Axes must have complete labels with units (very few exceptions to this)
    • for example: body_mass_g should be “Body mass (g)”
  • If plotting regression or correlation lines, underlying data must be displayed on plot in addition to predicted lines
  • In all other cases (for example, comparing means between groups), underlying data must be displayed on plot when possible
  • Concise, descriptive title (if presented alone and not in a report)

Additional points

  • logical start and end values of x or y axes (these are usually by default in ggplot, but you should double check)
  • if gridlines aren’t useful to understand the data, remove them
  • figure background should be white (easier to see points)
  • text labels should be large enough to see/read clearly
  • use color sparingly and be aware of color-blind friendly palettes
  • use one font throughout a plot
  • figure fonts should match text font (for example, don’t use Arial in a figure when the rest of your text is in Times New Roman)
  • make sure your plot size and aspect ratio renders correctly

In general, the simpler you can make a plot, the better.

Bad/better examples

Some examples of bad and better plots follow using data from the {palmerpenguins} package.

Bar chart

Code
penguins %>% 
  group_by(island, species) %>% 
  count() %>% 
  ggplot(aes(x = species, y = n)) +
  geom_col() +
  labs(title = "penguins") +
  facet_wrap(~island)

Why is this bad?
- gap between bottom of bars and x-axis
- meaningless y axis
- gray background against gray bars and black text is hard to see
- gridlines don’t do much

Code
penguins %>% 
  group_by(island, species) %>% 
  count() %>% 
  ggplot(aes(x = island, y = n)) +
  # fill = fills in the shape, color = controls the outline
  geom_col(fill = "darkgrey", color = "#000000") +
  # expand takes away the gap at the bottom and at the top of the plot
  # limits sets the limits of the axis
  scale_y_continuous(expand = c(0, 0), limits = c(0, 130)) +
  # change titles to be meaningful
  labs(title = "Penguin counts differ across species and islands",
       x = "Island", 
       y = "Penguin count") +
  # one of the built-in themes in ggplot
  theme_bw() +
  theme(# changing text sizes
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14),
        strip.text = element_text(size = 14),
        # getting rid of gridlines
        panel.grid = element_blank(),
        # making the subplot titles (aka strips) have a transparent background
        strip.background = element_blank(),
        # making the plot title bigger and centering it
        plot.title = element_text(size = 20, hjust = 0.5),
        plot.title.position = "plot",
        text = element_text(family = "Times")
        ) +
  facet_wrap(~species)

Why is this better?
- text is bigger
- gridlines are gone
- easier to see columns agains background
- complete axes
- grayscale color scheme (good for printing out and paper reports in black and white)

Bonus plot: lollipop plot

One way to represent discrete (i.e. count) variables is to make a lollipop plot. This cuts down on visual clutter, as the bars in a bar chart are thick and take up a lot of space, but lollipop plots show the same thing but with smaller shapes.

Code
penguins %>% 
  group_by(island, species) %>% 
  count() %>% 
  ggplot(aes(x = island, 
             y = n)) +
  # fill = fills in the shape, color = controls the outline
  geom_point(size = 3) +
  geom_segment(aes(y = 0,
                   yend = n)) +
  
  # expand takes away the gap at the bottom and at the top of the plot
  # limits sets the limits of the axis
  scale_y_continuous(expand = c(0, 0), limits = c(0, 130)) +
  # change titles to be meaningful
  labs(title = "Penguin counts differ across species and islands",
       x = "Island", 
       y = "Penguin count") +
  # one of the built-in themes in ggplot
  theme_bw() +
  theme(# changing text sizes
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14),
        strip.text = element_text(size = 14),
        # getting rid of gridlines
        panel.grid = element_blank(),
        # making the subplot titles (aka strips) have a transparent background
        strip.background = element_blank(),
        # making the plot title bigger and centering it
        plot.title = element_text(size = 20, hjust = 0.5),
        plot.title.position = "plot",
        text = element_text(family = "Times")
        ) +
  facet_wrap(~species)

Scatterplot

Code
ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm)) +
  geom_point()

Why is this bad?
- grey background, black dots
- hides some meaningful variation across species (for example, we know that Gentoo penguins tend to be bigger than Adelie and Chinstrap)
- axes are meaningless
- small text size
- points likely overlap, so some parts of the data are hidden

Code
ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm, color = species, shape = species)) +
  geom_point(size = 3, alpha = 0.7) +
  # specify color scheme
  scale_color_manual(values = c("darkorange", "cornflowerblue", "darkgreen")) +
  # meaningful titles
  labs(title = "Larger penguins tend to have longer flippers",
       x = "Body mass (g)", 
       y = "Flipper length (mm)",
       # have to specify color and shape separately (based on color and shape in aes() call)
       color = "Penguin species", shape = "Penguin species") +
  # another ggplot built-in theme
  theme_classic() +
  theme(# putting legend in plot area
        legend.position = c(0.85, 0.2),
        # legend text sizes
        legend.text = element_text(size = 14),
        legend.title = element_text(size = 14),
        # text size, position, and font adjustment
        axis.text = element_text(size = 14), 
        axis.title = element_text(size = 16),
        plot.title = element_text(size = 18, hjust = 0.5),
        plot.title.position = "plot",
        text = element_text(family = "Garamond")
    )

Why is this better?
- white background, no grid lines
- points are shaped and colored by species, so you can easily see the differences between groups
- transparency shows overlapping points
- complete axis labels
- text is larger and font is changed
- legend is in plot area (if there’s space to do this, generally good)

Other plots

boxplot and jitter

Code
ggplot(data = penguins, aes(x = species, y = body_mass_g)) +
  # fill the boxplot shape using the species column
  # make the boxplots narrower
  geom_boxplot(aes(fill = species), width = 0.2) +
  # fill the violin shape using the species column: every species has a different color
  # alpha argument: makes the violin shape more transparent (scale of 0 to 1)
  geom_jitter(aes(color = species), alpha = 0.5) +
  # specify the colors you want to use for each species
  scale_color_manual(values = c("#F56A56", "#3D83F5", "#A9A20B")) +
  scale_fill_manual(values = c("#F56A56", "#3D83F5", "#A9A20B")) +
  # relabel the axis titles, plot title, and caption
  labs(x = "Penguin species", y = "Body mass (g)",
       title = "Gentoo penguins tend to be heavier than Adelie or Chinstrap",
       caption = "Data source: {palmerpenguins}, \n Horst AM, Hill AP, Gorman KB.") +
  # themes built in to ggplot
  theme_bw() +
  # other theme adjustments
  theme(legend.position = "none", 
        axis.title = element_text(size = 13),
        axis.text = element_text(size = 12),
        plot.title = element_text(size = 14),
        plot.caption = element_text(face = "italic"),
        text = element_text(family = "Times New Roman"),
        panel.grid = element_blank())
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Citation

BibTeX citation:
@online{bui,
  author = {Bui, An},
  title = {Finalizing Plots},
  url = {https://an-bui.github.io/ES-193DS-W23/resources/finalizing-plots.html},
  langid = {en}
}
For attribution, please cite this work as:
Bui, An. n.d. “Finalizing Plots.” https://an-bui.github.io/ES-193DS-W23/resources/finalizing-plots.html.