Choose your own assignment - Advanced data visualization

Due date

June 5, 2024

Modified

June 13, 2024

Due on Wednesday June 5 (week 10) at 11:59 PM

Description

Data visualization is a huge part of data storytelling. It also happens to be one of the major topics within the R user community. In this assignment, you’ll dissect the making of a data visualization and create one of your own using data from your personal data project. You’ll learn about how people share their code (namely using GitHub) and create data art. You’ll then create your own piece, starting from the beginning (a sketch on a piece of paper) to a finished product (a visualization made in R). In the process, you’ll learn how to use ggplot and related packages to create visualizations. All components of this assignment should be written and rendered/knitted using Quarto or RMarkdown.

What is #tidytuesday? #tidytuesday is a hashtag/movement for R user community members to come together and learn more about how to use R tools to visualize and tell stories about data. Each week, organizers post a data set that has been cleaned up and is ready for visualization. We’ve worked with Tidy Tuesday data in class before: fisheries, ramen, and UFOs.

Why care about visual storytelling? Asking an audience to read a paragraph of text is hard, but asking people to look at an infographic is way easier. You can draw people in with a well-presented, aesthetically pleasing figure, and then present some information that they might be interested in.

Components

Part 1. Understand the context

a. Learn about tidy tuesday.

  • Read about tidytuesday from the rfordatascience organizers’ repo (link)
  • Scroll through the #tidytuesday hashtag on Twitter to see what visualizations people are making with the weekly data sets. (link)

b. Dissect someone else’s visualization

Choose three visualizations you really like. In 8-10 sentences total, summarize

  • The packages used and what they do
  • Any cleaning/summarizing steps
  • The structure of the final data frame
  • The geoms used and how they show up in the final visualization (for example, geom_line() in the script creates which lines in the output?)
    Finally, include
  • A screenshot of the visualization and a link to the creator’s GitHub code
Tip

The best way to understand someone’s code, especially if they’ve put it on GitHub, is to run it yourself. You can do this by forking and cloning the repository to your machine (as we did in workshop), or by copy/pasting the script into your computer. If you choose the second option, you may have to do some adjustment to file paths to get the script to read in the data correctly.

Part 2. Make your own visualization using your data.

a. Update the observations in your data entry.

b. Decide on a type of visualization that best fits the data set.

One good option is From Data to Viz by Yan Holtz, which lists types of visualizations based on data types.

c. Sketch your visualization on paper.

Use colored pencils, highlighters, markers, etc. to represent the colors, axes, and other components of your visualization.

Incorporate elements you liked from the visualizations you dissected in part 1.

d. Code up your sketch!

Annotate your code!

e. Write about your process

In 8-10 sentences total, describe

  • Your coding process (for example (not the only options for this prompt): did you have to clean up/summarize the data before you started? Which geoms did you start with?)
  • What elements from the part 1 visualizations you incorporated into your visualization and why
  • How you designed the visualization (for example (again, not the only options): how did you choose the colors, shapes, fonts, etc., and why?)
  • What message does your visualization convey - basically, what is the main take away of the visualization you created?

Checklist

Your submission should include

  • A link to the GitHub repository where your materials for this specific assignment are (make sure it is public!)
  • A link to a rendered HTML document with a table of contents showing where each part is
Note

Note: you can do this by making your GitHub repository connect to GitHub pages. The guide to doing this is here.

Your rendered HTML document should include

  • your name, the title, and the date
  • written responses to part 1 for all three visualizations you select
  • a photo of your sketch in part 2c
  • annotated code and output for part 2d
  • written response to part 2e

Your GitHub repository should include

  • an informative README
  • code and data organized into their own folders (thus, your code has to incorporate the here package in some capacity - see workshop 7)