Due on Wednesday June 5 (week 10) at 11:59 PM
Description
Data visualization is a huge part of data storytelling. It also happens to be one of the major topics within the R user community. In this assignment, you’ll dissect the making of a data visualization and create one of your own using data from your personal data project. You’ll learn about how people share their code (namely using GitHub) and create data art. You’ll then create your own piece, starting from the beginning (a sketch on a piece of paper) to a finished product (a visualization made in R). In the process, you’ll learn how to use ggplot and related packages to create visualizations. All components of this assignment should be written and rendered/knitted using Quarto or RMarkdown.
What is #tidytuesday? #tidytuesday is a hashtag/movement for R user community members to come together and learn more about how to use R tools to visualize and tell stories about data. Each week, organizers post a data set that has been cleaned up and is ready for visualization. We’ve worked with Tidy Tuesday data in class before: fisheries, ramen, and UFOs.
Why care about visual storytelling? Asking an audience to read a paragraph of text is hard, but asking people to look at an infographic is way easier. You can draw people in with a well-presented, aesthetically pleasing figure, and then present some information that they might be interested in.
Components
Part 1. Understand the context
a. Learn about tidy tuesday.
b. Dissect someone else’s visualization
- For context, read Sam Csik’s “One workflow for building effective (and pretty) {ggplot2} data visualizations.”
- Look for a Tidy Tuesday visualization you like with public code in the #tidytuesday hashtag. This usually comes in the form of a link to GitHub, where you’ll find a script (with a .R suffix) with all the code to create the visualization.
- alternatively, you can look at people’s GitHub repositories for tidy tuesday. Some examples include:
- Ijeamaka Anyene
- Georgios Karamanis
- Nicola Rennie (who also has an amazing app where she has displayed all her visualizations - see this here)
- Ijeamaka Anyene
- alternatively, you can look at people’s GitHub repositories for tidy tuesday. Some examples include:
Choose three visualizations you really like. In 8-10 sentences total, summarize
- The packages used and what they do
- Any cleaning/summarizing steps
- The structure of the final data frame
- The geoms used and how they show up in the final visualization (for example,
geom_line()
in the script creates which lines in the output?)
Finally, include
- A screenshot of the visualization and a link to the creator’s GitHub code
The best way to understand someone’s code, especially if they’ve put it on GitHub, is to run it yourself. You can do this by forking and cloning the repository to your machine (as we did in workshop), or by copy/pasting the script into your computer. If you choose the second option, you may have to do some adjustment to file paths to get the script to read in the data correctly.
Part 2. Make your own visualization using your data.
a. Update the observations in your data entry.
b. Decide on a type of visualization that best fits the data set.
One good option is From Data to Viz by Yan Holtz, which lists types of visualizations based on data types.
c. Sketch your visualization on paper.
Use colored pencils, highlighters, markers, etc. to represent the colors, axes, and other components of your visualization.
Incorporate elements you liked from the visualizations you dissected in part 1.
d. Code up your sketch!
Annotate your code!
e. Write about your process
In 8-10 sentences total, describe
- Your coding process (for example (not the only options for this prompt): did you have to clean up/summarize the data before you started? Which geoms did you start with?)
- What elements from the part 1 visualizations you incorporated into your visualization and why
- How you designed the visualization (for example (again, not the only options): how did you choose the colors, shapes, fonts, etc., and why?)
- What message does your visualization convey - basically, what is the main take away of the visualization you created?
Checklist
Your submission should include
- A link to the GitHub repository where your materials for this specific assignment are (make sure it is public!)
- A link to a rendered HTML document with a table of contents showing where each part is
Note: you can do this by making your GitHub repository connect to GitHub pages. The guide to doing this is here.
Your rendered HTML document should include
- your name, the title, and the date
- written responses to part 1 for all three visualizations you select
- a photo of your sketch in part 2c
- annotated code and output for part 2d
- written response to part 2e
Your GitHub repository should include
- an informative README
- code and data organized into their own folders (thus, your code has to incorporate the
here
package in some capacity - see workshop 7)