Spring 2024 - Homework 2

Due on Wednesday April 24 (Week 4) at 11:59 PM

Read the instructions carefully and double check that you have everything on the checklist.

Part 1. Tasks

You do not need to submit anything for these tasks, but are expected to have completed and know the material.

Task 1. Beginning steps to configure Git/GitHub: Part 2

Configure git and store your personal access token.

Note

Note: make sure git is installed. You did this last week for one of your homework tasks, but if it wasn’t installed and you didn’t install it, do it now!

Task 2. Read Lowndes and Horst 2020, “Tidy data for efficiency, reproducibility, and collaboration.”

Task 3. Read Horst and Lowndes 2022, “GitHub for supporting, reusing, contributing, and failing safely.”

Task 4. Install R packages

`lterdatasampler`

Read about the package here.

Part 2. Problems

Tip

Remember to set up a directory specifically for this homework assignment!

Problem 1. Raptor abundance between restoration plots (10 points)

Managers at Coal Oil Point Reserve (COPR) are interested in the relationship between restored areas and raptor abundance: as different parts of the reserve are restored to native grassland, small mammal (i.e. mice, gopher, vole) abundances should increase, which should attract more raptors (i.e. hawks, falcons) to the reserve. You conduct weekly surveys for raptors at COPR from April to June in one of the restoration areas and collect the following data on the number of raptors you see for each survey:

\[ 0, 2, 4, 6, 1, 2, 3, 5, 1, 0 \]

What kind of data did you collect, and why? Explain in 1-2 sentences. (2 points)
What is a better description of the variability in raptor count, standard deviation or standard error? Explain why in 1-2 sentences and calculate the metric of your choice, showing your work. (4 points)
What is a better description of the uncertainty in raptor count, standard deviation or standard error? Explain why in 1-2 sentences and calculate the metric of your choice, showing your work. (4 points)

Problem 2. Sugar maple stem masses (38 points)

Using the hbr_maples data set from lterdatasampler, answer the question: Does mean sugar maple stem mass in 2003 differ between reference and calcium-treated watersheds?

Working with data in a package

If you load in the package using library(lterdatasampler), the hbr_maples data set will be ready for you to use, but you just won’t see it in your environment. To have the data set pop up in your environment, use data(hbr_maples).

For any calculations for which you need a confidence level, use 95% with the corresponding significance level.

In one sentence each, write your null and alternative hypotheses to address this question. (2 points)
Make a QQ plot. In one sentence, describe whether the variable is normally distributed or not. (4 points)
Check variances. In one sentence, describe whether the groups have equal variances or not. (4 points)
Calculate the critical value for your test. (4 points)
Do a t-test. (4 points)

t-test arguments

Double check your arguments to make sure you’re running the right test.

Calculate an effect size and show the output. (4 points)

Calculating an effect size

You can either calculate Cohen’s d by using the formula presented in lecture:

\[ Cohen's \; d = \frac{\bar{y_A} - \bar{y_B}}{\sqrt{\frac{(n_A - 1)\times s^2_A + (n_B - 1)\times s^2_B}{n_A + n_B + 2}}} \]

or you could use a function in the package effectsize called cohens_d().

Make a figure with mean and confidence intervals showing the raw data underneath the summary information. Make sure that your plot is finalized. For full credit on your plot, adjust the ggplot()/geom_() defaults such that your plot has:
1. Different colors for each watershed
2. Different shapes for each watershed
3. A different font than the default
4. No gridlines
  (6 points)
Write a “methods” section. In one sentence each, describe:
1. Why the test you chose may have been appropriate for testing your hypotheses in part a
2. How you evaluated normality and homogeneity of variance
  (4 points)
Write a “results” section. Describe the results in full sentences in your own words, making sure to include the:
1. Test you ran
2. Number of observations for each watershed
3. Significance level
4. Degrees of freedom
5. Test statistic
6. p-value
7. Confidence interval in the difference in mean stem masses between watershed
8. Effect size
  (6 points)

Problem 3. Personal data (12 points)

By now, you have some observations on your data sheet for your personal data. Even though it’s early on in your data collection, it’s a good idea to practice good data management. For this problem, you’ll enter your data, read it into R, and create a visualization. If you get stuck at any step, you’ll know there’s something you need to fix.

Create a spreadsheet to enter your data. Two good options include Google Sheets or Microsoft Excel.
Create the columns of your spreadsheet.

Spreadsheet columns

If you organized your data sheet in “tidy format” (i.e. each row is an observation), then the columns of your spreadsheet would be exactly the same as the columns in your data sheet.

Enter your data.
Save your spreadsheet as a .csv file in your homework repository.
Read your data into R.
Create a visualization to explore your data. Finalize the plot. (4 points)
In 1-2 sentences, describe what the “main message” of your visualization is. (4 points)
In 2-4 sentences, describe the process of getting your data from your spreadsheet into R. Did you encounter any challenges? If so, why do you think those challenges arose, and how did you fix them? If not, why do you think your system for collecting your data worked? (4 points)

Changing your data collection scheme

If you found that entering your data from your spreadsheet and getting it into the right format to be used in R was challenging, that’s ok! This happens a lot with data collection. Feel free to change your data sheet so that you’re collecting data in a way that makes your life easier as you’re reading your data into R and using it.

Problem 4. Statistical critique (10 points)

Check the Google sheet and choose a paper to use for your critique based on An’s/Caitlin’s recommendations. Answer the following questions about the paper in 1-2 sentences each:

Why were you interested in this paper? (2 points)
What questions/hypotheses are the authors addressing? (2 points)
What statistical tests are the authors doing? Is there anything confusing about what they did? (2 points)
Find the figures and/or tables in the paper that is associated with the statistical tests from question (c). If you have a figure, what are the x- and y-axes, and what does the figure show (i.e. what is the main message of the figure)? If you have a table, what are the rows and columns, and what is it supposed to show? (2 points)
Take a screenshot of the figures and/or tables and insert the screenshot into your homework document. (2 points)

Checklist

Your homework should

Include your name, the title (“Homework 2”), and the date you turned in the assignment (3 points)
Include for Problem 1:
- responses for a-c
- full work (hand written or R code) for parts b-c
Include for Problem 2:
- written response for part a
- full work (R code), output, and written response for parts b-c
- full work (R code) and output for parts d-f
- full work (R code) and output for part g
- written response for parts h-i
Include for Problem 3:
- full work (R code) and output for part f
- written responses for parts g-h
Include for Problem 4:
- written responses for parts a-d
- figure for part e
be uploaded to Canvas as a single PDF (1 point)
be organized and readable (10 points)

84 points total

--- title: "Homework 2" editor: source freeze: auto published-title: "Due date" date: 2024-04-24 date-modified: last-modified --- [Due on Wednesday April 24 (Week 4) at 11:59 PM]{style="color: #79ACBD; font-size: 24px;"} Read the instructions carefully and double check that you have everything on the checklist. ## Part 1. Tasks You do not need to submit anything for these tasks, but are expected to have completed and know the material. ### Task 1. Beginning steps to configure Git/GitHub: Part 2 a. Configure git and store your personal access token. - [Instructions for MacOS](https://ucsb-meds.github.io/meds-install-mac.html#configure-git) - [Instructions for Windows](https://ucsb-meds.github.io/meds-install-windows.html#configure-git) :::{.callout-note} Note: make sure git is installed. You did this last week for one of your homework tasks, but if it wasn’t installed and you didn’t install it, do it now! ::: ### Task 2. Read Lowndes and Horst 2020, [“Tidy data for efficiency, reproducibility, and collaboration.”](https://www.openscapes.org/blog/2020/10/12/tidy-data/) ### Task 3. Read Horst and Lowndes 2022, [“GitHub for supporting, reusing, contributing, and failing safely.”](https://www.openscapes.org/blog/2022/05/27/github-illustrated-series/) ### Task 4. Install R packages #### `lterdatasampler` Read about the package [here](https://lter.github.io/lterdatasampler/index.html). ## Part 2. Problems :::{.callout-tip} **Remember to set up a directory specifically for this homework assignment!** ::: ### Problem 1. Raptor abundance between restoration plots (10 points) Managers at Coal Oil Point Reserve (COPR) are interested in the relationship between restored areas and raptor abundance: as different parts of the reserve are restored to native grassland, small mammal (i.e. mice, gopher, vole) abundances should increase, which should attract more raptors (i.e. hawks, falcons) to the reserve. You conduct weekly surveys for raptors at COPR from April to June in one of the restoration areas and collect the following data on the number of raptors you see for each survey: $$ 0, 2, 4, 6, 1, 2, 3, 5, 1, 0 $$ a. What kind of data did you collect, and why? Explain in 1-2 sentences. **(2 points)** b. What is a better description of the variability in raptor count, standard deviation or standard error? Explain why in 1-2 sentences and calculate the metric of your choice, showing your work. **(4 points)** c. What is a better description of the uncertainty in raptor count, standard deviation or standard error? Explain why in 1-2 sentences and calculate the metric of your choice, showing your work. **(4 points)** ### Problem 2. Sugar maple stem masses (38 points) Using the `hbr_maples` data set from `lterdatasampler`, answer the question: **Does mean sugar maple stem mass in 2003 differ between reference and calcium-treated watersheds?** :::{.callout-note title="Working with data in a package"} If you load in the package using `library(lterdatasampler)`, the `hbr_maples` data set will be ready for you to use, but you just won't see it in your environment. To have the data set pop up in your environment, use `data(hbr_maples)`. ::: For any calculations for which you need a confidence level, use 95% with the corresponding significance level. a. In one sentence each, write your null and alternative hypotheses to address this question. **(2 points)** b. Make a QQ plot. In one sentence, describe whether the variable is normally distributed or not. **(4 points)** c. Check variances. In one sentence, describe whether the groups have equal variances or not. **(4 points)** d. Calculate the critical value for your test. **(4 points)** e. Do a t-test. **(4 points)** :::{.callout-tip collapse=true} # t-test arguments Double check your arguments to make sure you’re running the right test. ::: f. Calculate an effect size and show the output. **(4 points)** :::{.callout-tip collapse=true} #### Calculating an effect size You can either calculate Cohen's d by using the formula presented in lecture: $$ Cohen's \; d = \frac{\bar{y_A} - \bar{y_B}}{\sqrt{\frac{(n_A - 1)\times s^2_A + (n_B - 1)\times s^2_B}{n_A + n_B + 2}}} $$ _or_ you could use a function in the package `effectsize` called `cohens_d()`. ::: g. Make a figure with mean and confidence intervals showing the raw data underneath the summary information. Make sure that your plot is [finalized](https://spring-2024.envs-193ds.com/resources/finalizing-plots). For full credit on your plot, adjust the `ggplot()`/`geom_()` defaults such that your plot has: i. Different colors for each watershed ii. Different shapes for each watershed iii. A different font than the default iv. No gridlines **(6 points)** h. Write a “methods” section. In one sentence each, describe: i. Why the test you chose may have been appropriate for testing your hypotheses in part a ii. How you evaluated normality and homogeneity of variance **(4 points)** i. Write a “results” section. Describe the results in full sentences in your own words, making sure to include the: i. Test you ran ii. Number of observations for each watershed iii. Significance level iv. Degrees of freedom v. Test statistic vi. p-value vii. Confidence interval in the difference in mean stem masses between watershed viii. Effect size **(6 points)** ### Problem 3. Personal data (12 points) By now, you have some observations on your data sheet for your personal data. Even though it's early on in your data collection, it's a good idea to practice good **data management**. For this problem, you'll enter your data, read it into R, and create a visualization. If you get stuck at any step, you'll know there's something you need to fix. a. Create a spreadsheet to enter your data. Two good options include Google Sheets or Microsoft Excel. b. Create the columns of your spreadsheet. :::{.callout-tip collapse=true} #### Spreadsheet columns If you organized your data sheet in "tidy format" (i.e. each row is an observation), then the columns of your spreadsheet would be exactly the same as the columns in your data sheet. ::: c. Enter your data. d. Save your spreadsheet as a .csv file in your homework repository. e. Read your data into R. f. Create a visualization to explore your data. Finalize the plot. **(4 points)** g. In 1-2 sentences, describe what the "main message" of your visualization is. **(4 points)** h. In 2-4 sentences, describe the process of getting your data from your spreadsheet into R. Did you encounter any challenges? If so, why do you think those challenges arose, and how did you fix them? If not, why do you think your system for collecting your data worked? **(4 points)** :::{.callout-tip collapse=true} #### Changing your data collection scheme If you found that entering your data from your spreadsheet and getting it into the right format to be used in R was challenging, that's ok! This happens a lot with data collection. Feel free to change your data sheet so that you're collecting data in a way that makes your life easier as you're reading your data into R and using it. ::: ### Problem 4. Statistical critique (10 points) Check the [Google sheet](https://docs.google.com/spreadsheets/d/1iGh39-AZ0RoEuY3VqjUQ_JORli-2wQ_O6EPRl20iy6k/edit?usp=sharing) and choose a paper to use for your critique based on An’s/Caitlin’s recommendations. Answer the following questions about the paper in 1-2 sentences each: a. Why were you interested in this paper? **(2 points)** b. What questions/hypotheses are the authors addressing? **(2 points)** c. What statistical tests are the authors doing? Is there anything confusing about what they did? **(2 points)** d. Find the figures and/or tables in the paper that is associated with the statistical tests from question (c). If you have a figure, what are the x- and y-axes, and what does the figure show (i.e. what is the main message of the figure)? If you have a table, what are the rows and columns, and what is it supposed to show? **(2 points)** e. Take a screenshot of the figures and/or tables and insert the screenshot into your homework document. **(2 points)** ## Checklist Your homework should - Include your name, the title (“Homework 2”), and the date you turned in the assignment **(3 points)** - Include for Problem 1: - responses for a-c - full work (hand written or R code) for parts b-c - Include for Problem 2: - written response for part a - full work (R code), output, and written response for parts b-c - full work (R code) and output for parts d-f - full work (R code) and output for part g - written response for parts h-i - Include for Problem 3: - full work (R code) and output for part f - written responses for parts g-h - Include for Problem 4: - written responses for parts a-d - figure for part e - be uploaded to Canvas as a single PDF **(1 point)** - be organized and readable **(10 points)** **84 points total**