Spring 2024 - Final part 2: personal data and statistical critique

Due on Tuesday June 11 at 11:59 PM

Description

This is part 2 of your final. You must complete all components individually (i.e. not in a group).

If you worked in a group for part 1: create a new Quarto document titled lastname-firstname_final-part-02 in the same GitHub repo that you used for part 1. Do problems 5-7.

If you worked alone for part 1: continue in your Quarto document from part 1. Do problems 5-7.

This part of the final is open communication, open note, open internet, etc.

Problem 5. Affective and exploratory visualizations

Skills you will demonstrate

In this problem, you will demonstrate your ability to communicate about your visualization and give feedback to others. You will also demonstrate your ability to design and execute an appropriate statistical analysis for your data.

Problem

b. Comparing visualizations

In 8-12 sentences, compare and contrast your affective visualization from Homework 3/workshop, the exploratory visualization you made for Homework 2, and the communicative/exploratory visualization you made for the midterm. Some prompts:

How are the visualizations different from each other?
What similarities do you see between all your visualizations?
What patterns (e.g. differences in means/counts/proportions/medians, trends through time, relationships between variables) do you see in each visualization? Are these different between visualizations? If so, why? If not, why not?

Problem 6. Data analysis

Skills you will demonstrate

In this part of the final, you will analyze your data using the response and predictor variable(s) you have measured. You will demonstrate your ability to design and execute the appropriate analysis for your own data set.

Problem

a. Response and predictor variables

List your response and predictor variable(s) and describe what kind of data (for example: continuous, categorical, ordinal, binary, etc.) each variable is.

b. Articulate a question

Write your “research question”. For example, “what is the effect of ____ on ____?” or “what are the differences in ____ on ____?”

c. Describe what kind of test or model you would use

Describe why that test or model would be appropriate given your response and predictor variables.

Note

You have to use one of the tests/models we learned about in this class. Make sure your response variables are appropriate for those tests!

d. Follow-ups

Describe what kind of effect size, post-hoc, correlation coefficient, prediction calculation, or any other follow-up you would do and why.

e. Analyze your data

Do your analysis that you outlined in parts c and d. Include the code and output for all parts. Annotate your code.

f. Visualize your results

Create a visualization that reflects the analysis you ran. For example, if you ran a t-test, you should show means (and some metric of spread or uncertainty). If you ran a Kruskal-Wallis test, you should show medians (and some metric of spread or uncertainty). If you ran a linear model, you should show model predictions with 95% confidence intervals.

For your visualization, be sure to show the underlying data.

Finalize your figure, get rid of visual clutter, and use colors/shapes/other aesthetics as appropriate.

g. Write a caption for your visualization.

h. Write a methods section.

Describe
- what variables you collected
- if you assigned ratings or categories to variables (for example: productivity on a scale of 1-5, tiredness on a scale of 1-10), describe how you determined the rating or category
- how/when you collected your data

i. Write a results section.

Describe
- what you found (with the appropriate summary of the statistical tests and sample size)
- your interpretation of your results (with reference to your figure)

Problem 7. Statistical critique

Skills you will demonstrate

By now, you have a sense of what tests are appropriate given a response and predictor variable(s). In this problem, you will critique the choice of statistical analysis that other people (the authors of the paper) have made.

You will be evaluated on the correctness of your critique (i.e. )

a. Find the methods

Revisit the methods of the paper you chose. If there are multiple methods, choose one that is in the list of methods we cover in this class:

t-test
Analysis of variance (ANOVA)
Mann-Whitney U
Kruskal-Wallis
Wilcoxon rank sum
Linear model or linear regression
Spearman correlation
Pearson correlation
logistic regression
Generalized linear mixed effect model

b. Method description

What is the response variable and what is the predictor? If a mixed effect model, what are the random effects?

c. Assumption check

What assumptions have to be met for using this method? If a generalized linear model, what type of error distribution is appropriate for the response variable?

d. Critique

Was this an appropriate method for the authors to use? Why or why not? Justify your critique using the response and predictor variable(s) and the experimental design.
If not, what would be a more appropriate method to use and why? Justify your suggestion using the response and predictor variable and the experimental design.

e. Communication

How transparent were the authors about their statistical method? Did they report the components that you would expect for a thorough report (not just the p-value, but test statistic, sample size, degrees of freedom, etc.)? If not, what is missing?
What follow-up tests/statistics (for example, an effect size, correlation, post-hoc, etc.) did the authors calculate or communicate about (e.g. model predictions/coefficients)? Why were these tests/statistics appropriate for the main statistical analysis? If they did not do any follow up, what would you suggest given their original test?

--- title: "Final part 2: personal data and statistical critique" editor: source freeze: auto published-title: "Due date" date: 2024-06-11 date-modified: last-modified format: html: message: false warning: false toc: true toc-depth: 5 --- [Due on Tuesday June 11 at 11:59 PM]{style="color: #79ACBD; font-size: 24px;"} ## Description This is part 2 of your final. You must complete all components individually (i.e. not in a group). **If you worked in a group for part 1:** create a _new_ Quarto document titled `lastname-firstname_final-part-02` in the same GitHub repo that you used for part 1. Do problems 5-7. **If you worked alone for part 1:** continue in your Quarto document from part 1. Do problems 5-7. This part of the final is open communication, open note, open internet, etc. ## Problem 5. Affective and exploratory visualizations ### Skills you will demonstrate In this problem, you will demonstrate your ability to **communicate about your visualization and give feedback to others.** You will also demonstrate your ability to **design and execute an appropriate statistical analysis for your data**. ### Problem #### a. Sharing your affective visualization This is a component you will complete in workshop on Thursday 6 June. **We will be taking attendance that day. If you attended class, you will receive full credit for this section.** You will present your affective visualization and give feedback to 2-3 other classmates in class via the Google form linked on Canvas. #### b. Comparing visualizations In 8-12 sentences, compare and contrast your affective visualization from Homework 3/workshop, the exploratory visualization you made for Homework 2, and the communicative/exploratory visualization you made for the midterm. Some prompts: - How are the visualizations different from each other? - What similarities do you see between all your visualizations? - What patterns (e.g. differences in means/counts/proportions/medians, trends through time, relationships between variables) do you see in each visualization? Are these different between visualizations? If so, why? If not, why not? ## Problem 6. Data analysis ### Skills you will demonstrate In this part of the final, you will analyze your data using the response and predictor variable(s) you have measured. You will demonstrate your ability to **design and execute the appropriate analysis for your own data set**. ### Problem #### a. Response and predictor variables List your response and predictor variable(s) and describe what kind of data (for example: continuous, categorical, ordinal, binary, etc.) each variable is. ### b. Articulate a question Write your "research question". For example, "what is the effect of ____ on ____?" or "what are the differences in ____ on ____?" ### c. Describe what kind of test or model you would use Describe why that test or model would be appropriate given your response and predictor variables. :::{.callout-note} You have to use one of the tests/models we learned about in this class. Make sure your response variables are appropriate for those tests! ::: ### d. Follow-ups Describe what kind of effect size, post-hoc, correlation coefficient, prediction calculation, or any other follow-up you would do and why. ### e. Analyze your data Do your analysis that you outlined in parts c and d. Include the code and output for all parts. Annotate your code. ### f. Visualize your results Create a visualization that _reflects the analysis you ran_. For example, if you ran a t-test, you should show means (and some metric of spread or uncertainty). If you ran a Kruskal-Wallis test, you should show medians (and some metric of spread or uncertainty). If you ran a linear model, you should show model predictions with 95% confidence intervals. For your visualization, _be sure to show the underlying data._ Finalize your figure, get rid of visual clutter, and use colors/shapes/other aesthetics as appropriate. ### g. Write a caption for your visualization. ### h. Write a methods section. Describe - what variables you collected - if you assigned ratings or categories to variables (for example: productivity on a scale of 1-5, tiredness on a scale of 1-10), describe how you determined the rating or category - how/when you collected your data ### i. Write a results section. Describe - what you found (with the appropriate summary of the statistical tests and sample size) - your interpretation of your results (with reference to your figure) ## Problem 7. Statistical critique ### Skills you will demonstrate By now, you have a sense of what tests are appropriate given a response and predictor variable(s). In this problem, you will **critique the choice of statistical analysis** that other people (the authors of the paper) have made. You will be evaluated on the correctness of your critique (i.e. ) ### a. Find the methods Revisit the methods of the paper you chose. If there are multiple methods, _choose one_ that is in the list of methods we cover in this class: - t-test - Analysis of variance (ANOVA) - Mann-Whitney U - Kruskal-Wallis - Wilcoxon rank sum - Linear model or linear regression - Spearman correlation - Pearson correlation - logistic regression - Generalized linear mixed effect model ### b. Method description What is the response variable and what is the predictor? If a mixed effect model, what are the random effects? ### c. Assumption check What assumptions have to be met for using this method? If a generalized linear model, what type of error distribution is appropriate for the response variable? ### d. Critique - Was this an appropriate method for the authors to use? Why or why not? Justify your critique using the response and predictor variable(s) and the experimental design. - If not, what would be a more appropriate method to use and why? Justify your suggestion using the response and predictor variable and the experimental design. ### e. Communication - How transparent were the authors about their statistical method? Did they report the components that you would expect for a _thorough_ report (not just the p-value, but test statistic, sample size, degrees of freedom, etc.)? If not, what is missing? - What follow-up tests/statistics (for example, an effect size, correlation, post-hoc, etc.) did the authors calculate or communicate about (e.g. model predictions/coefficients)? Why were these tests/statistics appropriate for the main statistical analysis? If they did not do any follow up, what would you suggest given their original test?