7 Midterm: Exercise & Sleep
Data import • Cleaning • Merging • Visualization • t-tests • ANOVA • Post-hoc tests • Recommendation
8 Overview
You are a researcher in sleep expert Matthew Walker’s lab. You have just run an experiment measuring the effect of different exercise methods on sleep habits.
You measured:
- Average hours of sleep before the experiment
- Average hours of sleep after the experiment
- Sleep efficiency at the end of the experiment
Dr. Walker needs you to identify which (if any) exercise method best improves sleep, and to make a clear, evidence-based recommendation.
This midterm assesses your ability to build a fully reproducible analysis from raw data through statistical inference and final reporting.
9 Learning Objectives
By completing this midterm, you will be able to:
- Import data from a multi-sheet Excel workbook
- Clean messy labels and standardize categorical variables
- Merge datasets into a single analysis-ready table
- Engineer derived variables and handle missing data responsibly
- Produce descriptive statistics and professional tables
- Create clear, interpretable visualizations
- Conduct and interpret t-tests and ANOVAs
- Run and interpret Tukey post-hoc comparisons
- Synthesize results into an actionable recommendation
10 Textbook Connection
This assignment builds directly from Chapters 1-5 (and 10) in Reproducible Research Using R.
Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.
11 Submission Instructions
Submit two files:
- Your
.qmdfile
- Your knitted and published
.htmlfile
Your report must:
- Knit/render without errors
- Include short narrative text between code chunks
- Use clear section headers matching the numbered tasks below
- Suppress unnecessary warnings/messages in the final HTML
- Include readable plots and (recommended) formatted tables (e.g.,
kable())
12 Midterm Tasks
12.1 1. Setup & Data Import
12.1.1 Tasks
Load all packages you use
Import both sheets from
midterm_sleep_exercise.xlsxusingreadxl:participant_info_midterm
sleep_data_midterm
Preview both datasets using appropriate functions (e.g.,
glimpse()orhead())
12.2 2. Merge & Base Cleaning
12.2.1 Tasks
Clean column names so they are consistent and easy to work with
Standardize labels for:
Exercise_Group(e.g., “cardio”, “CARDIO” → “Cardio”)
Sex(e.g., “m”, “MALE” → “Male”)
Merge the two datasets into a single dataset
12.2.2 Requirements
- Final dataset should be one row per participant
- Dataset should include demographics + sleep variables
- Include a checkpoint verifying row count and that the merge behaved as expected
12.3 3. Create Derived Variables
12.3.1 Tasks
Create a numeric sleep-change variable:
Sleep_Difference = post_sleep - pre_sleep_num
Create
AgeGroup2usingcase_when()with exactly two groups
(example:< 40vs>= 40)Identify how many rows are missing
Sleep_DifferenceDrop rows with missing
Sleep_Difference
12.3.2 Requirements
- Include a checkpoint showing how many rows were removed and why
- Your narrative should briefly justify your missing-data decision
12.4 4. Descriptive Statistics
12.4.1 Tasks
Report descriptive statistics for:
Sleep_Difference(overall)Sleep_Efficiency(overall)
For each variable, include:
- Mean
- SD
- Min
- Max
Then report group-wise summaries by Exercise_Group for:
Sleep_DifferenceSleep_Efficiency
12.4.2 Requirements
- Group summaries must be clearly readable
- Tables should be formatted professionally using
kable()and appropriate captions.
12.5 5. Visualizations (3 plots)
Create three plots:
- Boxplot:
Sleep_DifferencebyExercise_Group
- Boxplot:
Sleep_EfficiencybyExercise_Group
- Scatterplot:
Sleep_DifferencevsSleep_Efficiency, with a trend line
12.5.1 Requirements
Each plot must include:
- Descriptive title
- Clear x- and y-axis labels
- A clean theme (e.g.,
theme_minimal()) - Appropriate captions
After each plot, write 1–3 sentences describing what the plot suggests.
12.6 6. t-tests (Two)
Run two t-tests:
Sleep_Difference ~ Sex
Sleep_Difference ~ AgeGroup2
12.6.1 Requirements
For each t-test, report:
- Group means
- t statistic
- df
- p-value
Then interpret in plain language:
- Is the difference statistically significant?
- Is it practically meaningful?
12.7 7. ANOVAs (Two) + Post-hoc Tests
Run two ANOVAs:
- ANOVA A:
Sleep_Difference ~ Exercise_Group - ANOVA B:
Sleep_Efficiency ~ Exercise_Group
12.7.1 Requirements
For each ANOVA, include:
- ANOVA table output
- F statistic
- df
- p-value
- A brief effect-size comment
- PRE from
supernova()(required)
Then run Tukey post-hoc tests:
TukeyHSD()for each ANOVA
Interpret results:
- Which groups differ significantly?
- Which exercise group appears to “win” for each outcome?
12.8 8. Synthesis & Recommendation
Write 4–6 sentences answering:
- Considering both outcomes (
Sleep_DifferenceandSleep_Efficiency) and your post-hoc results, which single exercise regimen would you recommend to improve overall sleep?
12.8.1 Requirements
You may only pick one regimen
Support your recommendation with specific statistical evidence:
- F values and p-values
- Key Tukey contrasts
- Any meaningful patterns from your plots/tables
Your recommendation must be actionable and executive-friendly.
12.9 9. Reflection
Write 3–5 sentences addressing:
- What was most challenging?
- What did you feel confident about?
- What would you do differently next time to improve your analysis or report?
13 Reproducibility Practice
This midterm evaluates not only statistical accuracy, but workflow transparency. Your report must:
- Load all packages explicitly
- Import both sheets programmatically
- Confirm merge success with row counts
- Explicitly handle missing values (no silent removal)
- Use clearly labeled code chunks
- Avoid hard-coding statistical values in written interpretation
- Suppress unnecessary warnings/messages
- Render cleanly to HTML
14 Publishing Instructions
- Render your
.qmdfile to HTML. - Confirm warnings and messages do not appear in the final document.
- Submit both your
.qmdfile and your published.htmlreport.