7  Midterm: Exercise & Sleep

Data import • Cleaning • Merging • Visualization • t-tests • ANOVA • Post-hoc tests • Recommendation

8 Overview

You are a researcher in sleep expert Matthew Walker’s lab. You have just run an experiment measuring the effect of different exercise methods on sleep habits.

You measured:

  • Average hours of sleep before the experiment
  • Average hours of sleep after the experiment
  • Sleep efficiency at the end of the experiment

Dr. Walker needs you to identify which (if any) exercise method best improves sleep, and to make a clear, evidence-based recommendation.

This midterm assesses your ability to build a fully reproducible analysis from raw data through statistical inference and final reporting.


9 Learning Objectives

By completing this midterm, you will be able to:

  • Import data from a multi-sheet Excel workbook
  • Clean messy labels and standardize categorical variables
  • Merge datasets into a single analysis-ready table
  • Engineer derived variables and handle missing data responsibly
  • Produce descriptive statistics and professional tables
  • Create clear, interpretable visualizations
  • Conduct and interpret t-tests and ANOVAs
  • Run and interpret Tukey post-hoc comparisons
  • Synthesize results into an actionable recommendation

10 Textbook Connection

This assignment builds directly from Chapters 1-5 (and 10) in Reproducible Research Using R.

Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.


11 Submission Instructions

Submit two files:

  1. Your .qmd file
  2. Your knitted and published .html file

Your report must:

  • Knit/render without errors
  • Include short narrative text between code chunks
  • Use clear section headers matching the numbered tasks below
  • Suppress unnecessary warnings/messages in the final HTML
  • Include readable plots and (recommended) formatted tables (e.g., kable())

12 Midterm Tasks


12.1 1. Setup & Data Import

12.1.1 Tasks

  • Load all packages you use

  • Import both sheets from midterm_sleep_exercise.xlsx using readxl:

    • participant_info_midterm
    • sleep_data_midterm
  • Preview both datasets using appropriate functions (e.g., glimpse() or head())


12.2 2. Merge & Base Cleaning

12.2.1 Tasks

  • Clean column names so they are consistent and easy to work with

  • Standardize labels for:

    • Exercise_Group (e.g., “cardio”, “CARDIO” → “Cardio”)
    • Sex (e.g., “m”, “MALE” → “Male”)
  • Merge the two datasets into a single dataset

12.2.2 Requirements

  • Final dataset should be one row per participant
  • Dataset should include demographics + sleep variables
  • Include a checkpoint verifying row count and that the merge behaved as expected

12.3 3. Create Derived Variables

12.3.1 Tasks

  • Create a numeric sleep-change variable:

    • Sleep_Difference = post_sleep - pre_sleep_num
  • Create AgeGroup2 using case_when() with exactly two groups
    (example: < 40 vs >= 40)

  • Identify how many rows are missing Sleep_Difference

  • Drop rows with missing Sleep_Difference

12.3.2 Requirements

  • Include a checkpoint showing how many rows were removed and why
  • Your narrative should briefly justify your missing-data decision

12.4 4. Descriptive Statistics

12.4.1 Tasks

Report descriptive statistics for:

  • Sleep_Difference (overall)
  • Sleep_Efficiency (overall)

For each variable, include:

  • Mean
  • SD
  • Min
  • Max

Then report group-wise summaries by Exercise_Group for:

  • Sleep_Difference
  • Sleep_Efficiency

12.4.2 Requirements

  • Group summaries must be clearly readable
  • Tables should be formatted professionally using kable() and appropriate captions.

12.5 5. Visualizations (3 plots)

Create three plots:

  1. Boxplot: Sleep_Difference by Exercise_Group
  2. Boxplot: Sleep_Efficiency by Exercise_Group
  3. Scatterplot: Sleep_Difference vs Sleep_Efficiency, with a trend line

12.5.1 Requirements

Each plot must include:

  • Descriptive title
  • Clear x- and y-axis labels
  • A clean theme (e.g., theme_minimal())
  • Appropriate captions

After each plot, write 1–3 sentences describing what the plot suggests.


12.6 6. t-tests (Two)

Run two t-tests:

  1. Sleep_Difference ~ Sex
  2. Sleep_Difference ~ AgeGroup2

12.6.1 Requirements

For each t-test, report:

  • Group means
  • t statistic
  • df
  • p-value

Then interpret in plain language:

  • Is the difference statistically significant?
  • Is it practically meaningful?

12.7 7. ANOVAs (Two) + Post-hoc Tests

Run two ANOVAs:

  • ANOVA A: Sleep_Difference ~ Exercise_Group
  • ANOVA B: Sleep_Efficiency ~ Exercise_Group

12.7.1 Requirements

For each ANOVA, include:

  • ANOVA table output
  • F statistic
  • df
  • p-value
  • A brief effect-size comment
  • PRE from supernova() (required)

Then run Tukey post-hoc tests:

  • TukeyHSD() for each ANOVA

Interpret results:

  • Which groups differ significantly?
  • Which exercise group appears to “win” for each outcome?

12.8 8. Synthesis & Recommendation

Write 4–6 sentences answering:

  • Considering both outcomes (Sleep_Difference and Sleep_Efficiency) and your post-hoc results, which single exercise regimen would you recommend to improve overall sleep?

12.8.1 Requirements

  • You may only pick one regimen

  • Support your recommendation with specific statistical evidence:

    • F values and p-values
    • Key Tukey contrasts
    • Any meaningful patterns from your plots/tables

Your recommendation must be actionable and executive-friendly.


12.9 9. Reflection

Write 3–5 sentences addressing:

  • What was most challenging?
  • What did you feel confident about?
  • What would you do differently next time to improve your analysis or report?

13 Reproducibility Practice

This midterm evaluates not only statistical accuracy, but workflow transparency. Your report must:

  • Load all packages explicitly
  • Import both sheets programmatically
  • Confirm merge success with row counts
  • Explicitly handle missing values (no silent removal)
  • Use clearly labeled code chunks
  • Avoid hard-coding statistical values in written interpretation
  • Suppress unnecessary warnings/messages
  • Render cleanly to HTML

14 Publishing Instructions

  1. Render your .qmd file to HTML.
  2. Confirm warnings and messages do not appear in the final document.
  3. Submit both your .qmd file and your published .html report.