2 Assignment 2: mtcars Wrangling and Feature Engineering

3 Overview

This assignment continues our introduction to the Tidyverse by walking through a structured data-cleaning and feature-engineering workflow using the built-in mtcars dataset.

You will complete a partially written (“skeletal”) script by filling in missing code so that each step runs successfully and produces the expected outputs.

4 Learning Objectives

By completing this assignment, you will be able to:

Use dplyr verbs (rename(), mutate(), group_by(), summarize(), arrange()) to clean and transform data
Engineer new features from existing numeric variables
Use case_when() to create interpretable categories
Convert numeric codes into readable factors
Produce grouped summaries and identify “top N” rows within groups
Create a pivot-style summary table using grouped aggregation

5 Textbook Connection

This assignment builds directly from Chapter 2: Introduction to tidyverse in Reproducible Research Using R.

Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.

6 Submission Instructions

Submit one .R script file.

Your script must:

Run from top to bottom without errors
Include clear section comments (e.g., # Step 1: Rename columns)
Contain your reflection responses as comments at the end of the file

7 Assignment Tasks

7.1 mtcars Data Wrangling and Feature Engineering

Goal: Your lab has test results for a fleet of cars (mtcars). Clean up the columns, create useful features, and answer questions your team cares about.

Complete the following steps using the Tidyverse:

Make columns human-readable
Rename and store in a new dataset called cars:
- mpg → miles_per_gallon
- hp → horsepower
- wt → weight_1000lb
Engineer a performance feature
Create hp_per_ton (where one ton ≈ 2,000 lb).
Create a power class
Using case_when(), create power_class:
- < 150 horsepower → "Low"
- 150–250 horsepower → "Medium"
- > 250 horsepower → "High"
Make cylinder groups readable
Convert cyl to a factor with labels:
- "4 cyl", "6 cyl", "8 cyl"
Mileage by cylinders
Group by cyl and compute for miles_per_gallon:
- n
- mean
- standard deviation
  Sort results from best to worst average mileage.
Top cars per cylinder
For each cylinder group, identify the top 3 most fuel-efficient cars (highest miles_per_gallon) and keep:
- model (from row names)
- cyl
- miles_per_gallon
- horsepower
Transmission mix and share
Recode am into trans:
- 0 → "Automatic"
- 1 → "Manual"
  Compute counts and percent share of transmission type within each cylinder group.
Pivot-style summary
Create a summary table for each cyl × gear combination that includes:
- minimum miles_per_gallon
- maximum miles_per_gallon
- average miles_per_gallon
- average hp_per_ton
  Save as pivot_cars.

7.2 Starter Code (Skeletal Script)

Copy this into your .R script and complete the missing pieces. (Wickham (2023))

library(tidyverse)

# 1. Make columns human-readable
cars <- mtcars %>%
  rownames_to_column("model") %>%
  rename(
    miles_per_gallon = mpg,
    horsepower       = hp,
    weight_1000lb    = wt
  )

# 2. Engineer a performance feature
cars <- cars %>%
  mutate(
    hp_per_ton = horsepower / (weight_1000lb * 0.5)
  )

# 3. Create a power class
cars <- cars %>%
  mutate(
    power_class = case_when(
      horsepower < 150 ~ "Low",
      horsepower >= 150 & horsepower <= 250 ~ "Medium",
      horsepower > 250 ~ "High"
    )
  )

# 4. Make cylinder groups readable
cars <- cars %>%
  mutate(
    cyl = factor(
      cyl,
      levels = c(4, 6, 8),
      labels = c("4 cyl", "6 cyl", "8 cyl")
    )
  )

# 5. Mileage by cylinders
cars %>%
  group_by(cyl) %>%
  summarize(
    n = n(),
    mean_mpg = mean(miles_per_gallon, na.rm = TRUE),
    sd_mpg   = sd(miles_per_gallon, na.rm = TRUE),
    .groups  = "drop"
  ) %>%
  arrange(desc(mean_mpg))

# 6. Top cars per cylinder
cars %>%
  group_by(cyl) %>%
  slice_max(miles_per_gallon, n = 3, with_ties = FALSE) %>%
  select(model, cyl, miles_per_gallon, horsepower)

# 7. Transmission mix & share
cars %>%
  mutate(trans = recode(am, `0` = "Automatic", `1` = "Manual")) %>%
  count(cyl, trans) %>%
  group_by(cyl) %>%
  mutate(pct = n / sum(n))

# 8. Pivot-style summary
pivot_cars <- cars %>%
  group_by(cyl, gear) %>%
  summarize(
    min_mpg    = min(miles_per_gallon, na.rm = TRUE),
    max_mpg    = max(miles_per_gallon, na.rm = TRUE),
    avg_mpg    = mean(miles_per_gallon, na.rm = TRUE),
    avg_hp_ton = mean(hp_per_ton, na.rm = TRUE),
    .groups    = "drop"
  )

7.3 Reflection

At the end of your script, answer the following as comments:

Which piece of code felt the hardest for you? Why?
Which piece of code felt the easiest for you? Why?

8 Reproducibility Practice

This assignment focuses on readability and “checkpoints” in a multi-step wrangling workflow.

In your script:

Use section headers that match the steps (1–8) so the workflow is easy to follow.
Do not overwrite objects unintentionally (you should end with objects namedcarsandpivot_cars).
After Step 4, include a quick “checkpoint” line such asglimpse(cars)orcount(cars, cyl)to verify the transformation worked.

The goal is that another reader can follow your transformation pipeline step-by-step and verify the results at key moments.