#############################################
# Assignment 3: NYPD Shootings
# Cleaning, Insights, and Visualization
#############################################
# ---- Setup ----
library(tidyverse)
library(lubridate)
library(nycOpenData)
# ---- Data Ingest ----
shooting_data <- nyc_shooting_incidents()
# ==============================
# 1. CLEANING & FEATURE ENGINEERING
# ==============================
# 1A. Missingness (choose one column to drop NAs)
# missing_n <- sum(is.na(shooting_data$YOUR_COLUMN))
# Write a comment: how many were missing + why this column is safe
# cleaned <- shooting_data %>% drop_na(YOUR_COLUMN)
# 1B. Lowercase one categorical column
# cleaned <- cleaned %>% mutate(YOUR_COLUMN = str_to_lower(YOUR_COLUMN))
# 1C. Create time_of_day (Morning / Afternoon / Night)
# cleaned <- cleaned %>%
# mutate(
# occur_date = as.Date(occur_date),
# occur_time = hm(occur_time),
# hour = hour(occur_time),
# time_of_day = case_when(
# hour >= ___ & hour < ___ ~ "Morning",
# hour >= ___ & hour < ___ ~ "Afternoon",
# TRUE ~ "Night"
# )
# )
# 1D. Create days_since
# cleaned <- cleaned %>%
# mutate(days_since = as.integer(Sys.Date() - as.Date(occur_date)))
# ==============================
# 2. INSIGHTS (write as comments)
# ==============================
# ==============================
# 3. VISUALIZATIONS
# ==============================
# Plot A: time_of_day plot (include facet_wrap on at least one plot)
# Plot B: personal insight plot
# ==============================
# 4. WRITTEN SUMMARY (comments)
# ==============================3 Assignment 3: NYPD Shooting Incidents — Cleaning, Insights, and Visualization
4 Overview
In this assignment, you will work with NYPD Shooting Incident Data using the nycOpenData package to practice data cleaning, feature engineering, exploratory insight generation, and data visualization with ggplot2.
Compared to previous assignments, this one is more open-ended: you will make a few analytic decisions (e.g., how to define time-of-day categories and which variables to explore for your second insight). This marks your transition from structured exercises to defensible analytic decision-making.
5 Learning Objectives
By completing this assignment, you will be able to:
- Retrieve real-world NYC civic data using
nycOpenData - Identify and handle missing data in a defensible way
- Engineer new features from dates/times and categorical columns
- Generate exploratory insights using grouping and summaries
- Communicate findings using well-designed
ggplot2visualizations - Write a short interpretation that connects summaries and plots to your insights
6 Textbook Connection
This assignment builds directly from Chapter 3: Visualizations in Reproducible Research Using R.
Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.
7 Submission Instructions
Submit one .R script file.
Your script must:
- Run from top to bottom without errors
- Include your insight statements and written summary as comments where requested
- Contain two
ggplot2plots with clear titles and labeled axes - Include at least one faceted plot using
facet_wrap()
8 Assignment Tasks
8.1 0. Data Source
You must use the nycOpenData package and the function:
nyc_shooting_incidents()
8.2 1. Cleaning and Feature Engineering
Complete all of the following in your script:
8.2.1 1A. Missing data removal (defensible choice)
- Choose one column where it is reasonable to remove missing values.
- Compute and report (in a comment) how many values are missing in that column before removal.
- Remove rows with missing values in that column.
- In a comment, explain why that column was “safe” to use.
8.2.2 1B. Lowercase a column
- Choose one column (e.g., borough, perp_race, vic_race) and convert values to lowercase.
8.2.3 1C. Create time_of_day
- Create a new column called
time_of_daywith three categories:- Morning
- Afternoon
- Night
- You define the time boundaries.
- Briefly note your boundaries in a comment.
8.2.4 1D. Create days_since
- Create a column called
days_sinceshowing the number of days between today’s date and the date of the shooting. - Use
as.Date()and/orlubridateas needed.
8.3 2. Insights
Provide the following insights as comments in your script:
One insight about
time_of_day
Example format: “Shootings are more frequent during ___ than ___ suggestion…”One additional insight of your choice
Explore anything you find interesting in the dataset (be creative).
8.4 3. Visualizations (ggplot2)
Create two plots using ggplot2:
8.4.1 Plot 3A. time_of_day plot
- Must directly relate to your
time_of_dayvariable
8.4.2 Plot 3B. personal insight plot
- Must support your second insight
Both plots must include:
- Color using an aesthetic mapping (
aes(color = ...)oraes(fill = ...)) - Informative axis labels
- A clear title
- Custom font/size settings via
theme()
Additional requirement:
- At least one plot must use
facet_wrap()
8.5 4. Written Summary
At the end of your script, write a short paragraph (as comments) describing:
- What you discovered from your insights
- What the graphs revealed that was not obvious from the raw data
8.6 Starter Code Template
Copy this template into your script and complete the tasks above. (Wickham (2023)) (Spinu, Grolemund, and Wickham (2024)) (Martinez (2026))
9 Reproducibility Practice
This assignment focuses on reproducibility when working with real civic data and date/time feature engineering.
In your script:
Clearly document your analytic choices (especially your
time_of_dayboundaries) in comments.Keep your pipeline modular by creating a cleaned object (e.g.,
cleaned), and build from it.After creating your new variables, include a quick checkpoint such as:
count(cleaned, time_of_day)orsummary(cleaned$days_since)to verify your features were created correctly.
The goal is that another reader can see exactly what choices you made, reproduce your engineered variables, and understand how you arrived at your two insights.