9 Assignment 8: Florida Crime Analytics — Uncovering the Root of Florida’s Crime Surge
Correlation • Multiple Regression • Model Comparison • Executive Communication
10 Overview
The Florida Police Department (FPD) has hired you as their new Data Analyst. Your mission is to uncover which socioeconomic factors are most strongly associated with rising crime rates across Florida counties.
Using county-level data, you will conduct a fully reproducible Quarto analysis to determine whether income, education, or urbanization best explain differences in crime rates.
Your findings will help inform statewide prevention strategies, resource allocation, and community outreach efforts.
This assignment assesses your ability to:
- Clean and prepare structured data
- Compute and interpret correlations
- Build and compare regression models
- Evaluate model fit using R² and AIC
- Communicate statistical findings to a non-technical executive audience
11 Learning Objectives
By completing this assignment, you will be able to:
- Clean and standardize real-world datasets
- Perform exploratory data analysis (EDA)
- Interpret correlation matrices
- Build simple and multiple regression models
- Compare models using R², Adjusted R², and AIC
- Explain regression results in plain language
- Produce a reproducible Quarto report suitable for executive review
12 Textbook Connection
This assignment builds directly from Chapter 8: Linear Regression in Reproducible Research Using R.
Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.
13 Textbook Connection
This assignment builds directly from Chapter 8: Linear Regression in Reproducible Research Using R.
Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.
14 Submission Instructions
Submit two files:
- Your
.qmdfile
- Your knitted and published
.htmlfile
Your document must:
- Knit without errors
- Suppress warnings and messages in the final HTML
- Include clearly labeled sections
- Include all required visualizations
- Include regression tables and model comparisons
- Include a professional memo to the Chief of the FPD
- Use at least one inline R expression
15 Assignment Tasks
15.1 1. Data Loading and Preparation
You have been provided with an Excel file containing the following columns:
| Column | Description |
|---|---|
| County | County name |
| C | Crime rate (per 1,000 residents) |
| I | Median income (in thousands) |
| HS | High-school graduation rate (%) |
| U | Urban population (%) |
15.1.1 Required Tasks
You must:
Load the dataset reproducibly
Rename the columns to:
CrimeIncomeHighSchoolGradUrbanPop
Format all county names so that only the first letter is capitalized
Inspect the dataset structure
Provide summary statistics
Include a brief explanation of why consistent naming and formatting are essential for reproducible modeling.
15.2 2. Exploratory Data Analysis
Conduct exploratory analysis to understand the structure and distribution of the data.
15.2.1 Required Components
- Compute descriptive statistics (mean, median, range) for all numeric variables
- Create at least two visualizations (e.g., histograms, boxplots, scatterplots)
- Provide written interpretation of what you observe
Your interpretation should address:
- Distribution shapes
- Potential outliers
- Initial patterns between predictors and crime
15.2.2 Bonus (Optional)
Using ggplot2 and mapping tools, create a heatmap of Florida counties based on crime rates.
15.3 3. Correlation Analysis
Investigate which factors are most strongly associated with crime.
15.3.1 Required Tasks
Compute a correlation matrix including:
- Crime
- Income
- HighSchoolGrad
- UrbanPop
Identify which predictor shows the strongest relationship with Crime
Interpret the direction (positive or negative)
Interpret the strength (weak, moderate, strong)
(Optional) Visualize the correlation matrix using a correlation plotting tool.
In your written interpretation, explain:
- Which factor appears most important
- Whether any predictors appear highly correlated with one another
- Why correlation does not imply causation
15.4 4. Building Regression Models
You will now build models to predict county-level crime rates.
15.4.1 Required Models
A simple regression model using the predictor you believe fits best
At least two multiple regression models, such as:
- Income + HighSchoolGrad
- Income + UrbanPop
- A full model with all predictors
- Income + HighSchoolGrad
(Optional) Include an interaction term if theoretically justified
15.4.2 Model Comparison
Compare models using:
- R²
- Adjusted R²
- AIC
Interpret:
- Which predictors are statistically significant
- Which model explains the most variance
- Which model best balances simplicity and explanatory power
Explain why Adjusted R² and AIC are useful when comparing multiple models.
15.5 5. Executive Memo to the Chief of the Florida Police Department
Write a short professional memo addressed to:
Chief
Florida Police Department
Your memo must include:
- The best-performing model
- The most influential predictor(s)
- The proportion of variance explained (R²)
- Clear, plain-language interpretation
- One limitation of your analysis
- One actionable recommendation for resource allocation or prevention strategy
Your tone should be executive-level and professional.
16 Guidelines for Interpretation
When discussing results, always include:
- Direction — Positive or negative relationship
- Strength — Weak, moderate, or strong
- Significance — Statistically significant or not
- Plain language meaning
Example:
Counties with lower median income tend to have higher crime rates. For every $1,000 increase in median income, predicted crime decreases while holding other factors constant.
Avoid technical jargon without explanation.
17 Final Question
At the end of your report, include a short written response to:
Based on your analysis, which model best predicts Florida’s county-level crime rates, and why?
Be specific and reference your model comparison metrics.
18 Reflection
In 4–6 sentences, answer:
- Why is reproducibility especially important when working with public policy data?
- What did you learn about the difference between correlation and regression?
- How does model comparison improve decision-making?
19 Reproducibility Practice
This assignment emphasizes reproducible modeling and executive communication.
Your document must:
- Load data programmatically
- Avoid manual data manipulation outside the script
- Use clearly labeled code chunks
- Avoid hard-coding results in narrative
- Use inline R expressions where appropriate
- Suppress warnings/messages in the final published HTML
The goal is that another analyst could rerun your analysis and verify every claim.
20 Publishing Instructions
- Render your
.qmdfile to HTML. - Confirm that warnings and messages do not appear in the final document.
- Submit both your
.qmdfile and your published.htmlreport.