8 Assignment 7: NBA Analytics — Exploring Team Performance Through Reproducible Analysis
Functions • Correlation • Partial Correlation • Executive Communication
9 Overview
The NBA season is about to begin, and you have just been hired as a Data Analyst for the National Basketball Association. Commissioner Adam Silver wants insights into last season’s team performances — specifically:
- How offensive and defensive metrics relate to one another
- Whether teams from the Eastern and Western Conferences differ
- Whether team-level characteristics help explain performance patterns
You have been given an Excel workbook containing all 30 NBA teams, with each team stored on a separate sheet. Your job is to build a fully reproducible Quarto analysis that loads, cleans, visualizes, and analyzes the data.
Your final product should read like a professional analytics report prepared for an executive audience.
10 Learning Objectives
By completing this assignment, you will be able to:
- Write reusable functions in R for structured data ingestion
- Load multi-sheet Excel workbooks programmatically
- Engineer new performance metrics
- Merge datasets using lookup tables
- Conduct correlation and partial correlation analyses
- Interpret statistical relationships in plain language
- Communicate findings clearly to a non-technical executive audience
- Produce a fully reproducible Quarto document
11 Textbook Connection
This assignment builds directly from Chapter 7: Correlation in Reproducible Research Using R.
Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.
12 Submission Instructions
Submit two files:
- Your
.qmdfile
- Your knitted
.htmlfile
Your document must:
- Knit without errors
- Contain clearly labeled sections
- Include all required visualizations
- Include all correlation analyses
- Include a professional memo to Adam Silver
- Use at least one inline R expression
13 Assignment Tasks
13.1 1. Data Ingest and Preparation
You must programmatically load all 30 teams from the Excel workbook.
13.1.1 Required Tasks
Create a function that:
Loads a single sheet from the workbook
Adds a new column identifying the
Team(using the sheet name)
Creates a binary column called
Won_award(0 = No, 1 = Yes)
Calculates the following metrics:
PRA = Points + Rebounds + Assists(offensive metric)
STOCKS = Steals + Blocks(defensive metric)
Then:
- Use
excel_sheets()to list all sheet names
- Apply your function to every sheet using
lapply()
- Combine all teams into one complete dataset using a row-binding method
13.1.2 Requirements
- Your code must not manually load sheets one-by-one
- The final dataset must include all 30 teams
- Include a checkpoint confirming the total number of rows
- Report the total number of observations using an inline R expression
Briefly explain why writing a function improves reproducibility.
13.2 2. Adding Conference Information
You are provided with a Conference lookup sheet.
You must:
Merge Conference information into your combined dataset
Recode Conference as a binary variable for analysis:
- East = 1
- West = 0
- East = 1
Include a small checkpoint confirming that both conferences are represented correctly.
Briefly explain why binary coding is useful for correlation analysis.
13.3 3. Visual Exploration
Create at least two visualizations using ggplot2.
13.3.1 Required Plot 1
- A scatterplot showing the relationship between
PRAandSTOCKS - Points must be colored by Conference
13.3.2 Required Plot 2
- A second visualization of your choice
(Examples: distribution of PRA by conference, age vs PRA, STOCKS vs Age, etc.)
For each plot, write 3–4 sentences describing:
- What patterns you observe
- Whether offensive and defensive metrics appear related
- Whether conferences appear visually different
All plots must include:
- Clear titles
- Axis labels
- A caption
13.4 4. Correlation Analysis
13.4.1 A. Point-Biserial Correlation
Test whether Conference (East = 1, West = 0) is related to:
- PRA
- STOCKS
Report:
- Correlation coefficient (r)
- Direction (positive or negative)
- Strength (weak, moderate, strong)
- Statistical significance
- Plain-language interpretation
13.4.2 B. Correlation Matrix
Create a small correlation matrix including at least:
- Age
- PRA
- STOCKS
Visualize the matrix using ggcorrplot.
Then answer:
- Which relationships are strongest?
- Are offensive and defensive metrics strongly related?
- Does Age appear meaningfully associated with performance?
13.4.3 C. Partial Correlation
Run a partial correlation examining the relationship between:
- PRA and STOCKS
- While controlling for either Age or Minutes Played
Interpret:
- How the relationship changes when controlling for the third variable
- Whether the association appears independent of that control variable
Explain in plain language what “controlling for” means.
13.5 5. Executive Memo to Adam Silver
Write a short professional memo addressed to:
Adam Silver
Commissioner, National Basketball Association
Your memo must include:
- A summary of your key findings
- Whether Eastern and Western Conference teams appear statistically different
- Whether offensive and defensive performances tend to move together
- One limitation of your analysis
- One recommended next step
Your tone should be professional and executive-level, not academic.
14 Guidelines for Interpretation
When interpreting correlations, always include:
- The direction (positive or negative)
- The strength (weak, moderate, strong)
- Whether it is statistically significant
- A plain-language explanation
Example of plain-language style:
As PRA increases, STOCKS also tends to increase, suggesting that teams with stronger offensive production may also contribute more defensive playmaking.
15 Reflection
In 4–6 sentences, answer:
- Why is reproducible workflow especially important in sports analytics?
- How does programmatically loading data (rather than manual steps) improve reliability?
- What did you learn about interpreting correlations versus visual patterns?
16 Reproducibility Practice
This assignment emphasizes reproducible data pipelines.
Your document must:
- Load all sheets programmatically
- Avoid manual data entry
- Use clearly labeled code chunks
- Avoid hard-coding values in written interpretation
- Use inline R expressions where appropriate
- Render cleanly to HTML
The goal is that another analyst could rerun your entire workflow and verify your findings without guessing what steps were taken.
17 Publishing Instructions
- Render your
.qmdfile to HTML. - Confirm that warnings and messages do not appear in the final document.
- Submit both your
.qmdfile and your published.htmlreport.