8  Assignment 7: NBA Analytics — Exploring Team Performance Through Reproducible Analysis

Functions • Correlation • Partial Correlation • Executive Communication

9 Overview

The NBA season is about to begin, and you have just been hired as a Data Analyst for the National Basketball Association. Commissioner Adam Silver wants insights into last season’s team performances — specifically:

  • How offensive and defensive metrics relate to one another
  • Whether teams from the Eastern and Western Conferences differ
  • Whether team-level characteristics help explain performance patterns

You have been given an Excel workbook containing all 30 NBA teams, with each team stored on a separate sheet. Your job is to build a fully reproducible Quarto analysis that loads, cleans, visualizes, and analyzes the data.

Your final product should read like a professional analytics report prepared for an executive audience.


10 Learning Objectives

By completing this assignment, you will be able to:

  • Write reusable functions in R for structured data ingestion
  • Load multi-sheet Excel workbooks programmatically
  • Engineer new performance metrics
  • Merge datasets using lookup tables
  • Conduct correlation and partial correlation analyses
  • Interpret statistical relationships in plain language
  • Communicate findings clearly to a non-technical executive audience
  • Produce a fully reproducible Quarto document

11 Textbook Connection

This assignment builds directly from Chapter 7: Correlation in Reproducible Research Using R.

Students are encouraged to review the chapter before beginning this assignment, as it provides the conceptual foundation and reproducible workflow demonstrated here.


12 Submission Instructions

Submit two files:

  1. Your .qmd file
  2. Your knitted .html file

Your document must:

  • Knit without errors
  • Contain clearly labeled sections
  • Include all required visualizations
  • Include all correlation analyses
  • Include a professional memo to Adam Silver
  • Use at least one inline R expression

13 Assignment Tasks


13.1 1. Data Ingest and Preparation

You must programmatically load all 30 teams from the Excel workbook.

13.1.1 Required Tasks

Create a function that:

  • Loads a single sheet from the workbook

  • Adds a new column identifying the Team (using the sheet name)

  • Creates a binary column called Won_award (0 = No, 1 = Yes)

  • Calculates the following metrics:

    • PRA = Points + Rebounds + Assists (offensive metric)
    • STOCKS = Steals + Blocks (defensive metric)

Then:

  • Use excel_sheets() to list all sheet names
  • Apply your function to every sheet using lapply()
  • Combine all teams into one complete dataset using a row-binding method

13.1.2 Requirements

  • Your code must not manually load sheets one-by-one
  • The final dataset must include all 30 teams
  • Include a checkpoint confirming the total number of rows
  • Report the total number of observations using an inline R expression

Briefly explain why writing a function improves reproducibility.


13.2 2. Adding Conference Information

You are provided with a Conference lookup sheet.

You must:

  • Merge Conference information into your combined dataset

  • Recode Conference as a binary variable for analysis:

    • East = 1
    • West = 0

Include a small checkpoint confirming that both conferences are represented correctly.

Briefly explain why binary coding is useful for correlation analysis.


13.3 3. Visual Exploration

Create at least two visualizations using ggplot2.

13.3.1 Required Plot 1

  • A scatterplot showing the relationship between PRA and STOCKS
  • Points must be colored by Conference

13.3.2 Required Plot 2

  • A second visualization of your choice
    (Examples: distribution of PRA by conference, age vs PRA, STOCKS vs Age, etc.)

For each plot, write 3–4 sentences describing:

  • What patterns you observe
  • Whether offensive and defensive metrics appear related
  • Whether conferences appear visually different

All plots must include:

  • Clear titles
  • Axis labels
  • A caption

13.4 4. Correlation Analysis


13.4.1 A. Point-Biserial Correlation

Test whether Conference (East = 1, West = 0) is related to:

  • PRA
  • STOCKS

Report:

  • Correlation coefficient (r)
  • Direction (positive or negative)
  • Strength (weak, moderate, strong)
  • Statistical significance
  • Plain-language interpretation

13.4.2 B. Correlation Matrix

Create a small correlation matrix including at least:

  • Age
  • PRA
  • STOCKS

Visualize the matrix using ggcorrplot.

Then answer:

  • Which relationships are strongest?
  • Are offensive and defensive metrics strongly related?
  • Does Age appear meaningfully associated with performance?

13.4.3 C. Partial Correlation

Run a partial correlation examining the relationship between:

  • PRA and STOCKS
  • While controlling for either Age or Minutes Played

Interpret:

  • How the relationship changes when controlling for the third variable
  • Whether the association appears independent of that control variable

Explain in plain language what “controlling for” means.


13.5 5. Executive Memo to Adam Silver

Write a short professional memo addressed to:

Adam Silver
Commissioner, National Basketball Association

Your memo must include:

  • A summary of your key findings
  • Whether Eastern and Western Conference teams appear statistically different
  • Whether offensive and defensive performances tend to move together
  • One limitation of your analysis
  • One recommended next step

Your tone should be professional and executive-level, not academic.


14 Guidelines for Interpretation

When interpreting correlations, always include:

  • The direction (positive or negative)
  • The strength (weak, moderate, strong)
  • Whether it is statistically significant
  • A plain-language explanation

Example of plain-language style:

As PRA increases, STOCKS also tends to increase, suggesting that teams with stronger offensive production may also contribute more defensive playmaking.


15 Reflection

In 4–6 sentences, answer:

  1. Why is reproducible workflow especially important in sports analytics?
  2. How does programmatically loading data (rather than manual steps) improve reliability?
  3. What did you learn about interpreting correlations versus visual patterns?

16 Reproducibility Practice

This assignment emphasizes reproducible data pipelines.

Your document must:

  • Load all sheets programmatically
  • Avoid manual data entry
  • Use clearly labeled code chunks
  • Avoid hard-coding values in written interpretation
  • Use inline R expressions where appropriate
  • Render cleanly to HTML

The goal is that another analyst could rerun your entire workflow and verify your findings without guessing what steps were taken.


17 Publishing Instructions

  1. Render your .qmd file to HTML.
  2. Confirm that warnings and messages do not appear in the final document.
  3. Submit both your .qmd file and your published .html report.