Exploring NFL Stats with Data Science

Oct 9, 2024

Lecture Notes: Merging Football and Data Science

Introduction

  • Combining interests in football and data science.
  • Building a simple web application to explore NFL player stats data.

Initial Steps

  1. Open your web browser.
  2. Navigate to the 2019 NFL season.
  3. Access "Team Stats and Standings" > "Player Stats" > "Rushing".
  4. Scrape the data from the website.

Setting Up the Environment

  • Activate your Conda environment if available, to manage libraries and dependencies.
  • Use a web application framework (like Streamlit) and a code editor (like Atom).

Web Application Overview

  • Side Panel Inputs:
    • Year: Default is 2019; can be adjusted from 1990 onwards.
    • Teams: Extracted from data frame.
    • Position: Extracted from POS column.

Data Cleaning

  • The application shows 117 "clean" rows out of 344 total.
  • Data cleaning is not performed yet and is suggested as a side project.

Web Application Features

  • Intercorrelation Heat Map: Visualizes relationships between variables.

Code Walkthrough

Importing Libraries

  • Streamlit: For building web applications.
  • Pandas: For data manipulation.
  • Base64: For encoding/decoding CSV downloads.
  • Matplotlib & Seaborn: For plotting histograms and heatmaps.
  • NumPy: Utilized in histogram creation.

Application Title

  • Line 8: Title - "NFL Football Stats Rushing Explorer"

Application Functionality

  • Lines 10-14: Explanation of the app and libraries used.

Data Loading

  • URL setup for web scraping from profootballreference.com.
  • Year range managed programmatically; scraping done with Pandas in one line.

Data Preprocessing

  • Dropping redundant headers/columns.
  • Assigning data to player_stats variable.
  • Sorting teams by unique values.

User Input Features

  • Team and position selection implemented.
  • Unique values sorted and displayed.

Data Display

  • Lines 43-45: Display filtered player stats.
  • Line 45: Shows the data frame.

Download Feature

  • Lines 47-55: Allows CSV download of data.

Heat Map

  • Final code block creates inter-correlation heat map.

Conclusion

  • The application is built with under 70 lines of code.
  • Encouragement to like and subscribe on YouTube.
  • Emphasis on learning data science through practical application.