Transcript for:
Exploring NFL Stats with Data Science

Do you like football? Do you like data science? If you answered yes to both, then this video is for you. And today we're going to talk about how you could merge football and data science together. And without further ado, we're starting right now.

So in this video, we're going to cover how you could build your very own simple web application for exploring the NFL player stats data. Okay, so let's get started. So the first thing you want to do is fire up your Google Chrome or your internet browser. And then you want to click on the Seasons tab. Go to 2019 NFL.

Click on the Team Stats and Standings. And then you want to click on the Player Stats. And then for the Standard, you want to click on Rushing. And so please note that the data that we're going to be using today will be based on the rushing data.

And if you would like to use other... then please feel free to play around with it. And so you want to click on the rushing. Okay, and so we're going to be taken to this page here.

And so you're going to see the player stats right here. And so we're going to be scraping the data from this website. And so let us copy this URL. All right, and so now we're going to fire up the terminal.

And as always, I'll be activating my conda environment. So on your own computer, if you have conda installed, and you have a conda environment, then you could activate your own conda environment. But if you don't have any, then you don't have to do anything. So I like to use conda because it allows me to contain all of the libraries and packages and dependency in a self-contained manner, so that it doesn't ruin the other project that I'm working on, on the same computer. So I'm going to desktop here.

Streamlit folder. Going to the football. Alright and so that's the file that we have. FootballApp.py Okay and so we're gonna take a look at the code in Atom. And let me also fire up the web app as well.

Okay, and so this is the web application that you're seeing here and on the left You're going to see a collapsible side panel And so the side panel here will have a total of three input section So the first one will be taking the user input of which year you want the data to be from so here We're using a default of the year 2019 and if we click down on it You will notice that it'll start from 1990 and so we could change that year and I'll be showing you that in just a moment And the teams will be detected directly from the data frame here. And then the position is taken from the POS column here. Okay.

And so notice that this web app that I'm going to be demonstrating today, we have not yet done any data cleaning. And so we're using only the completed data. So you're going to see that there are a total of 117 rows here.

So... I'll be leaving it to you as a hobby project for you to clean the data, and let's see how bigger the data set will become. So let me show you what does the raw data look like. Okay, so that's the raw data. And as you can see, there are a total of about 344 rows here, versus 117, which is the complete data.

Okay, so it's clean here, the one that we have. So let me hide this again. Okay, and let's take a look further at the functionality of the web app that we have here. So if you click on the intercorrelation heat map, you're going to be seeing the intercorrelation of the variables here.

Okay, so let's take a line by line explanation of the code here. So the first six lines of code will be importing the necessary libraries that we are going to be using today. And so the first one is the streamlet because it allows us to build this essentially this web app.

And then we're going to import pandas as PD because of the data frame that we're using here, we're going to be using base 64. Because of here, the functionality to download the data as a CSV file. So it will be essentially encoding it and decoding it. And so here we're going to be making use of matplotlib dot pi plot as PLT and also the import seaborn as SNS.

So we're going to use both of them together to make the histogram plot that we see right here. And so here, as you can see, we're using NumPy in the creation of this histogram plot. Alright, and so now let's move on to line number eight.

Line number eight here is the title of the web app. And the title is NFL Football Stats Rushing Explorer. Lines number 10 through 14 is right here. The explanation of the web app.

This app performed simple web scraping of NFL football player stats data, focusing on rushing. And so the Python libraries that we're using is included right here. And actually, it's not yet complete.

we also have NumPy, we also have matplotlib, and seaborn. And so the data source is coming from the pro football reference.com. So notice that a couple of minutes ago, I've shown you how I've copied this URL link, okay, this link at the top here. Alright, and then I've pasted right here as a reference.

So in the load data function, notice that for the URL right here, this is the URL that we are going to web script the data. And notice that here we're setting the date in the range of 1990 to 2020. And so notice that in the URL, you have the first component here, which is essentially right here, the first segment. And then the second segment will be the year, which we are going to do programmatically by replacing it with a string of the year. And then we're going to add forward slash rushing dot htm right here with the forward slash. And so essentially this will give us this URL right here if 2019 is selected here.

But if we select 2018, then this becomes 2018. And the data will then be updated. And here we set the header to be 1 in order to retrieve the data set here. So this is the function.

for doing the web scraping using the pandas library. And so as you can see that the web scraping is done in only one line of code here. And then the other lines we're going to essentially be pre-processing the data set.

Meaning that we're dropping some redundant headers or we're also dropping some repetitive columns and values, et cetera. And then finally we're going to be assigning the load data to the player stats variable. And then we're going to be sorting the team. And you're going to see the sorted team right here in the user input features here. Okay, which is line number 33 and 34. So after we sorted the team, and we're showing the unique values, because here you're going to see that in the raw data, the team values will be repetitive.

And so here we're sorting according to the unique values. And so we're going to see only unique values of the team names here and in sorted order. All right and then lines number 36 through 38 is going to be right here. The position.

So here we have set it to be RB, QB, WR, FB, and TE. Okay and we're also using it as a multi-select meaning that you could select multiple values at the same time or you could just delete it and then select the ones that you like. Okay okay you could add one by one.

the values that you like. Okay, and then aligns number 40 and 41. Here, we're going to be filtering the data based on the input of the sidebar of the team selection and the position selection. So line number 41 will be essentially filtering the data frame that we're seeing right here, based on our input selection, the team and the position lines number 43. through 45, we're going to be displaying the header called display player stats of selected team right here, and also the data dimension in a normal text underneath the header, and then the actual data frame will be displayed according to line number 45 right here. Okay, and then lines 47 through 55, it will allow us to download this data frame into a CSV file. So let's download it.

And as a recall, we're using the base 64 library in order to perform the encoding, decoding of the data. And so you're gonna see that the data is downloaded into this file. And then the remaining part of the code will essentially allow us to make the heat map, shown right here.

If you click on the button, inter-correlation heat map, you click on it, and then you will be seeing this heat map here of the inter-correlation between the variables. And so this block of code will allow you to make the heatmap. And so as you can see here, all of this is just under 70 lines of code, and it allows you to build a very simple data-driven web application for retrieving or web scraping the NFL football player stats data.

And I hope that this video was helpful to you, and please help us beat the YouTube algorithm by clicking the like button, subscribe if you haven't yet done so, and hit on the notification bell in order to be notified of the next video. And as always, The best way to learn data science is to do data science. And please enjoy the journey.