well hello Internet and welcome to my first live coding session on both YouTube and twitch at exactly the same time and today what I'm going to do is I'm going to focus in on tensorflow and I am specifically aiming this at from the beginning I'm going to make it so that just about anybody that watches this can understand what is going on as long as you have a little understanding of python or some other programming language you should be 100 fine and of course I'm here the whole entire time to answer any questions that you have the only thing that has come up um is that sometimes the resolution's better on Twitch and sometimes it's better on YouTube so if your resolution isn't great try bouncing over to the other site try that out and after this video is finished I will post a link to all of the code and it will be on GitHub and everything so the whole time I do this then feel CL just feel free to ask any questions okay so here we go and uh the one other weird thing is that if you comment on YouTube it won't show up on my screen but if you do it on Twitch it will sorry I can't fix that that's my software has a problem all right so enough talking I'm gonna show you one slide and this will be the only slide that you will ever see across the rest of this video but I think it's kind of important to conceptualize a couple things because I like to teach through coding is my background real it is absolutely not okay so here we go here comes the one and only slide you're going to see today okay so what exactly is tensorflow and I'm dumbing this down and making it very uh understandable basically tensorflow is used for many deep learning machine learning applications and it's basically what it does to simplify everything is it uses a bunch of numbers to predict another number and we'll get into why everything is numbers as the tutorial continues and you can see here a an example of a neural network on your screen and basically what we're doing here is we're demonstrating a situation in which our neural network is taking in certain input neurons which are going to be the number of bathrooms the number of bedrooms and the square footage for our house and then it is going to ultimately spit out a prediction on the price of said house thank you for following okay okay so what is an input neuron you can you can mainly think of this at least in the very beginning when you're learning about machine learning is what we're trying to do in the beginning is to plot a regression line that fits the data points that we provide to our our system our model that we are going to be building okay and what by regression line I mean that it is taking all the different data points and it's finding a mean value between those points and when I talk about input neurons I just mean the number of features that are going to be used by our neural network to make predictions and buy features as in our example we have here those features would be the number of bathrooms number of bedrooms and the square footage then you have in the center those three columns those are what we call Hidden layers and what they do is they apply different weightings to our feature and by weightings I mean how important certain features are to estimating the price okay so it's going to say okay let's uh maybe if we apply more importance to bathrooms versus square footage does that help us ultimately get our final price all right and um output neurons are just the predictions our neural network makes and that is the end of you seeing any slides all right it's just I do not like slides I don't like to use them but sometimes I have to and for those people that are saying well what if I don't have a really good computer uh I can I do this yes you can you can use something called Google collab which is what I am going to be using in this video and I'm going to be using all original data as well um when I say features I am saying that those imp that that input data that is going to influence maybe the price of a home or as the example I'm going to use here I'm going to use the MBA and I'm going to try to predict basketball salaries based off of analysis of different statistics so in that situation the statistics the number three point shooters NBA I'm referring to basketball the uh percentage of three throws uh three pointers that are achieved the number of points the age the team all of that would be features in this example so the features are just all the inputs that we would receive um I'm going to be using python 100 across the entire course of this tutorial because that's what I use I'm an inventory analyst and I use Python and I use tensorflow Pi torch I use a whole bunch of other different data science platforms to be able to make all my predictions all right so what are we going to do here now well you might ask yourself well what is a tensor and I think this kind of explains it pretty well now if you ask your computer if you provide an input to your computer your model and that that is the color pink your computer does not know what pink is and but however it would understand the concept of 100 red 0.075 green and .079 blue and we are going to represent that data in a format like this okay so this is how we are going to be translating all of our things that our computer can't understand into to numeric data are your predictions accurate well they are going to progressively get more and more accurate as we continue whenever we first start out they're not going to be very accurate but that is why that's the whole process and that's sort of going to be true no matter what data we ultimately analyze or what we are ultimately trying to predict we are going to learn over the course of this video many different things that we are going to have to get better at and those things are going to change depending upon the type of data that we are working with um yes those RGB values in that specific situation uh represent a vector and you can see here some examples of how we will use tensors this is a vector this is a scalar Matrix would be like a multi-dimensional array or like a matrix and a tensor is a multi-dimensional array of vectors here is all the inputs that are Imports that I'm going to be using so of course I'm going to be using uh tensorflow I am going to be using also uh Keras I'm going to all this code is going to be available on GitHub after the video is done I just have to upload it um I am going to also be using numpy I'm going to be using pandas and I put all of the different pip installation uh tricks you you would need if you decide you don't want to use Google collab so there's pandas I'm using Seaborn matte plot lib I'm going to be using uh scikit learn for a whole bunch of different things that I will cover as the tutorial continues I would explain more exactly what these are doing but I think it makes more sense when you see them in actual code and in actual examples and what this guy down here does is it actually tests to see if I'm using Google collab a Google collab GPU or not and I'm going to run this and you're going to see that checkmark came here and I did not use the GPU I'm probably not going to use the GPU but if you would want to use a GPU you could just click up on in in this area right here and then you are going to be able to use that I am not going to use it though um because I have no reason to use it all right let's get that out of here okay so here's some Google collab sort of fundamentals um okay and this is just some ways to work around inside of Google collab so if you want to change to a markdown cell what you're going to do like this if you go and actually focus in on any of these different pieces this is what's called markdown and this these two uh half symbols here are going to allow us to bold that and then to run all of your cell either this or code you just hit shift and hit return it's exactly the same on Windows as well is on Macintosh all right so there's just a couple little things and as we go through here I will explain more about what I'm doing as I'm working with Google collab now the very first thing we're going to need to do is we're going to need to download our data um and I'm not answering questions about net worth right now um and the data that I provide here it is you can see it's going to be on GitHub and I can go and paste this if you want to be able to get it so it's going to be here I'm working with two different chats so it's a little bit weird but this is the very first time I've ever did live coding on two different platforms at one time so bear with me so this is where it is okay so this is just a bunch of NBA stats this is stuff that I went and put together on my own and you can see all of the NBA players and the insane amount of of um data we have on every single NBA player and our goal here is to figure out how to predict salary based off of you know all the other statistics that we have but to be able to access it you're going to have to click on Raw right here and then you're going to get this and you're just going to select it and this is the way this is all CSV data comma separated values and then we're going to jump back over inside of here and I'm going to show you exactly how we are going to import all of the data that we want to work with so I'm going to say NBA data is going to be equal to PD stands for pandas I have a gigantic pandas tutorial on my site if you'd like to get it read CSV this is how easy it is to burrito CSV file and then you just paste in that information now whenever we do this of course we would like to that's just the you know the link to what I just showed you now anytime we do this of course we're going to want to come in here and verify that we were actually read the data now if I just want the first five pieces of data or rows of data I'm just going to type in head and run it and you're going to say that yes indeed I was able to grab all of this different data um yes I created this data set this is I I scrubbed it off the internet um it was kind of hard I was originally going to do baseball but then I decided to do basketball because it was a little bit easier to work with sorry I have to drink tea or I will lose my voice so yes my data set is available for anybody who wants to get it all right so after I go and get that then what I need to do is I need to delete unknown values like I said I don't think you need to really um I I don't think you really need to understand any of the data science or really anything to be able to jump into this it's going to be very very simple so to delete any any data that is um you know going to be missing I'm just going to go mbadata dot is n a and Dot sum whoops is an a whoops um a little mistake there uh dot sum like this and this is going to tell me if I am if I actually have any of these missing data in any of these different columns and you can see right here I have none all right so all of my data is very clean that's not normal uh it's mainly because I already went and got rid of most of it but let's come in here and let's say we want to go and delete data how would we do that if that did if you know if we did not get zeros in all those well we could do something like NBA data is is equal to and then we could say NBA data and drop n a and that would get rid of any of the problem data that was in in our in our data set but we don't have any so why worry about it all right okay and I could just get rid of this all together and I'm just going to click this to make it hide um what else do I want to do well there's some data inside of this as you can see up here that I do not need remember tensorflow and uh deep learning Platforms in general they only care about one thing and that one thing they care about is numeric data so we don't care about first names and last names we might leave meh I'm probably going to get rid of the team and also the position out of here as well um or do I want to hmm I might not want to actually let's leave team and position in here but we definitely don't care about first and last names all right um so let's come in and let's delete some unneeded data out of our system hmm imagine the size that half how Google has on me ah I think I'm extremely boring so if and Google advertisements are normally terrible in regards to what they send me uh I don't know okay so we're going to call this MBA data numeric I did call this NPA data right yes I did so I'm going to call it NBA data numeric just to signify that I am working with with numeric data so I just go MBA data like that and I'm going to copy over all of that data to NBA data numeric and then what I'm going to do is I'm going to delete certain columns let's go MBA data and numeric and I'm going to say that I want to get rid of the first name I don't want to we're going to see don't do that you're going to see in a bit here what I am going to do to be able to convert the non-numeric data into numeric data but I'm not I don't have any reason to use first name and last name so I'm not going to use it so I'm just going to come in here and let's get rid of that and I'm going to keep position and name you could get rid of it if you wanted to okay and then just to verify that it actually is all gone I'm gonna go MBA data numeric and you can see right here I have team I have salary I have position and all of the other data now I'll get into why I'm going to be changing this now one potential problem you could come uh upon whenever you're working with this type of data especially salary data and things like that is there may be dollar signs there may be um you know unneeded commas and so forth and so on so in those circumstance even though I don't need to do it right now I am going to create a custom function just so that if you ever need it you will be able to uh to uh use it how many inputs do you think you would need to ensure a model can learn price action oh that changes that is completely different depending upon the data set and what you're working on there are no Ironclad number of inputs sometimes too many inputs actually actually most of the time if you have too much data um input that actually is negative in regards to your results so that's a big problem a lot of the time okay but we're going to create a function here that we may or may not need and it's going to be called I'm just going to call it clean currency because it's very common for us to receive um you know currency data and we don't we have to get rid of those dollar signs and all that stuff all right so what I'm going to do is I'm going to say if is in Spence and X Str and what this is going to do is if a string has a dollar sign in it or any delimiters um or otherwise it's going to get rid of those things so it's going to get rid of dollar signs and it's going to get rid of commas and it's going to replace them with nothing and otherwise if the whole entire piece of data is numeric it's just going to return that all right so this is one way that we can very easily clean up our data so I'll say return and X replace and if it finds a dollar sign I'm going to replace that dollar sign with nothing and you can stack these guys so you can say replace once again and if I get a comma I want to replace it with nothing and there it is all right and then after I do that I can just say return X all right good stuff and we can run that and then what I'm going to do is I'm going to come in here and actually run this on the salary data as if it was bad data meaning that it was data that had dollar signs and so forth in it so I'm going to say MBA and data and whenever you're working with um machine learning and deep learning whatever you want to call it you are going to be doing a lot of data cleaning it's something that happens a lot and if you watched any of my data science videos you saw that I spent a lot of time especially in the beginning cleaning up data because it has to be clean or it will not work all right so I'm going to go through cycle through all of my salary column and I'm going to apply the clean currency it sounds like I'm uh doing something illegal clean currency function and then I want everything to be converted so to a float so I'm going to say as type float and I assume you know what a float is and we can come down here and then we can say NBA Tata numeric and run all this and you're going to see that I know there were no dollar signs there but let me tell you it's a very common thing that you need to get rid of dollar signs and commas so that's why I decided to cover it um just checking to see if I got any other questions data cleaning is just what I just did like for example everything needs to be in a new numeric form and it can't have things like commas in it can have things like double decimal points it can't it can't have commas it has to all be cleaned and you're going to see also whenever we go along later um we're going to be using something called one hot encoding and why that's important is okay so we have all these different data points we have age which is 22 this is 73 then we have 17 125 then we have 0.595 it's like wait a minute this is completely crazy and all the ranges for all of these different data points are all completely different you can't work with that type of data that data needs to be what we call normalized but I'll get back into that later also we maybe we want to work with Team maybe we think team is something that's important or positions important well guess what this is a c it's the last let me check C is not a number Tor for Toronto can't use that either so what we're going to need to do is turn these also into numbers so that we'll be able to better work with them don't worry I'll cover everything as we continue yes getting rid of all the unwanted data and normalizing it so that everything is in a common range normally between zero and one sometimes negative one and and one but you know it depends okay so now after we did all this let's say we wanted to get the overall shape of all of our data that we have here well we could just say print and NP which stands for numpy and again numpy is just a library that provides all kinds of awesome tools for us to be able to use and we can just go whoops data numeric well now it's not giving it to me silly thing okay let's run that and you can see there is the overall shape um let's say that I would like to get an overall summary of all of my different data again something that is extremely important to be able to do well I could just go MBA data numeric and Dot describe and you can use all this as a cheat sheet whenever I upload it okay so it's going to give me the count it's going to give me the mean standard deviation minimum values all of that maximum values you can see right here the oldest person in the NBA is 38 years old the youngest is 19. um you can see the mean age is 25 and the same thing is true for the number of games number of games started and all of that stuff and and let me be honest I don't even know what MP means doesn't matter you don't always need to know what all these things mean this is field gold field goal attempts field goal percentage three points three point attempts you know I got I know most of this Steels blocks turnovers PF not sure uh points there's points okay so now what we're going to be doing is just analyzing our data to sort of get uh well accustomed to it and one thing we can do um I'm not going to delete the the live stream don't worry about it and what I'm going to do also is I'm going to analyze all the data with box plots and why that's important is we want to be able to see how many outliers we have in our data and by outliers I mean those extreme values like say there was one person okay we see the average age in the NBA is 25 let's say there was one magical person who was 56 I don't know all right and and everybody else was near 25. well that one person's insane age um would dramatic could dramatically affect our overall results so we want to use box plots in the beginning to analyze our data and see if uh the data is has a ton of outliers that will be a sign that we probably have a problem with our data and we're probably gonna have problems later on and we're going to ultimately have to get rid of those outliers matches played will you upload calculus video I have no idea when I'm uploading a calculus video sorry um that is uh insanely complicated topic to cover every single potential calculus problem you could ever see ever in uh in one video that is it's a lot so I apologize but that is something that's almost impossible to do okay so what I'm going to do here is I want to have a bunch of rows I have to get a drink of water I'm gonna lose my voice okay all right so I want to do a box plot and I know that I have 27 columns here and I have eight rows and so what I want to do is I want to plot a whole bunch of box plots all at one time so I need to num Define based off of those numbers how many columns I'm going to have and how many rows I'm going to have well the closest I can get to that is the number of columns being seven and the number of rows being equal to four I then want to Define exactly how big my overall graph of box plots is going to be I'm going to say something like figure size is going to be equal to and I'm going to say something like 30 20. I know that's going to be too big but whatever who cares then I need to cycle through all of this data so that is my index that's going to help me do that I am going to also call axis here and access flatten and this just pertains to how close together the box plots are going to be and then I just need to cycle through all of my data so I'm going to get two values out of my MBA data numeric um field whatever you want to call it so I'm going to say in NBA data numeric so there it is and items so I'm going to cycle through every single row that I have and I'm going to pull out all the data I need individually for each MBA statistic and then I'm going to print out box plots based off of that so let's go items and I might need to get rid of the team information now that I think about it in position because I think this is potentially going to throw an error so that's what I'm doing there I'm cycling through all those rows and grabbing each individual piece of data and I'm going to print out a box plot for every single one of them and then I'll explain more about how those work so SNS and box plot and Y is going to be equal to K and you'll see whenever this all prints out and data is going to all come from NBA data numeric I don't know why it keeps messing that up okay numeric and um axes is going to be or yeah oops axes and index and then I need to increment my index of course by one so that I can jump to my next row I wish I could just shut those off and uh then I'd need to after I cycle through all these I'm going to plot out everything so I'm going to say plot tight layout again tight layout is just going to smush all the different box plots up together and then I can Define exactly how much padding that I want to have on each of these 0.4 and how do I shut that off if I when I take a break I'm going to look at how to shut off all those tips because they're messing up my code and the only reason it's doing that is because I'm zooming way in on this code right now and that's the reason why it's giving me that messed up look H and horizontal padding I'll make this be 5.0 and I think that is everything that I need there um and let's run it I think I might get an error yeah I did and the reason why I got an error see it's not showing that I need to get rid of the teams and some of the other the team in position I didn't get rid of that previously I'm sort of wondering if I'd have to get rid of that okay so let's just come in here and we'll go delete yeah the intellisense is kind of doesn't have any sense I don't pay attention to that I wish it would just turn itself off um I normally have it turned off normally I use Visual Studio code but I wanted to show you guys what Google collab looks like so that's the reason why I'm using Google collab because you know then everybody can participate in in this and you can borrow Google's servers and instead all right let's run that and I got rid of team and position so now I don't have anything except numbers and I think everything here is going to be perfectly fine I'm going to run this again just just to do it and 289 that should be there this oh we got rid of the two so I got rid of those columns and I'm just re-running all this let's go and run this again I might have a little typo inside of there let's see kind of complicated to do well no I don't no typos okay so here is a box plot for every single piece of data sorry I know this is salary here and you can't read it but basically um those outliers are going to potentially cause a lot of problems for us and the reason why is they are you know it's kind of like bad data and I can look at this right now and tell you there's a lot of very overpaid MBA Stars that's just looking at this that means there's a lot of people especially those people earning a considerably high amount of uh salary they're overpaid based off their statistics so that means we're probably going to have some problems later on and as you can see here basically the box plot is just going to allow you to compare different variables or features and the boxes are going to show the quartiles of the data the bar in the middle that we have right here is from the median and then you're going to have whiskers that's the lines that go up above the the median main piece of data where most of the data lies and the whisters are going to extend to all the other data aside from the points that are considered to be outliers and this is a great wonderful way to to really look and see if you're potentially going to have problems like we can see here all right salary which is what we're trying to find that has a ton of outliers so that means it's probably going to be pretty hard age doesn't um so if age is a good contributor to ultimately salary that's going to be a very good statistic that we can use to analyze our data a lot of outliers for field goal percent a lot of outliers for two points a lot of outliers in general all right and I'm specifically picking this data set because I want to create a bat data set that isn't easy to work with um and I can also go in and use histogram plots to also go and analyze this data and then we're going to do a correlation Matrix and then I promise we're going to get into the actual meat of the machine learning um let's say I'm just looking at your questions you have here what is going on all right is there any chance you could write bullet points describing the step-by-step process yeah I'm going to have all that um so yes you're gonna see bullet points for everything as we continue onwards here all right so let's get back to actually writing some code now what we're going to do is we're going to create a histogram plot and we can use histogram plots to analyze how values are distributed into different bins and then I'm going to show if the data is normalized or not and if you want to predict a label so in our house example if the number of bathrooms a number of bedrooms are the features the label is the ultimate price of the VA of the house okay so anytime I say label you think of what we are trying to predict okay now what we want to see in our histogram plots is the data we are using for analyze for predicting salary that it is going to have a very similar histogram plot all right so let's go and let's plot those things so again again this is the same sort of stuff this is just boilerplate um Seabourn ways of plotting information so I'm just going to go apply sub plots I'm going to use a whole bunch of plots just like I did before number of columns I'm going to use seven because that worked for me before number of rows I'm going to use four because that worked for before figure size of 30 by 20 actually worked so let's just keep that and let's see it should work again so 30 by 20. and uh always do that figure size so this is going to be figure size equals there we go and there we are and uh again we're going to do basically exactly the same thing index we're going to cycle through all the different rows of data and we are going to be uh printing out a histogram plot for every single one of the different things so let's go and anytime I ever work with new data this is the very first thing I did rather than type all this out when I just come up here and copy and paste this so I'm going to grab this and this and almost all of this is exactly the same so I'm just going to copy it paste it down inside of here and if you have any questions hearing me or anything I probably shouldn't have did that before uh just just tell me okay so this is going to be changed to histogram plot and the difference here is this is going to be V and this guy here is going to be well X act the axes are going to be the same for this and I get more in detail into how Seaborn works but don't worry about it it's just boilerplate stuff you don't need to worry about absolutely every single thing you just get information overload if you try to do everything or learn every single thing so say density and I'm going to say that I want my line width um to be equal to zero everything is fine on its own index is exactly the same and tight layout I don't believe I need to change absolutely anything with that either and let's run it okay and it is taking a while doing a lot of calculations to make all of these beautiful plots for you and here we go okay so um what we can see here is here is salary we have a lot of players making not so much money and then we have a bunch of uh small number of players making a extremely large amount of money so like I said what we're what we want to do is we want our data to be in a normal distribution and that just means that let me see if there's one that there that's kind of a normal distribution that means that most of the data points are distributed along the mean and you can see what I'm pointing at right uh sort of my head's in the way here I'll move it up a little bit okay is there another one here it's on screen there you go this one's a normal distribution over here this field goal percentage all right and that just means that most of the data is around the mean the center of all of our data and we are specifically hoping that our salary data that we have right here is going to be similar to the m input that we are looking for so we're hoping that two point percent uh two point uh attempts are going to be very contributory to salary we're going to Hope also basically anything that looks like the salary graph that that's what we're hoping our data is going to work with so then what we can do is we can see if that is indeed correct how long have I been streaming I'm running out of T I'm running out of everything here um oh okay I'm gonna do one more thing I'll do two more things I'm gonna take a brief break so I can get more tea so I'll lose my voice because this is gonna be a long video I'm covering so many different things okay so how can we go in and check um if our data is correlated and when I mean when I say the word correlated I talk about correlation all the time in all my data science videos but that means the data is related to each other and one way that I like to explain this is let's say you own a candy company now one of the biggest Investments as a candy uh business owner is sugar so if the price of sugar goes way up that's going to affect your profitability because you need to buy a lot of sugar and if the main thing you're selling if it's mostly sugar and the price of sugar goes up that means that your Candy's gonna have to also go up that means your candy prices are going to be correlated to your sugar prices and likewise if sugar goes down dramatically for some reason well then your your profitability would go up because the cost to make your candy also would fall now the price of hammers would have absolutely no bearing or the price of steel more than likely would have no bearing on the cost for you to make candy so what we want to do remember we looked at this stuff well first off we looked at this these these box plots to see how our data looks and to see if there's tons of outliers very important then we wanted to go in and see what data inside of here is how highly correlated to salary salary is the thing we want to predict and we see all of that see I hope you see the value then what we can do is we can create what's called a correlation Matrix and it's going to tell us how heavily correlated each of these data points is to salary I mean it's so extremely valuable all this information okay and it's so easy to do so we're just going to go in here and plot this figure size is going to be equal to 30 and 20. this is the box that that I'm going to be putting on my plots in except this is just going to be what we're going to be calling a heat map what I like to call a correlation Matrix and then we just go NBA and data numeric and Dot correlation do not absolute value and annotation I'm going to mark that as true um and then I can run it and look at this beautiful thing that's going to pop up here and tell us everything we want to know I mean it's just great don't worry I'm getting into the tents there it is so now we can come up here and we can say all right we're interested in salary now this is a one why is it a one because salary is 100 correlated to itself let me just verify you can see what I'm doing yes you can see what I'm doing okay so salary is 100 correlated to itself and likewise ages and all that stuff so what we're looking at here is what other things are highly correlated to salary well we could see 52 percent is um game started and there's 61 percent what else do we got uh 64 and 63 so we can see right now we're starting to see what data we should use to make our predictions on salary what we should use maybe shrink this down slightly even though you're not gonna be able to see it but I'll sort of be able to describe what I'm doing here let's try that recalculate reformat that isn't it beautiful that you can use Google collab to do all this I mean you have no idea how many thousands of dollars you're being saved well that didn't even help but um you can see here so here is 0.61 so that means that [Music] um field goals is extremely correlated to the salary I bet you points is also 0.63 number of points the player scores is going to be highly correlated to salary what's 0.64 what's more important than points scored free throws oh the number of free throws that probably means that player is getting fouled more often than not so just looking at this data that we have here I can tell you the free throws points um fill goals and um what else free throws free throw attempt free throw attempts but not free throw percentage how crazy is that um and 0.58 turnovers is going to definitely pertain to salary so we have so much information and one more thing I'm going to do here before I take a very brief break to get some more tea so I don't lose my voice is I'm going to calculate the percentage of outliers because looking at our data that has really told me that is the most important thing that's that's concerning about our data is the number of outliers so what I would like to do is calculate what percentage of outliers do we have with each of our data points so I'm going to say four k v n and b a data numeric Dot items again this is just cycling through all of the different rows of data that we have inside of here and this is a Formula that we're going to use um what we're going to do here first is we're going to be using interquartile ranges um and what it's going to do is tell tell us the range of the middle half of my data set so this if we go up here to box plot this is what we're interested in this guy right here and what we want to do is we want to find out where what percentage of all the data points are outliers so we want to find out what percentage of all this data here is this okay and it has big fancy names but it's basically just a formula all right so what we need to do is we need to get our first quartile that is the difference here we're going to be getting this guy right here and this guy right here okay this guy right between this and that and that in there okay so we need to calculate those how do we do that I'm just going to call this q1 and make sure I'm not getting any questions doesn't look like I am okay let's move this over here okay so let's go over and let's try to get this organized all right I'm back so what I'm going to do is I'm going to go V and portal and I want the first quartile so that's going to be 0.25 like that and then I need to get our third quartile so that is going to be Q3 is equal to V Dot and oh that's it's not quartile it's quintile quintile quintile quintile there it is okay and all right there it is right there there we are I was like why isn't that working and that would be 0.75 and then what I'm going to use here is a formula to get your interquartile range I'm going to call it IQR then you're going to subtract Q3 two three is what I called it uh you're gonna subtract q1 from Q3 and q1 and again this is just a formula that you need to use and I need to go and find the lower fence as well as the upper fence I'm trying to estimate this this is what I'm trying to do just to make this 100 clear I'm grabbing from here to there and then I'm trying to figure out what percentage of all of this is the outliers okay hopefully that makes 100 sense all I'm doing is punching in a formula though you don't even need to know the specifics of the formula don't get your head all bent out of shape because you don't know it's just a formula we're using a formula all right to do this we go V like that and what we're going to do is we're going to say is V less than or equal to q1 minus 1.5 times the interquartile range okay it's just a formula and then we're going to do basically the same thing but I'm going to type it out because whenever I copy and paste sometimes I make errors so greater than or equal to Q3 plus 1.5 times IQR and I lost my parentheses here let's get that parenthesis and throw the parentheses here instead there it is and then I need to go and calculate the percentage of total outliers how I do that I'm just going to call this per and to do that you just say numpy shape and V column and zero times 100.0 and then subtract divide that by MP shape NBA data numeric and that zero index on that as well I get it all looks like I did sorry about the coffin but that's because I am losing my voice here slowly yeah I thought that was fine I don't know why and then what it's going to do as it Cycles through is it's going to present print out the total number of outliers for each data point outliers is equal to and I'm going to say that I want two decimal places for the percentage that is kicked out of this guy and I am just going to go percent and I can say this is going to be uh K and [Music] that percentage that we just calculated all right and let's run it see if it worked what did I do wrong cannot form a data type of one right a scalar type of flute what did we do wrong here typo probably um let me call them is V sorry about the complicated type of formula but sometimes that happens q1 minus 1.5 times IQR ah I know why because I need to put parentheses I did need that parenthesis there I got rid of a parenthesis that I needed so let's go and throw this parentheses here and now all that should go away and let's run it hmm outliers and what does it not like here print percentage outliers hmm 0.2 hmm I'm not sure what I did wrong there oh silly there we are run it there we are and you can see here the salary has a 9.69 out uh the percentage of outliers in this hour is almost 10 percent everything else was really pretty good A 10 is definitely the worst okay so we know where we have our potential problems and what I'm going to do next after I take my short break is I'm going to cover tensorflow neural network regressions in detail so I'll be back in one probably three minutes foreign thank you to everybody that stayed with me there a couple of you and I see that you can't hear me all right well it's uh I'll raise my volume up there I raised it up so hopefully you can hear me now a little bit better than before yeah I don't think you need to be an expert in data science to be able to understand this stuff uh let's just verify a couple different things all right and then I will get back into the more tensorflow side of all of this stuff all right so what did we learn we learned a lot about our data we learned that our data is going to be difficult to work with we have a lot of outliers in the one thing that we're trying to analyze which is salary um we are we learned a lot about what specific data points or features get used to saying features are are most important in calculating salary we used our histogram plots to see that our data is not normalized don't worry we will normalize most of it box plots told us we also have a lot of outliers there okay so we know that we're going to have some problems with our data but don't worry I I will get better and we will get better and better okay so a little bit of a review tensorflow neural network regression now a regression is just going to be used to determine the relationships between different features in regards to how they will affect predictions that we are going to make and I I think everybody's and let me see here I'll go and I will grab one so regression and whenever we start off this is what we will be really looking at oh I just clicked on the world's smallest regression um let's go can I see this full screen and I'm on Wikipedia yay uh there we go all right so here it is so this is a regression I have no idea what this is plotting or what it's doing but basically what we have here are all of these different data points and how we create this regression line is we take this data point and this data point here and we sort of push them together to find the mean between all of these different points of data and whenever we do that we get a straight line This is a linear regression because it is a line a straight line and we are not only going to be dealing with linear regression analysis but that is where we will start because we have to start somewhere remember this is all just using a bunch of numbers to predict another number whether it is number of bathrooms and number of uh bedrooms or square footage to the house house price or if we're looking at NBA statistics to try to make a guess on salary based off of that statistics and there we go so what I want to do here now is I want to get more into tensors and how to work with them and how to create them this is from the ground level so what I want to do here first is I want to create a constant tensor and uh let's go and create a scaler so we can say scalar and we'll say tensorflow dot constant and five there we are and then we can say scalar like that and run it and you can see that it is a 32-bit integer and it has a value of five now we're going to create a vector this is going to be the part of the tutorial where things just start getting really fast so I'll just call this vector and it's constant again like this and we can just go and say something like 10 and 10. there it is and here is our Vector run it there we go you can see the shape is two in this circumstance and it's still a 32-bit integer okay um let's go and create a matrix so I'm going to call it mate whoops I'm going to call it Matrix and that is going to be and I'll make it a constant I'll also show you how to make non-constant tensors so in this circumstance we are going to put our boxes around here like that and then we'll just go one and two like this and three and four just to do something and whoops don't get that yeah okay and Matrix boom there we go and we'll be manipulating all this data actually I'm going to show you how to multiply matrices and I'll explain why that is a useful skill to have now what I want to do is I want to create a tensor and it's like a 3D Matrix and so I will call this tensor that's where I wanted to put it right create a tensor yes so tensor and equal to TF this is mainly what we will be working with data that looks like this because sending one piece of data doesn't make even partial sense in any circumstance so we'll do here and I'm in this circumstance I'm just going to say one oops one and two and three and can we re-watch later yes I will have this on Twitch and YouTube both and I've also entertained the idea of potentially making a version where I I just I don't talk that much all right I'm not not that but like an alternate version that's not live streamed um in case anybody wanted that I don't know if I'm going to do that or not because that's kind of in my rear end Okay so um what do I want to do here oh I want to keep this uh this box right here um just trying to figure out how I want to organize all this this dummy data that I'm not even really gonna do anything with I'm just trying to organize it so that you guys can can mess around with it um and that is really what you need to do to master this stuff is just sit hunt for things on the internet um and create bigger and bigger problems I said I wanted to get rid of that and I wish I would have gotten rid of it okay so 789 and let's just do we'll just keep the party going here um 12. and this guy right here and let's close that there and like that and thankfully we never do this okay I'm just just I'm just showing you what it looks like uh but you will never hand sit here and type in all of this stuff that's why I never do it anytime you ever see me struggling with something that probably means I never do it so I never uh manually type in a tensor okay and neither should you okay so there you go and you can see the overall format of our our of our tensor that we just created there um now what I want to do is I want to create a tensor that can be changed so how do you do that well I'm going to call this V tensor for a variable tensor and you just type in TF Dot variable and just to keep this simple I'm just going to do five and six like that and the and sensor don't worry we will be creating models here very very quickly or very very soon anyway um let's say you would want to go and either get or change values how would you do that I'm going to use print now in this circumstance so I'm going to say that I want index of zero and I am going to just go V and tensor and zero and what else would I like to do oh let's put that inside of there like that and like that and uh if I wanted to change the value I can just say tensor and zero and follow that up with a sign and let's say that I want to put eight inside of there instead and then I can go and print this out again and you can see that the values we were able to get the values and also we were able to change the values okay good now another thing you're going to want to do whenever you're really practicing and messing around with tensorflow and data science in general is you're going to want to be able to create random tensors um so I'm going to create one so I'm going to say random tensor is equal to TF dot random dot generator and then you're going to say from seed and you may say well why do you want from seed well let's say that you want this to be random but you want to be able to get the same random tensor at some point in the future that's what the seed does it guarantees that the value what you receive back um uh is going to be exactly the same okay so that's why we use seeds we could change that to any number you can want and that's the only reason we want is we want to be able to reproduce these random number of values another thing you're going to be able to do is to command whoops let's go up here put that down there and I will do this later on in the video is you're going to be able to say tensorflow dot random seed like this and then it'll be a random set seed set seed there we are 66 and then that'll be a global seed that you will use throughout the code and that's what I'm going to be using later but I just want to show you everything so that you know everything okay so now what I want to do is I want to generate values that are near the mean which is going to be zero and these values are going to create a bell curve meaning that it's going to be normalized so I'm going to say random tensor and equal to random tensor and normal bell-shaped curve did I say bell-shaped curve that's what it is it looks like a bell okay is equal to three and two because you're almost always going to want to work with normalized data so 0.0 that's what I said the mean is and then I'm going to do a standard deviation which is going to be just one okay and like this and then we can just go and look at our random tensor and see how wonderful it is and there you can see is our random tensor that we generated and it's three by two okay and it's exactly as I said it would another thing or would be well thank you I'm glad you are enjoying the video it makes me very happy and believe me we're scratching the surface of the craziness but I wanted this to be approachable to somebody that was a beginner don't worry in about three weeks we'll be doing things that are ridiculously complex now how would you go and turn a numpy array into a tensor something that's important something you will do all the time [Applause] just call it numpy array and you just say numpy dot a range I just want to generate this guy um I'm going to say that I want to generate values from 1 to 24 so I'm going to say a range and one you have to put the 25 it goes up to 25 but doesn't include 25 what am I doing I forgot my parentheses sometimes I talk and forget to type okay and then you can also Define what data type you want this to be so I'm going to say data type is equal to and this will be numpy and int 32. to get rid of my parentheses yes it did okay and then we can say numpy array like that all right so there it is generated from 1 to 24 just as I said that it would now let's convert this into a tensor and I'm just going to call this tensor 2 to be simple and go TF dot constant and pass in your numpy array and guess what now you have a tensor and there it is you can see it's a tensor but it's the same exact values one and also that we're using integer uh 32-bit integers now let's say that you wanted to be able to change the shape of your tensors how do you make your windows faster I'm using um I don't know if I understand your question but I'm using Google collab is what I am using and Google collab does all the processing for all of this so um that's the reason why I'm using it it's like Google's giving you its high-powered computers to be able to mess around with this stuff it's pretty awesome centrock thanks for stopping by okay so change the shape of my tensor so if I want to change the shape of this it is um what I'm going to need to to do is get a value of 24 because I have 24 different pieces of data so that means that the shape of what I'm converting into must ultimately be equal to 24. let me show you an example what I'm talking about so that means if I have make it six by two by two well guess what that equals 24 and that is a valid represent or a valid shape change from this what we have right here now if this was a one obviously this would be 12 and this would not work okay so just make sure that whatever you're working with is going to be equal to 24. so let's make this 2 and this be 24. just a little bit of linear algebra thrown in here believe me linear algebra algebra trig those are going to be the what you need to understand more than absolutely anything not calculus okay so I'm going to create a constant here and I'm going to go NP array like that and I'm going to say that I want to change the shape to for this uh tensor that we are going to be creating here to six buy two buy two there it is sensor three boom and we converted it if you want me to change this to one just to show you that it doesn't work I will do that say it doesn't work so this has to be two or it has to be some uh version of rows and columns and sheets that are going to be multiplied together to equal 24. uh what do we want to do now well you can get a whole bunch of different information uh no you cannot play games on Google collab as far as I know I don't know is there a way there might be some weird way of programming something and and I don't know I have no idea uh I've never tried to make make games on Google collab okay so let's say I want to get the first index of this I'm just going to say I'm going to show you a whole bunch of different ways of getting data so I'm going to say index 0 like this and tensor 3 and there grab that and what else would I want to get let's say that I also want to get the values in for our dimensions of our our Vector so I can say I'm going to call this values in Dimensions something like that I don't know I think you get what I'm talking about and you can just say tensor three dot shape exactly like that whoops values and what else can we do I can also let's say I just want to get the dimensions here so we'll say Dimensions like that and tensor free dot uh dimensions and um oh that's a number of Dimensions sorry number of dimensions and parentheses and what else do we want to get let's say the total values so total values and you can go TF Dot size or tensor 3 and we convert this to numpy like that and what else do we want uh let's say that we want to just get the data type all right so there's there's a bunch of different random things or types of information we can get about our tensors that are sometimes useful sometimes not so much tensor three dot d type I believe yes there we are run it oops type something in wrong as you can tell I don't do this a whole lot I'm mainly look at the I'm gonna look at this I don't look at that much in regards to how to my tensors are set up oh this doesn't have parentheses I don't think yeah there you go so there's all the information you can get on that okay so why is linear algebra and all this nonsense useful well that I'll just give you a brief explanation of matrix multiplication you could do so many awesome things pro program wise using um matrices I mean and speed up the uh the calculation of very complex things very very quickly thank you for helping the person that's having trouble with their computer okay so we got Matrix uh multiplication this is what we specifically refer to as finding the dot product between between two matrices I think this is pretty self-explanatory how it's going to work you're gonna have this matrices right here and this matrices right here now what's important is that they are going to have to be opposite each other so in this circumstance we have two rows and three columns here we are going to have uh three rows and two columns all right so what that means is the number of columns three must be equal to on the for the matrices on the left must be equal to the number of rows for the matrices on the right and the reason why is what we're going to do is we're going to take one and multiply it times one and then add it to two whoops two multiplied times three over here see we're multiplying these across and then we ultimately we get a value of 22 28 and all these other different things and you say boy that's really dumb why are we doing that you might not say that I just want to show you one brief example so let's say these are going to be the number of different ice there's so many things you can do with linear algebra that are amazing but I'm just showing you this one brief example okay so let's say we sold 26 chocolate ice creams uh this is 18 Vanilla Ice Creams nine strawberry 24 chocolate chip ice creams on Saturday okay so I think you can see these are the different flavors how many of those different flavors were sold on each of those individual days this is how much a chocolate ice cream costs this is a vanilla ice cream cost uh four and or for strawberry and for for chocolate chip okay so what we're going to be able to do with these two matrices is we can find the earnings on each day with matrix multiplication and you might say well how would that work well I'm going to show you okay so I'm going to take this data right here and I am going to create a matrix so I'm going to call this ice cream sales equal to tensorflow and this is going to be a constant like that and then I need to go and put each of those rows of data inside of here and this is just the tip of the iceberg of awesome things you can do this isn't even that awesome but it's kind of awesome um when it comes to working with matrices okay and and then I'm going to show it start getting into programming our models and all of that stuff to make predictions on our NBA salaries but this is this isn't a learning one video for tensorflow by any stretch of the imagination but it is about as much information as I think one person can who's new to this stuff can understand in one video okay so and I'm going to try to make a new video depending upon if you guys are interested or not every single uh what is this Wednesday okay Wednesday um it depends on if you're interested if you're not well then I'll make one every other Wednesday or something like that or who knows it'll all depend upon interest because sometimes things that I think are ridiculously interesting other people don't actually most of the time I think something's really interesting uh most of the world does not agree with me so I don't know if I don't know how how that works in your life but that's my life in general okay so eight and six and fourteen okay so that is how many of each of the different flavors we had and did I close all of my parentheses oh I need one more bracket right there okay so I just created all of those and I can go and show you them so ice cream sales run it there we are okay so there is our first tensor and then we're going to create another one that is going to represent the ice cream price and make another constant like this and our price for this is going to be three two two and four and four is that right and ice cream price boom there we are three two four four yes chocolate ice cream costs three dollars a cone vanilla strawberry and all that stuff all right so I went and plugged that information in and how easy is it going to be now to be able to go and find multiple um how much money we were able to net depending upon all of these different uh on on each of these individual days let me just go TF and matric C multiplication and ice cream price and ice cream sales and run it and you can see right here that based off of that how many different items we sold and the different prices for each of those ice creams we were able to net 246 dollars obviously this is made up data but whatever 246 dollars on Saturday and then Sunday Monday Tuesday and so forth and so on okay all right so there we are went through a lot of stuff about tensors and all of these different things now we're finally actually going to start creating models and making predictions now you see something here that is extremely important and it is called normalizing and one hot encoding or data now remember uh what I said before about how all of our data is all messed up so I'm going to come in here and I'm going to go MBA data numeric just a review okay so what we're going to do is we're going to convert our non-numeric data into numbers but all of our data is basically on completely different scales like I said before we have for I don't even know what MP is did we figure that out anyway field goals we have 265 then for a field goal percentage we have 0.439 we have all over the place salary is this salary is two million seven hundred thousand this is crazy we can't work with data like this so we're going to have to normalize our data to a common scale and we can do this um while also not distorting differences in the wide range of values and we will convert all the values to between zero and one while preserving their original distribution that they have and I'm going to show you exactly how to do that right now and then what I'm also going to do is you know jump into actually creating models so let's go and get rid of this we know that our data is messy and not normalized and by one hot encoding what that means is we are going to uh take things like letters like um position and things like that and we're going to convert those into numerics as well do you have a tutorial I do not have any videos on opengl sorry okay so we have all of these different pieces here and what I want to do is I'm going to use I'm going to go and convert all of these let's get this over here and I'm just adjusting my window a little bit whoops I don't want to do that adjusting my window a little bit so that I can see all the comments because I have to look at comments on YouTube and also on Twitch this is something that's a little bit weird okay so what are we going to do here I am going to one hot and code and which means converting things like letters and non-numerics into a Pneumatic a numeric format and I'm going to normalize which means I'm going to take all these other values and put them on a common scale of between the value of 0 and 1. so pretty easy to do um I am going to do a column transformation here so I'm just going to call this CT I'm going to call this make column trans former and then what I'm going to do is let's go and put this on a separate line I'm going to go min Max scalar and this is in our Imports at the very top of this of the screen so like the very first thing we did when we started this and let's say that I want to just do a transformation on age and points so I'm going to say age and points so that means I'm going to try to make predictions what am I doing here I'm going to try to make predictions based off of only the age of the player as well as the total number of points the player scored so let's go like this and let's go like this and there we are and what this is going to do is it's going to normalize the the values in those two columns two values between zero and one um okay and what's up next well one hot encoding that means I'm going to convert things like letters and words and things like that into also numerix so one hot encoder is what you use to do that and I am going to I actually can't do anything because I don't have any uh I'm not going to work with this but I'm doing this so that you can see it so I'm going to say handle unknown I'm gonna say ignore for this and then what I'm going to do is I'm going to show you let's say you kept position or team or something like that inside of your data set I'm going to show you how to hot one hot and code it so how you would do it is just come in here and do something like position I would have to go back and redo all of this to make this work so I'm not going to do that so what I'm going to do instead so this I believe is all covered right um in Max make sure I closed all the parentheses okay so what this would do for us is it would go and convert the age and points to a common scale between zero and one and it would convert all the different positions like Center and forward and all those different things into numerix as well you could also put team inside of here that was another one cover I'm not going to do it in this video but I'm showing you how to do it if you have data that needs one hot encoded okay um so I'm just gonna leave that there I'm gonna copy this and then I'm going to sort of tap that out and then I'm going to paste this again and I'm going to use just my my numeric data here to analyze this all right so we have agent points that's all good and we can just go like this and that all right so okay there we are we are going to normalize all of our data right like that whoops what I do wrong um did I have too many of these I might have let's go there okay so it worked I'm gonna go and add in even more data points as we continue here but this is where we're going to stop okay the next thing I need to do is I need to separate my data into features and labels the features are going to be all the NBA player stats and the labels are going to be one thing which is salary which is what we're trying to predict so how you do that is also pretty pretty easy I'm going to label all the features as X because this is just something that is always basically done NBA and mainly these are going to be X and Y so I'm going to come in here and I'm going to go MBA underscore data like that and I'm not going to be using I'm going to be using MBA data instead of MBA numerix I use numerix before because I I just I wanted to um and then his circumstance I'm going to be normalizing this data so I don't want to do that okay so I'm going to drop salary because that's the thing we're trying to guess and I'm just separating the salary column data from the other ones and these are what we call our features so these are just the NBA NBA stats okay so that's what those are I am also going to come in and I'm going to have y this is going to be the label or the salary I'm going to say MBA data and salary so their salary and I this is going to be the label or it's going to be the player salary so we're taking all the NBA stats and we're trying to figure out what is uh the salary for the player based off of that information okay so I got that what do I need to do now well I need to separate into training and testing data what we're going to do is we're going to train our models on 80 of our data and then we're going to use the remaining 20 of our known data like we know this is correct and we're going to have our model try to predict the last 20 percent and then based off of that we'll know how accurate our model is so pretty awesome stuff so how do we do that how do we separate our training and our testing data well again I'm going to say the training is going to be 80 percent and the test is going to be 20 we need training data we also need testing data another problem with this data set is there's only about 350 data points in it I'll get more into why models work and why they don't work that's not that much data uh we would like to have more data to we always wanted would like to have more data but just giving you a heads up that that is something okay so I want to go and get my training data for our features the test data for our features and also the training data for our labels and um which is why and the training data for our test data for our labels I think I said all that right maybe I didn't I think you understand either way so we're just going to go X and Test and y and train and why and test and there's a function that I imported earlier on and it is called train and test and split and you just give it your data so that's going to be my features X why my labels I am going to say that the test size is going to be 20 just like I just I think I explained pretty well uh size is 20 and then I'm going to do random States this is the random seed remember if we ever want this Randomness to be reproducible in the future we just that's the seed okay I think did I do everything right no I didn't I just don't see something there's something wrong here okay I see train test um yeah I knew there was something wrong train test split is the name of that now it looks right let's run it and it worked okay so now what I need to do is I need to transform our training and our test data and what are we going to be doing well we're going to be using this stuff up here we're going to be converting our age and our points to normalized data and then I'm going to come back and I'm going to we're going to examine whether adding more data points is going to make our predictions more accurate or not so I need to transform that data into normalized on the same scale so how do we do it well we just it's very easy actually um [Music] whoops hey a drink this is green tea that's what that is it's kind of gross I drink very thick green tea okay so what we're going to do here is I am going to go CT and fit and this is going to normalize our data and one hot encode our data also if we did that so train like this and then I'm going to transform the training and test data so let's just go in here and I'm going to go X train normal is what I'm going to call this is equal to CT transform X train and then I'm also going to do the test data normal equal to CT transform X test and run it is that right and what'd I do wrong um it's at fit X train oh well this is wrong you guys should have cult caught me on that and said hey there's something wrong with here that looks like a weird variable name okay let's run it again and there it worked okay so our new data now that is normalized and it would be one hot encoded if we set it up like that is now going to look like this so I'm going to say train and normal and now you can see that the age see everything is in the same range so instead of having this is the age good Lord let's just get rid of that but you see you can see the difference between the age and so forth and so on so it's all with between the values of zero and one all right I'm not going to run that because it filled up my whole entire screen it was impossible to see what was going on and now we finally reached the part where we can build a neural network and I'm going to take a brief break and then I will be back and do so and that will be the last break of the day so I will be back in a second all right I'm back um I am going to get rid of this all right cool it's gone okay so um everybody can hear me good do you ever see a person drink so much green tea I drink green tea like crazy like most people drink coffee I can't drink coffee actually okay so build neural network with our training data this is what we have been working towards okay so I am first off going to set a random seed I think everybody should understand why I am doing this so I'm going to say random seed oh random set seed and I'm going to just make this 66 66 has no importance I could make this anything so just so I can reproduce it now neurons are if you're wondering what a neuron is in a neural network they're just nodes through which data and computations flow I showed you previously I don't have the slide here but remember that pretty picture I showed you on the slide the very beginning that represents the neurons and the pathways through which they're flowing now what I want to do is I want to create a model and sequentially go through all of my data I'm going to call this MBA model is going to be equal to and I go TF or tensorflow Keras and if you want to sequentially you can there's multiple different as we continue here covering more and more you're going to see multiple different ways of working with this data okay get rid of that next time I will not have all these stop popping up okay so now what we're going to do remember I talked about hidden layers I am going to create four hidden layers and if you can what you should do is you should go through and you should actually play with this this is one thing you definitely should play with what hap how accurate is the data if I use one hidden layer how accurate is the data if I use 10 um so you know that's something to play with okay so cares like this and what this did what this is doing remember is it's taking our features that would or import into this and it is going to um it is going to um try to make sense of it try to organize those features that input data the MBA player statistics and convert it to a final salary that's what the hidden layers do and it's different sometimes having a lot of layers is good sometimes it is bad there's something called overfitting data and if you over fit data what it'll do is it will very accurately convert features into a final price based off the training data but then it'll be wildly interaccurate whenever we try to test that data so and I'm going to use activation here and uh activation functions are going to help our Network decide what of the data we provide to it is actually important and what I am going to do I'm going to go this I'm going to be using relu there's multiple different ones and I will cover them as the tutorial continues I will constantly cover all the different options you can use so that we can test and see which is the best for us why is activation there it goes okay now I'm just going to copy this like this and make what I say I was going to make four layers I think so so I'm going to make this a hundred also is a hundred and uh the last one this is just going to be a one and I'm going to get rid of this last parenthesis that is right here and I think everything else is all set up so that what's going on just a 100 explain what's going on when we create rate this model is these uh hidden layers are going to receive the data they're going to add weightings to the data to try to figure out which data provided is the most important to ultimately figure out what the salary for the player should be all right and how it does this to be specific is going to be depending upon the error so let's say we take the age and the what did we do I forget no age and points is that what it was agent points so it goes and it gets age and points data and it's going to say okay how much weighting should in when I say waiting how important is age or how important is points on a percentage basis to calculate the final salary amount okay that's what it is um and if we do not add in this activation it's just going to do a simple uh regression which is just going to be a straight linear plot which is something that we do not want and uh relu what that stands for is reactified linear unit and it's just a function that is very computationally efficient in making its calculations so that's why we're going to use it okay so that is how we're going to create our model now after we create our model you can see the points of actually creating a neural network we go to gather our data we have to create our model model which is defining our hidden layers and such and then we have to compile the model and what it's going to do the com compilation of this is going to calculate errors wow thank you very much for the donation Vex code I greatly appreciate it it's going to calculate the errors and it's going to optimize and also evaluate again and what we're going to be using is the mean absolute error and I might as well just go and create it so I'm going to go MBA and model dot compile and here's our loss function by loss what I mean is how accurate are our predictions so loss like how much are we off with each of our predictions so thought Heroes dot losses dot m a e there it is there is mean absolute error just how much is our prediction off from the actual that's what it's calculating then we're going to also Define an Optimizer and there's multiple different optimizers in this circumstance I think that the atom Optimizer is going to work well but we should test other optimizers and we shall as the tutorials continue this is an extremely complicated concept you're learning here so can't really be covered all in one video unless the video is you know I don't even know I won't here is our learning rate and um what we're doing there's two main optimizers you're probably going to use you're going to use SGD which stands for stochastic gradient descent and how it works is it's going to optimize our guesses by smoothing the predictions to our observed data which means our actual salaries for our NBA players and the atom is also going to do something quite similar and what the learning rate is doing this is starting to get a little bit complicated um okay so if we okay whenever we are I probably should have a slide for the learning rate and that would make it more sense but what the learning rate is going to do is it is going to how do I explain this all right so it goes and makes a prediction on an MBA salary based off of the data we provide like age and points what the learning rate does is it says okay well whoops we're off by a good bit how much should we adjust our weighting for our next prediction and the higher this number is the quicker it will find an accurate prediction for salary so 0.01 is a pretty high accurate or a high learning rate it's very common to find something like zero whoops zero point like that I'm just gonna stick with this though right now just to test our results so it's learn by learning rate it's saying how much should we adjust our weightings uh for our next guess all right now the problem with having a large learning rate is you might overshoot what your goal is in regards to predicting however if you have a very small learning rate like that it's going to take longer to get to your desired um your desired prediction all right hopefully that made sense if it doesn't you could ask me and I will if you or you could just say that didn't make sense and I'll try to explain it in another way I should have used I should have had a slide for that um and the Met metrics that we are going to measure here is going to be our error um so this is going to be our mean absolute error and that's gonna so how much are we off with our predictions okay so that's what the metrics part is okay made our model we compiled our model now we need to fit our model so I'm going to call this fit data is equal to NBA uh if you have sound sample and I want to make a multi-label classification I'm going to cover multi-label classifications in the next video um I I I I'm that's what that's the next video that I'm going to make and what that would be an example of that was let's say you had like a whole bunch of pictures of food and you wanted to only pick those pictures of food that had fish in them okay so that would be multiple labels so you could have like shrimp and cod and salmon and those would all match but if it was a steak it wouldn't match okay that's next video okay learning rate makes sense why does the model training well the model knows it's done whenever you tell it to be done which is what I'm going to do right now so I'm going to call this I'm going to go NBA model and fit this is that was a perfect uh question because that's what I'm providing so what I provide here is going to be the training data for X which is going to be normalized and then the Y training and epochs and I'm gonna have that be 100. now or I could you could change this as much as you want basically what an Epoch is is it's a full iteration over our training data so what we're saying here is we want to do a full iteration through our training data 100 times to try to make these predictions and you're going to see how accuracy affects depending upon how many epochs we increase because that's something I'm probably going to play with and I'm going to say verbose no all that means is I don't want as it's creating all this I don't want this stuff spilling all over the screen okay so let's just do that and sometimes this can take a couple seconds but Google computers are pretty fast if I would have taken verbose off of there it would have shown that the model went but you can see the check mark here so that means that everything worked now based off of this I want to evaluate my model and see if it's any good or not so I'm going to say MBA and model loss and NBA model um Mae which is going to tell us how much our predictions were off and it's very easy to find that out you just go NBA model and call the evaluate function and all this code is going to be on GitHub uh probably an hour from now I'm gonna put it all up there and do you guys like this do you like this me doing live coding better than my normal thing or not because I've been tired of considering actually redoing this tutorial in a non-uh a non um live way also I don't know and also uh one question I have for you viewers is does is the resolution truly better on Twitch than it is on YouTube if somebody could tell me that I would greatly appreciate that I guess I can find out afterwards okay so I'm gonna run and we can see that we are off by four million dollars which these which sounds horrendous and it's not spectacularly good but it um you know we're dealing with almost every NBA player is making a million dollars so it's off by 4 million what we can do however is show how efficient our model was by coming in here and actually plotting the predictive capabilities as we go do you have any computer vision tutorial available I don't have any yet that's probably going to come not next week but probably the week after is whenever I am going to do that and I'm also entertaining doing a tensorflow lite tutorial also where you can do like stuff with a Raspberry Pi it's oh there's a bazillion things all right so what I want to do here is I want to plot to see how increasing the number of epochs is going to decrease My overall error so I'm going to go plot and figure and I'm going to change this to figure size and do I want this to be 30 by 20 Eric Rosen was arrested for money laundering I do not know if that's true or not Eric Rosen is a YouTuber who who does chess videos I kind of doubt that's true but I don't know anything's possible in this world I think we all can say that anything is possible okay so what I want to do is I want to plot my loss and remember while we made our prediction right here based off of two things age and points scored that's it so we're going to take a look at what happens whenever we increase that to uh we increase the number of epochs and also when we increase the number of Statistics we need to make predictions all right so I'm using Seabourn here that's what SNS stands for and I'm just going to do a simple line plot that's going to show how our accuracy improved with our model and um fit data dot history is going to tell us this so we're on that and there you can see all right so when our model first started off it was off by over nine million dollars and you can see here it is going down down down down down down down and now we're off by about five so hey we did a lot for each different iteration through training we went from nine to four million that's very good what happens if we take this up to 200 okay and you know just so you remember an Epoch is a full iteration over our training data that's what we got here okay so I'm going to come in here let's run that and we can see if increasing the number of epochs is going to increase the performance of our model and let's see hey look at that now we're down to three okay so you can see now we went and dropped from four to three so that is pretty awesome another thing we can do is we can change what data we are going to be using for our model where is that where did I Define that let's come up here there we are here's more so we're getting better and better at what point is the model over fit or under fit well whenever your results dramatically change thank you very much Bobby Fisher man I greatly appreciate it a bit but I do not know anything about Eric Rosen I know who he is but I don't know anything specifics about his his life you know the YouTubers and all that they have a tendency to get crazy sometimes with all their money and uh I'm not saying that anything happened but who knows okay so just to speed things up here a little bit I'm gonna say okay let's see okay so I just made our prediction based off of age and points that's all I use that's like no data at all but if we come up here and look at this and this is why charting all this information is extremely important so we go age and points um agent points so here is age 0.39 so it doesn't look like salary has absolutely anything to do with the age of the player I just picked that randomly uh points most definitely does and remember 0.64 this guy right here free throws the number of free throws a player makes in a game is uh the highest okay so what we can do now is we can take that different data and we can go and make changes now would I lose everything okay so what was it free throws and I see I didn't wasn't paying attention Okay well let's see if I can out produce um a a statistician in it an expert on analyzing players okay one of the most common things used to make predictions in regards to the quality of an NBA player is something that's called NBA efficiency let's see if the experts are better at making predictions than I am um normally I would guess yes the answer would be true so NBA efficiency I have read I am not an expert on on MBA statistics but whatever MBA efficiency has a great bearing and is a great way of judging the overall capabilities of a player let's see if it's better than my me being off about 3 million well the major difference here is going to be the different things here I'm gonna I'm just gonna copy and paste so you don't have to watch me do this so I'm just going to copy and paste okay this is how the NBA judges this is points this is whatever that rebounds trb I think I don't even know Steels blocks field goal percentage free throw percentage turnovers who cares we got all that all right so we're just going to run that that is how the NBA judges that now we have to run all this other stuff so bear with me while I do that I'm transforming everything here um compiler model so I got the 3.7 and don't worry my models are going to get much much much more accurate it's just whenever you start off if I started getting into weightings and things like that this could go off the rails so I just want to try to keep it okay so I'm more efficient I'm better at making salary predictions than the NBA experts at least in that situation another thing let's come in and let's say well what if we use everything what if we use every single statistic to make player evaluations is that better well we can just throw this in here and find out so there's everything there's every single statistic and we know some of them are horrible in regards to correlation with salary and some of them are very very good so we'll do that and run that and we'll build our model and we will test again so here's our model it's how fast everything is it's unbelievable um okay we ran it let's see if everything is better than the NBA experts oh yes so everything was better we dropped down to 3.6 million which is is considerably better than what we had before um do I have anything else well I know that um High correlated features like uh field goals and free throws free throw attempts turnovers and points we're all highly correlated to what I to my salary so let's try just doing those um I will grab it high correlated features so that's these guys right here actually age was not not something that was good so throw this in here run it and run it and so everything is the best so far so let's see if highly correlated is better so we got 3.6 and as we make changes it's our our accuracy is going to continue to go up and up and up and up and up until we are really good okay and look at that that's kind of surprising actually adding only correlated assets actually made our accuracy go down all right so there's a lot of stuff I think that if you watch this whole video and you completely understand it you have taken a major step in regards to understanding how tensorflow works and um anybody have any other questions has this practice of analyzing data with python helped you make capital gains yes we are going to eventually start applying all of this stuff to um to the stock market yeah of course for educational purposes only I am not going to recommend any stocks I only did that one time and I got lucky to be honest somebody dared me to pick a stock that would uh this was in my python for finance video tutorial series somebody dared me to they were like oh you don't know what you're talking about if you if you're so smart get give us a stock prediction and see what happens and I got lucky stock went up three over 300 percent in like three days so that was luck I I was it was some education but it was also a little bit of luck all right so that's it um uh I guess everything was all right um I don't know what do you guys think about this becoming and every does this an every Wednesday type of thing is this in every other Wednesday type of thing what what do you think in regards to that I already told you I'm going to be doing um uh I'm gonna be doing lots of things in the next video probably what I'm going to do is is analyzing photographs and things like that and making predictions based off of whatever the photograph is that's probably the next video I don't know if that'll be up next week or not either way thank you to everybody that joined me today I normally hang around a little bit after the video is over to answer any other questions um every other Wednesday would be night every other Wednesday okay I might do that I might say that I will not next Wednesday but the Wednesday after that I will do another video like this and it'll be like a two hour video that covers crazy stuffs any prerequisites for upcoming computer vision tutorials a basic understanding of python that's what you need to understand I'll structure the videos so that you will completely understand them as long as you understand python and just so you understand also sometimes I just say hey this is a common formula that we use to do whatever and just understand that like you you know like I can either go deep into the weeds and require you to understand linear algebra and very very complicated statistics and probability mathematics or I can sort of just say hey in this circumstance just understand this is a Formula you use you don't really necessarily need to understand every single thing that's going on with it okay and I think that's probably a better way of approaching that thank you very much for uh the donation Bobby Fisher Maine you are the man all right thank you so much um I'd be interested in stuff like this however often you decide to upload them I need an excuse to use both of them all right cool yes oh all right so everything thank you thank you thank you I greatly appreciate you guys I hope you really understand that and uh otherwise I will catch you in uh the Wednesday after next all right thank you talk to you later bye