ai and machine learning have become a revolution in the 21st century companies across the globe are using ai and machine learning techniques to solve business problems and increase growth hi everyone welcome to this live session on ai and machine learning full course by simply learn in this session we will cover all the important concepts techniques and algorithms that you need to know to become a successful ai or machine learning engineer our highly experienced trainers will take you through this live session but before we begin make sure to subscribe to the simply learn channel and hit the bell icon to never miss an update so we'll start off by learning the basics of ai and machine learning from short animated videos then we will understand the different supervised and unsupervised learning algorithms such as linear regression logistic regression decision trees svm knn as well as k-means clustering then we will sift our focus towards learning deep learning where we will look at what is deep learning and how neural networks work we will also cover two important deep learning frameworks tensorflow and keras finally we will understand about generative adversarial network and recurrent neural networks so let's get started picture this a machine that could organize your cupboard just as you like it or serve every member of the house a customized cup of coffee makes your day easier doesn't it these are the products of artificial intelligence but why use the term artificial intelligence well these machines are artificially incorporated with human-like intelligence to perform tasks as we do this intelligence is built using complex algorithms and mathematical functions but ai may not be as obvious as in the previous examples in fact a.i is used in smartphones cars social media feeds video games banking surveillance and many other aspects of our daily life the real question is what does an ai do at its core here is a robot we built in our lab which is now dropped onto a field in spite of a variation in lighting landscape and dimensions of the field the ai robot must perform as expected this ability to react appropriately to a new situation is called generalized learning the robot is now at a crossroad one that is paved and the other rocky the robot must determine which path to take based on the circumstances this portrays the robot's reasoning ability after a short stroll the robot now encounters a stream that it cannot swim across using the plank provided as an input the robot is able to cross this stream so our robot uses the given input and finds the solution for a problem this is problem solving these three capabilities make the robot artificially intelligent in short ai provides machines with the capability to adapt reason and provide solutions well now that we know what ai is let's have a look at the two broad categories an ais classified into weak ai also called narrow ai focuses solely on one task for example alphago is a maestro of the game go but you can't expect it to be even remotely good at chess this makes alphago a weak ai you might say alexa is definitely not a weak ai since it can perform multiple tasks well that's not really true when you ask alexa to play despacito it picks up the keywords play and despacito and runs a program and is trained to alexa cannot respond to a question it isn't trained to answer for instance try asking alexa the status of traffic from work to home alexa cannot provide you this information as she is not trained to and that brings us to our second category of ai strong ai now this is much like the robots that only exist in fiction as of now ultron from avengers is an ideal example of a strong ai that's because it's self-aware and eventually even develops emotions this makes the ai's response unpredictable you must be wondering well how is artificial intelligence different from machine learning and deep learning we saw what ai is machine learning is a technique to achieve ai and deep learning in turn is a subset of machine learning machine learning provides a machine with the capability to learn from data and experience through algorithms deep learning does this learning through ways inspired by the human brain this means through deep learning data and patterns can be better perceived ray kurzweil a well-known futurist predicts that by the year 2045 we would have robots as smart as humans this is called the point of singularity well that's not all in fact elon musk predicts that the human mind and body will be enhanced by ai implants which would make us partly cyborgs so here's a question for you which of the below ai projects don't exist yet a an ai robot with citizenship b a robot with a muscular skeletal system c ai that can read its owner's emotions d ai that develops emotions over time give it a thought and leave your answers in the comments section below since the human brain is still a mystery it's no surprise that ai2 has a lot of unventured domains for now ai is built to work with humans and make our tasks easier however with the maturation of technology we can only wait and watch what the future of ai holds for us well that is artificial intelligence for you in short we know humans learn from their past experiences and machines follow instructions given by humans but what if humans can train the machines to learn from the past data and do what humans can do and much faster well that's called machine learning but it's a lot more than just learning it's also about understanding and reasoning so today we will learn about the basics of machine learning so that's paul he loves listening to new songs he either likes them or dislikes them paul decides this on the basis of the song's tempo genre intensity and the gender of voice for simplicity let's just use tempo and intensity for now so here tempo is on the x-axis ranging from relaxed to fast whereas intensity is on the y-axis ranging from light to soaring we see that paul likes the song with fast tempo and soaring intensity while he dislikes the song with relaxed tempo and light intensity so now we know paul's choices let's say paul listens to a new song let's name it as song a song a has fast tempo and a soaring intensity so it lies somewhere here looking at the data can you guess whether paul will like the song or not correct so paul likes this song by looking at paul's past choices we were able to classify the unknown song very easily right let's say now paul listens to a new song let's label it as song b so song b lies somewhere here with medium tempo and medium intensity neither relaxed nor fast neither light nor soaring now can you guess whether ball likes it or not not able to guess whether paul will like it or dislike it are the choices unclear correct we could easily classify song a but when the choice became complicated as in the case of song b yes and that's where machine learning comes in let's see how in the same example for song b if we draw a circle around the song b we see that there are four words for like whereas one would for dislike if we go for the majority votes we can say that paul will definitely like the song that's all this was a basic machine learning algorithm also it's called k nearest neighbors so this is just a small example in one of the many machine learning algorithms quite easy right believe me it is but what happens when the choices become complicated as in the case of song b that's when machine learning comes in it learns the data builds the prediction model and when the new data point comes in it can easily predict for it more the data better the model higher will be the accuracy there are many ways in which the machine learns it could be either supervised learning unsupervised learning or reinforcement learning let's first quickly understand supervised learning suppose your friend gives you one million coins of three different currencies say one rupee one euro and one dirham each coin has different weights for example a coin of one rupee weighs three grams one euro weighs seven grams and one dirham weighs four grams your model will predict the currency of the coin here your weight becomes the feature of coins while currency becomes the label when you feed this data to the machine learning model it learns which feature is associated with which label for example it will learn that if a coin is of 3 grams it will be a 1 rupee coin let's give a new coin to the machine on the basis of the weight of the new coin your model will predict the currency hence supervised learning uses labeled data to train the model here the machine knew the features of the object and also the labels associated with those features on this note let's move to unsupervised learning and see the difference suppose you have cricket data set of various players with their respective scores and wickets taken when you feed this data set to the machine the machine identifies the pattern of player performance so it plots this data with the respective wickets on the x-axis while runs on the y-axis while looking at the data you will clearly see that there are two clusters the one cluster are the players who scored higher runs and took less wickets while the other cluster is of the players who scored less runs but took many wickets so here we interpret these two clusters as batsmen and bowlers the important point to note here is that there were no labels of batsmen and bowlers hence the learning with unlabeled data is unsupervised learning so we saw supervised learning where the data was labeled and the unsupervised learning where the data was unlabeled and then there is reinforcement learning which is reward based learning or we can say that it works on the principle of feedback here let's say you provide the system with an image of a dog and ask it to identify it the system identifies it as a cat so you give a negative feedback to the machine saying that it's a dog's image the machine will learn from the feedback and finally if it comes across any other image of a dog it will be able to classify it correctly that is reinforcement learning to generalize machine learning model let's see a flowchart input is given to a machine learning model which then gives the output according to the algorithm applied if it's right we take the output as a final result else we provide feedback to the training model and ask it to predict until it learns i hope you've understood supervised and unsupervised learning so let's have a quick quiz you have to determine whether the given scenarios uses supervised or unsupervised learning simple right scenario 1 facebook recognizes your friend in a picture from an album of tagged photographs scenario 2 netflix recommends new movies based on someone's past movie choices scenario 3 analyzing bank data for suspicious transactions and flagging the fraud transactions think wisely and comment below your answers moving on don't you sometimes wonder how is machine learning possible in today's era well that's because today we have humongous data available everybody is online either making a transaction or just surfing the internet and that's generating a huge amount of data every minute and that data my friend is the key to analysis also the memory handling capabilities of computers have largely increased which helps them to process such huge amount of data at hand without any delay and yes computers now have great computational powers so there are a lot of applications of machine learning out there to name a few machine learning is used in healthcare where diagnostics are predicted for doctor's review the sentiment analysis that the tech giants are doing on social media is another interesting application of machine learning fraud detection in the finance sector and also to predict customer churn in the e-commerce sector while booking a gap you must have encountered search pricing often where it says the fare of your trip has been updated continue booking yes please i'm getting late for office well that's an interesting machine learning model which is used by global taxi giant uber and others where they have differential pricing in real time based on demand the number of cars available bad weather rush r etc so they use the surge pricing model to ensure that those who need a cab can get one also it uses predictive modeling to predict where the demand will be high with the goal that drivers can take care of the demand and search pricing can be minimized great hey siri can you remind me to book a cab at six pm today okay i'll remind you thanks no problem comment below some interesting everyday examples around you where machines are learning and doing amazing jobs so that's all for machine learning basics today from my site welcome to artificial intelligence tutorial what's in it for you today what is artificial intelligence so we'll start with a general concept types of artificial intelligence covering the four main areas ways of achieving artificial intelligence and some general applications of artificial intelligence in today's world finally we'll dive into a use case predicting if a person has diabetes or not and we'll be using tensorflow for that in the python environment what is artificial intelligence in today's world is probably one of the most exciting advancements that we're in the middle of experiencing as humans so what is artificial intelligence and here we have a robot with nice little clampy hands hey what am i you are what we call artificial intelligence i am your creator reminding him who programmed him and who he's supposed to take care of artificial intelligence is a branch of computer science dedicated to creating intelligent machines that work and react like humans in today's place where we're at with artificial intelligence i really want to highlight the fact that they work and react like humans because that is where the development of artificial intelligence is and that's what we're comparing it to is how it looks like next to a human thanks any tasks you want me to do for you get me a cup of coffee poof here you go he brings him a cup of coffee that's my kind of robot i love my coffee in the morning you'll see there's kind of a side theme of coffee today's uh tutorial so here we have a robot that's able to bring him a cup of coffee seems pretty far out there you know that we have a walking robot that's doing this but it's not as far out there as you think we have automatic coffee pots we have devices that can move an object from one point to the other you know anything from amazon delivering packages or whatever you want all these things are in today's world we're right on the edge of all this that's what makes this so exciting such an exciting topic and a place of study so let's take a look a real general look at types of artificial intelligence in the really big picture i mean we drill into it there's like machine learning and all kinds of different aspects but just a generic level of artificial intelligence so hi there discovered four different types come have a look so we have our kind of einstein looking guy or professor at some college so the very first part that we're really this is uh as reactive machines they've been around a long time this kind of ai are purely reactive and do not hold the ability to form memories or use past experiences to make decisions these machines are designed to do specific jobs remember i talked about the programmable coffee maker that makes coffee in the morning it doesn't remember yesterday from tomorrow it runs its program even to our washing machines they have automatic load balancers that have been around for decades so reactive machines we program to do certain tasks and uh they have no memory but they're very functional then we have limited memory this is kind of right where we're at right now this kind of ai uses past experience and the present data to make a decision self-driving cars are kind of limited memory ai we bring up self-driving cars because that's a big thing especially in today's market they have all these images that they brought in and videos of what's gone on before and they use that to teach the automatic car what to do so it's based on a limited memory limited memory means it's not evolving new ideas they have to actually when they need to make a change they have the program built on the memory but then they have to make those changes outside of that and then put the new setup in there so it's continually reprogramming itself theory of mind these ai machines can socialize and understand human emotions machines with such abilities are yet to be developed of course that's under debate there's a lot of places that are working on the theory of mind and having be able to cognitively understand somebody based on the environment their facial features all kinds of things and evolve with them so their own they have some kind of reflective ability to evolve with that and finally self-awareness this is the future of ai these machines will be super intelligent sentient and conscious so they'll be able to react very much like a human being although they'll have their own flavor i'm sure achieving artificial intelligence so how in today's world right now are we going to achieve artificial intelligence well the main arena right now is machine learning machine learning provides artificial intelligence with the ability to learn this is achieved by using algorithms to discover patterns and generate insights from the data they are exposed to and here we have a computer as we get more and more to the human realm of art and official intelligence you see this guy splits in two and he's inside his hidden networks and machine learning programs deep learning which is a subcategory of machine learning deep learning provides artificial intelligence the ability to mimic a human brain's neural network it can make sense of patterns noise and sources of confusion in the data let's try to segregate different kinds of photos using deep learning so first we have our pile of photographs much more organized than my boxes in my closet of photographs the machine goes through the features of every photo to distinguish them this is called feature extraction bingo it figures out the different features in the photos and so based on those different features labels of photos it says these are landscapes these are portraits these are others so it separates them in there if you've ever been on google uh photos a favorite of mine i go through there and i type in dog and i get all the images of my dog it's pretty amazing how it's able to pull those features out so when we're achieving artificial intelligence and the deep learning let's take a closer look and see just how deep learning works there are a lot of models out there this is the most basic model used this is a neural network there are three main layers in a neural network the photos that we want to segregate go into the input layer and so we have a picture of mona lisa it looks like and a couple of people when it goes into the first input layer the arrows are drawn to like individual dots each one of those white dots in the yellow layer or in the input layer would be a pixel in the picture maybe a value for that pixel so the picture of the mountains goes in there and it fills in that whole input layer and then the next picture of the nightscape goes in there the hidden layers are responsible for all the mathematical computations or feature extraction on our inputs so when you see here we have the two uh kind of orangish red layers here and there's all these lines in there those lines are the weights each one of those represents a usually a float number or a decimal number and then it multiplies it by the value in the input layer and you'll see that as it adds all these up in the hidden layer each one of those dots in the hidden layer then has a value based on the sum of the weights and then it takes those and puts it into another hidden layer and you might ask well why do we have multiple hidden layers well the hidden layers function as different alternatives to some degree so the more hidden layers you have the more complex the data that goes in and what can they can produce coming out the accuracy of the predicted output generally depends on the number of hidden layers we have so there we go the accuracy is based on how many hidden layers we have and again it has to do with how complex the data is going in the output layer gives us the segregated photos so once it adds up all these weights and you'll see there's weights also going into the output layer based on those weights it'll say yes it's a portrait no it's a portrait yes it's a landscape no it's a landscape and that's how we get our setup in there so now we were able to label our photos let's take a look and see what that looks like in another domain we have photograph domain now let's look at it in the airline ticket so let's predict the airline ticket prices using machine learning these are the factors based on which we're going to make the predictions and choosing your factors is so important but for right here we'll just go ahead and take a look what we got we have airlines we have the origin airport the destination airport the departure date here are some historical data of ticket prices to train the machine so we pull in the old data this is what's happened over the years now that our machine is trained let's give it new data for which it will predict the prices and if you remember from our four different kinds of machines we have machines with memory well this is the memory part it remembers all the old data and then it compares what you put in to produce the new data to predict the new prices the price is 1000 that's an expensive flight of course if that is uh us dollars if it's india i don't know what the price range is on that hopefully it's a good deal for wherever you're going so we predict the prices and this is kind of nice because if you're looking way ahead it'd be nice to know if you're planning a trip for next year how much those tickets are going to cost and they certainly fluctuate a lot so let's take a look at the applications of artificial intelligence we're going to dive just a little deeper because we talked about photos we've talked about airline ticket prices kind of very specific you get one specific number but let's look at some more things that are probably more in home more common right now speaking of in home this young gentleman is entering his room of course makes a comment like we all do we walk into a room this room is dark isn't it let's see what happens when i enter it the sensors in my room detect my presence and switch on the lights this is an example of non-memory machines okay you know it senses you it doesn't have a memory whether you've gone in or not some of the new models start running a prediction as to whether you're in the room or not when you show up when you don't so they turn things on before you come down the stairs in the morning especially if you're trying to save energy might have one of those fancy thermostats which starts guessing when you get up in the morning so it doesn't start the heater until it knows you're going to get up so here he comes in here and this is one of the examples of smart machine and that is one of the many applications of artificial intelligence one of the things we're developing here in our homes okay bye abruptly he leaves leaving his nice steaming cup of coffee there on the table and when he leaves the sensors detect he's gone and turn off the lights let's watch some tv did someone say tv the sound sensors on the tv detect my voice and turn on the tv sounds kind of strange but with the google dongle and a google home mini you can actually do just that you can ask it to play a movie downstairs on your downstairs tv true also with the amazon fire stick and all the different artificial intelligent home appliances that are coming out pretty amazing time to be alive and in this industry here you have one more application of artificial intelligence and you can probably think of dozens of others that are right now on the cutting edge of development so let's jump into my favorite part which is a use case predict if a person has diabetes and in the medical domain i mean if you read any of these they say predictive a person has diabetes but in the medical domain we would want to restate this as what is the risk is a person a high risk of diabetes should just be something on their radar to be looking out for and we'll be helping out with the use case there we are with a cup of coffee again i forgot to shave as you can see we'll start with the problem statement the problem statement is to predict if a person has diabetes or not and we might start with this prediction statement but if you were actually using this in a real case again you would say your high risk of diabetes would be the proper way to present that to somebody if you ran their test data through here but that's very domain specific which is outside of the problem statement as far as the computer programmers involved so we have a look at the features these are the things that have been recorded and they already have this data from the hospitals one of them would be number of times pregnant obviously if you're a guy that would be zero glucose concentration so these are people who've had their glucose measured blood pressure age age is a big factor an easy thing to look at insulin level so if someone's on insulin they probably have diabetes otherwise they wouldn't be taking insulin but that would affect everything else and we add that in there if they're if they're using insulin let's start off with the code so we're going to start off by importing parts of our data set and if we're going to do this let's jump into one of our favorite tools we're going to use the anaconda jupiter notebook so let me flip over there and we'll talk just briefly about that and then we'll look over this code so here we are in the jupyter notebook and this is an inline editor which is just really cool for messing with python on the fly you can do demos in it it's very visual you can add pieces of code it has cells so you can run each cell individually you can delete them but you can tell the cell to be a comment so that instead of running it it just skips over it there's all kinds of cool things you can do with jupiter notebook we're not going to go into that we're actually going to go with the tensorflow and you do have to import all the different modules in your python now if you're in a different python ide that's fine it'll work in just about any of your python ides and i'll point out the one difference in code when we get to that which is the inline for doing our graph and in this code we're going to go through we're going to load our data up we're going to explore it we're going to label it and then we'll use the tensorflow and then we'll get down to the end where we're going to take a look at how good the model is does it work so let's start with the top part here and let me go ahead and let's paste that code in there i've added a line in at the top i've added the folder because i didn't put it in the same folder as the program that's running you don't need to worry too much about that but if you have a folder issue or says file doesn't exist you probably have it in the wrong folder or you need to add the folder path and then we're going to import pandas and let me just one of the cool things since it's in internet chrome we can just do that control plus and zoom right in there and we have the import pandas as pd so go ahead and draw a line up there you can see that the pandas as pd add a little drawing thing in there to make it easier to see pandas is a data set it's a really nice package you can add into your python usually it's done as pandas as pd that's very common to see the pd here let's circle that it's basically like an excel spreadsheet it adds columns it adds some headers it has a lot of functionality to look at your data and the first thing we're going to do with that is we're going to come down here and we're going to read and this is a csv it's a csv file you can open up the file on your computer if you want if you need access to the file you can either search it and download it from the main thing or type a note in down below on the youtube video and simply learn will help you get that file also and that's true of any questions you have on this you can add comments down there and somebody at simplylearn actually keeps an eye on that and we'll try to answer those questions for you i put in the full path you don't have to if you have your data files saved in the same folder you're running your program in and this is just a csv file comma separated variables and with the pandas we can read it in there and we're going to give it put it right in a variable called diabetes and then finally we take diabetes and you'll see diabetes.head this is a pandas command and it's really nice because it just displays the data for us let me go ahead and erase that and so we go up to the run button up here in our jupiter notebook which runs the script in this that's a cell we're in and you can see with thediabetes.head since it's a pandas data frame it prints everything out nice and neat for us this is really great and if i go ahead and do a control minus i can get it all to fit on the same screen and this is the first five lines when you do the head of the data it just prints out the first five lines in programming of course we start with zero so we have zero one two three and four there we go we have yeah here we go we start with zero we have zero one two three and four we have our titles going across the top and of course the first column is number of pregnancies glucose concentration for the second one blood pressure measurements of triceps as we keep going you'll see you can guess what most of these things are group i'm not sure this is probably the test group i didn't look it up to see specifically what that is anytime you start exploring data and for this example we don't really need to get too much in detail you want to know the domain what does this information mean are we duplicating information going in that's beyond the scope of this for right now though we're just going to look at this data and we're going to take apart the different pieces of this data so let's jump in there and take a look at what the next set of code is so in our next setup or our next step we gotta start looking at cleaning the data we gotta start looking at these different columns and how are we gonna bring them in which columns are what i'm gonna jump ahead a little bit and also as we look at the columns we're going to do our import our tensorflow as tf that's a common import that's our tensorflow model tensorflow was developed by google and then they put it into open source so that it's free for everybody to use which is always a nice thing so let's take a look at what these two steps look like in our jupiter notebook so let's paste this code in here and this is something that happens when you're messing with data is you'll repeat yourself numerous times we already have diabetes head so i'm going to put the little pound sign which tells python that this is just going to be a comment we're not going to run it a second time in here we have diabetes dot columns and then diabetes columns to norm equals diabetes columns to norm.apply lambda x let's talk about that but let me first run that and let's see that comes out with that and then let's put the next set of code in also and i'm going to run that let me just hit the run button it doesn't really have an output but we did a lot of stuff here so let's go over what we did on this so i've turned on my drawing implement to help go through all this there's a lot going on here the first thing is we have all of our columns up here all of our column names let me just highlight that and draw a circle around that like we did earlier and we want to look at just specific columns and we're going to start let me change the color on this so it really sticks out i'll use like a blue i'm going to look at these columns and we're not going to do age class or group we're going to look at those in a different way and i'll explain in a little bit what that's about so we're looking at number of pregnancies glu basically anything that can be read as a float number and you might say age could be read as a float number we're going to address that differently because when you're doing statistics we usually take the ages and we group them into buckets and we'll talk about that and so we're going to take these columns and we're going to do what they call normalize them call them to normalize what is normalize if you're using the sk learn module one of their neural networks they call it scaling or scalar this is in all of these neural networks what happens is if i have two pieces of data and let's say one of them is in this case zero to six and one of them is in insulin level is zero to maybe point two would be a very high level i don't know what they actually go to a neural network we'll look at this 6 and it'll go 0 to 6 and as you get into 0 6 here's an 8 even a higher number it will weight those heavier and so it'll skew the results based more on the number of pregnancies than based on the insulin or based on the blood pressure so we want to level the playing field we want these values to all be in the same area we do that by normalizing it here and we use a lambda function this is always a kind of a fun thing we take the diabetes it's a panda set up and let me just erase all that so we can go down here put it back to red and we take our diabetes which is a panda data set and we only want to look at the columns to normalize because that's what we built we built a list of those columns and we're going to apply a lambda function and you'll see lambda in all kinds of python programming it just means it's a function we put x in and whatever is outside of the x over here it then takes that and returns that value so the new value is going to be whatever x is minus x min again it's a pandas data set so we can get by with that which is really cool makes it easy to read so we take x we look for the minimum value of x and then we divide that by the maximum value x minus the x min or the width and what this essentially does is if you're going from 0 to 15 it's going to divide by 15 and then we subtract it let's let's make this 2 instead of 0 or 20 uh 20 to 115 there we go first off we set this to 0 so we subtract x min so right off the bat the minimal value is going to be zero and then we divide it by the difference here x min minus x max or x max minus x min this basically sets all the values between zero to one and that's very important if we want to make sure that it's not going to be a skew in the results if it's not going to weight the values one direction or the other zero to one that's what this is doing and again there's like so many different ways to scale our data this is the most basic scale standard scaling and it's the most common and we spelled it out in the lambda you can there's actual um modules that do that uh but lambda is just such an easy thing to do and you can see really see what we're doing here so now we've taken these columns we've scaled them all and change them to a zero to one value we now need to let the program know the tensorflow model that we're going to create we need to let it know what these things are so to do that i create a variable called number pregnancy keep it simple and we've imported the tensorflow as tf one line of code brings our tensorflow in for us awesome love python so you get number of pregnancies and tf.feature column.numeric column we're going to tell it's a numeric column so it knows that this is a float value zero to one and then number underscore pregnant so we're taking that column and we've reshaped that column from zero to one and no that doesn't mean you have a point one pregnancy just means zero point one equals one or something number pregnant so we're just mapping this data so tensorflow knows what to do with it and we go through all of these we go through all the way up to again we're not doing age let me just separate age down here these have all been adjusted age we're going to handle differently we still put it as a numerical column because it is so let me just circle age and group we didn't even touch group yet we're going to look at group and age and we're going to do some special stuff with that class we're going to look at last why because this is whether they have diabetes or not that's what we're testing for so this is our history we know these people have diabetes or not and we want to find out we want to create a model and then we want to test that model predicting whether somebody new has diabetes or not so here i am still with my cup of coffee working away refill number three let's explore both the categorical group which is going to be the a b c and d on the end and also the age we're going to talk about those a little bit more and then we're going to combine all those features and we'll jump into splitting the data although that's kind of the next step but we'll put that all together what are we doing with the age and what are we doing with the group down there so let's flip on over and take a look at our jupiter notebook so the first thing is let's put this new set of code in here and run it and let's talk about what we're doing on this set of code in i actually want to go one more step forward on the next set of code or part of it there we go and i can go and just run that while we have it that's fine let's just scroll up a little bit here and let's go through what we just did with this set of code if we go back up here to the top part we have group b c b b c we learned from the header so when i come down here we're going to create a signed group and let me get my drawing pad out so we're looking at this first one up here assigned group tf feature column categorical column with vocabulary list that is a mouthful this is saying that this is a categorical list in this case we have a b c d you see this a lot in tensorflow a lot of times tensorflow is used to analyze text or logs error logs and they're just full of all this information and they take these in this case a b c d but in other cases it'd be words and they create numbers out of them so it has zero this is one column this is another column this is another column this is another column and it's just this way of being able to look at this in a different light and assigning them different values you really don't want to assign this a float value first thought might be let's assign this 0 to 3 0 1 2 3 for a b c d but if you did that it would try to add 1 plus 2 to get 3 as part of its math and functions and that just isn't true c does not equal a plus b c is just a categorical name it could have been the frank group and laura's group in a bids group but it's not you know these are they're not float numbers so very important you have that distinction in your data the next step we're looking at is our map plot and we're going to do this this is a cool useful tool we're going to import matplotlibrary.pi plot as plt standard import and this next line here is really important we're doing the matplot library we want it to be inline that's because we're in a jupyter notebook remember i told you before that if you're running this in a different ide you might get a different result this is the line you need if you want to see the graph on this page and then we do diabetes we're going to look at just the age and dot hist bins equals 20. what the heck is that well it turns out that because this is remember it's a pandas there's our pd pandas as pd because this is a pandas panda automatically knows to look for the matplot library and a hist just means it's going to be a histogram and we're going to separate it into 20 bins and that's what this graph is here so when i take the diabetes and i do a histogram of it it produces this really nice graph we can see that at 22 most of the participants in this probably around 174 of the people that were recorded were of this age bracket they're right here in this this first column here and as they get further and further down they have less and less and less until when you're at 80 you can actually see down here there's almost none in a couple of these categories and this is important to know because when you're doing census you know the older the the group is the more the people have passed away so you're going to have this in any of your senses as you take you're always going to have a much larger group in the younger crowd and then it gets lower and lower as they get older and i mentioned the word buckets here we go we're going to create an age bucket we're going to put people in buckets can you imagine people sitting in buckets kind of a funny way of looking at it this just makes it easier for us to separate it when i go to the doctor's office i don't want to be told well your age 22 this is you know you'll get a decent one at 22. here we are back to my cup of coffee i told you coffee was an underlying theme these next couple steps are very specific to tensorflow up until now we had some tensorflow as we set the categories up but you'd have to set that up with any model you use to make sure they're set correctly they'd be set up a little differently but we run input functions and in tensorflow this allows for some really cool things to happen and this is why tensorflow is so predominant in a lot of areas so let's take a look at these next couple lines of code here we're going to create the input function and we're going to go ahead and create the model and let me go ahead and put these codes in here i went ahead and split it up so it's different lines so we can talk a little bit about that and then the actual model and let me go ahead and run this so we can see what that looks like and in my particular environment it just prints out a little bit of information about the model not too worried about looking at that but we do want to take a closer look at what we did here so we've created an input function and again this is very specific to tensorflow with the input function we have our train we have our x equals x train and our y equals y train because we want to train it with a particular information but we have these other settings in here these two settings and the number of epics is how many times it's going to go over our training model epic means large that means all the data so we're going to go over it a thousand times that's actually a huge overkill for this amount of data usually it only needs probably about 200 but you know when we're putting it together and you're trying things out you just kind of throw the numbers in there and then you go back and fine-tune it sometimes the batch size is really important this is where tensorflow does some really cool things if you're processing this over a huge amount of data and you try to batch everything at once you end up with a problem this means we're only going to read 10 lines at a time through our data so each one of those rows of testing they've done we're only going to look at 10 of them at a time and put that through our model and train it and then shuffle self-explanatory we're just moving we're just randomly selecting which data and what order we go in that way if there's like you know five in a row that are kind of weighted one way and vice versa it mixes them up and then finally we create our model so the model goes in there and goes okay i have a tf.estimator.linearclassifier we're going to put in the feature columns equals feature columns we defined all our columns up above and then we have in classes equal two and that just means we have our out result is zero or one you have diabetes or you don't have diabetes or in this case we actually call it high risk of diabetes and then i put one more line of code in there which i forgot we've set up our model we set up our feature columns now we need to actually train it model.train you'll see this so common in so many different neural network models this is like a standard what's different though is we have to feed it the function remember we created this function with all this information on it and then we have steps in uh steps similar to number of batches and batch size it's more like a individual lines we step through a thousand is a lot more common for steps than epics but steps is used you probably leave this out in this particular example and let's go ahead and run this all together because it has a site when we start training it we get a different output so here we go i've run it it's given me the information from when i created the model and now it just starts going through we get this information tensor loss equals global step all kinds of stuff going on here and if you're following this it's just going through the steps and training it it gives you information you could dig deep into here but for this particular setup we're not going to go too deep on what's going on just know that we've trained our model this model now has the information we need in it to start running predictions so as we sip our next um take our next step of coffee or maybe it's tea or if you're one of those strange late night workers maybe it's a sip of wine we go into the next step and we actually want to run some predictions on here but we don't want to run the training we want to run the test on there we want to take our test data and see what it did and so that's what we're going to do next is we're going to run the test through and actually get some answers so if you were actually deploying it you would pull the answers out of the data it's bringing back let's take a look at that in our jupiter notebook so here we go let's paste it in there i'm going to go ahead and run it and you'll see that as it goes it's actually putting the answers out if you remember correctly well we'll walk through this here in just a second but it goes through and it runs each line and it gives you a prediction for each line one at a time and prints them out so let's take a look and see what that actually looks like let's start with this first part we've created another function this is just like the other function except for x equals x test and there's no why why is there no why because we don't know what the answer is on this yet we don't want to tell it the answer we wanted to guess what the answer is so we can evaluate that and see how good it did on that 33 percent of the data so this is our x test batch size is 10 again so if we were watching this roll down here we would see that it actually processes it 10 lines at a time it's only going to go through once it goes through all the x test data one time we don't need to have it predict multiple times and shuffle equals false very important we set the shuffle to false because if we were tracking this and actually giving people answers we'd want to make sure it connects to the right person so they get the right results of course we're just doing this to evaluate it so let's take a look down here what i put out and as i scroll down to my jupiter notebook we have some information as far as the tensorflow running and then the actual output and if we look at the output we know by this first bracket in python it's an array we know by this squiggly bracket that this is a dictionary so that this is a label the dictionary has logistic probabilities class ids classes and so this whole first part let me redo that this whole first part is one output and we know that because there is the bracket there and there is this bracket here for the dictionary and it's a dictionary of terms so if i pulled this out i looked at object 0 in here i would go down to and let me just find it here it is classes remember how we define classes we define classes as our answer and so it's guessing our tensorflow says based on all the information you gave me i'm guessing this first entry of our first test data is high risk diabetes uh oh go get tested change your diet watch what you're eating you know high risk one and if we go down far enough we can see down here's another one where classes equal zero i skipped a couple because they're all ones up here the b this particular output b in front of the one means it's a binary output it only comes out as zero or one and there's a lot of other information in here you can dig deep into tensorflow and explain these different numbers that's way beyond the scope of what we're working on today so the basics of it though is you get an output and we have an output of whether the person has diabetes or not well in this case it's high risk of diabetes or not so now that we've run our predictions take a sip of coffee it will show a break and we say well what do we need to do well we need to know how good was our predictions we need to evaluate our model so if you're going to publish this to a company or something like that they want to know how good you did let's take a look at what that looks like in the code so let's paste that in here and just real quick go back over this by now this function should look very uh this is we're going to call it eval input function it should be pretty straightforward here we have our tf estimator inputs pandas input function and then we have our x test our y test because we this time we want to give it the answer so it has something to see how good it did on batch is 10 processing 10 at a time we're only going once through the data we're not shuffling it although it doesn't really matter with this and then we're going to do results equals model.evaluate and we put our evaluation function in there this should all look familiar because we're repeating the same thing it's very similar with a couple different changes as far as what we're feeding it and the fact that we're doing an evaluation let's see what that looks like when we run it and we go up here to our run model you'll see warnings on some of these because i didn't install this on its own install so a lot of it is just temporary files because i'm using the jupyter notebook instead of setting it up on a regular machine and we get our output and we'll go ahead and just look at this output real quick and then we'll flip over and from here you'll see that it generates an accuracy a baseline average loss the mean gives you a precision prediction it gives you all kinds of information on here so let's flip over and see what's important and we'll look at the slide here we are in this slide and this should be exciting because we just about wrapped up our whole tensor flow and tensorflow is one of the more complicated models out there so give yourself a pat on the back for getting all the way through this when we look at this output we have an accuracy of 0.7165 that's really what we want to look at that means that we have an accuracy if you're just truncating it of 71 percent that's quite good for our model you know given a small amount of data we came up with the 71 percent of letting people know they're high risk or not with diabetes so we created a model that can predict if a person has diabetes based on some previous records of people who were diagnosed with diabetes and we've managed to have an accuracy of 71 percent which is quite good the model was implemented on python using tensorflow again pat yourself on the back because tensorflow is one of the more complicated scripts out there it's also one of the more diverse and useful ones let's look at another example machine learning based on the amount of rainfall how much would be the crop yield so we here we have our crops we have our rainfall and we want to know how much we're going to get from our crops this year so we're going to introduce two variables independent and dependent the independent variable is a variable whose value does not change by the effect of other variables and is used to manipulate the dependent variable it is often denoted as x in our example rainfall is the independent variable this is a wonderful example because you can easily see that we can't control the rain but the rain does control the crop so we talk about the independent variable controlling the dependent variable let's define dependent variable as a variable whose value change when there is any manipulation the values of the independent variables it is often denoted as y and you can see here our crop yield is dependent variable and it is dependent on the amount of rainfall received now that we've taken a look at a real life example let's go a little bit into the theory and some definitions on machine learning and see how that fits together with linear regression numerical and categorical values let's take our data coming in and this is kind of random data from any kind of project we want to divide it up into numerical and categorical so numerical is numbers age salary height where categorical would be a description the color a dog's breed gender categoricals limited to very specific items where numerical is a range of information now that you've seen the difference between numerical and categorical data let's take a look at some different machine learning definitions we look at our different machine learning algorithms we can divide them into three areas supervised unsupervised reinforcement we're only going to look at supervised today unsupervised means we don't have the answers we're just grouping things reinforcement is where we give positive and negative feedback to our algorithm to program it and it doesn't have the information until after the fact but today we're just looking at supervised because that's where linear regression fits in in supervised data we have our data already there and our answers for a group and then we use that to program our model and come up with an answer the two most common uses for that is through the regression and classification now we're doing linear regression so we're just going to focus on the regression side and in the regression we have simple linear regression we have multiple linear regression and we have polynomial linear regression now on these three simple linear regression is the examples we've looked at so far where we have a lot of data and we draw a straight line through it multiple linear regression means we have multiple variables remember where we had the rainfall and the crops we might add additional variables in there like how much food do we give our crops when do we harvest them those would be additional information add in to our model and that's why it'd be multiple linear regression and finally we have polynomial linear regression that is instead of drawing a line we can draw a curved line through it now that you see where regression model fits into the machine learning algorithms and we're specifically looking at linear regression let's go ahead and take a look at applications for linear regression let's look at a few applications of linear regression economic growth used to determine the economic growth of a country or a state in the coming quarter can also be used to predict the gdp of a country product price can be used to predict what would be the price of a product in the future we can guess whether it's going to go up or down or should i buy today housing sales to estimate the number of houses a builder would sell and what price in the coming months score predictions cricket fever to predict the number of runs a player would score in the coming matches based on the previous performance i'm sure you can figure out other applications you could use linear regression for so let's jump in and let's understand linear regression and dig into the theory understanding linear regression linear regression is the statistical model used to predict the relationship between independent and dependent variables by examining two factors the first important one is which variables in particular are significant predictors of the outcome variable and the second one that we need to look at closely is how significant is the regression line to make predictions with the highest possible accuracy if it's inaccurate we can't use it so it's very important we find out the most accurate line we can get since linear regression is based on drawing a line through data we're going to jump back and take a look at some euclidean geometry the simplest form of a simple linear regression equation with one dependent and one independent variable is represented by y equals m times x plus c and if you look at our model here we plotted two points on here x1 and y1 x2 and y2 y being the dependent variable remember that from before and x being the independent variable so y depends on whatever x is m in this case is the slope of the line where m equals the difference in the y 2 minus y 1 and x 2 minus x 1. and finally we have c which is the coefficient of the line or where it happens to cross the zero axes let's go back and look at an example we used earlier of linear regression we're going to go back to plotting the amount of crop yield based on the amount of rainfall and here we have our rainfall remember we cannot change rainfall and we have our crop yield which is dependent on the rainfall so we have our independent and our dependent variables we're going to take this and draw a line through it as best we can through the middle of the data and then we look at that we put the red point on the y axis is the amount of crop yield you can expect for the amount of rainfall represented by the green dot so if we have an idea what the rainfall is for this year and what's going on then we can guess how good our crops are going to be and we've created a nice line right through the middle to give us a nice mathematical formula let's take a look and see what the math looks like behind this let's look at the intuition behind the regression line now before we dive into the math and the formulas that go behind this and what's going on behind the scenes i want you to note that when we get into the case study and we actually apply some python script that this math that you're going to see here is already done automatically for you you don't have to have it memorized it is however good to have an idea what's going on so if people reference the different terms you'll know what they're talking about let's consider a sample data set with five rows and find out how to draw the regression line we're only going to do five rows because if we did like the rainfall with hundreds of points of data that would be very hard to see what's going on with the mathematics so we'll go ahead and create our own two sets of data and we have our independent variable x and our dependent variable y and when x was 1 we got y equals 2 when x was 2 y was 4 and so on and so on if we go ahead and plot this data on a graph we can see how it forms a nice line through the middle you can see where it's kind of grouped going upwards to the right the next thing we want to know is what the means is of each of the data coming in the x and the y the means doesn't mean anything other than the average so we add up all the numbers and divide by the total so 1 plus 2 plus 3 plus 4 plus 5 over 5 equals 3 and the same for y we get 4. if we go ahead and plot the means on the graph we'll see we get 3 comma 4 which draws a nice line down the middle a good estimate here we're going to dig deeper into the math behind the regression line now remember before i said you don't have to have all these formulas memorized or fully understand them even though we're going to go into a little more detail of how it works and if you're not a math wiz and you don't know if you've never seen the sigma character before which looks a little bit like an e that's opened up that just means summation that's all that is so when you see the sigma character it just means we're adding everything in that row and for computers this is great because as a programmer you can easily iterate through each of the x y points and create all the information you need so in the top half you can see where we broken that down into pieces and as it goes through the first two points it computes the squared value of x the squared value of y and x times y and then it takes all of x and adds them up all of y adds them up all of x squared adds them up and so on and so on and you can see we have the sum of equal to 15 the sum is equal to 20 all the way up to x times y where the sum equals 66. this all comes from our formula for calculating a straight line where y equals the slope times x plus the coefficient c so we go down below and we're going to compute more like the averages of these and we'll explain exactly what that is in just a minute and where that information comes from is called the square means error but we'll go into that in detail in a few minutes all you need to do is look at the formula and see how we've gone about computing it line by line instead of trying to have a huge set of numbers pushed into it and down here you'll see where the slope m equals and then the top part if you read through the brackets you have the number of data points times the sum of x times y which we computed one line at a time there and that's just the 66 and take all that and you subtract it from the sum of x times the sum of y and those have both been computed so you have 15 times 20. and on the bottom we have the number of lines times the sum of x squared easily computed as 86 for the sum minus i'll take all that and subtract the sum of x squared and we end up as we come across with our formula you can plug in all those numbers which is very easy to do on the computer you don't have to do the math on a piece of paper or calculator and you'll get a slope of 0.6 and you'll get your c coefficient if you continue to follow through that formula you'll see it comes out as equal to 2.2 continuing deeper into what's going behind the scenes let's find out the predicted values of y for corresponding values of x using the linear equation where m equals 0.6 and c equals 2.2 we're going to take these values and we're going to go ahead and plot them we're going to predict them so y equals 0.6 times where x equals 1 plus 2.2 equals 2.8 so on and so on and here the blue points represent the actual y values and the brown points represent the predicted y values based on the model we created the distance between the actual and predicted values is known as residuals or errors the best fit line should have the least sum of squares of these errors also known as e-square if we put these into a nice chart where you can see x and you can see why what the actual values were and you can see why i predict it you can easily see where we take y minus y predict it and we get an answer what is the difference between those two and if we square that y minus y prediction squared we can then sum those squared values that's where we get the 0.64 plus the 0.36 plus 1 all the way down until we have a summation equals 2.4 so the sum of squared errors for this regression line is 2.4 we check this error for each line and conclude the best fit line having the least e square value in a nice graphical representation we can see here where we keep moving this line through the data points to make sure the best fit line has the least square distance between the data points and the regression line now we only looked at the most commonly used formula for minimizing the distance there are lots of ways to minimize the distance between the line and the data points like sum of squared errors sum of absolute errors root mean square error etc what you want to take away from this is whatever formula is being used you can easily using a computer programming and iterating through the data calculate the different parts of it that way these complicated formulas you see with the different summations and absolute values are easily computed one piece at a time up until this point we've only been looking at two values x and y well in the real world it's very rare that you only have two values when you're figuring out a solution so let's move on to the next topic multiple linear regression let's take a brief look at what happens when you have multiple inputs so in multiple linear regression we have uh well we'll start with the simple linear regression where we had y equals m plus x plus c and we're trying to find the value of y now with multiple linear regression we have multiple variables coming in so instead of having just x we have x1 x2 x3 and instead of having just one slope each variable has its own slope attached to it as you can see here we have m1 m2 m3 and we still just have the single coefficient so when you're dealing with multiple linear regression you basically take your single linear regression and you spread it out so you have y equals m1 times x1 plus m2 times x2 so on all the way to m to the nth x to the nth and then you add your coefficient on there implementation of linear regression now we get into my favorite part let's understand how multiple linear regression works by implementing it in python if you remember before we were looking at a company and just based on its rnd trying to figure out its profit we're going to start looking at the expenditure of the company we're going to go back to that we're going to predict his profit but instead of predicting it just on the r d we're going to look at other factors like administration costs marketing costs and so on and from there we're going to see if we can figure out what the profit of that company is going to be to start our coding we're going to begin by importing some basic libraries and we're going to be looking through the data before we do any kind of linear regression we're going to take a look at the data to see what we're playing with then we'll go ahead and format the data to the format we need to be able to run it in the linear regression model and then from there we'll go ahead and solve it and just see how valid our solution is so let's start with importing the basic libraries now i'm going to be doing this in anaconda jupiter notebook a very popular ide i enjoy it it's such a visual to look at and so easy to use just any id for python will work just fine for this so break out your favorite python ide so here we are in our jupiter notebook let me go ahead and paste our first piece of code in there and let's walk through what libraries we're importing first we're going to import numpy as np and then i want you to skip one line and look at import pandas as pd these are very common tools that you need with most of your linear regression the numpy which stands for number python is usually denoted as np and you have to almost have that for your sk learn toolbox you always import that right off the beginning pandas although you don't have to have it for your sklearn libraries it does such a wonderful job of importing data setting it up into a data frame so we can manipulate it rather easily and it has a lot of tools also in addition to that so we usually like to use the pandas when we can and i'll show you what that looks like the other three lines are for us to get a visual of this data and take a look at it so we're going to import matplotlibrary.pipelot as plt and then seaborn as sns seaborn works with the matplot library so you have to always import matplot library and then seaborn sits on top of it and we'll take a look at what that looks like you could use any of your own plotting libraries you want there's all kinds of ways to look at the data these are just very common ones and the seaborne is so easy to use it just looks beautiful it's a nice representation that you can actually take and show somebody and the final line is the amber signed matte plot library inline that is only because i'm doing an inline ide my interface in the anaconda jupiter notebook requires i put that in there or you're not going to see the graph when it comes up let's go ahead and run this it's not going to be that interesting so we're just setting up variables in fact it's not going to do anything that we can see but it is importing these different libraries and setup the next step is load the data set and extract independent and dependent variables now here in the slide you'll see companies equals pd.read csv and it has a long line there with the file at the end 1000 companies.csv you're going to have to change this to fit whatever setup you have and the file itself you can request just go down to the commentary below this video and put a note in there and simply learn we'll try to get in contact with you and supply you with that file so you can try this coding yourself so we're going to add this code in here and we're going to see that i have companies equals pd.reader underscore csv and i've changed this path to match my computer c colon slash simply learn slash 1000 underscore companies dot csv and then below there we're going to set the x equals to companies under the i location and because this is companies as a pd data set i can use this nice notation that says take every row that's what the colon the first colon is comma except for the last column that's what the second part is or we have a colon minus one and we want the values set into there so x is no longer a data set a pandas data set but we can easily extract the data from our pandas data set with this notation and then y we're going to set equal to the last row well the question is going to be what are we actually looking at so let's go ahead and take a look at that and we're going to look at the companies dot head which lists the first five rows of data and i'll open up the file in just a second so you can see where that's coming from but let's look at the data in here as far as the way the pandas sees it when i hit run you'll see it breaks it out into a nice setup this is what panda is one of the things pandas is really good about is it looks just like an excel spreadsheet you have your rows and remember when we're programming we always start with zero we don't start with one so it shows the first five rows zero one two three four and then it shows your different columns r d spend administration marketing spend state profit it even notes that the top are column names it was never told that but pandas is able to recognize a lot of things that they're not the same as the data rows why don't we go ahead and open this file up in a csv so you can actually see the raw data so here i've opened it up as a text editor and you can see at the top we have rnd spend comma administration comma marketing spin comma state come a profit carrier's return i don't know about you but i'd go crazy trying to read files like this that's why we use the pandas you could also open this up in an excel and it would separate it since it is a comma separated variable file but we don't want to look at this one we want to look at something we can read rather easily so let's flip back and take a look at that top part the first five row now as nice as this format is where i can see the data to me it doesn't mean a whole lot maybe you're an expert in business and investments and you understand what uh 165 349 dollars and 20 cents compared to the administration cost of a hundred and thirty six thousand eight hundred ninety seven dollars and eighty cents so on so on it helps to create the profit of 192 261 and 83 cents that makes no sense to me whatsoever no pun intended so let's flip back here and take a look at our next set of code where we're going to graph it so we can get a better understanding of our data and what it means so at this point we're going to use a single line of code to get a lot of information so we can see where we're going with this let's go ahead and paste that into our notebook and see what we got going and so we have the visualization and again we're using sns which is pandas as you can see we imported the map plot library dot pi plot as plt which then the seaborn uses and we imported the seaborn as sns and then that final line of code helps us show this in our inline coding without this it wouldn't display and you could display it to a file in other means and that's the matte plot library in line with the amber sign at the beginning so here we come down to the single line of code seaborne is great because it actually recognizes the panda data frame so i can just take the companies dot core for coordinates and i can put that right into the seaborn and when we run this we get this beautiful plot and let's just take a look at what this plot means if you look at this plot on mine the colors are probably a little bit more purplish and blue than the original one uh we have the columns and the rows we have r and d spending we have administration we have marketing spending and profit and if you cross index any two of these since we're interested in profit if you cross index profit with profit it's going to show up if you look at the scale on the right way up in the dark why because those are the same data they have an exact correspondence so r and d spending is going to be the same as rnd spending and the same thing with administration costs right down the middle you get this dark row or dark um diagonal row that shows that this is the highest corresponding data that's exactly the same and as it becomes lighter there's less connections between the data so we can see with profit obviously profit is the same as profit and next it has a very high correlation with r d spending which we looked at earlier and it has a slightly less connection to marketing spending and even less to how much money we put into the administration so now that we have a nice look at the data let's go ahead and dig in and create some actual useful linear regression models so that we can predict values and have a better profit now that we've taken a look at the visualization of this data we're going to move on to the next step instead of just having a pretty picture we need to generate some hard data some hard values so let's see what that looks like we're going to set up our linear regression model in two steps the first one is we need to prepare some of our data so it fits correctly and let's go ahead and paste this code into our jupiter notebook and what we're bringing in is we're going to bring in the sklearn pre-processing where we're going to import the label encoder and the one hot encoder to use the label encoder we're going to create a variable called label encoder and set it equal to capital l label capital e encoder this creates a class that we can reuse for transferring the labels back and forth now about now you should ask what labels are we talking about let's go take a look at the data we processed before and see what i'm talking about here if you remember when we did the companies dot head and we printed the top five rows of data we have our columns going across we have column zero which is r and d spending column one which is administration column two which is marketing spending and column three is state and you'll see under state we have new york california florida now to do a linear regression model it doesn't know how to process new york it knows how to process a number so the first thing we're going to do is we're going to change that new york california and florida and we're going to change those to numbers that's what this line of code does here x equals and then it has the colon comma 3 in brackets the first part the colon comma means that we're going to look at all the different rows so we're going to keep them all together but the only row we're going to edit is the third row and in there we're going to take the label coder and we're going to fit and transform the x also the third row so we're going to take that third row we're going to set it equal to a transformation and that transformation basically tells it that instead of having a new york it has a zero or a one or a two and then finally we need to do a one hot encoder which equals one hot inc or categorical features equals three and then we take the x and we go ahead and do that equal to one hot encoder fit transform x to array this final transformation preps our data force so it's completely set the way we need it as just a row of numbers even though it's not in here let's go ahead and print x and just take a look at what this data is doing you'll see you have an array of arrays and then each array is a row of numbers and if i go ahead and just do row 0 you'll see i have a nice organized row of numbers that the computer now understands we'll go ahead and take this out there because it doesn't mean a whole lot to us it's just a row of numbers next on setting up our data we have avoiding dummy variable trap this is very important why because the computers automatically transformed our header into the setup and it's automatically transferred all these different variables so when we did the encoder the encoder created two columns and what we need to do is just have the one because it has both the variable and the name that's what this piece of code does here let's go ahead and paste this in here and we have x equals x colon comma one colon all this is doing is removing that one extra column we put in there when we did our one hot encoder and our labeling encoding let's go ahead and run that and now we get to create our linear regression model and let's see what that looks like here and we're going to do that in two steps the first step is going to be in splitting the data now whenever we create a predictive model of data we always want to split it up so we have a training set and we have a testing set that's very important otherwise we'd be very unethical without testing it to see how good our fit is and then we'll go ahead and create our multiple linear regression model and train it and set it up let's go ahead and paste this next piece of code in here and i'll go ahead and shrink it down a size or two so it all fits on one line so from the sklearn module selection we're going to import train test split and you'll see that we've created four completely different variables we have capital x train capital x test smaller case y train smaller case y test that is the standard way that they usually reference these when we're doing different models you usually see that a capital x and you see the train and the test and the lower case y what this is is x is our data going in that's our r d span our administration our marketing and then why which we're training is the answer that's the profit because we want to know the profit of an unknown entity so that's what we're going to shoot for in this tutorial the next part train test split we take x and we take y we've already created those x has the columns with the data in it and y has a column with profit in it and then we're gonna set the test size equals point two that basically means twenty percent so twenty percent of the rows are going to be tested we're going to put them off to the side so since we're using a thousand lines of data that means that 200 of those lines we're going to hold off to the side to test for later and then the random state equals zero we're going to randomize which ones it picks to hold off to the side we'll go ahead and run this it's not overly exciting so setting up our variables but the next step is the next step we actually create our linear regression model now that we got to the linear regression model we get that next piece of the puzzle let's go ahead and put that code in there and walk through it so here we go we're going to paste it in there and let's go ahead and since this is a shorter line of code let's zoom up there so we can get a good look and we have from the sklearn linear underscore model we're going to import linear regression now i don't know if you recall from earlier when we were doing all the math let's go ahead and flip back there and take a look at that do you remember this where we had this long formula on the bottom and we were doing all this summarization and then we also looked at setting it up with the different lines and then we also looked all the way down to multiple linear regression where we're adding all those formulas together all of that is wrapped up in this one section so what's going on here is i'm going to create a variable called regressor and the regressor equals the linear regression that's a linear regression model that has all that math built in so we don't have to have it all memorized or have to compute it individually and then we do the regressor dot fit in this case we do x train and y train because we're using the training data x being the data in and y being profit what we're looking at and this does all that math for us so within one click and one line we've created the whole linear regression model and we fit the data to the linear regression model and you can see that when i run the regressor it gives an output linear regression it says copy x equals true fit intercept equals true in jobs equal one normalize equals false it's just giving you some general information on what's going on with that regressor model now that we've created our linear regression model let's go ahead and use it and if you remember we kept a bunch of data aside so we're going to do a y predict variable and we're going to put in the x test and let's see what that looks like scroll up a little bit paste that in here predicting the test set results so here we have y predict equals regressor dot predict x test going in and this gives us y predict now because i'm in jupiter inline i can just put the variable up there and when i hit the run button it'll print that array out i could have just as easily done print why predict so if you're in a different ide that's not an inline setup like the jupiter notebook you can do it this way print y predict and you'll see that for the 200 different test variables we kept off to the side it's going to produce 200 answers this is what it says the profit are for those 200 predictions but let's don't stop there let's keep going and take a couple look we're going to take just a short detail here and calculating the coefficients and the intercepts this gives us a quick flash at what's going on behind the line we're going to take a short detour here and we're going to be calculating the coefficient and intercepts so you can see what those look like what's really nice about our regressor we created is it already has a coefficients for us we can simply just print regressor dot coefficient underscore when i run this you'll see our coefficients here and if we can do the regressor coefficient we can also do the regressor intercept and let's run that and take a look at that this all came from the multiple regression model and we'll flip over so you can remember where this is going into where it's coming from you can see the formula down here where y equals m1 times x1 plus m2 times x2 and so on and so on plus c the coefficient so these variables fit right into this formula y equals slope 1 times column 1 variable plus slope 2 times column 2 variable all the way to the m and to the n and x to the n plus c the coefficient or in this case you have minus 8.89 to the power of 2 etc etc times the first column and the second column and the third column and then our intercept is the minus 1 0 3 0 0 9 point boy it gets kind of complicated when you look at it this is why we don't do this by hand anymore this is why we have the computer to make these calculations easy to understand and calculate now i told you that was a short detour and we're coming towards the end of our script as you remember from the beginning i said if we're going to divide this information we have to make sure it's a valid model that this model works and understand how good it works so calculating the r squared value that's what we're going to use to predict how good our prediction is and let's take a look what that looks like in code and so we're going to use this from sklearn.metrics we're going to import r2 score that's the r squared value we're looking at the error so in the r2 score we take our y test versus our y predict y test is the actual values we're testing that was the one that was given to us that we know are true the y predict of those 200 values is what we think it was true and when we go ahead and run this we see we get a 0.9352 that's the r2 score now it's not exactly a straight percentage so it's not saying it's 93 percent correct but you do want that in the upper 90s oh and higher shows that this is a very valid prediction based on the r2 score and if r squared value of 0.91 or 9 2 as we got on our model because remember it does have a random generation involved this proves the model is a good model which means success yay we successfully trained our model with certain predictors and estimated the profit of the companies using linear regression what is a decision tree let's go through a very simple example before we dig in deep decision tree is a tree shape diagram used to determine a course of action each branch of the tree represents a possible decision occurrence or reaction let's start with a simple question how to identify a random vegetable from a shopping bag so we have this group of vegetables in here and we can start off by asking a simple question is it red and if it's not then it's going to be the purple fruit to the left probably an eggplant if it's true it's going to be one of the red fruits is the diameter greater than two if false is going to be a what looks to be a red chili and if it's true it's going to be a bell pepper from the capsicum family so it's a capsicum problems that decision tree can solve so let's look at the two different categories the decision tree can be used on it can be used on the classification the true false yes no and it can be used on regression when we figure out what the next value is in a series of numbers or a group of data in classification the classification tree will determine a set of logical if-then conditions to classify problems for example discriminating between three types of flowers based on certain features in regression a regression tree is used when the target variable is numerical or continuous in nature we fit the regression model to the target variable using each of the independent variables each split is made based on the sum of squared error before we dig deeper into the mechanics of the decision tree let's take a look at the advantages of using a decision tree and we'll also take a glimpse at the disadvantages the first thing you'll notice is that it's simple to understand interpret and visualize it really shines here because you can see exactly what's going on in a decision tree little effort is required for data preparation so you don't have to do special scaling there's a lot of things you don't have to worry about when using a decision tree it can handle both numerical and categorical data as we discovered earlier and non-linear parameters don't affect its performance so even if the data doesn't fit an easy curved graph you can still use it to create an effective decision or prediction if we're going to look at the advantages of a decision tree we also need to understand the disadvantages of a decision tree the first disadvantage is overfitting overfitting occurs when the algorithm captures noise in the data that means you're solving for one specific instance instead of a general solution for all the data high variance the model can get unstable due to small variation in data low bias tree a highly complicated decision tree tends to have a low bias which makes it difficult for the model to work with new data decision tree important terms before we dive in further we need to look at some basic terms we need to have some definitions to go with our decision tree and the different parts we're going to be using we'll start with entropy entropy is a measure of randomness or unpredictability in the data set for example we have a group of animals in this picture there's four different kinds of animals and this data set is considered to have a high entropy you really can't pick out what kind of animal it is based on looking at just the four animals as a big clump of entities so as we start splitting it into subgroups we come up with our second definition which is information gain information gain it is a measure of decrease in entropy after the data set is split so in this case based on the color yellow we've split one group of animals on one side as true and those who aren't yellow as false as we continue down the yellow side we split based on the height true or false equals ten and on the other side height is less than ten true or false and as you see as we split it the entropy continues to be less and less and less and so our information gain is simply the entropy e1 from the top and how it's changed to e2 and the bottom and we'll look at the deeper math although you really don't need to know a huge amount of math when you actually do the programming in python because they'll do it for you but we'll look on the actual math of how they compute entropy finally we went on the different parts of our tree and they call the leaf node leaf node carries the classification or the decision so it's the final end at the bottom the decision node has two or more branches this is where we're breaking the group up into different parts and finally you have the root node the topmost decision node is known as the root node how does a decision tree work wonder what kind of animals i'll get in the jungle today maybe you're the hunter with the gun or if you're more into photography you're a photographer with a camera so let's look at this group of animals and let's try to classify different types of animals based on their features using a decision tree so the problem statement is to classify the different types of animals based on their features using a decision tree the data set is looking quite messy and the entropy is high in this case so let's look at a training set or a training data set and we're looking at color we're looking at height and then we have our different animals we have our elephants our giraffes our monkeys and our tigers and they're of different colors and shapes let's see what that looks like and how do we split the data we have to frame the conditions that split the data in such a way that the information gain is the highest note gain is the measure of decreasing entropy after splitting so the formula for entropy is the sum that's what this symbol looks like that looks like kind of like a e funky e of k where i equals 1 to k k would represent the number of animals the different animals in there where value or p value of i would be the percentage of that animal times the log base two of the same the percentage of that animal let's try to calculate the entropy for the current data set and take a look at what that looks like and don't be afraid of the math you don't really have to memorize this math just be aware that it's there and this is what's going on in the background and so we have three giraffes two tigers one monkey two elephants a total of eight animals gathered and if we plug that into the formula we get an entropy that equals three over eight so we have three giraffes a total of eight times the log usually they use base two on the log so log base two of three over eight plus in this case it says yellow fence two over eight two elephants over total of eight times log base two two over eight plus one monkey over total of eight log base two one over eight and plus two over eight of the tigers log base two over eight and if we plug that into our computer or calculator i obviously can't do logs in my head we get an entropy equal to 0.571 the program will actually calculate the entropy of the data set similarly after every split to calculate the gain now we're not going to go through each set one at a time to see what those numbers are we just want you to be aware that this is a formula or the mathematics behind it gain can be calculated by finding the difference of the subsequent entropy values after a split now we'll try to choose a condition that gives us the highest gain we will do that by splitting the data using each condition and checking that the gain we get out of them the condition that gives us the highest gain will be used to make the first split can you guess what that first split will be just by looking at this image as a human is probably pretty easy to split it let's see if you're right if you guessed the color yellow you're correct let's say the condition that gives us the maximum gain is yellow so we will split the data based on the color yellow if it's true that group of animals goes to the left if it's false it goes to the right the entropy after the splitting has decreased considerably however we still need some splitting of both the branches to attain an entropy value equal to 0. so we decide to split both the nodes using height as a condition since every branch now contains single label type we can say that entropy in this case has reached the least value and here you see we have the giraffes the tigers the monkey and the elephants all separated into their own groups this tree can now predict all the classes of animals present in the dataset with a hundred percent accuracy that was easy use case loan repayment prediction let's get into my favorite part and open up some python and see what the programming code in the scripting looks like in here we're going to want to do a prediction and we start with this individual here who's requesting to find out how good his customers are going to be whether they're going to repay their loan or not for his bank and from that we want to generate a problem statement to predict if a customer will repay loan amount or not and then we're going to be using the decision tree algorithm in python let's see what that looks like and let's dive into the code in our first few steps of implementation we're going to start by importing the necessary packages that we need from python and we're going to load up our data and take a look at what the data looks like so the first thing i need is i need something to edit my python and run it in so let's flip on over and here i'm using the anaconda jupiter notebook now you can use any python ide you like to run it in but i find the jupyter notebook's really nice for doing things on the fly and let's go ahead and just paste that code in the beginning and before we start let's talk a little bit about what we're bringing in and then we're going to do a couple things in here we have to make a couple changes as we go through this first part of the import the first thing we bring in is numpy as np that's very standard when we're dealing with mathematics especially with very complicated machine learning tools you almost always see the numpy come in for your num your number it's called number python it has your mathematics in there in this case we actually could take it out but generally you'll need it for most of your different things you work with and then we're going to use pandas as pd that's also a standard the pandas is a data frame setup and you can liken this to taking your basic data and storing it in a way that looks like an excel spreadsheet so as we come back to this when you see np or pd those are very standard uses you'll know that that's the pandas and i'll show you a little bit more when we explore the data in just a minute then we're going to need to split the data so i'm going to bring in our train test and split and this is coming from the sk learn package cross validation in just a minute we're going to change that and we'll go over that too and then there's also the sk.tree import decision tree classifier that's the actual tool we're using remember i told you don't be afraid of the mathematics it's going to be done for you well the decision tree classifier has all that mathematics in there for you so you don't have to figure it back out again and then we have sklearn.metrics for accuracy score we need to score our setup that's the whole reason we're splitting it between the training and testing data and finally we still need the sklearn import tree and that's just the basic tree function is needed for the decision tree classifier and finally we're going to load our data down here and i'm going to run this and we're going to get two things on here one we're going to get an error and two we're going to get a warning let's see what that looks like so the first thing we had is we have an error why is this error here well it's looking at this it says i need to read a file and when this was written the person who wrote it this is their path where they stored the file so let's go ahead and fix that and i'm going to put in here my file path i'm just going to call it full file name and you'll see it's on my c drive and this is very lengthy setup on here where i stored the data2.csv file don't worry too much about the full path because on your computer it'll be different the data.2 csv file was generated by simplylearn if you want a copy of that you can comment down below and request it here in the youtube and then if i'm going to give it a name full file name i'm going to go ahead and change it here to full file name so let's go ahead and run it now and see what happens and we get a warning when you're coding understanding these different warnings and these different errors that come up is probably the hardest lesson to learn so let's just go ahead and take a look at this and use this as a opportunity to understand what's going on here if you read the warning it says the cross validation is depreciated so it's a warning on it's being removed and it's going to be moved in favor of the model selection so if we go up here we have sklearn dot cross validation and if you research this and go to the sklearn site you'll find out that you can actually just swap it right in there with model selection and so when i come in here and i run it again that removes a warning what they've done is they've had two different developers develop it in two different branches and then they decided to keep one of those and eventually get rid of the other one that's all that is and very easy and quick to fix before we go any further i went ahead and opened up the data from this file remember the the data file we just loaded on here the data underscore 2.csv let's talk a little bit more about that and see what that looks like both as a text file because it's a comma separated variable file and in a spreadsheet this is what it looks like as a basic text file you can see at the top they've created a header and it's got one two three four five columns and each column has data in it and let me flip this over because we're also going to look at this uh in an actual spreadsheet so you can see what that looks like and here i've opened it up in the open office calc which is pretty much the same as excel and zoomed in and you can see we've got our columns and our rows of data little easier to read in here we have a result yes yes no we have initial payment last payment credit score house number if we scroll way down we'll see that this occupies a thousand and one lines of code or lines of data with the first one being a column and then one thousand lines of data now as a programmer if you're looking at a small amount of data i usually start by pulling it up in different sources so i can see what i'm working with but in larger data you won't have that option it'll just be too too large so you need to either bring in a small amount that you can look at it like we're doing right now or we can start looking at it through the python code so let's go ahead and move on and take the next couple steps to explore the data using python let's go ahead and see what it looks like in python to print the length and the shape of the data so let's start by printing the length of the database we can use a simple lin function from python and when i run this you'll see that it's a thousand long and that's what we expected there's a thousand lines of data in there if you subtract the column head this is one of the nice things when we did the balance data from the panda read csv you'll see that the header is row 0 so it automatically removes a row and then shows the data separate it does a good job sorting that data out for us and then we can use a different function and let's take a look at that and again we're going to utilize the tools in panda and since the balance underscored data was loaded as a panda data frame we can do a shape on it and let's go ahead and run the shape and see what that looks like what's nice about this shape is not only does it give me the length of the data we have a thousand lines it also tells me there's five columns so we were looking at the data we had five columns of data and then let's take one more step to explore the data using python and now that we've taken a look at the length and the shape let's go ahead and use the pandas module for head another beautiful thing in the data set that we can utilize so let's put that on our sheet here and we have print data set and balance data dot head and this is a panda's print statement of its own so it has its own print feature in there and then we went ahead and gave a label for a print job here of data set just a simple print statement and we run that and let's just take a closer look at that let me zoom in here there we go pandas does such a wonderful job of making this a very clean readable data set so you can look at the data you can look at the column headers you can have it when you put it as the head it prints the first five lines of the data and we always start with zero so we have five lines we have zero one two three four instead of one two three four five that's a standard scripting and programming set as you wanna start with the zero position and that is what the data head does it pulls the first five rows of data puts in a nice format that you can look at and view very powerful tool to view the data so instead of having to flip and open up an excel spreadsheet or open office cal or trying to look at a word doc where it's all scrunched together and hard to read you can now get a nice open view of what you're working with we're working with a shape of a thousand long five wide so we have five columns and we do the full date ahead you can actually see what this data looks like the initial payment last payment credit scores house number so let's take this now that we've explored the data and let's start digging into the decision tree so in our next step we're going to train and build our data tree and to do that we need to first separate the data out we're going to separate into two groups so that we have something to actually train the data with and then we have some data on the side to test it to see how good our model is remember with any of the machine learning you always want to have some kind of test set to to weigh it against so you know how good your model is when you distribute it let's go ahead and break this code down and look at it in pieces so first we have our x and y where do x and y come from well x is going to be our data and y is going to be the answer or the target you can look at its source and target in this case we're using x and y to denote the data in and the data that we're actually trying to guess what the answer is going to be and so to separate it we can simply put in x equals the balance of the data.values the first brackets means that we're going to select all the lines in the database so it's all the data and the second one says we're only going to look at columns one through five remember we always start with zero zero is a yes or no and that's whether the loan went default or not so we want to start with one if we go back up here that's the initial payment and it goes all the way through the house number well if we want to look at one through five we can do the same thing for y which is the answers and we're going to set that just equal to the zero row so it's just the zero row and then it's all rows going in there so now we've divided this into two different data sets one of them with the data going in and one with the answers next we need to split the data and here you'll see that we have it split into four different parts the first one is your x training your x test your y train your y test simply put we have x going in where we're going to train it and we have to know the answer to train it with and then we have x test where we're going to test that data and we have to know in the end what the y was supposed to be and that's where this train test split comes in that we loaded earlier in the modules this does it all for us and you can see they set the test size equal to 0.3 so it's roughly 30 percent will be used in the test and then we use a random state so it's completely random which rows it takes out of there and then finally we get to actually build our decision tree and they've called it here clf underscore entropy that's the actual decision tree or decision tree classifier and in here they've added a couple variables which we'll explore in just a minute and then finally we need to fit the data to that so we take our clf entropy that we created and we fit the x train and since we know the answers for x-trade or the y-train we go ahead and put those in and let's go ahead and run this and what most of these sklearn modules do is when you set up the variable in this case we set the clf entropy equal decision tree classifier it automatically prints out what's in that decision tree there's a lot of variables you can play within here and it's quite beyond the scope of this tutorial to go through all of these and how they work but we're working on entropy that's one of the options we've added that is completely a random state of 100 so 100 percent and we have a max depth of three now the max depth if you remember above when we were doing the different graphs of animals means it's only going to go down three layers before it stops and then we have minimal samples of leaves is five so it's going to have at least five leaves at the end so i'll have at least three splits i'll have no more than three layers and at least five end leaves with the final result at the bottom now that we've created our decision tree classifier not only created it but trained it let's go ahead and apply it and see what that looks like so let's go ahead and make a prediction and see what that looks like we're going to paste our predict code in here and before we run it let's just take a quick look at what's it's doing here we have a variable y predict that we're going to do and we're going to use our variable clf entropy that we created and then you'll see dot predict and it's very common in the sk learn modules that their different tools have to predict when you're actually running a prediction in this case we're going to put our x test data in here now if you delivered this for use and actual commercial use and distributed it this would be the new loans you're putting in here to guess whether the person is going to be pay them back or not in this case so we need to test out the data and just see how good our sample is how good of our tree does at predicting the loan payments and finally since anaconda jupiter notebook is it works as a command line for python we can simply put the y predict e n to print it i could just as easily put the print and put brackets around y predict en to print it out we'll go ahead and do that it doesn't matter which way you do it and you'll see right here that runs a prediction this is roughly 300 in here remember it's 30 percent of a thousand so you should have about 300 answers in here and this tells you which each one of those lines of our test went in there and this is what our y predict came out so let's move on to the next step we're going to take this data and try to figure out just how good a model we have so here we go since sklearn does all the heavy lifting for you and all the math we have a simple line of code to let us know what the accuracy is and let's go ahead and go through that and see what that means and what that looks like let's go ahead and paste this in and let me zoom in a little bit there we go so you have a nice full picture and we'll see here we're just going to do a print accuracy is and then we do the accuracy score and this was something we imported earlier if you remember at the very beginning let me just scroll up there real quick so you can see where that's coming from that's coming from here down here from sklearn.metrics import accuracy score and you could probably run a script make your own script to do this very easily how accurate is it how many out of 300 do we get right and so we put in our y test that's the one we ran the predict on and then we put in our y predict e n that's the answers we got and we're just going to multiply that by a hundred because this is just going to give us an answer as a decimal and we want to see it as a percentage and let's run that and see what it looks like and if you see here we got an accuracy of 93.66667 so when we look at the number of loans and we look at how good our model fit we can tell people it has about a 93.6 fitting to it so just a quick recap on that we now have accuracy set up on here and so we have created a model that uses the decision tree algorithm to predict whether a customer will repay the loan or not the accuracy of the model is about 94.6 percent the bank can now use this model to decide whether it should approve the loan request from a particular customer or not and so this information is really powerful we may not be able to as individuals understand all these numbers because they have thousands of numbers that come in but you can see that this is a smart decision for the bank to use a tool like this to help them to predict how good their profit's going to be off of the loan balances and how many are going to default or not so why random forest it's always important to understand why we use this tool over the other ones what are the benefits here and so with the random forest the first one is there's no overfitting if you use of multiple trees reduce the risk of overfitting training time is less overfitting means that we have fit the data so close to what we have as our sample that we pick up on all the weird parts and instead of predicting the overall data you're predicting the weird stuff which you don't want high accuracy runs efficiently in large database for large data it produces highly accurate predictions in today's world of big data this is really important and this is probably where it really shines this is where why random forest really comes in it estimates missing data data in today's world is very messy so when you have a random forest it can maintain the accuracy when a large proportion of the data is missing what that means is if you have data that comes in from five or six different areas and maybe they took one set of statistics in one area and they took a slightly different set of statistics in the other so they have some of the same same shared data but one is missing like the number of children in the house if you're doing something over demographics and the other one is missing the size of the house it will look at both of those separately and build two different trees and then it can do a very good job of guessing which one fits better even though it's missing that data let us dig deep into the theory of exactly how it works and let's look at what is a random forest random forest or random decision forest is a method that operates by constructing multiple decision trees the decision of the majority of the trees is chosen by the random forest as a final decision and this uh we have some nice graphics here we have a decision tree and they actually use a real tree to denote the decision tree which i love and given a random some kind of picture of a fruit this decision tree decides that the output is it's an apple and we have a decision tree 2 where we have that picture of the fruit goes in and this one decides that it's a lemon and the decision three tree gets another image and it decides it's an apple and then this all comes together in what they call the random forest and this random forest then looks at it and says okay i got two votes for apple one vote for lemon the majority is apples so the final decision is apples to understand how the random forest works we first need to dig a little deeper and take a look at the random forest and the actual decision tree and how it builds that decision tree in looking closer at how the individual decision trees work we'll go ahead and continue to use the fruit example since we're talking about trees and forests a decision tree is a tree shaped diagram used to determine a course of action each branch of the tree represents a possible decision occurrence or reaction so in here we have a bowl of fruit and if you look at that it looks like they switched from lemons to oranges we have oranges cherries and apples and the first decision of the decision tree might be is a diameter greater than or equal to three and if it says false it knows that they're cherries because everything else is bigger than that so all the cherries fall into that decision so we have all that data we're training we can look at that we know that that's what's going to come up is the color orange well goes hmm orange or red well if it's true then it comes out as the orange and if it's false that leaves apples so in this example it sorts out the fruit in the bowl or the images of the fruit a decision tree these are very important terms to know because these are very central to understanding the decision tree when working with them the first is entropy everything on the decision tree and how it makes this decision is based on entropy entropy is a measure of randomness or unpredictability in the data set then they also have information gain the leaf node the decision node and the root node we'll cover these other four terms as we go down the tree but let's start with entropy so starting with entropy we have here a high amount of randomness what that means is that whatever is coming out of this decision if it was going to guess based on this data it wouldn't be able to tell you whether it's a lemon or an apple it would just say it's a fruit so the first thing we want to do is we want to split this apart and we take the initial data set we're going to sit create a data set 1 and a data set 2 we just split it into and if you look at these new data sets after splitting them the entropy of each of those sets is much less so for the first one whatever comes in there it's going to sort that data and it's going to say okay if this data goes this direction it's probably an apple and if it goes into the other direction it's probably a lemon so that brings us up to information gain it is the measure of decrease in the entropy after the data set is split what that means in here is that we've gone from one set which has a very high entropy to two lower sets of entropy and we've added in the values of e1 for the first one and e2 for the second two which are much lower and so that information gain is increased greatly in this example and so you can find that the information grain simply equals decision e1 minus e2 as we're going down our list of definitions we'll look at the leaf node and the leaf node carries the classification or the decision so we look down here to the leaf node we finally get to our set 1 or our set 2 when it comes down there and it says okay this object's gone into set one if it's gone into set one it's going to be split by some means and we'll either end up with apples on the leaf node or a lemon on the leaf node and on the right it'll either be an apple or lemons those leaf nodes are those final decisions or classifications that's the definition of leaf node in here if we're going to have a final leaf where we make the decision we should have a name for the nodes above it and they call those decision nodes a decision node decision node has two or more branches and you can see here where we have the five apples and one lemon and in the other case the five lemons in one apple they have to make a choice of which tree it goes down based on some kind of measurement or information given to the tree and that brings us to our last definition the root node the top most decision node is known as the root node and this is where you have all of your data and you have your first decision it has to make or the first split in information so far we've looked at a very general image with the fruit being split let's look and see exactly what that means to split the data and how do we make those decisions on there let's go in there and find out how does a decision tree work so let's try to understand this and let's use a simple example and we'll stay with the fruit we have a bowl of fruit and so let's create a problem statement and the problem is we want to classify the different types of fruits in the bowl based on different features the data set in the bowl is looking quite messy and the entropy is high in this case so if this bowl was our decision maker it wouldn't know what choice to make it has so many choices which one do you pick apple grapes or lemons and so we look in here we're going to start with the dra a training set so this is our data that we're training our data with and we have a number of options here we have the color and under the color we have red yellow purple we have a three one diameter three one and we have a label apple lemon grapes apple lemon grapes and how do we split the data we have to frame the conditions to split the data in such a way that the information gain is the highest it's very key to note that we're looking for the best gain we don't want to just start sorting out the smallest piece in there we want to split it the biggest way we can and so we measure this decrease in entropy that's what they call it entropy there's our entropy after splitting and now we'll try to choose a condition that gives us the highest gain we will do that by splitting the data using each condition and checking the gain that we get out of them the conditions that give us the highest gain will be used to make the first split so let's take a look at these different conditions we have color we have diameter and if we look underneath that we have a couple different values we have diameter equals three color equals yellow red diameter equals one and when we look at that you'll see over here we have one two three four threes that's a pretty high selection so let's say the condition gives us a maximum gain of three so we have the most pieces fall into that range so our first split from our decision node as we split the data based on the diameter is it greater than or equal to three if it's not that's false it goes into the grape bowl and if it's true it goes into a bowl fold of lemon and apples the entropy after splitting has decreased considerably so now we can make two decisions if you look at there very much less chaos going on there this node has already attained an entropy value of 0 as you can see there's only one kind of label left for this branch so no further splitting is required for this node however this node on the right is still requires a split to decrease the entropy further so we split the right node further based on color if you look at this if i split it on color that pretty much cuts it right down the middle it's the only thing we have left in our choices of color and diameter too and if the color is yellow it's going to go to the right bowl and if it's false it's going to go to the left bowl so the entropy in this case is now zero so now we have three bowls with zero entropy there's only one type of data in each one of those bowls so we can predict a lemon with a hundred percent accuracy and we can predict the apple also with 100 accuracy along with our grapes up there so we've looked at kind of a basic tree in our forest but what we really want to know is how does a random forest work as a whole so to begin our random forest classifier let's say we already have built three trees and we're going to start with the first tree that looks like this just like we did in the example this tree looks at the diameter if it's greater than or equal to three it's true otherwise it's false so one side goes to the smaller diameter one side goes to larger diameter and if the color is orange it's going to go to the right true we're using oranges now instead of lemons and if it's red it's going to go to the left false we build a second tree very similar but split differently instead of the first one being split by a diameter uh this one when they created it if you look at that first bowl it has a lot of red objects so it says is the color red because that's going to bring our entropy down the fastest and so of course if it's true it goes to the left if it's false it goes to the right and then it looks at the shape false are true and so on and so on and tree three is a diameter equal to one and it came up with this because there's a lot of cherries in this bowl so that would be the biggest split on there is is the diameter equal to one that's going to drop the entropy the quickest and as you can see it splits it into true if it goes false and they've added another category does it grow in the summer and if it's false it goes off to the left if it's true it goes off to the right let's go ahead and bring these three trees you can see them all in one image so this would be three completely different trees categorizing a fruit and let's take a fruit now let's try this and this fruit if you look at it we've blackened it out you can't see the color on it so it's missing data remember one of the things we talked about earlier is that a random forest works really good if you're missing data if you're missing pieces so this fruit has an image but maybe the person had a black and white camera when they took the picture and we're going to take a look at this and it's going to have they put the color in there so ignore the color down there but the diameter equals three we find out it grows in the summer equals yes and the shape is a circle and if you go to the right you can look at what one of the decision trees did this is the third one is the diameter greater than equal to three is the color orange well it doesn't really know on this one but if you look at the value it's a true and go to the right tree two classifies it as cherries is the color equal red is the shape a circle true it is a circle so this would look at it and say oh that's a cherry and then we go to the other classifier and it says is the diameter equal one well that's false does it grow in the summer true so it goes down and looks at as oranges so how does this random forest work the first one says it's an orange the second one said it was a cherry and the third one says it's an orange and you can guess if you have two oranges and one says it's a cherry when you add that all together the majority of the vote says orange so the answer is it's classified as an orange even though we didn't know the color and we're missing data on it i don't know about you but i'm getting tired of fruit so let's switch and i did promise you we'd start looking at a case example and get into some python coding today we're going to use the case the iris flower analysis this is the exciting part as we roll up our sleeves and actually look at some python coding before we start the python coding we need to go ahead and create a problem statement wonder what species of iris do these flowers belong to let's try to predict the species of the flowers using machine learning in python let's see how it can be done so here we begin to go ahead and implement our python code and you'll find that the first half of our implementation is all about organizing and exploring the data coming in let's go ahead and take this first step which is loading the different modules into python and let's go ahead and put that in our favorite editor whatever your favorite editor is in this case i'm going to be using the anaconda jupiter notebook which is one of my favorites certainly there's notepad plus plus and eclipse and dozens of others or just even using the python terminal window any of those will work just fine to go ahead and explore this python coding so here we go let's go ahead and flip over to our jupyter notebook and i've already opened up a new page for python 3 code and i'm just going to paste this right in there and let's take a look and see what we're bringing into our python the first thing we're going to do is from the sklearn.datasets import load iris now this isn't the actual data this is just the module that allows us to bring in the data the load iris and the iris is so popular it's been around since 1936 when ronald fisher published a paper on it and they're measuring the different parts of the flower and based on those measurements predicting what kind of flower it is and then if we're going to do a random forest classifier we need to go ahead and import a random forest classifier from the sk learn module so sklearn dot ensemble import random force classifier and then we want to bring in two more modules and these are probably the most commonly used modules in python and data science with any of the other modules that we bring in and one is going to be pandas we're going to import pandas as pd pd is a common term used for pandas and pandas is basically creates a data format for us where when you create a pandas data frame it looks like an excel spreadsheet and you'll see that in a minute when we start digging deeper into the code panda is just wonderful because it plays nice with all the other modules in there and then we have numpy which is our numbers python and the numbers python allows us to do different mathematical sets on here we'll see right off the bat we're going to take our np and we're going to go ahead and seed the randomness with it with 0. so np.random.seed is seating that is 0. this code doesn't actually show anything we're going to go ahead and run it because i need to make sure i have all those loaded and then let's take a look at the next module on here the next six slides including this one are all about exploring the data remember i told you half of this is about looking at the data and getting it all set so let's go ahead and take this code right here the script and let's get that over into our jupyter notebook and here we go we've gone ahead and run the imports and i'm going to paste the code down here and let's take a look and see what's going on the first thing we're doing is we're actually loading the iris data and if you remember up here we loaded the module that tells it how to get the iris data now we're actually assigning that data to the variable iris and then we're going to go ahead and use the df to define data frame and that's going to equal pd and if you remember that's pandas as pd so that's our pandas and panda data frame and then we're looking at iris data and columns equals iris feature names and we're going to do the df head and let's run this you can understand what's going on here the first thing you want to notice is that our df has created what looks like an excel spreadsheet and in this excel spreadsheet we have set the columns so up on the top you can see the four different columns and then we have the data iris.data down below it's a little confusing without knowing where this data is coming from so let's look at the bigger picture and i'm going to go print i'm just going to change this for a moment and we're going to print all of iris and see what that looks like so when i print all of iris i get this long list of information and you can scroll through here and see all the different titles on there what's important to notice is that first off there's a brackets at the beginning so this is a python dictionary and in a python dictionary you'll have a key or a label and this label pulls up whatever information comes after it so feature names which we actually used over here under columns is equal to an array of sepal length steeple width petal length petal width these are the different names they have for the four different columns and if you scroll down far enough you'll also see data down here oh goodness it came up right towards the top and data is equal to the different data we're looking at now there's a lot of other things in here like target we're going to be pulling that up in a minute and there's also the names the target names which is further down and we'll show you that also in a minute let's go ahead and set that back to the head and this is one of the neat features of pandas and panda data frames is when you do df.head or the pandadataframe.head it'll print the first five lines of the dataset in there along with the headers if you have it in this case we have the column header set to iris features and in here you'll see that we have zero one two three four in python most arrays always start at zero so when you look at the first five it's going to be zero one two three four not one two three four five so now we've got our iris data imported into a data frame let's take a look at the next piece of code in here and so in this section here of the code we're gonna take a look at the target and let's go ahead and get this into our notebook this piece of code so we can discuss it a little bit more in detail so here we are in our jupyter notebook i'm going to put the code in here and before i run it i want to look at a couple things going on so we have df species and this is interesting because right here you'll see where i have df species in brackets which is uh the key code for creating another column and here we have iris.target now these are both in the pandas setup on here so in pandas we can do either one i could have just as easily done iris and then in brackets target depending on what i'm working on both are acceptable let's go ahead and run this code and see how this changes and what we've done is we've added the target from the iris data set as another column on the end now what species is this is what we're trying to predict so we have our data which tells us the answer for all these different pieces and then we've added a column with the answer that way when we do our final setup we'll have the ability to program our neural network to look for these this different data and know what a setosa is or a vera color which we'll see in just a minute or virginica those are the three that are in there and now we're going to add one more column i know we're organizing all this data over and over again it's kind of fun there's a lot of ways to organize it what's nice about putting everything onto one data frame is i can then do a print out and it shows me exactly what i'm looking at and i'll show you that where you where that's different where you can alter that and do it slightly differently but let's go ahead and put this into our script up to debt now and here we go we're going to put that down here and we're going to run that and let's talk a little bit about what we're doing now we're exploring data and one of the challenges is knowing how good your model is did your model work and to do this we need to split the data and we split it into two different parts they usually call it the training and the testing and so in here we're going to go ahead and put that in our database so you can see it clearly and we've set it df remember you can put brackets this is creating another column is train so we're going to use part of it for training and this equals np remember that stands for numpy.random.uniform so we're generating a random number between zero and one and we're going to do it for each of the rows that's where the length df comes from so each row gets a generated number and if it's less than 0.75 it's true and if it's greater than 0.75 it's false this means we're going to take 75 of the data roughly because there's a randomness involved and we're going to use that to train it and then the other 25 we're going to hold off to the side and use that to test it later on so let's flip back on over and see what the next step is so now that we've labeled our database for which is training and which is testing let's go ahead and sort that into two different variables train and test and let's take this code and let's bring it into our project and here we go let's paste it on down here and before i run this let's just take a quick look at what's going on here is we have up above we created remember there's our def dot head which prints the first five rows and we've added a column is train at the end and so we're going to take that we're going to create two variables we're going to create two new data frames one's called train one's called test 75 and train 25 percent and test and then to sort that out we're going to do that by doing df our main original data frame with the iris data in it and if df is trained equals true that's going to go in the train and if df is trained equals false it goes in the test and so when i run this we're going to print out the number in each one let's see what that looks like and you'll see that it puts 118 in the training module and it puts 32 in the testing module which lets us know that there was 150 lines of data in here so if you went and looked at the original data you could see that there's 150 lines and that's roughly 75 percent in one and 25 for us to test our model on afterward so let's jump back to our code and see where this goes in the next two steps we want to do one more thing with our data and that's make it readable to humans i don't know about you but i hate looking at zeros and ones so let's start with the features and let's go ahead and take those and make those readable to humans and let's put that in our code let's see here we go paste it in and you'll see here we've done a couple very basic things we know that the columns in our data frame again this is a panda thing the df columns and we know the first four of them zero one two 3 that'd be the first four are going to be the features or the titles of those columns and so when i run this you'll see down here that it creates an index sepa length sepa width pedal length and pedal width and this should be familiar because if you look up here here's our column titles going across and here's the first four one thing i want you to notice here is that when you're in a command line whether it's jupyter notebook or you're running command line in the terminal window if you just put the name of it it'll print it out this is the same as doing print features and the short hand is you just put features in here if you're actually writing a code and saving the script and running it by remote you really need to put the print in there but for this when i run it you'll see it gives me the same thing but for this we want to go ahead and we'll just leave it as features because it doesn't really matter and this is one of the fun thing about jupiter notebooks is i'm just building the code as we go and then we need to go ahead and create the labels for the other part so let's take a look and see what that for our final step in prepping our data before we actually start running the training and the testing is we're going to go ahead and convert the species on here into something the computer understands so let's put this code into our script and see where that takes us all right here we go we've set y equal to pd.factory train species of zero so let's break this down just a little bit we have our pandas right here pd factorize what's factorized doing i'm going to come back to that in just a second let's look at what train species is and why we're looking at the group zero on there and let's go up here and here is our species remember this on that we created this whole column here for species and then it has setosa setosa setosa setosa and if you scroll down enough you'd also see virginica and veracolor we need to convert that into something the computer understands zeros and ones so the train species of zero because this is in the format of a of an array of arrays so you have to have the zero on the end and then species is just that column factorize goes in there looks at the fact that there's only three of them so when i run this you'll see that y generates an array that's equal to in this case it's the training set and it's zeros ones and twos representing the three different kinds of flowers we have so now we have something the computer understands and we have a nice table that we can read and understand and now finally we get to actually start doing the predicting so here we go we have two lines of code oh my goodness that was a lot of work to get to two lines of code but there is a lot in these two lines of code so let's take a look and see what's going on here and put this into our full script that we're running and let's paste this in here and let's take a look and see what this is we have we're creating a variable clf and we're going to set this equal to the random forest classifier and we're passing two variables in here and there's a lot of variables you can play with as far as these two are concerned they're very standard in jobs all that does is to prioritize it not something to really worry about usually when you're doing this on your own computer you do in jobs equals two if you're working in a larger or big data and you need to prioritize it differently this is what that number does is it changes your priorities and how it's going to run across the system and things like that and then the random state is just how it starts 0 is fine for here but let's go ahead and run this we also have clf.fit train features comma y and before we run it let's talk about this a little bit more clf dot fit so we're fitting we're training it we are actually creating our random forest classifier right here this is the code that does everything and we're going to take our training set remember we kept our test off to the side and we're going to take our training set with the features and then we're going to go ahead and put that in and here's our target the y so the y is 0 1 and 2 that we just created and the features is the actual data going in that we put into the training set let's go ahead and run that and this is kind of an interesting thing because it printed out the random force classifier and everything around it and so when you're running this in your terminal window on a script like this this automatically treats this like just like when we were up here and i typed in y and printed out y instead of print y this does the same thing it treats this as a variable and prints it out but if you're actually running your code that wouldn't be the case and what is printed out is it shows us all the different variables we can change and if we go down here you can actually see in jobs equals two you can see the random state equals zero those are the two that we sent in there you would really have to dig deep to find out all these different meanings of all these different settings on here some of them are self-explanatory if you kind of think about it a little bit like max features as auto so all the features that we're putting in there is just going to automatically take all four of them whatever we send it it'll take some of them might have so many features because you're processing words there might be like 1.4 million features in there because you're doing legal documents and that's how many different words are in there at that point you probably want to limit the maximum features that you're going to process and leaf nodes that's the end nodes remember we had the fruit and we're talking about the leaf nodes like i said there's a lot in this we're looking at a lot of stuff here so you might have in this case there's probably only think three leaf nodes maybe four you might have thousands of leaf notes at which point you do need to put a cap on that and say okay it can only go so far and then we're going to use all of our resources on processing this and that really is what most of these are about is limiting the process and making sure we don't overwhelm a system and there's some other settings in here again we're not going to go over all of them warm start equals false warm start is if you're programming it one piece at a time externally since we're not we're not going to have like we're not going to continually to train this particular learning tree and again like i said there's a lot of things in here that you'll want to look up more detail from the sk learn and if you're digging in deep and running a major project on here for today though all we need to do is fit or train our features and our target y so now we have our training model what's next if we're going to create a model we now need to test it remember we set aside the test feature test group 25 of the data so let's go ahead and take this code and let's put it into our script and see what that looks like okay here we go and we're going to run this and it's going to come out with a bunch of zeros ones and twos which represents the three type of flowers the setosa the virginica and the versacolor and what we're putting into our predict is the test features and i always kind of like to know what it is i am looking at so real quick we're going to do test features and remember features is an array of sepal length simple width pedal length pedal width so when we put it in this way it actually loads all these different columns that we loaded into features so if we did just features let me just do features in here so you can see what features looks like this is just playing with the with pandas data frames you'll see that it's an index so when you put an index in like this into test features into test it then takes those columns and creates a panda data frames from those columns and in this case we're going to go ahead and put those into our predict so we're going to put each one of these lines of data the 5.0 3.4 1.5.2 and we're going to put those in and we're going to predict what our new forest classifier is going to come up with and this is what it predicts it predicts zero zero zero one two one one two two two and and uh again this is the flower type setosa virginica and versacolor so now that we've taken our test features let's explore that let's see exactly what that data means to us so the first thing we can do with our predix is we can actually generate a different prediction model when i say different we're going to view it differently it's not that the data itself is different so let's take this next piece of code and put it into our script so we're pasting it in here and you'll see that we're doing uh predict and we've added underscore proba for probability so there's our clf.predict probability so we're running it just like we ran it up here but this time with this we're going to get a slightly different result and we're only going to look at the first 10. so you'll see down here instead of looking at all of them which was what 27 you'll see right down here that this generates a much larger field on the probability and let's take a look and see what that looks like and what that means so when we do the predict underscore prabha for probability it generates three numbers so we had three leaf nodes at the end and if you remember from all the theory we did this is the predictors the first one is predicting a 1 for setosa it predicts a 0 for virginica and it predicts a 0 for versa color and so on and so on and so on and let's um you know what i'm going to change this just a little bit let's look at 10 to 20 just because we can and we start to get a little different of data and you'll see right down here it gets to this one this line right here and this line has 0 0.5 0.5 and so if we're going to vote and we have two equal votes it's going to go with the first one so it says setosa gets zero votes virginica gets .5 votes versus color gets 0.5 votes but let's just go with the virginica since these two are equal and so on and so on down the list you can see how they vary on here so now we've looked at both how to do a basic predict of the features and we've looked at the predict probability let's see what's next on here so now we want to go ahead and start mapping names for the plants we want to attach names so that it makes a little more sense for us and this we're going to do in these next two steps we're going to start by setting up our predictions and mapping them to the name so let's see what that looks like and let's go ahead and paste that code in here and run it and this goes along with the next piece of code so we'll skip through this quickly and then come back to a little bit so here's iris dot target names and uh if you remember correctly this was the the names that we've been talking about this whole time the setosa virginica versus color and then we're going to go ahead and do the prediction again we've run we could have just set a variable equal to this instead of re-running it each time but we're going to run it again clf dot predict test features remember that returns the zeros the ones and the twos and then we're going to set that equal to predictions so this time we're actually putting it in a variable and when i run this it distributes it it comes out as an array and the array is setosa satosa satosa satosa setosa we're only looking at the first five we could actually do let's do the first 25 just so we can see a little bit more on there and you'll see that it starts uh mapping it to all the different flower types the versa color and the virginica in there and let's see how this goes with the next one so let's take a look at the top part of our species in here and we'll take this code and put it in our script and let's put that down here and paste it there we go and we'll go ahead and run it and let's talk about both these sections of code here and how they go together the first one is our predictions and i went ahead and did predictions through 25 let's just do five and so we have setosa satoshi satosa satoshi that's what we're predicting from our test model and then we come down here we look at test species i remember i could have just done test.species.head and you'll see it says setosa satosa satosa setosa and they match so the first one is what our forest is doing and the second one is what the actual data is now is we need to combine these so that we can understand what that means we need to know how good our forest is how good it is at predicting the features so that's where we come up to the next step which is lots of fun we're going to use a single line of code to combine our predictions and our actuals so we have a nice chart to look at and let's go ahead and put that in our script in our jupiter notebook here let's see let's go ahead and paste that in and then i'm gonna because i'm on the jupiter notebook i can do a control minus so you can see the whole line there there we go resize it and let's take a look and see what's going on here we're going to create in pandas remember pd stands for pandas and we're doing a cross tab this function takes two sets of data and creates a chart out of them so when i run it you'll get a nice chart down here and we have the predicted species so across the top you'll see the setosa versus color virginica and the actual species setosa versacolor virginica and so the way to read this chart and let's go ahead and take a look on how to read this chart here when you read this chart you have setosa where they meet you have versus color where they meet and you have virginica where they meet and they're meeting where the actual and the predicted agree so this is the number of accurate predictions so in this case it equals 30. if you had 13 plus 5 plus 12 you get 30. and then we notice here where it says virginica but it was supposed to be versacolor this is inaccurate so now we have two two inaccurate predictions and 30 accurate predictions so we'll say that the model accuracy is 93 that's just 30 divided by 32 and if we multiply by 100 we can say that it is 93 percent accurate so we have a 93 accuracy with our model i did want to add one more quick thing in here on our scripting before we wrap it up so let's flip back on over to my script in here we're going to take this line of code from up above i don't know if you remember it but predix equals the iris dot target underscore names so we're going to map it to the names and we're going to run the prediction and we read it on test features but you know we're not just testing it we want to actually deploy it so at this point i would go ahead and change this and this is an array of arrays this is really important when you're running these to know that so you need the double brackets and i could actually create data maybe let's just do two flowers so maybe i'm processing more data coming in and we'll put two flowers in here and then i actually want to see what the answer are is so let's go ahead and type in preds and print that out and when i run this you'll see that i've now predicted two flowers that maybe i measured in my front yard as versacolor and versacolor not surprising since they put the same data in for each one this would be the actual end product going out to be used on data that you don't know the answer for let's take a detour and see if we can connect us to the human experience and find out why support vector machine so in this example last week my son and i visited a fruit shop dad is that an apple or a strawberry so the question comes up what fruit do they just pick up from the fruit stand after a couple of seconds you could figure out that it was a strawberry so let's take this model a step further and let's uh why not build a model which can predict an unknown data and in this we're going to be looking at some sweet strawberries or crispy apples we want to be able to label those two and decide what the fruit is and we do that by having data already put in so we already have a bunch of strawberries we know our strawberries and they're already labeled as such we already have a bunch of apples we know our apples and are labeled as such then once we train our model that model then can be given the new data and the new data is this image in this case you can see a question mark on it and it comes through and goes it's a strawberry in this case we're using the support vector machine model svm is a supervised learning method that looks at data and sorts it into one of two categories and in this case we're sorting the strawberry into the strawberry site at this point you should be asking the question how does the prediction work before we dig into an example with numbers let's apply this to our fruit scenario we have our support vector machine we've taken it and we've taken labeled sample of data strawberries and apples and we draw on a line down the middle between the two groups this split now allows us to take new data in this case an apple and a strawberry and place them in the appropriate group based on which side of the line they fall in and that way we can predict the unknown as colorful and tasty as the fruit example is let's take a look at another example with some numbers involved and we can take a closer look at how the math works in this example we're going to be classifying men and women and we're going to start with a set of people with a different height and a different weight and to make this work we'll have to have a sample data set a female where you have their height and weight 174 65 174 88 and so on and we'll need a sample data set of the male they have a height 179 90 180 to 80 and so on let's go ahead and put this on the graph so you have a nice visual so you can see here we have two groups based on the height versus the weight and on the left side we're going to have the women on the right side we're going to have the men now if we're going to create a classifier let's add a new data point and figure out if it's male or female so before we can do that we need to split our data first we can split our data by choosing any of these lines in this case we draw on two lines through the data in the middle that separates them in from the women but to predict the gender of a new data point we should split the data in the best possible way and we say the best possible way because this line has a maximum space that separates the two classes here you can see there's a clear split between the two different classes and in this one there's not so much a clear split this doesn't have the maximum space that separates the two that is why this line best splits the data we don't want to just do this by eyeballing it and before we go further we need to add some technical terms to this we can also say that the distance between the points in the line should be as far as possible in technical terms we can say the distance between the support vector and the hyperplane should be as far as possible and this is where the support vectors are the extreme points in the data set and if you look at this data set they have circle two points which seem to be right on the outskirts of the woman and one on the outskirts of the men and hyperplane has a maximum distance to the support vectors of any class now you'll see the line down the middle and we call this the hyperplane because when you're dealing with multiple dimensions it's really not just a line but a plane of intersections and you can see here where the support vectors have been drawn in dashed lines the math behind this is very simple we take d plus the shortest distance to the closest positive point which would be on the min side and d minus is the shortest distance to the closest negative point which is on the women's side the sum of d plus and d minus is called the distance margin or the distance between the two support vectors that are shown in the dashed lines and then by finding the largest distance margin we can get the optimal hyperplane once we've created an optimal hyperplane we can easily see which side the new data fits in and based on the hyperplane we can say the new data point belongs to the male gender hopefully that's clear how that works on a visual level as a data scientist you should also be asking what happens if the hyperplane is not optimal if we select a hyperplane having low margin then there is a high chance of misclassification this particular svm model the one we discussed so far is also called referred to as the ls vm so far so clear but a question should be coming up we have our sample data set but instead of looking like this what if it looked like this where we have two sets of data but one of them occurs in the middle of another set you can see here where we have the blue and the yellow and then blue again on the other side of our data line in this data set we can't use a hyperplane so when you see data like this it's necessary to move away from a 1d view of the data to a two-dimensional view of the data and for the transformation we use what's called a kernel function the kernel function will take the 1d input and transfer it to a 2-dimensional output as you can see in this picture here the 1d when transferred to a two dimensional makes it very easy to draw a line between the two data sets what if we make it even more complicated how do we perform an svm for this type of data set here you can see we have a two-dimensional data set where the data is in the middle surrounded by the green data on the outside in this case we're going to segregate the two classes we have our sample data set and if you draw a line through it's obviously not an optimal hyperplane in there so to do that we need to transfer the 2d to a 3d array and when you translate it into a three-dimensional array using the kernel you can see where you can place a hyperplane right through it and easily split the data before we start looking at a programming example and dive into the script let's look at the advantage of the support vector machine we'll start with high dimensional input space or sometimes referred to as the curse of dimensionality we looked at earlier one dimension two dimension three dimension when you get to a thousand dimensions a lot of problems start occurring with most algorithms that have to be adjusted for the svm automatically does that in high dimensional space one of the high dimensional space one high dimensional space that we work on is sparse document vectors this is where we tokenize the words and documents so we can run our machine learning algorithms over them i've seen ones get as high as 2.4 million different tokens that's a lot of vectors to look at and finally we have regularization parameter the realization parameter or lambda is a parameter that helps figure out whether we're going to have a bias or overfitting of the data whether it's going to be overfitted to a very specific instance or is going to be biased to a high or low value with the svm it naturally avoids the overfitting and bias problems that we see in many other algorithms these three advantages of the support vector machine make it a very powerful tool to add to your repertoire of machine learning tools now we did promise you a used case study we're actually going to dive into some python programming and so we're going to go into a problem statement and start off with the zoo so in the zoo example we have family members going to the zoo we have the young child going dead is that a group of crocodiles or alligators well that's hard to differentiate and zoos are a great place to start looking at science and understanding how things work especially as a young child and so we can see the parents sitting here thinking well what is the difference between a crocodile and an alligator well one crocodiles are larger in size alligators are smaller in size snout width the crocodiles have a narrow snout and alligators have a wider snout and of course in the modern day and age the father is sitting here thinking how can i turn this into a lesson for my son and he goes let a support vector machine segregate the two groups i don't know if my dad ever told me that but that would be funny now in this example we're not going to use actual measurements and data we're just using that for imagery and that's very common in a lot of machine learning algorithms and setting them up but let's roll up our sleeves and we'll talk about that more in just a moment as we break into our python script so here we arrive in our actual coding and i'm going to move this into a python editor in just a moment but let's talk a little bit about what we're going to cover first we're going to cover in the code the setup how to actually create our svm and you're going to find that there's only two lines of code that actually create it and the rest of it is done so quick and fast that it's all here in the first page and we'll show you what that looks like as far as our data cause we're going to create some data i talked about creating data just a minute ago and so we'll get into the creating data here and you'll see this nice correction of our two blobs and we'll go through that in just a second and then the second part is we're going to take this and we're going to bump it up a notch we're going to show you what it looks like behind the scenes but let's start with actually creating our setup i like to use the anaconda jupiter notebook because it's very easy to use but you can use any of your favorite python editors or setups and go in there but let's go ahead and switch over there and see what that looks like so here we are in the anaconda python notebook or anaconda jupiter notebook with python we're using python 3. i believe this is in any of your 3x versions and you'd have to look at the sklearn and make sure if you're using a 2x version an earlier version let's go and put our code in there and one of the things i like about the jupiter notebook is i go up to view and i'm going to go ahead and toggle the line numbers on to make it a little bit easier to talk about and we can even increase the size because this is edited in this case i'm using google chrome explorer and that's how it opens up for the editor although anyone any like i said any editor will work now the first step is going to be our imports and we're going to import four different parts the first two i want you to look at are line one and line two are numpy as np and matplotlib p and matplot library dot pi plot as plt now these are very standardized imports when you're doing work the first one is the numbers python we need that because part of the platform we're using uses that for the numpy array and i'll talk about that in a minute so you can understand why we want to use a numpy array versus the standard python array and normally it's pretty standard setup to use np for numpy the map plot library is how we're going to view our data so this says you do need the np for the sk learn module but the map plot library is purely for our use for visualization and so you really don't need that for the svm but we're going to put it there so you have a nice visual aid and we can show you what it looks like that's really important at the end when you finish everything so you have a nice display for everybody to look at and then finally we're going to i'm going to jump one ahead to line number four that's the sklearn.datasets.samplesgenerator import make blobs and i told you that we were going to make up data and this is a tool that's in the sk learning and makeup data i personally don't want to go to the zoo get in trouble for jumping over the fence and probably get eaten by the crocodiles or alligators as i work on measuring their snouts and width and length instead we're just going to make up some data and that's what that make blobs is it's a wonderful tool if you're ready to test your your setup and you're not sure about what data you're going to put in there you can create this blob and it makes it real easy to use and finally we have our actual svm the sk learn import svm on line three so that covers all our imports we're going to create remember i used to make blobs to create data and we're going to create a capital x and a lowercase y equals make blobs in samples equals 40. so we're going to make 40 lines of data it's going to have two centers with a random state equals 20 so each each each group is going to have 20 different pieces of data in it and the way that looks is that we'll have under x an x y plane so i have two numbers under x and y will be 0 1 that's the two different centers so we have yes or no in this case alligator or crocodile that's what that represents and then i told you that the actual sk learner the svm is in two lines of code and we see it right here with clf equals svm dot svc kernel equals linear and i set sequel to one although in this example since we are not uh regularizing the data because we want to be very clear and easy to see i went ahead you can set it to a thousand a lot of times when you're not doing that but for this thing linear because it's a very simple linear example we only have the two dimensions and it'll be a nice linear hyperplane it'll be a nice linear line instead of a full plane so we're not dealing with a huge amount of data and then all we have to do is do clf dot fit x comma y and that's it clf has been created and then we're going to go ahead and display it and i'm going to talk about this display here in just a second but let me go ahead and run this code and this is what we've done is we've created two blobs you'll see the blue on the side and then kind of an orangish on the other side that's our two sets of data they represent one represents crocodiles and one represents alligators and then we have our measurements in this case we have like the width and length of the snout and i did say i was going to come up here and talk just a little bit about our plot and you'll see plt that's what we imported we're going to do a scatter plot that means we're just putting dots on there and then look at this notation i have the capital x and then in brackets i have a colon comma 0. that's from numpy if you did that in a regular array you'll get an error in a python array you have to have that in a numpy array it turns out that our makeblobs returns a numpy array and this notation is great because what it means is the first part is the colon means we're going to do all the rows that's all the data in our blob we created under capital x and then the second part has a comma 0. we're only going to take the first value and then if you notice we do the same thing but we're going to take the second value remember we always start with zero and then one so we have column zero and column one and you can look at this as our x y plots the first one is the x plot and the second one is the y plot so the first one is on the bottom 0 2 4 6 8 and 10 and then the second one x of the one is the 4 5 6 7 8 9 10 going up the left hand side s equals 30 is just the size of the dots we can see them instead of little tiny dots and then the c map equals plt.cm.paired and you'll also see the c equals y that's the color we're using two colors zero one and that's why we get the nice blue and the two different colors for the alligator and the crocodile now you can see here that we did this the actual uh fit was done in two lines of code a lot of times there'll be a third line where we regularize the data we set it between like minus -1 and 1 and we reshape it but for this it's not necessary and it's also kind of nice because you can actually see what's going on and then if we wanted to we wanted to actually run a prediction let's take a look and see what that looks like and to predict some new data and we'll show this again as we get towards the end of digging in deep you can simply assign your new data in this case i'm giving it a width and length 3 4 and a width and length 5 6 and note that i put the data as a set of brackets and then i have the brackets inside and the reason i do that is because when we're looking at data it's designed to process a large amount of data coming in we don't want to just process one line at a time and so in this case i'm processing two lines and then i'm just going to print and you'll see clf dot predict new data so the clf and the dot predict part is going to give us an answer and let's see what that looks like and you'll see 0 1. so predicted the first one the 3 4 is going to be on the one side and the 5 6 is going to be on the other side so one came out as an alligator and one came out as a crocodile now that's pretty short explanation for the setup but really we want to dug in and see what's going on behind the scenes and let's see what that looks like so the next step is to dig in deep and find out what's going on behind the scenes and also put that in a nice pretty graph we're going to spend more work on this and we did actually generating the original model and you'll see here that we go through a few steps and i'll move this over to our editor in just a second we come in we create our original data it's exactly identical to the first part and i'll explain why we redid that and show you how not to redo that and then we're going to go in there and add in those lines we're going to see what those lines look like and how to set those up and finally we're going to plot all that on here and show it and you'll get a nice graph with the what we saw earlier when we were going through the theory behind this where it shows the support vectors and the hyperplane and those are done where you can see the support vectors as the dashed lines and the solid line which is the hyperplane let's get that into our jupiter notebook before i scroll down to a new line i want you to notice line 13 has plot show and we're going to talk about that here in just a second but let's scroll down to a new line down here and i'm going to paste that code in and you'll see that the plot show has moved down below let's scroll up a little bit and if you look at the top here of our new section one two three and four is the same code we had before and let's go back up here and take a look at that we're gonna fit the values on our svm and then we're gonna plot scatter it and then we're gonna do a plot show so you should be asking why are we redoing the same code well when you do the plot show that blanks out what's in the plot so once i've done this plot show i have to reload that data now we could do this simply by removing it up here re-running it and then coming down here and then we wouldn't have to rerun these first four lines of code now in this it doesn't matter too much and you'll see the plot show was down here and then removed right there on line five i'll go ahead and just delete that out of there because we don't want to blank out our screen we want to move on to the next setup so we can go ahead and just skip the first four lines because we did that before and let's take a look at the ax equals plt.gc now right now we're actually spending a lot of time just graphing that's all we're doing here okay so this is how we display a nice graph with our results in our data ax is very standard note used variable when you talk about plt and it's just setting it to that axis the last axis in the plt it can get very confusing if you're working with many different layers of data on the same graph and this makes it very easy to reference the ax so this reference is looking at the plt that we created and we already mapped out our two blobs on and then we want to know the limits so we want to know how big the graph is we can find out the x limit and the y limit simply with the get x limit and get y limit commands which is part of our metplot library and then we're going to create a grid and you'll see down here we have we've set the variable xx equal to np.line space x limit 0 x limit 1 comma 30 and we've done the same thing for the y space and then we're going to go in here and we create a mesh grid and this is a numpy command so we're back to our numbers python let's go through what these numpy commands mean with the line space in the mesh grid we've taken x x small s x x equals np line space and we have our x limit 0 and our x limit 1 and we're going to create 30 points on it and we're going to do the same thing for the y axis now this has nothing to do with our evaluation it's all we're doing is we're creating a grid of data and so we're creating a set of points between 0 and the x limit we're creating 30 points and the same thing with the y and then the mesh grid loops those all together so it forms a nice grit so if we were going to do this say between the limit 0 and 10 and do 10 points we would have a 0 0 1 1 0 1 0 2 0 3 0 4 to 10 and so on you can just imagine a point at each corner one of those boxes and the mesh grid combines them all so we take the yy and the xx we created and creates the full grid and we've set that grid into the yy coordinates and the xx coordinates now remember we're working with numbi and python we like to separate those we like to have instead of it being x comma 1 you know x comma y and then x 2 comma y 2 and this in the next set of data it would be a column of x's and a column of y's and that's what we have here is we have a column of y's and we put it as a capital yy and a column of x's capital x x with all those different points being listed and finally we get down to the numpy v stack just as we created those in the mesh grid we're now going to put them all into one array x y array now that we've created the stack of data points we're going to do something interesting here we're going to create a value z and the z equals the clf that's our that's our support vector machine we created and we've already trained and we have a dot decision function we're going to put the x y in there so here we have all this data we're going to put that x y in there that data and we're going to reshape it and you'll see that we have the xx dot shape in here this literally takes the xx resets it up connected to the y and the z value lets us know whether it is the left hand side is going to generate three different values the z value does and it'll tell us whether that data is a support vector to the left the hyperplane in the middle or the support vector to the right so it generates three different values for each of those points and those points have been reshaped so they're right on a line on those three different lines so we've set all of our data up we've labeled it to three different areas and we've reshaped it and we've just taken 30 points in each direction if you do the math you have 30 times 30 so it's 900 points of data and we separated between the three lines and reshaped it to fit those three lines we can then go back to our map plot library we've created the ax and we're going to create a contour and you'll see here we have contour capital xx capital yy these have been reshaped to fit those lines z is the labels so now we have the three different points with the labels in there and we can set the colors equals k and i told you we had three different labels but we have three levels of data the alphas just makes it kind of see-through so it's only 0.5 of the value in there so when we graph it the data will show up from behind it wherever the lines go and finally the line styles this is where we set the two support vectors to be dash dash lines and then a single one is just a straight line that's what all that setup does and then finally we take our ax dot scatter we're going to go ahead and plot the support vectors but we've programmed it in there so that they look nice like the dash dash line and the dashed line on that grid and you can see here when we do the clf dot support vectors we are looking at column zero and column one and then again we have the s equals 100 so we're going to make them larger and the line width equals 1 face colors equals none let's take a look and see what that looks like when we show it and you can see we get down to our end result it creates a really nice graph we have our two support vectors and dashed lines and they have the near data so you can see those two points or in this case the four points where those lines nicely cleave the data and then you have your hyper plane down the middle which is as far from the two different points as possible creating the maximum distance so you can see that we have our nice output for the size of the body and the width of the snout and we've easily separated the two groups of crocodile and alligator congratulations you've done it we've made it of course these are pretend data for our crocodiles and alligators but this hands-on example will help you to encounter any support vector machine projects in the future and you can see how easy they are to set up and look at in-depth knn is really a fundamental place to start in the machine learning it's the basis of a lot of other things and just the logic behind it is easy to understand and incorporated in other forms of machine learning so today what's in it for you why do we need k n n what is k n how do we choose the factor k when do we use k n n how does k n algorithm work and then we'll dive in to my favorite part the use case predict whether a person will have diabetes or not that is a very common and popular used data set as far as testing out models and learning how to use the different models in machine learning by now we all know machine learning models make predictions by learning from the past data available so we have our input values our machine learning model builds on those inputs of what we already know and then we use that to create a predicted output is that a dog little kid looking over there watching the black cat cross their path no deer you can differentiate between a cat and a dog based on their characteristics cats cats have sharp claws uses to climb smaller length of ears meows and purrs doesn't love to play around dogs they have dull claws bigger length of ears barks loves to run around you usually don't see a cat running around people although i do have a cat that does that where dogs do and we can look at these we can say we can evaluate the sharpness of the claws how sharp brother claws and we can evaluate the length of the ears and we can usually sort out cats from dogs based on even those two characteristics now tell me if it is a cat or a dog not question usually little kids know cats and dogs by now unless they live a place where there's not many cats or dogs so if we look at the sharpness of the claws the length of the ears and we can see that the cat has smaller ears and sharper claws than the other animals its features are more like cats it must be a cat sharp claws length of ears and it goes in the cat group because knn is based on feature similarity we can do classification using knn classifier so we have our input value the picture of the black cat it goes into our trained model and it predicts that this is a cat coming out so what is knn what is the k n algorithm k nearest neighbors is what that stands for is one of the simplest supervised machine learning algorithms mostly used for classification so we want to know is this a dog or is not a dog is it a cat or not a cat it classifies a data point based on how its neighbors are classified knn stores all available cases and classifies new cases based on a similarity measure and here we gone from cats and dogs right into wine another favorite of mine k n stores all available cases and classifies new cases based on a similarity measure and here you see we have a measurement of sulfur dioxide versus the chloride level and then the different wines they've tested and where they fall on that graph based on how much sulfur dioxide and how much chloride k and knn is a perimeter that refers to the number of nearest neighbors to include in the majority of the voting process and so if we add a new glass of wine there red or white we want to know what the neighbors are in this case we're going to put k equals 5. we'll talk about k in just a minute a data point is classified by the majority of votes from its five nearest neighbors here the unknown point would be classified as red since four out of five neighbors are red so how do we choose k how do we know k equals five i mean that's was the value we put in there so we're going to talk about it how do we choose a factor k k n algorithm is based on feature similarity choosing the right value of k is a process called parameter tuning and is important for better accuracy so at k equals 3 we can classify we have a question mark in the middle as either a as a square or not is it a square or is it in this case a triangle and so if we set k equals to 3 we're going to look at the three nearest neighbors we're going to say this is a square and if we put k equals to 7 we classify as a triangle depending on what the other data is around and you can see as the k changes depending on where that point is that drastically changes your answer and we jump here we go how do we choose the factor of k you'll find this in all machine learning choosing these factors that's the face you get it's like oh my gosh did i choose the right k did i set it right my values in whatever machine learning tool you're looking at so that you don't have a huge bias in one direction or the other and in terms of k n n the number of k if you choose it too low the bias is based on it's just two noises it's it's right next to a couple things and it's going to pick those things and you might get a skewed answer and if your k is too big then it's going to take forever to process so you're going to run into processing issues and resource issues so what we do the most common use and there's other options for choosing k is to use the square root of n so it is a total number of values you have you take the square root of it in most cases you also if it's an even number so if you're using uh like this case squares and triangles if it's even you want to make your k value odd that helps it select better so in other words you're not going to have a balance between two different factors that are equal so usually take the square root of n and if it's even you add one to it or subtract one from it and that's where you get the k value from that is the most common use and it's pretty solid it works very well when do we use knn we can use knn when data is labeled so you need a label on it we know we have a group of pictures with dogs dogs cats cats data is noise free and so you can see here when we have a class and we have like underweight 140 23 hello kitty normal that's pretty confusing we have a high variety of data coming in so it's very noisy and that would cause an issue data set is small so we're usually working with smaller data sets where you might get into a gig of data if it's really clean doesn't have a lot of noise because k n is a lazy learner i.e it doesn't learn a discriminative function from the training set so it's very lazy so if you have very complicated data and you have a large amount of it you're not going to use the knn but it's really great to get a place to start even with large data you can sort out a small sample and get an idea of what that looks like using the knn and also just using for smaller data sets k n works really good how does a k n algorithm work consider a data set having two variables height in centimeters and weight in kilograms and each point is classified as normal or underweight so we see right here we have two variables you know true false they're either normal or they're not they're underweight on the basis of the given data we have to classify the below set as normal or underweight using knn so if we have new data coming in that says 57 kilograms and 177 centimeters is that going to be normal or underweight to find the nearest neighbors we'll calculate the euclidean distance according to the euclidean distance formula the distance between two points in the plane with the coordinates x y and a b is given by distance d equals the square root of x minus a squared plus y minus b squared and you can remember that from the two edges of a triangle we're computing the third edge since we know the x side and the y side let's calculate it to understand clearly so we have our unknown point and we placed it there in red and we have our other points where the data is scattered around the distance d1 is the square root of 170 minus 167 squared plus 57 minus 51 squared which is about 6.7 and distance 2 is about 13. and distance 3 is about 13.4 similarly we will calculate the euclidean distance of unknown data point from all the points in the data set and because we're dealing with small amount of data that's not that hard to do it's actually pretty quick for a computer and it's not a really complicated mass you can just see how close is the data based on the euclidean distance hence we have calculated the euclidean distance of unknown data point from all the points as shown where x1 and y1 equal 57 and 170 whose class we have to classify so now we're looking at that we're saying well here's the euclidean distance who's going to be their closest neighbors now let's calculate the nearest neighbor at k equals three and we can see the three closest neighbors puts them at normal and that's pretty self-evident when you look at this graph it's pretty easy to say okay what you know we're just voting normal normal normal three votes for normal this is going to be a normal weight so majority of neighbors are pointing towards normal hence as per k n algorithm the class of 170 should be normal so a recap of knn positive integer k is specified along with a new sample we select the k entries in our database which are closest to the new sample we find the most common classification of these entries this is the classification we give to the new sample so as you can see it's pretty straightforward we're just looking for the closest things that match what we got so let's take a look and see what that looks like in a use case in python so let's dive into the predict diabetes use case so use case predict diabetes the objective predict whether a person will be diagnosed with diabetes or not we have a data set of 768 people who were or were not diagnosed with diabetes and let's go ahead and open that file and just take a look at that data and this is in a simple spreadsheet format the data itself is comma separated very common set of data and it's also a very common way to get the data and you can see here we have columns a through i that's what one two three four five six seven eight eight columns with a particular attribute and then the ninth column which is the outcome is whether they have diabetes as a data scientist the first thing you should be looking at is insulin well you know if someone has insulin they have diabetes because that's why they're taking it and that could cause issue in some of the machine learning packages but for a very basic setup this works fine for doing the knn and the next thing you notice is it didn't take very much to open it up i can scroll down to the bottom of the data there's 768. it's pretty much a small data set you know at 769 i can easily fit this into my ram on my computer i can look at it i can manipulate it and it's not going to really tax just a regular desktop computer you don't even need an enterprise version to run a lot of this so let's start with importing all the tools we need and before that of course we need to discuss what ide i'm using certainly can use any particular editor for python but i like to use for doing very basic visual stuff the anaconda which is great for doing demos with the jupiter notebook and just a quick view of the anaconda navigator which is the new release out there which is really nice you can see under home i can choose my application we're going to be using python36 i have a couple different versions on this particular machine if i go under environments i can create a unique environment for each one which is nice and there's even a little button there where i can install different packages so if i click on that button and open the terminal i can use a simple pip install to install different packages i'm working with let's go ahead and go back under home and we're going to launch our notebook and i've already you know kind of like the old cooking shows i've already prepared a lot of my stuff so we don't have to wait for it to launch because it takes a few minutes for it to open up a browser window in this case it's going to open up chrome because that's my default that i use and since the script is pre-done you'll see have a number of windows open up at the top the one we're working in and since we're working on the k n predict whether a person will have diabetes or not let's go and put that title in there and i'm also going to go up here and click on cell actually we want to go ahead and first insert a cell below and then i'm going to go back up to the top cell and i'm going to change the cell type to markdown that means this is not going to run as python it's a markdown language so if i run this first one it comes up in nice big letters which is kind of nice remind us what we're working on and by now you should be familiar with doing all of our imports we're going to import the pandas as pd import numpy as np pandas is the pandas data frame and numpy is a number array very powerful tools to use in here so we have our imports so we've brought in our pandas our numpy our two general python tools and then you can see over here we have our train test split by now you should be familiar with splitting the data we want to split part of it for training our thing and then training our particular model and then we want to go ahead and test the remaining data to see how good it is pre-processing a standard scalar preprocessor so we don't have a bias of really large numbers remember in the data we had like number of pregnancies isn't going to get very large where the amount of insulin they take can get up to 256. so 256 versus 6 that will skew results so we want to go ahead and change that so they're all uniform between minus 1 and 1. and then the actual tool this is the k neighbors classifier we're going to use and finally the last three are three tools to test all about testing our model how good is it we just put down test on there and we have our confusion matrix our f1 score and our accuracy so we have our two general python modules we're importing and then we have our six modules specific from the sk learn setup and then we do need to go ahead and run this so these are actually imported there we go and then move on to the next step and so in this set we're going to go ahead and load the database we're going to use pandas remember pandas is pd and we'll take a look at the data in python we looked at it in a simple spreadsheet but usually i like to also pull it up so we can see what we're doing so here's our data set equals pd.read csv that's a pandas command and the diabetes folder i just put in the same folder where my ipython script is if you put in a different folder you need the full length on there we can also do a quick length of the data set that is a simple python command len for length we might even let's go ahead and print that we'll go print and if you do it on its own line link that data set the jupyter notebook it'll automatically print it but when you're in most of your different setups you want to do the print in front of there and then we want to take a look at the actual data set and since we're in pandas we can simply do data set head and again let's go ahead and add the print in there if you put a bunch of these in a row you know the data set 1 had data set two head it only prints out the last one so i usually always like to keep the print statement in there but because most projects only use one data frame panda's data frame doing it this way doesn't really matter the other way works just fine and you can see when we hit the run button we have the 768 lines which we knew and we have our pregnancies it's automatically given a label on the left remember the head only shows the first five lines so we have zero through four and just a quick look at the data you can see it matches what we looked at before we have pregnancy glucose blood pressure all the way to age and then the outcome on the end and we're going to do a couple things in this next step we're going to create a list of columns where we can't have zero there's no such thing as zero skin thickness or zero blood pressure zero glucose uh any of those you'd be dead so not a really good factor if they don't if they have a zero in there because they didn't have the data and we'll take a look at that because we're going to start replacing that information with a couple of different things and let's see what that looks like so first we create a nice list as you can see we have the values talked about glucose blood pressure skin thickness and this is a nice way when you're working with columns is to list the columns you need to do some kind of transformation on a very common thing to do and then for this particular setup we certainly could use the there's some panda tools that will do a lot of this where we can replace the n a but we're going to go ahead and do it as a data set column equals dataset column.replace this is this is still pandas you can do a direct there's also one that's that you look for your nan a lot of different options in here but the nan numpy nan is what that stands for is none it doesn't exist so the first thing we're doing here is we're replacing the zero with a numpy none there's no data there that's what that says that's what this is saying right here so put the zero in and we're going to play zeros with no data so if it's a zero that means the person's well hopefully not dead hope they just didn't get the data the next thing we want to do is we're going to create the mean which is the integer from the data set from the column dot mean where we skip n a's we can do that that is a pandas command there the skip n a so we're going to figure out the mean of that data set and then we're going to take that data set column and we're going to replace all the n a npn with the means why did we do that and we could have actually just taken this step and gone right down here and just replaced zero and skip anything where except you could actually there's a way to skip zeros and then just replace all the zeros but in this case we want to go ahead and do it this way so you can see that we're switching this to a non-existent value then we're going to create the mean well this is the average person so if we don't know what it is if they did not get the data and the data is missing one of the tricks is you replace it with the average what is the most common data for that this way you can still use the rest of those values to do your computation and it kind of just brings that particular value of those missing values out of the equation let's go ahead and take this and we'll go ahead and run it doesn't actually do anything so we're still preparing our data if you want to see what that looks like we don't have anything in the first few lines just not going to show up but we certainly could look at a row let's do that let's go into our data set with the printed data set and let's pick in this case let's just do glucose and if i run this this is going to print all the different glucose levels going down and we thankfully don't see anything in here that looks like missing data at least on the ones it shows you can see it skipped a bunch in the middle that's what it does if you have too many lines in jupiter notebook it'll skip a few and go on to the next in a data set let me go and remove this and we'll just zero out that and of course before we do any processing before proceeding any further we need to split the data set into our train and testing data that way we have something to train it with and something to test it on and you're going to notice we did a little something here with the pandas database code there we go my drawing tool we've added in this right here of the data set and what this says is that the first one in pandas this is from the pd pandas it's going to say within the data set we want to look at the eye location and it is all rows that's what that says so we're going to keep all the rows but we're only looking at 0 column 0 to 8. remember column 9 here it is right up here we printed it in here is outcome well that's not part of the training data that's part of the answer yes column nine but it's listed as eight number eight so zero to eight is nine columns so eight is the value and when you see it in here zero this is actually zero to seven it doesn't include the last one and then we go down here to y which is our answer and we want just the last one just column eight and you can do it this way with this particular notation and then if you remember we imported the train test split that's part of the sklearn right there and we simply put in our x and our y we're going to do random state equals zero you don't have to necessarily seat it that's a seed number i think the default is one when you seed it i have to look that up and then the test size test size is 0.2 that simply means we're going to take 20 percent of the data and put it aside so that we can test it later that's all that is and again we're going to run it not very exciting so far we haven't had any printout other than to look at the data but that is a lot of this is prepping this data once you prep it the actual lines of code are quick and easy and we're almost there with the actual writing of our knn we need to go ahead and do a scale the data if you remember correctly we're fitting the data in a standard scalar which means instead of the data being from you know five to 303 in one column and the next column is 1 to 6. we're going to set that all so that all the data is between -1 and 1. that's what that standard scalar does keeps it standardized and we only want to fit the scaler with the training set but we want to make sure the testing set is the x test going in is also transformed so it's processing it the same so here we go with our standard scalar we're going to call it sc underscore x for the scalar and we're going to import the standard scalar into this variable and then our x train equals sc underscore x dot fit transform so we're creating the scalar on the x train variable and then our x test we're also going to transform it so we've trained and transformed the x train and then the x test isn't part of that training it isn't part of that of training the transformer it just gets transformed that's all it does and again we're going to go and run this and if you look at this we've now gone through these steps all three of them we've taken care of replacing our zeros for key columns that shouldn't be zero and we replace that with the means of those columns that way that they fit right in with our data models we've come down here we split the data so now we have our test data and our training data and then we've taken and we've scaled the data so all of our data going in no no we don't tr we don't train the y part the y train and y test that never has to be trained it's only the data going in that's what we want to train in there then define the model using k neighbors classifier and fit the train data in the model so we do all that data prep and you can see down here we're only going to have a couple lines of code where we're actually building our model and training it that's one of the cool things about python and how far we've come it's such an exciting time to be in machine learning because there's so many automated tools let's see before we do this let's do a quick length of and let's do y we want let's just do length of y and we get 768 and if we import math we do math dot square root let's do y train there we go it's actually supposed to be x train before we do this let's go ahead and do import math and do math square root length of y test and when i run that we get 12.409 i want to show you where this number comes from we're about to use 12 is an even number so if you know if you're ever voting on things remember the neighbors all vote don't want to have an even number of neighbors voting so we want to do something odd and let's just take one away we'll make it 11. let me delete this out of here this one the reasons i love jupiter notebook is you can flip around and do all kinds of things on the fly so we'll go ahead and put in our classifier we're creating our classifier now and it's going to be the k neighbors classifier n neighbors equal 11. remember we did 12 minus 1 for 11 so we have an odd number of neighbors p equals 2 because we're looking for is it are they diabetic or not and we're using the euclidean metric there are other means of measuring the distance you could do like square square means values all kinds of measure this but the euclidean is the most common one and it works quite well it's important to evaluate the model let's use the confusion matrix to do that and we're going to use the confusion matrix wonderful tool and then we'll jump into the f1 score and finally accuracy score which is probably the most commonly used quoted number when you go into a meeting or something like that so let's go ahead and paste that in there and we'll set the cm equal to confusion matrix why test y predict so those are the two values we're going to put in there and let me go ahead and run that and print it out and the way you interpret this is you have the y predicted which would be your title up here you could do let's just do p r e d predicted across the top and actual going down actual it's always hard to to write in here actual that means that this column here down the middle that's the important column and it means that our prediction said 94 and prediction in the actual agreed on 94 and 32 this number here the 13 and the 15 those are what was wrong so you could have like three different if you're looking at this across three different variables instead of just two you'd end up with the third row down here and that column going down the middle so in the first case we have the the and i believe the zero has a 94 people who don't have diabetes the prediction said that 13 of those people did have diabetes and were at high risk and the 32 that had diabetes it had correct but our prediction said another 15 out of that 15 it classified as incorrect so you can see where that classification comes in and how that works on the confusion matrix then we're going to go ahead and print the f1 score let me just run that and you see we get a 0.69 in our f1 score the f1 takes into account both sides of the balance of false positives where if we go ahead and just do the accuracy account and that's what most people think of is it looks at just how many we got right out of how many we got wrong so a lot of people when you're a data scientist and you're talking to other data scientists they're going to ask you what the f1 score the f score is if you're talking to the general public or the decision makers in the business they're going to ask what the accuracy is and the accuracy is always better than the f1 score but the f1 score is more telling it lets us know that there's more false positives than we would like on here but 82 percent not too bad for a quick flash look at people's different statistics and running an sk learn and running the knn the k nearest neighbor on it so we have created a model using knn which can predict whether a person will have diabetes or not or at the very least whether they should go get a checkup and have their glucose checked regularly or not the print accuracy score we got the 0.818 was pretty close to what we got and we can pretty much round that off and just say we have an accuracy of 80 percent tells us it is a pretty fair fit in the model welcome to the session on k-means clustering so what is k-means clustering k-means clustering is an unsupervised learning algorithm in this case you don't have labeled data unlike in supervised learning so you have a set of data and you want to group them and as the name suggests you want to put them into clusters which means objects that are similar in nature similar in characteristics need to be put together so that's what k means clustering is all about the term k is basically is a number so we need to tell the system how many clusters we need to perform so if k is equal to two there will be two clusters if k is equal to three three clusters and so on and so forth that's what the k stands for and of course there is a way of finding out what is the best or optimum value of k for a given data we will look at that so that is k means clustering so let's take an example k-means clustering is used in many many scenarios but let's take an example of cricket the game of cricket let's say you received data of a lot of players from maybe all over the country or all over the world and this data has information about the runs scored by the people ordered by the player and the wickets taken by the player and based on this information we need to cluster this data into two clusters batsman and bowlers so this is an interesting example let's see how we can perform this so we have the data which consists of primarily two characteristics which is the runs and the wickets so the bowlers basically take wickets and the batsmen score runs there will be of course a few bowlers who can score some runs and similarly there will be some batsmen who will who would have taken a few wickets but with this information we want to cluster those players into batsmen and bowlers so how does this work let's say this is how the data is so there are information there is information on the y-axis about the run scored and on the x axis about the wickets taken by the players so if we do a quick plot this is how it would look and when we do the clustering we need to have the clusters like shown in the third diagram rtms we need to have a cluster which consists of people who have scored high runs which is basically the batsman and then we need a cluster with people who have taken a lot of wickets which is typically the bowlers there may be a certain amount of overlap but we will not talk about it right now so with k-means clustering we will have here that means k is equal to 2 and we will have two clusters such as batsman and bowlers so how does this work the way it works is the first step in k-means clustering is the allocation of two centroids randomly so two points are assigned as so-called centroids so in this case we want two clusters which means k is equal to two so two points have been randomly assigned as centroids keep in mind these points can be anywhere there are random points they are not initially they are not really the centroids centroid means it's a central point of a given data set but in this case when it starts off it's not really the centroid okay so these points though in our presentation here we have shown them one point closer to these data points and another closer to these data points they can be assigned randomly anywhere okay so that's the first step the next step is to determine the distance of each of the data points from each of the randomly assigned centroids so for example we take this point and find the distance from this centroid and the distance from this center right this point is taken and the distance is found from this centroid and the center and so on and so forth so for every point the distance is measured from both the centroids and then whichever distance is less that point is assigned to that centroid so for example in this case visually it is very obvious that all these data points are assigned to this centroid and all these data points are assigned to this centroid and that's what is represented here in blue color and in this yellow color the next step is to actually determine the central point or the actual centroid for these two clusters so we have this one initial cluster this one initial cluster but as you can see these points are not really the centroid centroid means it should be the central position of this data set central position of this data set so that is what needs to be determined as the next step so the central point of the actual centroid is determined and the original randomly allocated centroid is repositioned to the actual centroid of this new clusters and this process is actually repeated now what might happen is some of these points may get reallocated in our example that is not happening probably but it may so happen that the distance is found between each of these data points once again with these centroids and if there is if it is required some points may be reallocated we will see that in a later example but for now we will keep it simple so this process is continued till the centroid repositioning stops and that is our final cluster so this is our so after iteration we come to this position this situation where the centroid doesn't need any more repositioning and that means our algorithm has converged convergence has occurred and we have the cluster two clusters we have the clusters with a centroid so this process is repeated the process of calculating the distance and repositioning the centroid is repeated till the repositioning stops which means that the algorithm has converged and we have the final cluster with the data points and the centroids so this is what you're going to learn from this session we will talk about the types of clustering what is k-means clustering application of k-means clustering k-means clustering is done using distance measure so we will talk about the common distance measures and then we will talk about how k-means clustering works and go into the details of k-means clustering algorithm and then we will end with a demo and a use case for k-means clustering so let's begin first of all what are the types of clustering there are primarily two categories of clustering hierarchical clustering and then partitional clustering and each of these categories are further subdivided into agglomerative and divisive clustering and k-means and fuzzy c means clustering let's take a quick look at what each of these types of clustering are in hierarchical clustering the clusters have a tree-like structure and hierarchical clustering is further divided into agglomerative and divisive the glomerulative clustering is a bottom-up approach we begin with each element as a separate cluster and merge them into successively larger clusters so for example we have abcdef we start by combining b and c form one cluster d e and e form one more then we combine d e and f one more bigger cluster and then add b c to that and then finally a to it compared to that divisive clustering or divisive clustering is a top-down approach we begin with the whole set and proceed to divide it into successively smaller clusters so we have abcdef we first take that as a single cluster and then break it down into a b c d e and f then we have partitional clustering split into two subtypes k means clustering and fuzzy c means in k means clustering the objects are divided into the number of clusters mentioned by the number k that's where the k comes from so if we say k is equal to 2 the objects are divided into two clusters c1 and c2 and the way it is done is the features or characteristics are compared and all objects having similar characteristics are clubbed together so that's how k means clustering is done we will see it in more detail as we move forward and fuzzy c means is very similar to k-means in the sense that it clubs objects that have similar characteristics together but while in k-means clustering two objects cannot belong to or any object a single object cannot belong to two different clusters in c means objects can belong to more than one cluster so that is the primary difference between k-means and fuzzy c means so what are some of the applications of k-means clustering k-means clustering is used in a variety of examples or variety of business cases in real life starting from academic performance diagnostic systems search engines and wireless sensor networks and many more so let us take a little deeper look at each of these examples academic performance so based on the scores of the students students are categorized into a b c and so on clustering forms a backbone of search engines when a search is performed the search results need to be grouped together the search engines very often use clustering to do this and similarly in case of wireless sensor networks the clustering algorithm plays the role of finding the cluster heads which collects all the data in its respective cluster so clustering especially k means clustering uses distance measure so let's take a look at what is distance measure so while these are the different types of clustering in this video we will focus on k-means clustering so distance measure tells how similar some objects are so the similarity is measured using what is known as distance measure and what are the various types of distance measures there is euclidean distance there is manhattan distance then we have squared euclidean distance measure and cosine distance measure these are some of the distance measures supported by k-means clustering let's take a look at each of these what is euclidean distance measure this is nothing but the distance between two points so we have learnt in high school how to find the distance between two points this is a little sophisticated formula for that but we know a simpler one is square root of y2 minus y1 whole square plus x2 minus x1 whole square so this is an extension of that formula so that is the euclidean distance between two points what is the squared euclidean distance measure it's nothing but the square of the euclidean distance as the name suggests so instead of taking the square root we leave the square as it is and then we have manhattan distance measure in case of manhattan distance it is the sum of the distances across the x-axis and the y-axis and note that we are taking the absolute value so that the negative values don't come into play so that is the manhattan distance measure then we have cosine distance measure in this case we take the angle between the two vectors formed by joining the points from the origin so that is the cosine distance measure okay so that was a quick overview about the various distance measures that are supported by k-means now let's go and check how exactly k-means clustering works okay so this is how k-means clustering works this is like a flowchart of the whole process there is a starting point and then we specify the number of clusters that we want now there are a couple of ways of doing this we can do by trial and error so we specify a certain number maybe k is equal to 3 or 4 or 5 to start with and then as we progress we keep changing until we get the best clusters or there is a technique called elbow technique whereby we can determine the value of k what should be the best value of k how many clusters should be formed so once we have the value of k we specify that and then the system will assign that many centroids so it picks randomly that to start with randomly that many points that are considered to be the centroids of these clusters and then it measures the distance of each of the data points from these centroids and assigns those points to the corresponding centroid from which the distance is minimum so each data point will be assigned to the centroid which is closest to it and thereby we have k number of initial clusters however this is not the final clusters the next step it does is for the new groups for the clusters that have been formed it calculates the main position thereby calculates the new centroid position the position of the centroid moves compared to the randomly allocated one so it's an iterative process once again the distance of each point is measured from this new centroid point and if required the data points are reallocated to the new centroids and the mean position or the new centroid is calculated once again if the centroid moves then the iteration continues which means the convergence has not happened the clustering has not converged so as long as there is a movement of the centroid this iteration keeps happening but once the centroid stops moving which means that the cluster has converged or the clustering process has converged that will be the end result so now we have the final position of the centroid and the data points are allocated accordingly to the closest centroid i know it's a little difficult to understand from this simple flowchart so let's do a little bit of visualization and see if we can explain it better let's take an example if we have a data set for a grocery shop so let's say we have a data set for a grocery shop and now we want to find out how many clusters this has to be spread across so how do we find the optimum number of clusters there is a technique called the elbow method so when these clusters are formed there is a parameter called within sum of squares and the lower this value is the better the cluster is that means all these points are very close to each other so we use this within sum of squares as a measure to find the optimum number of clusters that can be formed for a given data set so we create clusters or we let the system create clusters of a variety of numbers maybe of 10 10 clusters and for each value of k the within ss is measured and the value of k which has the least amount of within ss or wss that is taken as the optimum value of k so this is the diagrammatic representation so we have on the y axis the within sum of squares or wss and on the x axis we have the number of clusters so as you can imagine if you have k is equal to 1 which means all the data points are in a single cluster the within s value will be very high because they are probably scattered all over the moment you split it into two there will be a drastic fall in the within ss value that's what is represented here but then as the value of k increases the decrease the rate of decrease will not be so high it will continue to decrease but probably the rate of decrease will not be high so that gives us an idea so from here we get an idea for example the optimum value of k should be either 2 or 3 or at the most 4 but beyond that increasing the number of clusters is not dramatically changing the value in wss because that pretty much gets stabilized okay now that we have got the value of k and let's assume that these are our delivery points the next step is basically to assign two centroids randomly so let's say c1 and c2 are the centroids assigned randomly now the distance of each location from the centroid is measured and each point is assigned to the centroid which is closest to it so for example these points are very obvious that these are closest to c1 whereas this point is far away from c2 so these points will be assigned which are close to c1 will be assigned to c1 and these points or locations which are close to c2 will be assigned to c2 and then so this is the how the initial grouping is done this is part of c1 and this is part of c2 then the next step is to calculate the actual centroid of this data because remember c1 and c2 are not the centroids they have been randomly assigned points and only thing that has been done was the data points which are closest to them have been assigned but now in this step the actual centroid will be calculated which may be for each of these data sets somewhere in the middle so that's like the main point that will be calculated and the centroid will actually be positioned or repositioned there same with c2 so the new centroid for this group is c2 in this new position and c1 is in this new position once again the distance of each of the data points is calculated from these centroids now remember it's not necessary that the distance still remains the or each of these data points still remain in the same group by recalculating the distance it may be possible that some points get reallocated like so you see this so this point earlier was closer to c2 because c2 was here but after recalculating repositioning it is observed that this is closer to c1 than c2 so this is the new grouping so some points will be reassigned and again the centroid will be calculated and if the centroid doesn't change so that is a repetitive process iterative process and if the centroid doesn't change once the centroid stops changing that means the algorithm has converged and this is our final cluster with this as the centroid c1 and c2 as the centroids these data points as a part of each cluster so i hope this helps in understanding the whole process iterative process of k-means clustering so let's take a look at the k-means clustering algorithm let's say we have x1 x2 x3 n number of points as our inputs and we want to split this into k clusters or we want to create k clusters so the first step is to randomly pick k points and call them centroids they are not real centroids because centroid is supposed to be a center point but they are just called centroids and we calculate the distance of each and every input point from each of the centroids so the distance of x1 from c1 from c2 c3 each of the distances we calculate and then find out which distance is the lowest and assign x1 to that particular random centroid repeat that process for x2 calculate its distance from each of the centroids c1 c2 c3 up to ck and find which is the lowest distance and assign x2 to that particular center same with x3 and so on so that is the first round of assignment that is done now we have k groups because there are we have assigned the value of k so there are k centroids and so there are k groups all these inputs have been split into k groups however remember we picked the centroids randomly so they are not real centroids so now what we have to do we have to calculate the actual centroids for each of these groups which is like the mean position which means that the position of the randomly selected centroids will now change and they will be the main positions of these newly formed k groups and once that is done we once again repeat this process of calculating the distance right so this is what we are doing as a part of step four we repeat step two and three so we again calculate the distance of x1 from the centroid c1 c2 c3 and then c which is the lowest value and assign x1 to that calculate the distance of x2 from c1 c2 c3 or whatever up to ck and find whichever is the lowest distance and assign x2 to that centroid and so on in this process there may be some reassignment x1 was probably assigned to cluster c2 and after doing this calculation maybe now x1 is assigned to c1 so that kind of reallocation may happen so we repeat the steps 2 and three till the position of the centroids don't change or stop changing and that's when we have convergence so let's take a detailed look at it at each of these steps so we randomly pick k cluster centers we call them centroids because they are not initially they are not really the centroids so we let us name them c1 c2 up to ck and then step two we assign each data point to the closest center so what we do we calculate the distance of each x value from each c value so the distance between x1 c1 distance between x1 c2 x1 c3 and then we find which is the lowest value right that's the minimum value we find and assign x1 to that particular centroid then we go next to x2 find the distance of x2 from c1 x2 from c2 x2 from c3 and so on up to ck and then assign it to the point or to the centroid which has the lowest value and so on so that is step number two in step number three we now find the actual centroid for each group so what has happened as a part of step number two we now have all the points all the data points grouped into k groups because we we wanted to create k clusters right so we have k groups each one may be having a certain number of input values and they need not be equally distributed by the weight based on the distance we will have k groups but remember the initial values of the c1 c2 were not really the centroids of these groups right we assigned them randomly so now in step 3 we actually calculate the centroid of each group which means the original point which we thought was the centroid will shift to the new position which is the actual centroid for each of these groups okay and we again calculate the distance so we go back to step two which is what we calculate again the distance of each of these points from the newly positioned centroids and if required we reassign these points to the new centroids so as i said earlier there may be a reallocation so we now have a new set or a new group we still have k groups but the number of items and the actual assignment may be different from what was in step to here okay so that might change then we perform step 3 once again to find the new centroid of this new group so we have again a new set of clusters new centroids and new assignments we repeat this step two again once again we find and then it is possible that after iterating through three or four or five times the centroid will stop moving in the sense that when you calculate the new value of the centroid that will be same as the original value or there will be very marginal change so that is when we say convergence has occurred and that is our final cluster that's the formation of the final cluster all right so let's see a couple of demos of k-means clustering we will actually see some live demos in python notebook using pattern notebook but before that let's find out what's the problem that we are trying to solve the problem statement is let's say walmart wants to open a chain of stores across the state of florida and it wants to find the optimal store locations now the issue here is if they open too many stores close to each other obviously they will not make profit but if they if the stores are too far apart then they will not have enough sales so how do they optimize this now for an organization like walmart which is an e-commerce giant they already have the addresses of their customers in their database so they can actually use this information or this data and use k-means clustering to find the optimal location now before we go into the python notebook and show you the live code i wanted to take you through very quickly a summary of the code in the slides and then we will go into the python notebook so in this block we are basically importing all the required libraries like numpy matplotlib and so on and we are loading the data that is available in the form of let's say the addresses for simplicity's sake we will just take them as some data points then the next thing we do is quickly do a scatter plot to see how they are related to each other with respect to each other so in the scatter plot we see that there are a few distinct groups already being formed so you can actually get an idea about how the cluster would look and how many clusters what is the optimal number of clusters and then starts the actual k-means clustering process so we will assign each of these points to the centroids and then check whether they are the optimal distance which is the shortest distance and assign each of the points data points to the centroids and then go through this iterative process till the whole process converges and finally we get an output like this so we have four distinct clusters and which is if we can say that this is how the population is probably distributed across florida state and the centroids are like the location where the store should be the optimum location where the store should be so that's the way we determine the best locations for the store and that's how we can help walmart find the best locations for their stores in florida so now let's take this into python notebook let's see how this looks when we are learning running the code live all right so this is the code for k-means clustering in jupyter notebook we have a few examples here which we will demonstrate how k means clustering is used and even there is a small implementation of k-means clustering as well okay so let's get started okay so this block is basically importing the various libraries that are required like matplotlib and numpy and so on and so forth which would be used as a part of the code then we are going and creating blobs which are similar to clusters now this is a very neat feature which is available in scikit-learn make blobs is a nice feature which creates clusters of data sets so that's a wonderful functionality that is readily available for us to create some test data kind of thing okay so that's exactly what we are doing here we are using make blobs and we can specify how many clusters we want so centers we are mentioning here so it will go ahead and so we just mentioned four so it will go ahead and create some test data for us and this is how it looks as you can see visually also we can figure out that there are four distinct classes or clusters in this data set and that is what make blobs actually provides now from here onwards we will basically run the standard k-means functionality that is readily available so we really don't have to implement k-means itself the k-means functionality or the function is readily available you just need to feed the data and we'll create the clusters so this is the code for that we import k-means and then we create an instance of k means and we specify the value of k this n underscore clusters is the value of k remember k means in k means k is basically the number of clusters that you want to create and it is a integer value so this is where we are specifying that so we have k is equal to 4 and so that instance is created we take that instance and as with any other machine learning functionality fit is what we use the function or the method rather fit is what we use to train the model here there is no real training kind of thing but that's the call okay so we are calling fit and what we are doing here we are just passing the data so x has these values the data that has been created right so that is what we are passing here and this will go ahead and create the clusters and then we are using after doing uh fit we run the predict which uh basically assigns for each of these observations which cluster where it belongs to all right so it will name the clusters maybe this is cluster one this is two three and so on or i will actually start from zero cluster zero one two and three maybe and then for each of the observations it will assign based on which cluster it belongs to it will assign a value so that is stored in y underscore k means when we call predict that is what it does and we can take a quick look at these y underscore k means or with the cluster numbers that have been assigned for each observation so this is the cluster number assigned for observation 1 maybe this is for observation 2 observation 3 and so on so we have harmony about i think 300 samples right so all the 300 samples there are 300 values here each of them the cluster number is given and the cluster number goes from 0 to 3. so there are four clusters so the numbers go from 0 1 2 3. so that's what is seen here okay now so this was a quick example of generating some dummy data and then clustering that okay and this can be applied if you have proper data you can just load it up into x for example here and then run the cable so this is the central part of the k-means clustering program example so you basically create an instance and you mention how many customers you want by specifying this parameter and underscore clusters and that is also the value of k and then pass the data to get the values now the next section of this code is the implementation of a k-means now this is kind of a rough implementation of the k-means algorithm so we will just walk you through i will walk you through the code uh at each step what it is doing and then we will see a couple of more examples of how k means clustering can be used in maybe some real life examples real life use cases all right so in this case here what we are doing is basically implementing the k means clustering and there is a function or a library calculates for a given two pairs of points it will calculate the the distance between them and see which one is the closest and so on so this is like this is pretty much like what k means does right so it calculates the distance of each point or each data set from predefined centroid and then based on whichever is the lowest this particular data point is assigned to that centroid so that is basically available as a standard function and we will be using that here so as explained in the slides the first step that is done in case of k-means clustering is to randomly assign some centroids so as a first step we randomly allocate a couple of centroids which we call here we are calling as centers and then we put this in a loop and we take it through an iterative process for each of the data points we first find out using this function pairwise distance argument for each of the points we find out which one which center or which randomly selected centroid is the closest and accordingly we assign the data or the data point to that particular centroid or cluster and once that is done for all the data points we calculate the new centroid by finding out the mean position which is the center position right so we calculate the new centroid and then we check if the new centroid is the coordinates or the position is the same as the previous centroid the positions we will compare and if it is the same that means the process has converged so remember we do this process till the centroids or the centroid doesn't move anymore right so the centroid gets relocated each time this reallocation is done so the moment it doesn't change anymore the position of the centroid doesn't change anymore we know that convergence has occurred so till then so you see here this is like an infinite loop while true is an infinite loop it only breaks when the centers are the same the new center and the old center positions are the same and once that is done we return the centers and the labels now of course as explained this is not a very sophisticated and advanced implementation very basic implementation because one of the flaws in this is that sometimes what happens is the centroid the position will keep moving but in the change will be very minor so in that case also with that is actually convergence right so for example the change is 0.0001 we can consider that as convergence otherwise what will happen is this will either take forever or it will be never ending so that's a small flaw here so that is something additional checks may have to be added here but again as mentioned this is not the most sophisticated implementation uh this is like a kind of a rough implementation of the k-means clustering okay so if we execute this code this is what we get as the output so this is the definition of this particular function and then we call that find underscore clusters and we pass our data x and the number of clusters which is four and if we run that and plot it this is the output that we get so this is of course each cluster is represented by a different color so we have a cluster in green color yellow color and so on and so forth and these big points here these are the centroids this is the final position of the centroids and as you can see visually also this appears like a kind of a center of all these points here right similarly this is like the center of all these points here and so on so this is the example or this is an example of an implementation of k means clustering and next we will move on to see a couple of examples of how k-means clustering is used in maybe some real-life scenarios or use cases in the next example or demo we are going to see how we can use k-means clustering to perform color compression we will take a couple of images so there will be two examples and we will try to use k-means clustering to compress the colors this is a common situation in image processing when you have an image with millions of colors but then you cannot render it on some devices which may not have enough memory so that is the scenario where where something like this can be used so before again we go into the python notebook let's take a look at quickly the the code as usual we import the libraries and then we import the image and then we will flatten it so the reshaping is basically we have the image information is stored in the form of pixels and if the image is like for example 427 by 640 and it has three colors so that's the overall dimension of the of the initial image we just reshape it and then feed this to our algorithm and this will then create clusters of only 16 clusters so this this colors there are millions of colors and now we need to bring it down to 16 colors so we use k is equal to 16 and this is how when we visualize this is how it looks there are these are all about 16 million possible colors the input color space has 16 million possible colors and we just some compress it to a 16 color so this is how it would look when we compress it to 16 colors and this is how the original image looks and after compression to 16 colors this is how the new image looks as you can see there is not a lot of information that has been lost though the image quality is definitely reduced a little bit so this is an example which we are going to now see in python notebook let's go into the python node and once again as always we will import some libraries and load this image called flower dot jpg okay so let me load that and this is how it looks this is the original image which has i think 16 million colors and this is the shape of this image which is basically what is the shape is nothing but the overall size right so this is 427 pixel by 640 pixel and then there are three layers which is this three basically is for rgb which is red green blue so color image will have that right so that is the shape of this now what we need to do is data let's take a look at how data is looking so let me just create a new cell and show you what is in data basically we have captured this information so data is what let me just show you here all right so let's take a look at china what are the values in china and if we see here this is how the data is stored this is nothing but the pixel values okay so this is like a matrix and each one has about for this 427 by 640 pixels all right so this is how it looks now the issue here is these values are large the numbers are a large so we need to normalize them to between zero and one right so that's why we will basically create one more variable which is data which will contain the values between 0 and 1 and the way to do that is divide by 255 so we divide china by 255 and we get the new values in data so let's just run this piece of code and this is the shape so we now have also yeah what we have done is we changed using reshape we converted into the three-dimensional into a two-dimensional data set and let us also take a look at how let me just insert uh probably a cell here and take a look at how data is looking all right so this is how data is looking and now you see this is the values are between zero and one right so if you earlier noticed in case of china the values were large numbers now everything is between zero and one this is one of the things we need to do all right so after that the next thing that we need to do is to visualize this and we can take random set of maybe 10 000 points and plot it and check and see how this looks let us just plot this and so this is how the original the color the pixel distribution is these are two plots one is red against green and another is red against blue and this is the original distribution of the color so then what we will do is we will use k-means clustering to create just 16 clusters for the various colors and then apply that to the image now what will happen is since the data is large because there are millions of colors using regular k means maybe a little time consuming so there is another version of k-means which is called mini batch gaming so we will use that which is which processes in the overall concept remains the same but this basically processes it in smaller batches that's the only thing okay so the results will pretty much be the same so let's go ahead and execute this piece of code and also visualize this so that we can see that there are this is how the 16 colors would look so this is red against green and this is red against blue there is uh quite a bit of similarity between this original color schema and the new one right so it doesn't look very very completely different or anything like that now we apply this the newly created colors to the image and we can take a look uh how this is looking now we can compare both the images so this is our original image and this is our new image so as you can see there is not a lot of information that has been lost it pretty much looks like the original image yes we can see that for example here there is a little bit it appears a little dullish compared to this one right because we kind of took off some of the finer details of the color but overall the high level information has been maintained at the same time the main advantage is that now this can be this is an image which can be rendered on a device which may not be that very sophisticated now let's take one more example with a different image in the second example we will take an image of the summer palace in china and we repeat the same process this is a high definition color image with millions of colors and also three dimensional now we will reduce that to 16 colors using k means clustering and we do the same process like before we reshape it and then we cluster the colors to 16 and then we render the image once again and we will see that the color the quality of the image slightly deteriorates as you can see here this has much finer details in this which are probably missing here but then that's the compromise because there are some devices which may not be able to handle this kind of high density images so let's run this chord in python notebook all right so let's apply the same technique for another picture which is even more intricate and has probably much complicated color schema so this is the image now once again we can take a look at the shape which is 427 by 640 by 3 and this is the new data would look somewhat like this compared to the flower image so we have some new values here and we will also bring this as you can see the numbers are much big so we will much bigger so we will now have to uh scale them down to values between 0 and 1 and that is done by dividing by 255 so let's go ahead and do that and reshape it okay so we get a two-dimensional matrix and we will then as a next step we will go ahead and visualize this how it looks does the 16 colors and this is basically how it would look 16 million colors and now we can create the clusters out of this the 16 k means clusters we will create so this is how the distribution of the pixels would look with 16 colors and then we go ahead and apply this and visualize how it is looking for with the new just the 16 color so once again as you can see this looks much richer in color but at the same time and this probably doesn't have as we can see it doesn't look as rich as this one but nevertheless the information is not lost the shape and all that stuff and this can be also rendered on a slightly a device which is probably not that sophisticated okay so that's pretty much it so we have seen two examples of how color compression can be done uh using k-means clustering and we have also seen in the previous examples of how to implement k-means the code to roughly how to implement k-means clustering and we use some sample data using blob to just execute the k-means clustering ever wondered how google translates an entire web page to a different language in a matter of seconds or your phone gallery group's images based on their location all of this is a product of deep learning but what exactly is deep learning deep learning is a subset of machine learning which in turn is a subset of artificial intelligence artificial intelligence is a technique that enables a machine to mimic human behavior machine learning is a technique to achieve ai through algorithms trained with data and finally deep learning is a type of machine learning inspired by the structure of the human brain in terms of deep learning this structure is called an artificial neural network let's understand deep learning better and how it's different from machine learning say we create a machine that could differentiate between tomatoes and cherries if done using machine learning we'd have to tell the machine the features based on which the two can be differentiated these features could be the size and the type of stem on them with deep learning on the other hand the features are picked out by the neural network without human intervention of course that kind of independence comes at the cost of having a much higher volume of data to train our machine now let's dive into the working of neural networks here we have three students each of them write down the digit nine on a piece of paper notably they don't all write it identically the human brain can easily recognize the digits but what if a computer had to recognize them that's where deep learning comes in here's a neural network trained to identify handwritten digits each number is present as an image of 28 times 28 pixels now that amounts to a total of 784 pixels neurons the core entity of a neural network is where the information processing takes place each of the 784 pixels is fed to a neuron in the first layer of our neural network this forms the input layer on the other end we have the output layer with each neuron representing a digit with the hidden layers existing between them the information is transferred from one layer to another over connecting channels each of these has a value attached to it and hence is called a weighted channel all neurons have a unique number associated with it called bias this bias is added to the weighted sum of inputs reaching the neuron which is then applied to a function known as the activation function the result of the activation function determines if the neuron gets activated every activated neuron passes on information to the following layers this continues up till the second last layer the one neuron activated in the output layer corresponds to the input digit the weights and bias are continuously adjusted to produce a well-trained network so where is deep learning applied in customer support when most people converse with customer support agents the conversation seems so real they don't even realize that it's actually a bot on the other side in medical care neural networks detect cancer cells and analyze mri images to give detailed results self-driving cars what seem like science fiction is now a reality apple tesla and nissan are only a few of the companies working on self-driving cars so deep learning has a vast scope but it too faces some limitations the first as we discussed earlier is data while deep learning is the most efficient way to deal with unstructured data a neural network requires a massive volume of data to train let's assume we always have access to the necessary amount of data processing this is not within the capability of every machine and that brings us to our second limitation computational power training a neural network requires graphical processing units which have thousands of cores as compared to cpus and gpus are of course more expensive and finally we come down to training time deep neural networks take hours or even months to train the time increases with the amount of data and number of layers in the network so here's a short quiz for you arrange the following statements in order to describe the working of a neural network a the bias is added b the weighted sum of the inputs is calculated c specific neuron is activated d the result is fed to an activation function leave your answers in the comments section below some of the popular deep learning frameworks include tensorflow pytorch keras deep learning 4j cafe and microsoft cognitive toolkit considering the future predictions for deep learning and ai we seem to have only scratched the surface in fact horus technology is working on a device for the blind that uses deep learning with computer vision to describe the world to the users replicating the human mind at the entirety may be not just an episode of science fiction for too long the future is indeed full of surprises and that is deep learning for you in short welcome to this tutorial on deep learning now there are several applications of deep learning really very interesting and innovative applications and one of them is identifying the geographic location based on a picture and how does this work the way it works is pretty much we train an artificial neural network with millions of images which are tagged their geolocation is tagged and then when we feed a new picture it will be able to identify the geolocation of this new image for example you have all these images especially with maybe some significant monuments or significant locations and you train with millions of such images and then when you feed another image it need not be exactly one of those that you have claimed it can be completely different that is the whole idea of training it will be able to recognize for example that this is a picture from paris because it is able to recognize the eiffel tower so the way it works internally if we have to look a little bit under the heart as these images are nothing but this is digital information in the form of pixels so each image could be a certain size it can be 256 by 256 pixel kind of a resolution and then each pixel is either having a certain grade of color and all that is fed into the neural network and it then gets trained in and it's able to based on these pixels pixel information it is able to get trained and able to recognize the features and extract the features and thereby it is able to identify these images and the location of these images and then when you feed a new image it kind of based on the training it will be able to figure out where this image is from so that's the way a little bit under the hood how it works so what are we going to do in this tutorial we will see what is deep learning and what do we need for deep learning and one of the main components of deep learning is neural network so we will see what is neural network what is a perceptron and how to implement logic gates like and or nor and so on using perceptrons the different types of neural networks and then applications of deep learning and we will also see how neural networks works so how do we do the training of neural networks and at the end we will end up with a small demo code which will take you through in tensorflow now in order to implement deep learning code there are multiple libraries or development environments that are available and tensorflow is one of them so the focus at the end of this would be on how to use tensorflow to write a piece of code using python as a programming language and we will take up an example which is a very common one which is like the hello world of deep learning the handwriting number recognition which is a mnist commonly known as mnist database so we will take a look at mnist database and how we can train a neural network to recognize handwritten numbers so that's what you will see in this particular video so let's get started what is deep learning deep learning is like a subset of what is known as a high level concept called artificial intelligence you must be already familiar must have heard about this term artificial intelligence so artificial intelligence is like the high level concept if you will and in order to implement artificial intelligence applications we use what is known as machine learning and within machine learning a subset of machine learning is deep learning machine learning is a little bit more generic concept and deep learning is one type of machine learning if you will and we will see a little later in maybe the following slides a little bit more in detail how deep learning is different from traditional machine learning but to start with we can mention here that deep learning uses one of the differentiators between deep learning and traditional machine learning is that deep learning uses neural networks and we will talk about what are neural networks and how we can implement neural networks and so on and so forth as a part of this tutorial so a little deeper into deep learning deep learning primarily involves working with complicated unstructured data compared to traditional machine learning with where we normally use structured data in deep learning the data would be primarily images or voice or maybe text files so and it is large amount of data as well and deep learning can handle complex operations it involves complex operations and the other difference between traditional machine learning and deep learning is that the feature extraction happens pretty much automatically in traditional machine learning feature engineering is done manually the data scientists we data scientists have to do feature engineering feature extraction but in deep learning that happens automatically and of course deep learning for large amounts of data complicated unstructured data deep learning gives very good performance now as i mentioned one of the secret sources of deep learning is neural networks let's see what neural networks is neural networks is based on our biological neurons the whole concept of deep learning and artificial intelligence is based on human brain and human brain consists of billions of tiny stuff called neurons and this is how a biological neuron looks and this is how an artificial neuron look so neural networks is like a simulation of our human brain human brain has billions of biological neurons and we are trying to simulate the human brain using artificial neurons this is how a biological neuron looks it has dendrites and the corresponding component with an artificial neural network is or an artificial neuron are the inputs they receive the inputs through dendrites and then there is the cell nucleus which is basically the processing unit in a way so in artificial neuron also there is a piece which is an equivalent of this cell nucleus and based on the weights and biases we will see what exactly weights in biases are as we move the input gets processed and that results in an output in a biological neuron the output is sent through a synapse and in an artificial neuron there is an equivalent of that in the form of an output and biological neurons are also interconnected so there are billions of neurons which are interconnected in the same way artificial neurons are also interconnected so this output of this neuron will be fed as an input to another neuron and so on now in neural network one of the very basic units is a perceptron so what is a perceptron a perceptron can be considered as one of the fundamental units of neural networks it can consist at least one neuron but sometimes it can be more than one neuron but you can create a perceptron with a single neuron and it can be used to perform certain functions it can be used as a basic binary classifier it can be trained to do some basic binary classification and this is how a basic perceptron looks like and this is nothing but a neuron you have inputs x 1 x 2 x 2 x n and there is a summation function and then there is what is known as an activation function and based on this input what is known as the weighted sum the activation function either gets gives an output like a zero or a one so we say the neuron is either activated or not so that's the way it works so you get the inputs these inputs are each of the inputs are multiplied by a weight and there is a bias that gets added and that whole thing is fed to an activation function and then that results in an output and if the output is correct it is accepted if it is wrong if there is an error then that error is fed back and the neuron then adjusts the weights and biases to give a new output and so on and so forth so that's what is known as the training process of a neuron or a neural network there's a concept called perceptron learning so perceptron learning is again one of the very basic learning processes the way it works is somewhat like this so you have all these inputs like x1 to xn and each of these inputs is multiplied by a weight and then that sum this is the formula the equation so that sum w i x i sigma of that which is a sum of all these product of x and w is added up and then a bias is added to that the bias is not dependent on the input but are the input values but the bias is common for one neuron however the bias value keeps changing during the training process once the training is completed the values of these weights w1 w2 and so on and the value of the bias gets fixed so that is basically the whole training process and that is what is known as the perceptron training so the weights and biases keep changing till you get the accurate output and the summation is of course passed through the activation function as you see here this w i x i summation plus b is passed through activation function and then the neuron gets either fired or not and based on that there will be an output that output is compared with the actual or expected value which is also known as labeled information this is the process of supervised learning so the output is already known and that is compared and thereby we know if there is an error or not and if there is an error the error is fed back and the weights and biases are updated accordingly till the error is reduced to the minimum so this iterative process is known as perceptron learning or perceptron learning rule and this error needs to be minimized so till the error is minimized this iteratively the weights and biases keep changing and that is what is the training process so the whole idea is to update the weights and the bias of the perceptron till the error is minimized the error need not be zero the error may not ever reach zero but the idea is to keep changing these weights and bias so that the error is minimum the minimum possible that it can have so this whole process is an iterative process and this is the iteration continues till either the error is zero which is uh unlikely situation or it is the minimum possible within these given conditions now in 1943 two scientists warren mcculloch and walter pitts came up with an experiment where they were able to implement the logical functions like and or and nor using neurons and that was a significant breakthrough in a sense so they were able to come up with the most common logical gates they were able to implement some of the most common logical gates which could take two inputs like a and b and then give a corresponding result so for example in case of an and gate a and b and then the output is a b in case of an or gate it is a plus b and so on and so forth and they were able to do this using a single layer perceptron now most of these gates it was possible to use single layer perceptron except for xor and we will see why that is in a little bit so this is how an and gate works the inputs a and b the output should be fired or the neuron should be fired only when both the inputs are one so if you have zero zero the output should be zero for zero one it is again zero one zero again zero and one one the output should be one so how do we implement this with a neuron so it was found that by changing the values of weights it is possible to achieve this logic so for example if we have equal weights like 0.7 0.7 and then if we take the sum of the weighted products so for example 0.7 into 0 and then 0.7 into 0 will give you 0 and so on and so forth and in the last case when both the inputs are 1 you get a value which is greater than 1 which is the threshold so only in this case the neuron gets activated and the output is there is an output in all the other cases there is no output because the threshold value is 1. so this is implementation of an and gate using a single perceptron or a single neuron similarly an or gate in order to implement an or gate in case of an or gate the output will be one if either of these inputs is one so for example zero one will result in one or other in all the cases it is one except for zero zero so how do we implement this using a perceptron once again if you have a perceptron with weights for example one point two now if you see here if in the first case when both are zero the output is zero in the second case when it is 0 and 1 1.2 into 0 is 0 and then 1.2 into 1 is 1. and in the second case similarly the output is 1.2 in the last case when both the inputs are 1 the output is 2.4 so during the training process these weights will keep changing and then at one point where the weights are equal to w1 is equal to 1.2 and w2 is equal to 1.2 the system learns that it gives the correct output so that is implementation of or gate using a single neuron or a single layer perceptron now xor gate this was one of the challenging ones they tried to implement an xor gate with a single level perceptron but it was not possible and therefore in order to implement an xor so this was like a roadblock in the progress of neural network however subsequently they realized that this can be implemented an xor gate can be implemented using a multi-level perceptron or mlp so in this case there are two layers instead of a single layer and this is how you can implement an xor gate so you will see that x1 and x2 are the inputs and there is a hidden layer and that's why it is denoted as h3 and h4 and then you take the output of that and feed it to the output at o5 and provide a threshold here so we will see here that this is the numerical calculation so the weights are in this case for x1 it is 20 and minus 20 and once again 20 and minus 20. so these inputs are fed into h3 and h4 so you'll see here for h3 the input is 0 1 1 1 and for h4 it is 1 0 1 1 and if you now look at the output final output where the threshold is taken as one if we use a sigmoid with the threshold one you will see that in these two cases it is zero and in the last two cases it is one so this is a implementation of xor in case of x or only when one of the inputs is 1 you will get an output so that is what we are seeing here if we have either both the inputs are 1 or both the inputs are 0 then the output should be zero so that is what is an exclusive or gate so it is exclusive because only one of the inputs should be one and then only you will get an output of one which is satisfied by this condition so this is a special implementation xor gate is a special implementation of perceptron now that we got a good idea about perceptron let's take a look at what is the neural network so we have seen what is a perceptron we have seen what is a neuron so we will see what exactly is a neural network so neural network is nothing but a network of these neurons and they are different types of neural networks there are about five of them these are artificial neural network convolutional neural network then recursive neural network or recurrent neural network deep neural network and deep belief network so and each of these types of neural networks have a special you know they can solve a special kind of problems for example convolutional neural networks are very good at performing image processing and image recognition and so on whereas rnn are very good for speech recognition and also text analysis and so on so each type has some special characteristics and they can they're good at performing certain special kind of tasks what are some of the applications of deep learning deep learning is today used extensively in gaming you must have heard about alphago which is a game created by a startup called deepmind which got acquired by google and alphago is an ai which defeated the human world champion lee sedol in this game of go so gaming is an area where deep learning is being extensively used and a lot of research happens in the area of gaming as well in addition to that nowadays there are neural networks a special type called generative adversarial networks which can be used for synthesizing either images or music or text and so on and they can be used to compose music so the neural network can be trained to compose a certain kind of music and autonomous cars you must be familiar with google google's self-driving car and today a lot of automotive companies are investing in this space and deep learning is a core component of this autonomous cars the cars are trained to recognize for example the road the lane markings on the road signals any objects that are in front any obstruction and so on and so forth so all this involves deep learning so that's another major application and robots we have seen several robots including sofia you may be familiar with sophia who was given a citizenship by saudi arabia and there are several such robots which are very human like and the underlying technology in many of these robots is deep learning medical diagnostics and healthcare is another major area where deep learning is being used and within healthcare diagnostics again there are multiple areas where deep learning and image recognition image processing can be used for example for cancer detection as you may be aware if cancer is detected early on it can be cured and one of the challenges is in the availability of specialists who can diagnose cancer using these diagnostic images and various scans and and so on and so forth so the idea is to train neural network to perform some of these activities so that the load on the cancer specialist doctors or oncologists comes down and there is a lot of research happening here and there are already quite a few applications that are claimed to be performing better than human beings in this space can be lung cancer it could be breast cancer and so on and so forth so health care is a major area where deep learning is being applied let's take a look at the inner working of a neural network so how does an artificial neural network let's say identify can we train a neural network to identify the shapes like squares and circles and triangles when these images are fed so this is how it works any image is nothing but it is a digital information of the pixels so in this particular case let's say this is an image of 28 by 28 pixel and this is an image of a square there's a certain way in which the pixels are lit up and so these pixels have a certain value maybe from 0 to 256 and 0 indicates that it is black or it is dark and 256 indicates it is completely it is white or lit up so that is like an indication or a measure of the how the pixels are lit up and so this is an image is let's say consisting of information of 784 pixels so all the information what is inside this image can be kind of compressed into this 784 pixels the way each of these pixels is lit up provides information about what exactly is the image so we can train neural networks to use that information and identify the images so let's take a look how this works so each neuron the value if it is close to one that means it is white whereas if it is close to zero that means it is black now this is a an animation of how this whole thing works so these pixels one of the ways of doing it is we can flatten this image and take this complete 784 pixels and feed that as input to our neural network the neural network can consist of probably several layers there can be a few hidden layers and then there is an input layer and an output layer now the input layer takes the 784 pixels as input the values of each of these pixels and then you get an output which can be of three types or three classes one can be a square a circle or a triangle now during the training process there will be initially obviously you feed this image and it will probably say it's a circle or it will say it's a triangle so as a part of the training process we then send that error back and the weights and the biases of these neurons are adjusted till it correctly identifies that this is a square that is the whole training mechanism that happens out here now let's take a look at a circle same way so you feed these 784 pixels there is a certain pattern in which the pixels are lit up and the neural network is trained to identify that pattern and during the training process once again it would probably initially identify it incorrectly saying this is a square or a triangle and then that error is fed back and the weights and biases are adjusted finally till it finally gets the image correct so that is the training process so now we will take a look at same way a triangle so now if you feed another image which is consisting of triangles so this is the training process now we have trained our neural network to classify these images into a triangle or a circle and a square so now this neural network can identify these three types of objects now if you feed another image and it will be able to identify whether it's a square or a triangle or a circle now what is important to be observed is that when you feed a new image it is not necessary that the image or the the triangle is exactly in this position now the neural network actually identifies the patterns so even if the triangle is let's say positioned here not exactly in the middle but maybe at the corner or in the side it would still identify that it is a triangle and that is the whole idea behind pattern recognition so how does this training process work this is a quick view of how the training process works so we have seen that a neuron consists of inputs it receives inputs and then there is a weighted sum which is nothing but this x i wi summation of that plus the bias and this is then fed to the activation function and that in turn gives us a output now during the training process initially obviously when you feed these images when you send maybe a square it will identify it as a triangle and when you maybe feed a triangle it will identify as a square and so on so that error information is fed back and initially these weights can be random maybe all of them have zero values and then it will slowly keep changing so the as a part of the training process the values of these weights w1 w2 up to wn keep changing in such a way that towards the end of the training process it should be able to identify these images correctly so till then the weights are adjusted and that is known as the training process so and these weights are numeric values it could be 0.5 0.25 0.35 and so on it could be positive or it could be negative and the value that is coming here is the pixel value as we have seen it can be anything between 0 to 1 you can scale it between 0 to 1 or 0 to 256 whichever way 0 being black and 256 being white and then all the other colors in between so that is the input so these are numerical values this multiplication or the product w i x i is a numerical value and the bias is also a numerical value we need to keep in mind that the bias is fixed for a neuron it doesn't change with the inputs whereas the weights are one per input so that is one important point to be noted so but the bias also keeps changing initially it will again have a random value but as a part of the training process the weights the values of the weights w1 w2 wn and the value of b will change and ultimately once the training process is complete these values are fixed for this particular neuron w1 w2 up to wn and plus the value of the b is also fixed for this particular neuron and in this way there will be multiple neurons and each there may be multiple levels of neurons here and that's the way the training process works so this is another example of multi-layer so there are two hidden layers in between and then you have the input layer values coming from the input layer then it goes through multiple layers hidden layers and then there is an output layer and as you can see there are weights and biases for each of these neurons in each layer and all of them gets keeps changing during the training process and at the end of the training process all these weights have a certain value and that is a trained model and those values will be fixed once the training is completed all right then there is something known as activation function neural networks consists of one of the components in neural networks is activation function and every neuron has an activation function and there are different types of activation functions that are used it could be a relu it could be sigmoid and so on and so forth and the activation function is what decides whether a neuron should be fired or not so whether the output should be 0 or 1 is decided by the activation function and the activation function in turn takes the input which is the weighted sum you remember we talked about w i x i plus b that weighted sum is fed as an input to the activation function and then the output can be either a zero or a one and there are different types of activation functions which are covered in an earlier video you might want to watch all right so as a part of the training process we feed the inputs the label data or the training data and then it gives an output which is the predicted output by the network which we indicate as y hat and then there is a labeled data because we for supervised learning we already know what should be the output so that is the actual output and in the initial process before the training is complete obviously there will be error so that is measured by what is known as a cost function so the difference between the predicted output and the actual output is the error and the cost function can be defined in different ways there are different types of cost functions so in this case it is like the average of the squares of the error so and then all the errors are added which can sometimes be called as sum of squares sum of square errors or sse and that is then fed as a feedback in what is known as backward propagation or back propagation and that helps in the network adjusting the weights and biases and so the weights and biases get updated till this value the error value or the cost function is minimum now there is a optimization technique which is used here called gradient descent optimization and this algorithm works in a way that the error which is the cost function needs to be minimized so there's a lot of mathematics that goes behind this for example they find the local minima the global minima using the differentiation and so on and so forth but the idea is this so as a training process as the as a part of training the whole idea is to bring down the error which is like let's say this is the function the cost function at certain levels it is very high the cost value of the cost function the output of the cost function is very high so the weights have to be adjusted in such a way and also the bias of course that the cost function is minimized so there is this optimization technique called gradient descent that is used and this is known as the learning rate now gradient descent you need to specify what should be the learning rate and the learning rate should be optimal because if you have a very high learning rate then the optimization will not converge because at some point it will cross over to the side on the other hand if you have very low learning rate then it might take forever to convert so you need to come up with the optimum value of the learning rate and once that is done using the gradient descent optimization the error function is reduced and that's like the end of the training process all right so this is another view of gradient descent so this is how it looks this is your cost function the output of the cost function and that has to be minimized using gradient descent algorithm and these are like the parameters and weight could be one of them so initially we start with certain random values so cost will be high and then the weights keep changing and in such a way that the cost function needs to come down and at some point it may reach the minimum value and then it may increase so that is where the gradient descent algorithm decides that okay it has reached a minimum value and it will kind of try to stay here this is known as the global minima now sometimes these curves may not be just for explanation purpose this has been drawn in a nice way but sometimes these curves can be pretty erratic there can be some local minima here and then there is a peak and then and so on so the whole idea of gradient descent optimization is to identify the global minima and to find the weights and the bias at that particular point so that's what is gradient descent and then this is another example so you can have these multiple local minima so as you can see at this point when it is coming down it may appear like this is a minimum value but then it is not this is actually the global minimum value and the gradient is an algorithm will make an effort to reach this level and not get stuck at this point so the algorithm is already there and it knows how to identify this global minimum and that's what it does during the training process now in order to implement deep learning there are multiple platforms and languages that are available but the most common platform nowadays is tensorflow and so that's the reason we have uh this tutorial we've created this tutorial for tensorflow so we will take you through a quick demo of how to write a tensorflow code using python and tensorflow is an open source platform created by google so let's just take a look at the details of tensorflow and so this is a a library a python library so you can use python or any other languages it's also supported in other languages like java and r and so on but python is the most common language that is used so it is a library for developing deep learning applications especially using neural networks and it consists of primarily two parts if you will so one is the tensors and then the other is the graphs or the flow that's the way the name that's the reason for this kind of a name called tensorflow so what are tensor sensors are like multi-dimensional arrays if you will that's one way of looking at it so usually you have a one-dimensional array so first of all you can have what is known as a scalar which means a number and then you have a one-dimensional array something like this which means this like a set of numbers so that is a one dimension array then you can have a two dimensional array which is like a matrix and beyond that sometimes it gets difficult so this is a three dimensional array but tensorflow can handle many more dimensions so it can have multi-dimensional arrays that is the strength of tensorflow and which makes computation deep learning computation much faster and that's the reason why tensorflow is used for developing deep learning applications so tensorflow is a deep learning tool and this is the way it works so the data basically flows in the form of tensors and the way the programming works as well is that you first create a graph of how to execute it and then you actually execute that particular graph in the form of what is known as a session we will see this in the tensorflow code as we move forward so all the data is managed or manipulated in tensors and then the processing happens using these graphs there are certain terms called like for example ranks of a tensor the rank of a tensor is like a dimensional dimensionality in a way so for example if it is scalar so there is just a number just one number the rank is supposed to be zero and then it can be a one dimensional vector in which case the rank is supposed to be one and then you can have a two dimensional vector typically like a matrix then in that case we say the rank is two and then if it is a three-dimensional array then rank is three and so on so it can have more than three as well so it is possible that you can store multi-dimensional arrays in the form of tensors so what are some of the properties of tensorflow i think today it is one of the most popular platform tensorflow is the most popular deep learning platform or library it is open source it's developed by google developed and maintained by google but it is open source one of the most important things about tensorflow is that it can run on cpus as well as gpus gpu is a graphical processing unit just like cpus central processing unit now in earlier days gpu was used for primarily for graphics and that's how the name has come and one of the reasons is that it cannot perform generic activities very efficiently like cpu but it can perform iterative actions or computations extremely fast and much faster than a cpu so they are really good for computational activities and in deep learning there is a lot of iterative computation that happens so in the form of matrix multiplication and so on so gpus are very well suited for this kind of computation and tensorflow supports both gpu as well as cpu and there's a certain way of writing code in tensorflow we will see as we go into the code and of course tensorflow can be used for traditional machine learning as well but then that would be an overkill but just for understanding it may be a good idea to start writing code for a normal machine learning use case so that you get a hang of how tensorflow code works and then you can move into neural networks so that is uh just a suggestion but if you're already familiar with how tensorflow works then probably yeah you can go straight into the neural networks part so in this tutorial we will take the use case of recognizing handwritten digits this is like a hello world of deep learning and this is a nice little mnist database is a nice little database that has images of handwritten digits nicely formatted because very often in deep learning and neural networks we end up spending a lot of time in preparing the data for training and with mnis database we can avoid that you already have the data in the right format which can be directly used for training and mnist also offers a bunch of built-in utility functions that we can straight away use and call those functions without worrying about writing our own functions and that's one of the reasons why mnist database is very popular for training purposes initially when people want to learn about deep learning and tensorflow this is the database that is used and it has a collection of 70 000 handwritten digits and a large part of them are for training then you have test just like in any machine learning process and then you have validation and all of them are labeled so you have the images and their label and these images they look somewhat like this so they are handwritten images collected from a lot of individuals people have these are samples written by human beings they have written these numbers these numbers going from zero to nine so people have written these numbers and then the images of those have been taken and formatted in such a way that it is very easy to handle so that is mnis database and the way we are going to implement this in our tensorflow is we will feed this data especially the training data along with the label information and the data is basically these images are stored in the form of the pixel information as we have seen in one of the previous slides all the images are nothing but these are pixels so an image is nothing but an arrangement of pixels and the value of the pixel either it is lit up or it is not or in somewhere in between that's how the images are stored and that is how they are fed into the neural network and for training once the network is trained when you provide a new image it will be able to identify within a certain error of course and for this we will use one of the simpler neural network configurations called max and for simplicity what we will do is we will flatten these pixels so instead of taking them in a two-dimensional arrangement we just flatten them out so for example it starts from here it is a 28 by 28 so there are 784 pixels so pixel number one starts here it goes all the way up to 28 then 29 starts here and goes up to 56 and so on and the pixel number 784 is here so we take all these pixels flatten them out and feed them like one single line into our neural network and this is a what is known as a softmax layer what it does is once it is trained it will be able to identify what digit this is so there are in this output layer there are 10 neurons each signifying a digit and at any given point of time when you feed an image only one of these 10 neurons gets activated so for example if this is trained properly and if you feed a number nine like this then this particular neuron gets activated so you get an output from this neuron let me just use a pen or a laser to show you here okay so you're feeding the number nine let's say this has been trained and now if you're feeding a number nine this will get activated now let's say you fade one to the trained network then this neuron will get activated if you feed two this neuron will get activated and so on i hope you get the idea so this is one type of a neural network or an activation function known as soft max layer so that's what we will be using here this is one of the simpler ones for quick and easy understanding so this is how the code would look we will go into our lab environment in the cloud and we will show you there directly but very quickly this is how the code looks and let me run you through briefly here and then we will go into the jupyter notebook where the actual code is and we will run that as well so as a first step first of all we are using python here and that's why the syntax of the language is python and the first step is to import the tensorflow library so and we do this by using this line of code saying import tensorflow as tf tf is just for convenience so you can name give any name and once you do this tf is tensorflow is available as an object in the name of tf and then you can run its methods and accesses its attributes and so on and so forth and mnist database is actually an integral part of tensorflow and that's again another reason why we as a first step we always use this example mnist database example so you just simply import mnist database as well using this line of code and you slightly modify this so that the labels are in this format what is known as one hot true which means that the label information is stored like an array and let me just use pen to show what exactly it is so when you do this one hot true what happens is each label is stored in the form of an array of 10 digits and let's say the number is 8 okay so in this case all the remaining values there will be a bunch of zeros so this is like array at position 0 this is at position 1 position 2 and so on and so forth let's say this is position 7 then this is position 8 that will be 1 because our input is 8 and again position 9 will be 0 okay so one hot encoding this one hot encoding true will kind of load the data in such a way that the labels are in such a way that only one of the digits has a value of one and that indicates so based on which digit is 1 we know what is the label so in this case the 8th position is 1 therefore we know this sample data the value is 8. similarly if you have a 2 here let's say then the labeled information will be somewhat like this so you have your labels so you have this as 0 the 0th position the first position is also 0 the second position is 1 because this indicates number two and then you have third as zero and so on okay so that is the significance of this one hot true all right and then we can check how the data is uh looking by displaying the data and as i mentioned earlier this is pretty much in the form of digital form like numbers so all these are like pixel values so you will not really see an image in this format but there is a way to visualize that image i will show you in a bit and uh this tells you how many images are there in each set so the training there are 55 000 images in training and in the test set there are 10 000 and then validation there are five thousand so all together there are seventy thousand images all right so let's uh move on and we can view the actual image by uh using the matplotlib library and this is how you can view this is the code for viewing the images and you can view them in color or you can view them in grayscale so the c map is what tells in what way we want to view it and what are the maximum values and the minimum values of the pixel values so these are the max and minimum values so of the pixel values so maximum is 1 because this is a scaled value so 1 means it is white and 0 means it is black and in between is it can be anywhere in between black and white and the way to train the model there is a certain way in which you write your tensorflow code and the first step is to create some placeholders and then you create a model in this case we will use the softmax model one of the simplest ones and placeholders are primarily to get the data from outside into the neural network so this is a very common mechanism that is used and then of course you will have variables which are your you remember these are your weights and biases so for in our case there are 10 neurons and each neuron actually has 784 because each neuron takes all the inputs if we go back to our slide here actually every neuron takes all the 784 inputs right this is the first neuron it has it receives all the 784 this is the second neuron this also receives all the 700 so each of these inputs needs to be multiplied with the weight and that's what we are talking about here so these are this is a matrix of 784 values for each of the neurons and so it is like a 10 by 784 matrix because there are 10 neurons and similarly there are biases now remember i mentioned bias is only one per neuron so it is not one per input unlike the weights so therefore there are only 10 biases because there are only 10 neurons in this case so that is what we are creating a variable for biases so this is something little new in tensorflow you will see unlike our regular programming languages where everything is a variable here the variables can be of three different types you have placeholders which are primarily used for feeding data you have variables which can change during the course of computation and then a third type which is not shown here are constants so these are like fixed numbers all right so in a regular programming language you may have everything as variables or at the most variables and constants but in tensorflow you have three different types placeholders variables and constants and then you create what is known as a graph so tensorflow programming consists of graphs and tensors as i mentioned earlier so this can be considered ultimately as a tensor and then the graph tells how to execute the whole implementation so that the execution is stored in the form of a graph and in this case what we are doing is we are doing a multiplication tf you remember this tf was created as a tensorflow object here one more level one more so tf is available here now tensorflow has what is known as a matrix multiplication or matmal function so that is what is being used here in this case so we are using the matrix multiplication of tensorflow so that you multiply your input values x with w right this is what we were doing x w plus b you're just adding b and this is in very similar to one of the earlier slides where we saw sigma x i wi so that's what we are doing here matrix multiplication is multiplying all the input values with the corresponding weights and then adding the bias so that is the graph we created and then we need to define what is our loss function and what is our optimizer so in this case we again use the tensorflow's apis so tf.nn softmax cross entropy with logits is the api that we will use and reduce mean is what is like the mechanism whereby which says that you reduce the error and optimizer for doing deduction of the error what optimizer are we using so we are using gradient descent optimizer we discussed about this in a couple of slides uh earlier and for that you need to specify the learning rate you remember we saw that there was a slide somewhat like this and then you define what should be the learning rate how fast you need to come down that is the learning rate and this again needs to be tested and tried and to find out the optimum level of this learning rate it shouldn't be very high in which case it will not converge or shouldn't be very low because it will in that case it will take very long so you define the optimizer and then you call the method minimize for that optimizer and that will kickstart the training process and so far we've been creating the graph and in order to actually execute that graph we create what is known as a session and then we run that session and once the training is completed we specify how many times how many iterations we want it to run so for example in this case we are saying thousand steps so that is a exit strategy in a way so you specify the exit condition so training will run for thousand iterations and once that is done we can then evaluate the model using some of the techniques shown here so let us get into the code quickly and see how it works so this is our cloud environment now you can install tensorflow on your local machine as well i'm showing this demo on our existing cloud but you can also install tensorflow on your local machine and there is a separate video on how to set up your tensorflow environment you can watch that if you want to install your local environment or you can go for other any cloud service like for example google cloud amazon or cloud labs any of these you can use and run and try the code okay so it has got started we will login all right so this is our deep learning tutorial uh code and this is our tensorflow environment and uh so let's get started the first we have seen a little bit of a code walkthrough in the slides as well now you will see the actual code in action so the first thing we need to do is import tensorflow and then we will import the data and we need to adjust the data in such a way that the one hot is encoding is set to true one hot encoding right as i explained earlier so in this case the label values will be shown appropriately and if we just check what is the type of the data so you can see that this is a data sets python data sets and if we check the number of images the way it looks so this is how it looks it is an array of type float32 similarly the number if you want to see what is the number of training images there are 55 000 then there are test images 10 000 and then validation images 5000 now let's take a quick look at the data itself visualization so we will use matplotlib for this and if we take a look at the shape now shape gives us like the dimension of the tensors or or the arrays if you will so in this case the training data set if we see the size of the training data set using the method shape it says there are 55 000 and 55 000 by 784 so remember the 784 is nothing but the 28 by 28 28 into 28 so that is equal to 784 so that's what it is uh showing now we can take just one image and just see what is the the first image and see what is the shape so again size obviously it is only 784 similarly you can look at the image itself the data of the first image itself so this is how it shows so large part of it will probably be zeros because as you can imagine in the image only certain areas are written rest is blank so that's why you will mostly see zero say that it is black or white but then there are these values are so the values are actually they are scaled so their values are between zero and one okay so this is what you're seeing so certain locations there are some values and then other locations there are zeros so that is how the data is stored and loaded if we want to actually see what is the value of the handwritten image if you want to view it this is how you view it so you create like do this reshape and matplotlib has this feature to show you these images so we will actually use the function called i am show and then if you pass this parameters appropriately you will be able to see the different images now i can change the values in this position so which image we are looking at right so we can say if i want to see what is there in maybe 5000 right so 5000 has 3 similarly you can just say 5 what is in 5 5 as 8 what is in 50 again 8 so basically by the way if you're wondering uh how i'm executing this code shift enter in case you're not familiar with jupiter notebooks shift enter is how you execute each cell individual cell and if you want to execute the entire program you can go here and say run all so that is how this code gets executed and here again we can check what is the maximum value and what is the minimum value of this pixel values as i mentioned this is it is scaled so therefore it is between the values lie between 1 and 0. now this is where we create our model the first thing is to create the required placeholders and variables and that's what we are doing here as we have seen in the slides so we create one placeholder and we create two variables which is for the weights and biases these two variables are actually matrices so each variable has 784 by 10 actual values okay so one for this 10 is for each neuron there are 10 neurons and 784 is for the pixel values inputs that are given which is 28 into 28 and the biases as i mentioned one for each neuron so there will be 10 biases they are stored in a variable by the name b and this is the graph which is basically the multiplication of these matrix multiplication of x into w and then the bias is added for each of the neurons and the whole idea is to minimize the error so let me just execute i think this code is executed then we define what is our the y value is basically the label value so this is another placeholder we had x as one placeholder and white underscore true as a second placeholder and this will have values in the form of uh 10 digit 10 digit arrays and since we said one hot encoded the position which has a one value indicates what is the label for that particular number all right then we have cross entropy which is nothing but the loss loss function and we have the optimizer we have chosen gradient descent as our optimizer then the training process itself so the training process is nothing but to minimize the cross entropy which is again nothing but the loss function so we define all of this in the form of a graph so up to here remember what we have done is we have not exactly executed any tensorflow code till now we are just preparing the graph the execution plan that's how the tensorflow code works so the whole structure and format of this code will be completely different from how we normally do programming so even with people with programming experience may find this a little difficult to understand it and it needs quite a bit of practice so you may want to view this video also maybe a couple of times to understand this flow because the way tensorflow programming is done is slightly different from the normal programming some of you who let's say have done maybe spark programming to some extent will be able to easily understand this but even in spark the the programming the code itself is pretty straightforward behind the scenes the execution happens slightly differently but in tensorflow even the code has to be written in a completely different way so the code doesn't get executed in the same way as you have written so that that's something you need to understand and a little bit of practice is needed for this so so far what we have done up to here is creating the variables and feeding the variables and rather not feeding but setting up the variables and the graph that's all defining maybe the what kind of a network you want to use for example we want to use a softmax and so on so you have created the variables how to load the data loaded the data viewed the data and prepared everything but you have not yet executed anything in tensorflow now the next step is the execution in tensorflow so the first step for doing any execution in tensorflow is to initialize the variables so anytime you have any variables defined in your code you have to run this piece of code always so you need to basically create what is known as a node for initializing so this is a node you still are not yet executing anything here you just created a node for the initialization so let us go ahead and create that and here onwards is where you will actually execute your code in tensorflow and in order to execute the code what you will need is a session tensorflow session so tf.session will give you a session and there are a couple of different ways in which you can do this but one of the most common methods of doing this is with what is known as a with loop so you have a width tf dot session as says and with the colon here and this is like a block starting of the block and these indentations tell how far this block goes and this session is valid till this block gets executed so that is the purpose of creating this width block this is known as a width block so with tf dot session as says you say says dot run init now says dot run will execute a node that is specified here so for example here we are saying says dot run cess is basically an instance of the session right so here we are saying tf.session so an instance of the session gets created and we are calling that says and then we run a node within that one of the nodes in the graph so one of the nodes here is init so we say run that particular node and that is when the initialization of the variables happens now what this does is if you have any variables in your code in our case we have w is a variable and b is a variable so any variables that we created you have to run this code you have to run the initialization of these variables otherwise you will get an error okay so that is the that's what this is doing then we within this with block we specify a for loop and we are saying we want the system to iterate for a thousand steps and perform the training that's what this for loop does run training for thousand iterations and what it is doing basically is it is fetching the data or these images remember there are about 50 000 images but it cannot get all the images in one shot because it will take up a lot of memory and performance issues will be there so this is a very common way of performing deep learning training you always do in batches so we have maybe 50 000 images but you always do it in batches of 100 or maybe 500 depending on the size of your system and so on and so forth so in this case we are saying okay get me 100 images at a time and get me only the training images remember we use only the training data for training purpose and then we use test data for test purpose you must be familiar with machine learning so you must be aware of this but in case you are not in machine learning also not this is not specific to deep learning but in machine learning in general you have what is known as training data set and test data set your available data typically you will be splitting into two parts and using the training data set for training purpose and then to see how well the model has been trained you use the test data set to check or test the validity or the accuracy of the model so that's what we are doing here and you observe here that we are actually calling an mnist function here so we are saying mnist train dot next batch right so this is the advantage of using mnist database because they have provided some very nice helper functions which are readily available otherwise this activity itself we would have had to write a piece of code to fetch this data in batches that itself is a lengthy exercise so we can avoid all that if we are using mnist database and that's why we use this for the initial learning phase okay so when we say fetch what it will do is it will fetch the images into x and the labels into y and then you use this batch of 100 images and you run the cleaning so says dot run basically what we are doing here is we are running the training mechanism which is nothing but it passes this through the neural network passes the images through the neural network finds out what is the output and if the output obviously they initially it will be wrong so all that feedback is given back to the neural network and thereby all the w's and b's get updated till it reaches 1000 iterations in this case the exit criteria is thousand but you can also specify probably accuracy rate or something like that for the as an exit criteria so here it is it just says that okay this particular image was wrongly predicted so you need to update your weights and biases that's the feedback given to each neuron and that is run for 1000 iterations and typically by the end of this thousand iterations the model would have learned to recognize these handwritten images obviously it will not be 100 accurate okay so once that is done after so this happens for 1000 iterations once that is done you then test the accuracy of these models by using the test data set right so this is what we are trying to do here the code may appear a little complicated because if you're seeing this for the first time you need to understand uh the various methods of tensorflow and so on but it is basically comparing the output with what has been what is actually there that's all it is doing so you have your test data and you're trying to find out what is the actual value and what is the predicted value and seeing whether they are equal or not tf dot equal right and how many of them are correct and so on and so forth and based on that the accuracy is calculated as well so this is the accuracy and that is what we are trying to see how accurate the model is in predicting these numbers or these digits okay so let us run this this entire thing is in one cell so we will have to just run it in one shot it may take a little while let us see and not bad so it has finished the thousand iterations and what we see here as an output is the accuracy so we see that the accuracy of this model is around 91 okay now which is pretty good for such a short exercise within such a short time we got 90 percent accuracy however in real life this is probably not sufficient so there are other ways in to increase the accuracy we will see probably in some of the later tutorials how to improve this accuracy how to change maybe the hyper parameters like number of neurons or number of layers and so on and so forth and so that this accuracy can be increased beyond 90 percent let's look at the different types of artificial neural networks and this is like the biggest area growing is how these all come together let's see the different types of neural network and again we're comparing this to human learning so here's a human brain i feel sorry for that poor guy so we have a feed for forward neural network simplest form of a they call it a n a neural network data travels only in one direction input to output this is what we just looked at so as the data comes in all the weights are added it goes to the hidden layer all the weights are added it goes to the next hidden layer all the weights are added and it goes to the output the only time you use the reverse propagation is to train it so when you actually use it it's very fast when you're training it it takes a while because it has to iterate through all your training data and you start getting into big data because you can train these with a huge amount of data the more data you put in the better train they get the applications vision and speech recognition actually they're pretty much everything we talked about a lot of almost all of them used this form of neural network at some level radial basis function neural network this model classifies the data point based on its distance from a center point what that means is that you might not have training data so you want to group things together and you create central points and it looks for all the things you know some of these things are just like the other if you've ever watched the sesame street as a kid that dates me so it brings things together and this is a great way if you don't have the right training model you can start finding things that are connected you might not have noticed before applications power restoration systems they try to figure out what's connected and then based on that they can fix the problem if you have a huge power system cajon and self-organizing neural network vectors of random dimensions are input to discrete map comprised of neurons so they basically find a way to draw they call them they say dimensions or vectors or planes because they actually chop the data in one dimension two dimension three dimension four five six they keep adding dimensions and finding ways to separate the data and connect different data pieces together applications used to recognize patterns and data like in medical analysis the hidden layer saves its output to be used for future prediction recurrent neural networks so the hidden layers remember its output from last time and that becomes part of its new input you might use that especially in robotics or flying a drone you want to know what your last change was and how fast it was going to help predict what your next change you need to make is to get to where the drone wants to go applications text-to-speech conversation model so i talked about drones but you know just identifying on lexis or google assistant or any of these they're starting to add in i'd like to play a song on my pandora and i'd like it to be at volume 90 so you now can add different things in there and it connects them together the input features are taken in batches like a filter this allows a network to remember an image in parts convolution neural network today's world in photo identification and taking apart photos and trying to you know you ever seen that on google where you have five people together this is the kind of thing separates all those people so that it can do a face recognition on each person applications used in signal and image processing in this case i use facial images or google picture images as one of the options modular neural network it has a collection of different neural networks working together to get the output so wow we just went through all these different types of neural networks and the final one is to put multiple neural networks together i mentioned that a little bit when we separated people in a larger photo in individuals in the photo and then do the facial recognition on each person so one network is used to separate them and the next network is then used to figure out who they are and do the facial recognition applications still undergoing research this is a cutting edge you hear the term pipeline and there's actual in python code and in almost all the different neural network setups out there they now have a pipeline feature usually and it just means you take the data from one neural network and maybe another neural network or you put it into the next neural network and then you take three or four other neural networks and feed them into another one so how we connect the neural networks is really just cutting edge and it's so experimental i mean it's almost creative in its nature there's not really a science to it because each specific domain has different things it's looking at so if you're in the banking domain it's going to be different than the medical domain then the automatic car domain and suddenly figuring out how those all fit together is just a lot of fun and really cool so we have our types of artificial neural network we have our feed forward neural network we have a radial basis function neural network we have our cohenon self-organizing neural network recurrent neural network convolution neural network and modular neural network where it brings them all together and no the colors on the brain do not match what your brain actually does but they do bring it out that most of these were developed by understanding how humans learn and as we understand more and more of how humans learn we can build something in the computer industry to mimic that to reflect that and that's how these were developed so exciting part use case problem statement so this is where we jump in this is my favorite part let's use the system to identify between a cat and a dog if you remember correctly i said we're going to do some python code and you can see over here my hair is kind of sticking up over the computer cup of coffee on one side and a little bit of old school a pencil and a pen on the other side yeah most people now take notes i love the stickies on the computer that's great that's that is my computer i have sticky notes on my computer in different colors so not too far from today's programmer so the problem is is we want to classify photos of cats and dogs using a neural network and you can see over here we have quite a variety of dogs in the pictures and cats and you know just sorting out it is a cat it's pretty amazing and why would anybody want to even know the difference between a cat and a dog okay you know why well i have a cat door it'd be kind of fun that instead of it identifying instead of having like a little collar with a magnet on it which is what my cat has the door would be able to see oh that's the cat that's our cat coming in oh that's the dog we have a dog too that's the dog i want to let in maybe i don't want to let this other animal in because it's a raccoon so you can see where you could take this one step further and actually apply this you could actually start a little startup company idea self-identifying door so this use case will be implemented on python i am actually in python 3.6 it's always nice to tell people the version of python because that does affect sometimes which modules you load and everything and we're going to start by importing the required packages i told you we're going to do this in cross so we're going to import from cross models a sequential from the cross layers conversion 2d or conv 2d max pooling 2d flatten and dense and we'll talk about what each one of these do in just a second but before we do that let's talk a little bit about the environment we're going to work in and you know in fact let me go ahead and open a the website cross's website so we can learn a little bit more about karass so here we are on the cross website and it's a k-e-r-a-s dot io that's the official website for cross and the first thing you'll notice is that cross runs on top of either tensorflow cntk and i think it's pronounced thano or thiano what's important on here is that tensorflow and the same is true for all these but tensorflow is probably one of the most widely used currently packages out there with the cross and of course you know tomorrow this is all going to change it's all going to disappear and they'll have something new out there so make sure when you're learning this code that you understand what's going on and also know the code i mean look when you look at the code it's not as complicated once you understand what's going on the code itself is pretty straightforward and the reason we like cross and the reason that people are jumping on it right now is such a big deal is if we come down here let me just scroll down a little bit they talk about user friendliness modularity easy extensibility work with python python is a big one because a lot of people in data science now use python although you can actually access cross other ways if we continue down here is layers and this is where it gets really cool when we're working with cross you just add layers on remember those hidden layers we're talking about and we talked about the relu activation you can see right here let me just up that a little bit in size there we go that's big i can add in an r e l u layer and then i can add in a soft max layer in the next instance we didn't talk about soft max so you could do each layer separate now if i'm working in some of the other kits i use i take that and i have one set up and then i feed the output into the next one this one i can just add hidden layer after hidden layer with the different information in it which makes it very powerful and very fast to spin up and try different setups and see how they work with the data you're working on and we'll dig a little bit deeper in here and a lot of this is very much the same so when we get to that part i'll point that out to you also now just a quick side note i'm using anaconda with python in it and i went ahead and created my own package and i called it the cross python python36 because i'm in python36 anaconda is cool that way you can create different environments real easily if you're doing a lot of different experimenting with these different packages probably want to create your own environment in there and the first thing is you can see right here there's a lot of dependencies a lot of these you should recognize by now if you've done any of these videos if not kudos for you for jumping in today pip install numpy scipy the scikit learn pillow and h5py are both needed for the tensorflow and then putting the cross on there and then you'll see here and pip is just a standard installer that you use with python you'll see here that we did pip install tensorflow since we're going to do cross on top of tensorflow and then pip install and i went ahead and used the github so git plus get and you'll see here github.com this is one of their releases one of the most current release on there that goes on top of tensorflow and you can look up these instructions pretty much anywhere this is for doing it on anaconda certainly you'd want to install these if you're doing it in ubuntu server setup you'd want to get i don't think you need the h5py and ubuntu but you do need the rest in there because they are dependencies in there and it's pretty straightforward and that's actually in some of the instructions they have on their website so you don't have to initially go through this just remember their website on there and then when i'm under my anaconda navigator which i like you'll see where i have environments and on the bottom i created a new environment and i called it cross python 36 just to separate everything you can say i have python 30.5 and python 36. i used to have a bunch of other ones but it kind of cleaned house recently and of course once i go in here i can launch my jupyter notebook making sure i'm using the right environment that i just set up this of course opens up my in this case i'm using google chrome and in here i can go and just create a new document in here and this is all in your browser window when you use the anaconda do you have to use anaconda in jupyter notebook no you can use any kind of python editor whatever setup you're comfortable with and whatever you're doing in there so let's go ahead and go in here and paste the code in and we're importing a number of different settings in here we have import sequential that's under the models because that's the model we're going to use as far as our neural network and then we have layers and we have conversion 2d max pooling 2d flatten dense and you can actually just kind of guess at what these do we're talking we're working in a 2d photograph and if you remember correctly i talked about how the actual input layer is a single array it's not in two dimensions it's one dimension all these do is these are tools to help flatten the image so it takes a two-dimensional image and then it creates its own proper setup you don't have to worry about any of that you don't have to do anything special with the photograph you let the cross do it we're going to run this and you'll see right here they have some stuff that is going to be depreciated and changed because that's what it does everything's being changed as we go you don't have to worry about that too much if you have warnings if you run it a second time the warning will disappear and this has just imported these packages for us to use jupiter is nice about this that you can do each thing step by step and i'll go ahead and also zoom in there a little control plus that's one of the nice things about being in a browser environment so here we are back another sip of coffee if you're familiar with my other videos you'll notice i'm always sipping coffee i always have a in my case latte next to me and espresso so the next step is to go ahead and initialize we're going to call it the cnn or classifier neural network and the reason we call it a classifier is because it's going to classify it between two things it's going to be cat or dog so when you're doing classification you're picking specific objects you're specific it's a true or false yes no it is something or it's not so first thing we're going to create our classifier and it's going to equal sequential so there's sequential setup is the classifier that's the actual model we're using that's the neural network so we call it a classifier and the next step is to add in our convolution and let me just do a let me shrink that down in size you can see the whole line and let's talk a little bit about what's going on here i have my classifier and i add something what am i adding well i'm adding my first layer this first layer we're adding in is probably the one that takes the most work to make sure you have it set correct and the reason i say that this is your actual input and we're going to jump here to the part that says input shape equals 64 by 64 by 3. what does that mean well that means that our picture is coming in and there's these pictures remember we had like the picture of the car was 128 by 128 pixels well this one is 64 by 64 pixels and each pixel has three values that's where these numbers come from and it is so important that this matches i mentioned a little bit that if you have like a larger picture you have to reformat it to fit this shape if it comes in as something larger there's no input notes there's no input neural network there that will handle that extra space so you have to reshape your data to fit in here now the first layer is the most important because after that cross knows what your shape is coming in here and it knows what's coming out and so that really sets the stage most important thing is that input shape matches your data coming in and you'll get a lot of errors if it doesn't you'll go through there and picture number 55 doesn't match it correctly and guess what it does it usually gives you an error and then the activation if you remember we talked about the different activations on here we're using the relu model like i said that is the most commonly used now because one it's fast doesn't have the added calculations in it it just says here's the value coming out based on the weights and the value going in and from there you know it's uh if it's over one then it's good or over zero it's good if it's under zero then it's considered not active and then we have this conversion 2d what the heck is conversion 2d i'm not going to go into too much detail in this because this has a couple of things it's doing in here a little bit more in depth than we're ready to cover in this tutorial but this is used to convert from the photo because we have 64 by 64 by three and we're just converting it to two dimensional kind of setup so it's very aware that this is a photograph and that different pieces are next to each other and then we're going to add in a second convolutional layer that's what the conv stands for 2d so these are hidden layers so we have our input layer and our two hidden layers and they are two-dimensional because we're doing with a two-dimensional photograph and you'll see down here that on the last one we add a max pooling 2d and we put a pool size equals 2 2. and so what this is is that as you get to the end of these layers one of the things you always want to think of is what they call mapping and then reducing wonderful terminology from the big data we're mapping this data through all these layers and now we want to reduce it to only two sets in this case it's already in two sets because it's a 2d photograph but we had you know two dimensions by we actually have 64 by 64 by three so now we're just getting it down to a two by two just the two dimension two dimensional instead of having the third dimension of colors and we'll go ahead and run these we're not really seeing anything in our run script because we're just setting up this is all set up and this is where you start playing because maybe you'll add a different layer in here to do something else to see how it works and see what your output is that's what makes cross so nice is i can with just a couple flips of code put in a whole new layer that does a whole new processing and see whether that improves my run or makes it worse and finally we're going to do the final setup which is to flatten classifier add a flattened setup and then we're going to also add a layer a dense layer and then we're going to add in another dense layer and then we're going to build it we're going to compile this whole thing together so let's flip over and see what that looks like and we've even numbered them for you so we're going to do the flattening and flatten is exactly what it sounds like we've been working in a two-dimensional array of picture which actually is in three dimensions because of the pixels the pixels have a whole nother dimension to it of three different values and we've kind of resized those down to two by two but now we're just going to flatten it i don't want to have multiple dimensions being worked on by tensor and by keras i want just a single array so it's flattened out and then step four full connection so we add in our final two layers and you could actually do all kinds of things with this you could actually leave out this some of these layers and play with them you do need to flatten it that's very important then we want to use the dents again we're taking this and we're taking whatever came into it so once we take all those different the two dimensions or three dimensions as they are and we flatten it to one dimension we want to take that and we're going to pull it into units of 128. they got that you say where did they get 2128 from you could actually play with that number and get all kinds of weird results but in this case we took the 64 plus 64 is 128. you could probably even do this with 64 or 32. usually you want to keep it in the same multiple of whatever the data shape you're already using is in and we're using the activation the relu just like we did before and then we finally filter all that into a single output and it has how many units one why because we want to know whether true or false it's either a dog or a cat you could say one is dog zero is cat or maybe you're a cat lover and it's one is cat and zero as dogs if you love both dogs and cats you're going to have to choose and then we use the sigmoid activation if you remember from before we had the relu and there's also the sigmoid the sigmoid just makes it clear it's yes or no we don't want any kind of in-between number coming out and we'll go ahead and run this and you'll see it's still all in setup and then finally we want to go ahead and compile and let's put the compiling our classifier neural network and we're going to use the optimizer atom and i hinted at this just a little bit before where does adam come in where does an optimizer come in well the optimizer is the reverse propagation when we're training it it goes all the way through and says error and then how does it readjust those weights there are a number of them atom is the most commonly used and it works best on large data most people stick with the atom because when they're testing on smaller data see if their model is going to go through and get all their errors out before they run it on larger data sets they're going to run it on atom anyway so they just leave it on atom most commonly used but there are some other ones out there you should be aware of that that you might try them if you're stuck in a bind or you might blur that in the future but usually atom is just fine on there and then you have two more settings you have loss and metrics we're not going to dig too much into loss or metrics these are things you really have to explore keras because there are so many choices this is how it computes the error there's so many different ways to on your back propagation and your training so we're using the atom model but you can compute the error by standard deviation standard deviation squared they use binary cross entropy i'd have to look that up to even know what that is there's so many of these a lot of times you just start with the ones that look correct that are most commonly used and then you have to go read the cross site and actually see with these different losses and metrics and what different options they have so we're not going to get too much into them other than to reference you over to the karass website to explore them deeper but we are going to go ahead and run them and now we've set up our classifier so we have an object classifier and if you go back up here you'll see that we've added in step one we added in our layer for the input we added a layer that comes in there and uses the relu for activation and then it pulls the data so this is even though these are two layers the actual neural network layer is up here and then it uses this to pull the data into a two by two so into two dimensional array from a three dimensional array with the colors then we flatten it so there's our outer flatten and then we add another dense what they call dense layer this dense layer goes in there and it downsizes it to 128 it reduces it so you can look at this as we're mapping all this data down the two-dimensional setup and then we flatten it so we map it to a flattened map and then we take it and reduce it down to 128 and we use the relu again and then finally we reduce that down to just a single output and we use the sigmoid to do that to figure out whether it's yes no true false in this case cat or dog and then finally once we put all these layers together we compile them that's what we've done here and we've compiled them as far as how it trains to use these settings for the training back propagation so if you remember we talked about training our setup and when we go into this you'll see that we have two data sets we have one called the training set and the testing set and that's very standard in any data processing is you need to have that's pretty common in any data processing is you need to have a certain amount of data to train it and then you got to know whether it works or not is it any good and that's why you have a separate set of data for testing it or you already know the answer but you don't want to use that as part of the training set so in here we jump into part two fitting the classifier neural network to the images and then from cross let me just zoom in there i always love that about working with jupiter notebooks you can really see we're going to come in here we do the cross pre-processing and image and we import image data generator so nice of course it's such a high-end product right now going out and since images are so common they already have all this stuff to help us process the data which is great and so we come in here we do train data gin and we're going to create our object for helping us train it for reshaping the data so that it's going to work with our setup and we use an image data generator and we're going to rescale it and you'll see here we have one point which tells us it's a float value on the rescale over 255. where does 255 come from well that's the scale in the colors of the pictures we're using their value from 0 to 255. so we want to divide it by 255 and it'll generate a number between 0 and 1. they have shear range and zoom range horizontal flip equals true and this of course has to do with if the photos are different shapes and sizes like i said it's a wonderful package you really need to dig in deep to see all the different options you have for setting up your images for right now though we're going to stick with some basic stuff here and let me go ahead and run this code and again it doesn't really do anything because we're still setting up the pre-processing let's take a look at this next set of code and this one is just huge we're creating the training set so the training set is going to go in here and it's going to use our train data gen we just created dot flow from directory it's going to access in this case the path data set training set that's a folder so it's going to pull all the images out of that folder now i'm actually running this in the folder that the data sets in so if you're doing the same setup and you load your data in there and you're doing this make sure wherever your jupyter notebook is saving things to that you create this path or you can do the complete path if you need to you know see colon slash etc and the target size the batch size and class mode is binary so the classes we're switching everything to a binary value batch size what the heck is batch size well that's how many pictures we're going to batch through the training each time and the target size 64 by 64. a little confusing but you can see right here that this is just a general training and you can go in there and look at all the different settings for your training set and of course with different data we're doing pictures there's all kinds of different settings depending on what you're working with let's go ahead and run that and see what happens and you'll see that it found 800 images belonging to one classes so we have 800 images in the training set and if we're going to do this with the training set we also have to format the pictures in the test set now we're not actually doing any predictions we're not actually programming the model yet all we're doing is preparing the data so we're going to prepare a training set and the test set so any changes we make to the training set at this point also have to be made to the test set so we've done this thing we've done a train data generator we've done our training set and then we also have remember our test set of data so i'm going to do the same thing with that i'm going to create a test data gen and we're going to do this image data generator we're going to rescale 1 over 255 we don't need the other settings just the single setting for the test data gen and we're going to create our test set we're going to do the same thing we did with the test set except that we're pulling it from the test set folder and we'll run that and you'll see in our test set we found 2 000 images that's about right we're using 20 of the images as test and 80 to train it and then finally we've set up all our data we've set up all our layers which is where all the work is is cleaning up that data making sure it's going in there correctly and we're actually going to fit it we're going to train our data set and let's see what that looks like and here we go let's put the information in here and let's just take a quick look at what we're looking at with our fit generator we have our classifier dot fit generator that's our back propagation so the information goes through forward with the picture and it says up you're either right or you're wrong and then the air goes backward and reprograms all those weights so we're training our neural network and of course we're using the training set remember we created the training set up here and then we're going steps per epic so it's 8 000 steps epic means that that's how many times we go through all the pictures so we're going to rerun each of the pictures and we're going to go through the whole data set 25 times but we're going to look at each picture during each epic 8 000 times so we're really programming the heck out of this and going back over it and then they have validation data equals test set so we have our training set and then we're gonna have our test set to validate it so we're gonna do this all in one shot and we're gonna look at that and they're gonna do 200 steps for each validation and we'll see what that looks like in just a minute let's go ahead and run our training here and we're going to fit our data and as it goes it says epic one of 25 you start realizing that this is going to take a while on my older computer it takes about 45 minutes i have a dual processor we're processing uh 10 000 photos that's not a small amount of photographs to process so if you're on your laptop in which i am it's going to take a while so let's go ahead and go get our cup of coffee and a sip and come back and see what this looks like so i'm back you didn't know i was gone that was actually a lengthy pause there i made a couple changes and let's discuss those changes real quick and why i made them so the first thing i'm going to do is i'm going to go up here and insert a cell above and let's paste the original code back in there and you'll see that the original thing was steps per epic 8 000 25 epics and validation steps 2 000 and i changed these to 4 000 epics or 4 000 steps per epic 10 epics and just 10 validation steps and this will cause problems if you're doing this as a commercial release but for demo purposes this should work and if you remember our steps per epic that's how many photos we're going to process in fact let me go ahead and get my drawing pin out and let's just highlight that right here well we have 8 000 pictures we're going through so for each epic i'm going to change this to four thousand and cut that in half so it's going to randomly pick 4 000 pictures each time it goes through an epic and the epic is how many processes so this is 25 and i'm just going to cut that to 10. so instead of doing 25 runs through 8 000 photos each which you can do the math of 25 times 8 000. i'm only going to do 10 through 4 000. so i'm going to run this 40 000 times through the processes and the next thing i know that you'll you'll want to notice is that i also change the validation step and this would cause some major problems in releasing because i dropped it all the way down to 10. what the validation step does is it says we have 2 000 photos in our trainings or in our testing set and we're going to use that for validation well i'm only going to use a random 10 of those to validate so not really the best settings but let me show you why we did that let's scroll down here just a little bit and let's look at the output here and see what that what's going on there so i've got my drawing tool back on and you'll see here it lists a run so each time it goes through an epic it's going to do 4 000 steps and this is where the 4000 comes in so that's where we have we have epic one of ten four thousand steps is randomly picking half the pictures in the file and going through them and then we're going to look at this number right here that is for the whole epic and that's 2411 seconds and if you remember correctly you divide that by 60 you get minutes if you divide that by 60 you get hours or you can just divide the whole thing by 60 times 60 which is 3600. if 3600 is an hour this is roughly 45 minutes right here and that's 45 minutes to process half the pictures so if i was doing all the pictures we're talking an hour and a half per epic times 36 or no 25 they had 25 up above 25. so that's roughly a couple days a couple days of processing well for this demo we don't want to do that i don't want to come back the next day plus my computer did a reboot in the middle of the night so we look at this and we say okay let's we're just testing this out my computer that i'm running this on is a dual core processor runs point nine gigahertz per second for a laptop it was good about four years ago but for running something like this is probably a little slow so we cut the times down and the last one was validation we're only validating it on a random 10 photos and this comes into effect because you're going to see down here where we have accuracy value loss value accuracy and loss those are very important numbers to look at so the 10 means i'm only validating across 10 pictures that is where here we have value this is acc is for accuracy value loss we're not going to worry about that too much and accuracy now accuracy is while it's running it's putting these two numbers together that's what accuracy is and value accuracy is at the end of the epic what's our accuracy into the epic what is it looking at in this tutorial we're not going to go so deep but these numbers are really important when you start talking about these two numbers reflect bias that is really important let me just put that up there and bias is a little bit beyond this tutorial but the short of it is is if this accuracy which is being our validation per step is going down and the value accuracy continues to go up that means there's a bias that means i'm memorizing the photos i'm looking at i'm not actually looking for what makes a dog a dog what makes a cat a cat i'm just memorizing them and so the more this discrepancy grows the bigger the bias is and that is really the beauty of the cross neural network is a lot of built-in features like this that make that really easy to track so let's go ahead and take a look at the next set of code so here we are into part three we're going to make a new prediction and so we're going to bring in a couple tools for that and then we have to process the image coming in and find out whether it's an actual dog or cat we can actually use this to identify it and of course the final step of part three is to print prediction we'll go ahead and combine these and of course you can see me there adding more sticky notes to my computer screen hidden behind the screen and last one was don't forget to feed the cat and the dog so let's go and take a look at that and see what that looks like in code and put that in our jupiter notebook all right and let's paste that in here and we'll start by importing numpy as np numpy is a very common package i pretty much import it on any python project i'm working on another one i use regularly is pandas they're just ways of organizing the data and then np is usually the standard in most machine learning tools as a return for the data array although you can use the standard data array from python and we have cross pre-processing import image fish that all look familiar because we're going to take a test image and we're going to set that equal to in this case cat or dog one as you can see over here and you know let me get my drawing tool back on so let's take a look at this we have our test image we're loading and in here we have test image one and this one hasn't data hasn't seen this one at all so this is all new hope let me shrink the screen down let me start that over so here we have my test image and we went ahead and the cross processing has this nice image set up so we're going to load the image and we're going to alter it to a 64 by 64 print so right off the bat we're going to cross as nice that way it automatically sets it up for us so we don't have to redo all our images and find a way to reset those and then we use also to set the image to an array so again we're all in pre-processing the data just like we pre-processed before with our test information and our training data and then we use the numpy here's our numpy that's uh from our right up here important numpy as in p expand the dimensions test image axes equal zero so it puts it into a single array and then finally all that work all that pre-processing and all we do is we run the result we click on here we go result equals classifier predict test image and then we find out well what is the test image and let's just take a quick look and just see what that is and you can see when i ran it it comes up dog and if we look at those images there it is cat or dog image number one that looks like a nice floppy eared lab friendly with his tongue hanging out it's either that or a very floppy eared cat i'm not sure which but coordinator software says it's a dog and uh we have a second picture over here let's just see what happens when we run the second picture we can go up here and change this uh from dog image one to two we'll run that and it comes down here and says cat you can see me highlighting it down there as cat so our process works you're able to label a dog a dog and a cat a cat just from the pictures there we go cleared my drawing tool and the last thing i want you to notice when we come back up here to when i ran it you'll see it has an accuracy of 1 and the value accuracy of one well the value accuracy is the important one because the value accuracy is what it actually runs on the test data remember i'm only testing it on i'm only validating it on a random 10 photos and those 10 folders just happen to come up one now when they ran this on the server it actually came up about 86 percent this is why cutting these numbers down so far for a commercial release is bad so you want to make sure you're a little careful of that when you're testing your stuff that you change these numbers back when you run it on a more enterprise computer other than your old laptop that you're just practicing on or messing with and we come down here and again you know we had the validation of cat and so we have successfully built a neural network that could distinguish between photos of a cat and a dog imagine all the other things you could distinguish imagine all the different industries you could dive into with that just being able to understand those two difference of pictures what about mosquitoes could you find the mosquitoes that bite versus mosquitoes that are friendly it turns out the mosquitoes that bite us are only four percent of the mosquito population if even that maybe two percent there's all kinds of industries that use this and there's so many industries that are just now realizing how powerful these tools are just in the photos alone there is a myriad of industries sprouting up and i said it before i'll say it again what an exciting time to live in with these tools and that we get to play with welcome to tensorflow 2.0 tutorial what's in it for you we're going to cover today deep learning frameworks what is tensorflow features of tensorflow tensorflow applications how tensorflow works tensorflow 1.0 versus 2.0 tensorflow 2.0 architecture and then we'll go over a tensorflow demo where we roll up our sleeves and dive right into the code so let's start with deep learning frameworks to start with this chart doesn't even do the filled justice because it's just exploded these are just some of the major frameworks out there there's uh keras which happens to sit on tensorflow so they're very integrated there's tensorflow pie torches out there cafe theano uh dl4j and chainer these are just a few of the deep learning frameworks we're talking about neural networks if you're just starting out never seen a neural network you can go into python in the psi kit and do the neural network in there which is probably the most simplest version i know but the most robust version out there the most top of the ladder as far as the technology right now is tensorflow and that of course is changing from day to day and some of these are better for different purposes so let's dive into tensorflow let's see what is tensorflow what is tensorflow tensorflow is a popular open source library released in 2015 by google brain team for building machine learning and deep learning models it is based on python programming language and performs numerical computations using data flow graphs to build models so let's take a look at some of the features of tensorflow it works efficiently with multi-dimensional arrays if you've ever played with any of the simpler packages of neural networks you're going to find that you have to pretty much flatten them and make sure your your stuff is set in a flat model tensorflow works really good so we're talking pictures here where you have x and y coordinates where the picture is and then each pixel has three or four different channels that's a very complicated array a very multi-dimensional array it provides scalability of computation across machines and large data sets this is so new right now and you might think that's a minor thing but when python is operating on one computer and it has a float value and it truncates it differently on each computer you don't get the same results and so your training model might work on one machine and then on another it doesn't this is one of the things that tensorflow addresses and does a very good job on it supports fast debugging and model building this is why i love tensorflow i can go in there and i can build a model with different layers each layer might have different properties they have like the convolutional neural network which you can then sit on top of a regular neural network with reverse propagation there's a lot of tools in here and a lot of options and each layer that it goes through can utilize those different options and stack differently and it has a large community and provides tensorboard to visualize the model tensorboard is pretty uh recent but it's a really nice tool to have so you when you're working with other people or showing your clients or the shareholders in the company you can give them a nice visual model so they know what's going on what are they paying for and let's take a glance at some of the different uses or applications for tensorflow when we talk about tensorflow applications clearly this is data analytics we're getting into the data science i like to use data science as probably a better term this is the programming side and it's really the sky is a limit we can look at face detection language translation fraud detection video detection there are so many different things out there that tensorflow can be used for when you think of neural networks because tensorflow is a neural network think of complicated chaotic data this is very different than if you have a set numbers like you're looking at the stock market you can use this on the stock market but if you're doing something where the numbers are very clear and not so chaotic as you have in a picture then you're talking more about linear regression models and different regression models when you're looking at that when you're talking about these really complicated data patterns then you're talking neural networks in tensorflow and if we're going to talk about tensorflow we should talk about what tensors are after all that is what tensor that's what this is named after so we talk about tensors in tensorflow tensorflow is derived from its core component known as a tensor a tensor is a vector or a matrix of n dimensions that represent all types of data and you can see here we have the scalar which is just a single number you have your vector which is two numbers uh might be a number in a direction you have a simple matrix and then we get into the tensor i mentioned how a picture is a very complicated tensor because it has your x y coordinates and then each one of those pixels has three to four channels for your different colors and so each image coming in would be its own tensor and in tensorflow tensors are defined by a unit of dimensionality called as rank and you can see here we have our scalar which is a single number that has a rank of zero because it has no real dimensions to it other than it's just a single point and then you have your vector which would be a single list of numbers so that's a rank one matrix would have rank two and then as you can see right here as we get into the full tensor it has a rank three and so the next step is to understand how a tensorflow works and if you haven't looked at the basics of a neural network in reverse propagation that is the basics of tensorflow and then it goes through a lot of different options and properties that you can build into your different tensors so a tensorflow performs computations with the help of data flow graphs it has nodes that represent the operations in your model and if you look at this you should see a neural network going on here we have our inputs bc and d and you might have x equals b plus c y equals d minus 4 a equals x times y and then you have an output and so even though this isn't a neural network here it's just a simple set of computations going across you can see how the more complicated it gets the more you can actually one of the tensors is a neural network with reverse propagation but it's not limited to that there's so much more you can do with it and this here is just a basic flow of computations of the data going across and you can see we can plug in the numbers b equals four c equals three d equals six and you get x equals four plus three so x equals seven y equals six minus four so y equals two and finally a equals seven times two or a equals fourteen like i said this is a very simplified version of how tensorflow works each one of these layers can get very complicated but tensorflow does such a nice job that you can spin different setups up very easily and test them out so you can test out these different models to see how they work now tensorflow has gone through two major stages we had the original tensorflow release of 1.0 and then they came out with the 2.0 version and the 2.0 addressed so many things out there that the 1.0 really needed so we start talking about tensorflow 1.0 versus 2.0 i guess you would need to know this for a legacy programming job if you're pulling apart somebody else's code the first thing is that tensorflow 2.0 supports eager execution by default it allows you to build your models and run them instantly and you can see here from tensorflow 1 to tensorflow 2 we have almost double the code to do the same thing so if i want to do with tf.session or tensorflow session as a session the session run you have your variables your session run you have your tables initializer and then you do your model fit x train y train and then your validation data your x value y value and your epics and your batch size all that goes into the fit and you can see here where that was all just compressed to make it run easier you can just create a model and do a fit on it and you only have like that last set of code on there so it's automatic that's what they mean by the eager so if you see the first part you're like what the heck is all the session thing going on that's tensorflow 1.0 and then when you get into 2.0 it's just nice and clean if you remember from the beginning i said cross on our list up there and cross is the high level api in tensorflow 2.0 cross is the official high level api of tensorflow 2.0 it has incorporated cross as tf.caras cross provides a number of model building apis such as sequential functional and subclassing so you can choose the right level of abstraction for your project and we'll hopefully touch base a little bit more on this sequential being the most common uh form that is your your layers are going from one side to the other so everything's going in a sequential order functional is where you can split the layer so you might have your input coming on one side it splits into two completely mod different models and then they come back together and one of them might be doing classification the other one might be doing just linear regression kind of stuff or a neural basic reverse propagation neural network and then those all come together into another layer which is your neural network reverse propagation setup subclassing is the most complicated as you're building your own models and you can subclass your own models into cross so very powerful tools here this is all the stuff that's been coming out currently in the tensorflow cross setup a third big change we're going to look at is it in tensorflow 1.0 in order to use tf layers as variables you would have to write tf variable block so you'd have to pre-define that in tensorflow 2 you just add your layers in under the sequential and it automatically defines them as long as they're flat layers of course this changes a little bit as the more complicated tensor you have coming in but all of it's very easy to do and that's what 2.0 does a really good job of and here we have a little bit more on the scope of this and you can see how tensorflow 1 asks you to do these different layers and values if you look at the scope and the default name you start looking at all the different code in there to create the variable scope that's not even necessary in tensorf 2.0 so you'd have to do one before you do do what you see the code in 2.0 in 2.0 you just create your model it's a sequential model then you can add all your layers in you don't have to pre-create the variable scope so if you ever see the variable scope you know that came from an older version and then we have the last two which is our api cleanup and the autograph in the api cleanup tensorflow 1 you could build models using tf gans tf app tf contrib tf flags etc in tensorflow 2 a lot of apis have been removed and this is just they just cleaned them up because people weren't using them and they've simplified them and that's your tf app your tf flags your tf logging are all gone so there's those are three legacy features that are not in 2.0 and then we have our tf function and autograph feature in the old version tensorflow 1 0 the python functions were limited and could not be compiled or exported re-imported so you were continually having to redo your code you couldn't very easily just put a pointer to it and say hey let's reuse this in tensorflow 2 you can write a python function using the tf function to mark it for the jit compilation for the python jit so that tensorflow runs it as a single graph autograph feature of tf function helps to write graph code using natural python syntax now we just threw in a new word in you graph a graph is not a picture of a person you'll hear graph x and some other things graph is what are all those lines that are connecting different objects so if you remember from before where we had the different layers going through sequentially each one of those white lined arrows would be a graph x that's where that computation is taken care of and that's what they're talking about and so if you had your own special code or python way that you're sending that information forward you can now put your own function in there instead of using whatever function they're using in neural networks this would be your activation function although it could be almost anything out there depending on what you're doing next let's go for hierarchy and architecture and then we'll cover three basic tools in tensorflow before we roll up our sleeves and dive into the example so let's just take a quick look at tensorflow toolkits in their hierarchy at the high level we have our object oriented api so this is what you're working with you have your tf cross you have your estimators this sits on top of your tf layers tf losses tf metrics so you have your reusable libraries for model building this is really where tensorflow shines is between the cross running your estimators and then being able to swap in different layers you can your losses your metrics all of that is so built into tensorflow makes it really easy to use and then you can get down to your low level tf api you have extensive control over this you can put your own formulas in there your own procedures or models in there you could have it split we talked about that earlier so with the 2.0 you can now have it split one direction we do a linear regression model and then go to the other where it does a neural network and maybe each neural network has a different activation set on it and then it comes together into another layer which is another neural network so you can build these really complicated models and at the low level you can put in your own apis you can move that stuff around and most recently we have the tf code can run on multiple platforms and so you have your cpu which is basically like in the computer i'm running on i have eight cores and 16 dedicated threads i hear they now have one out there that has over 100 cores so you have your cpu running and then you have your gpu which is your graphics card and most recently they also include the tpu setup which is specifically for tensorflow models uh neural network kind of setup so now you can export the tf code and it can run on all kinds of different platforms for the most diverse setup out there and moving on from the hierarchy to the architecture in the tensorflow 2.0 architecture we have you can see on the left this is usually where you start out with and 80 of your time in data science is spent pre-processing data making sure it's loaded correctly and everything looks right so the first level in tensorflow is going to be your read and pre-processed data your tf data feature columns this is going to feed into your tf cross or your pre-made estimators and kind of you have your tensorflow hub that sits on top of there so you can see what's going on uh once you have all that set up you have your distribution strategy where are you going to run it you're going to be running it on just your regular cpu are you going to be running it with the gpu added in like i have a pretty high-end graphics card so it actually grabs that gpu processor and uses it or do you have a specialized tpu set up in there that you paid extra money for it could be if you're in later on when you're distributing the package you might need to run this on some really high processors because you're processing at a server level for uh let's say net you might be processing this at a distribute you're distributing it not the distribution strategy but you're distributing it into a server where that server might be analyzing thousands and thousands of purchases done every minute and so you need that higher speed to give them a to give them a recommendation or a suggestion so they can buy more stuff off your website or maybe you're looking for data fraud analysis working with the banks you want to be able to run this at a high speed so that when you have hundreds of people sending their transactions in it says hey this doesn't look right someone's scamming this person and probably has their credit card so when we're talking about all those fun things we're talking about saved model this is we were talking about that earlier where it used to be when you did one of these models it wouldn't truncate the float numbers the same and so a model going from one you build the model on your machine in the office and then you need to distribute it and so we have our tensorflow serving cloud on premium that's what i was talking about if you're like a banking or something like that now they have tensorflow lite so you can actually run a tensorflow on an android or an ios or raspberry pi a little breakout board there in fact they just came out with a new one that has a built-in there's just a little mini tpu with the camera on it so it can pre-process a video so you can load your tensorflow model on to that um talking about an affordable way to beta test a new product you have the tensorflow js which is for browser and node server so you can get that out on the browser for some simple computations that don't require a lot of heavy lifting but you want to distribute to a lot of end points and now they also have other language bindings so you can now create your tensorflow backend save it and have it accessed from c java go c sharp rust r or from whatever package you're working on so we kind of have an overview of the architecture and what's going on behind the scenes and in this case what's going on as far as distributing it let's go ahead and take a look at three specific pieces of tensorflow and those are going to be constants variables and sessions so very basic things you need to know and understand when you're working with the tensorflow setup so constants in tensorflow in tensorflow constants are created using the function constant in other words they're going to stay static the whole time whatever you're working with the syntax for constant value d type 9 shape equals none name constant verify shape equals false that's kind of the syntax you're looking at and we'll explore this with our hands on a little more in depth and you can see here we do z equals tf.constant 5.2 name equals x d type is a float that means that we're never going to change that 5.2 it's going to be a constant value and then we have our variables in tensorflow variables in tensorflow are in memory buffers that store tensors and so we can declare a two by three tensor populated by ones you could also do constants this way by the way so you can create a an array of ones for your constants i'm not sure why you'd do that but you know you might need that for some reason in here we have v equals tf.variables and then in tensorflow you have tf.ones and you have the shape which is two three which is then going to create a nice two by three array that's filled with ones and then of course you can go in there and they're variables so you can change them it's a tensor so you have full control over that and then you of course have sessions in tensorflow a session in tensorflow is used to run a computational graph to evaluate the nodes and remember when we're talking a graph or graph x we're talking about all that information then goes through all those arrows and whatever computations they have that take it to the next node and you can see down here where we have import tensorflow as tf if we do x equals a tf.constant of 10 we do y equals the tf constant of 2.0 or 20.0 and then you can do z equals tf.variable and it's a tf dot add x comma y and then once you have that set up in there you go ahead and knit your tf global variables initializer with tf session as session you can do a session run init and then you print the session run y uh and so when you run this you're going to end up with of course the 10 plus 20 is 30. and we'll be looking at this a lot more closely as we actually roll up our sleeves and put some code together so let's go ahead and take a look at that and for my coding today i'm going to go ahead and go through anaconda and then i'll use specifically the jupiter notebook on there and of course this code is going to work uh whatever platform you choose whether you're in a notebook the jupiter lab which is just the jupiter notebook but with tabs for larger projects we're going to stick with jupiter notebook pie charm whatever it is you're going to use in here you have your spider and your qt console for different programming environments the thing to note um it's kind of hard to see but i have my main pi 36. right now when i was writing this tensorflow works in python version 3 6. if you have python version 3 7 or 3 8 you're probably going to get some errors in there might be that they've already updated and i don't know it now you have an older version but you want to make sure you're in python version 3 6 in your environment and of course in anaconda i can easily set that environment up make sure you go ahead and pip in your tensor flow or if you're in anaconda you can do a conda install tensorflow to make sure it's in your package so let's just go ahead and dive in and bring that up this will open up a nice browser window i just love the fact i can zoom in and zoom out depending on what i'm working on making it really easy to just demo for the right size go under new and let's go ahead and create a new python and once we're in our new python window this is going to leave it untitled uh let's go ahead and import import tensorflow as tf uh at this point we'll go ahead and just run it real quick no errors yay no errors i i do that whenever i do my imports because i unbearably will have opened up a new environment and forgotten to install tensorflow into that environment uh or something along those lines so it's always good to double check and if we're going to double check that we also it's also good to know what version we're working with and we can do that simply by using the version command in tensorflow which you should know is is probably intuitively the tf dot underscore underscore version underscore underscore and you know it always confuses me because sometimes you do tf.version for one thing you do dot underscore version underscore for another thing this is a double underscore in tensorflow for pulling your version out and it's good to know what you're working with we're going to be working in tensorflow version 2.1.0 and i did tell you the the um we were going to dig a little deeper into our constants and you can do an array of constants and we'll just create this nice array a equals tf.constant and we're just going to put the ray right in there 4 3 six one uh we can run this and now that is what a is equal to and if we want to just double check that remember we're in jupiter notebook where i can just put the letter a and it knows that that's going to be print otherwise you round you surround it in print and you can see it's a tf tensor it has the shape the type and the and the array on here it's a two by two array and just like we can create a constant we can go and create a variable and this is also going to be a two by two array and if we go ahead and print the v out we'll run that and sure enough there's our tf variable in here then we can also let's just go back up here and add this in here i could create another tensor and we'll make it a constant this time and we're going to put that in over here we'll have b tf constant and if we go and print out v and b i'm going to run that and this is an interesting thing that always that happens in here uh you'll see right here when i print them both out what happens it only prints the last one unless you use print commands uh so important to remember that in jupyter notebooks we can easily fix that by go ahead and print and surround v with brackets and now we can see what the two different variables we have uh we have the three one five two which is a variable and this is just a flat uh constant so it comes up as a tf tensor shape two kind of two and that's interesting to note that this label is a tf.tensor and this is a tf variable so that's how it's looking in the back end when you're talking about the difference between a variable and a constant the other thing i want you to notice is that in variable we capitalize the v and with the constant we have a lowercase c little things like that can lose you when you're programming and you're trying to find out hey why doesn't this work uh so those are a couple of things to note in here and just like any other array in math we can do like a concatenate or concatenate the different values here and you can see we can take a b concatenated you just do a dot concat values and there's our a b axes on one hopefully you're familiar with axes and how that works when you're dealing with matrixes and if we go ahead and print this out you'll see right here we end up with a tensor so let's put it in as a constant not as a variable and you have your array four three seven eight and six one four five it's concatenated the two together and again i wanna highlight a couple things on this our axis equals one this means we're doing the columns so if you had a longer array like right now we have an array that is like you know has a shape one whatever it is two comma two um axis zero is going to be your first one and axes one is going to be your second one and it translates as columns and rows if we had a shape let me just put the word shape here um so you know what i'm talking about it's very clear and this is i'll tell you what i spend a lot of time looking at these shapes and trying to figure out which direction i'm going in and whether to flip it or whatever so you can get lost in which way your matrix is going which is column which is rows are you dealing with the third axes or the second axes axis one you know zero one two that's going to be our columns and if you can do columns then we also can do rows and that is simply just changing the concatenate uh we'll just grab this one here and copy it we'll do the whole thing over ctrl copy ctrl v and changes from axis one to axis zero and if we run that you'll see that now we concatenate by a row as opposed to column and you have four three six one seven eight four seven so it just brings it right down and turns it into rows versus columns you can see the difference there your output this really you want to look at the output sometimes just to make sure your eyes are looking at it correctly and it's in the format i find visually looking at it is almost more important than understanding what's going on because conceptually your mind just just too many dimensions sometimes the second thing i want you to notice is this says a numpy array so tensorflow is utilizing numpy as part of their format as far as python is concerned and so you can treat you can treat this output like a numpy array because it is just that it's going to be a numpy array another thing that comes up uh more than you would think is filling uh one of these with zeros or ones and so you can see here we just create a tensor tf.zeros and we give it a shape we tell it what kind of data type it is in this case we're doing an integer and then if we print out our tensor again we're in jupiter so i just type out tensor and i run this you can see i have a nice array of with shape three comma four of zeros one of the things i want to highlight here is integer 32. if i go to the um tensorflow data types i want you to notice how we have float 16 float 32 float 64 complex if we scroll down you'll see the integer down here 32 the reason for this is that we want to control how many bits are used in the precision this is for exporting it to another platform so what would happen is i might run it on this computer where python goes does a float to indefinite however long it wants to and then we can take it but we want to actually say hey we don't want that high precision we want to be able to run this on any computer and so we need to control whether it's a tf float 16 in this case we did an integer 32 we could also do this as a float so if i run this as a float 32 that means this has a 32 bit precision you'll see zero point whatever and then to go with zeros we have ones if we're going from the opposite side and so we can easily just create a tensorflow with ones and you might ask yourself why would i want zeros and ones and your first thought might be to initiate a new tensor usually we initiate a lot of this stuff with random numbers because it does a better job solving it if you start with a uniform set of ones or zeros you're dealing with a lot of bias so be very careful about starting a neural network for one of your rows or something like that with ones and zeros on the other hand i use this for masking you can do a lot of work with masking you can also have it might be that one tenser row is masked you know zero is false one is true or whatever you want to do it and so in that case you do want to use the zeros and ones and there are cases where you do want to initialize it with all zeros or all ones and then swap in different numbers as the tensor learns so it's another form of control but in general you see zeros and ones you usually are talking about a mask over another array and just like in numpy you can also do reshapes so if we take our remember this is shaped three comma four maybe we want to swap that to four comma three and if we print this out you will see let me just go and do that ctrl v let me run that and you'll see that the the order of these is now switched instead of four across now we have three across and four down and just for fun let's go back up here where we did the ones and i'm going to change the ones to tf.random uniform uh and we'll go ahead and just take off well we'll go and leave that we'll go ahead and run this and you'll see now we have uh 0.0441 and this way you can actually see how the reshape looks a lot different 0.041.15.71 and then instead of having this one it rolls down here to the 0.14 and this is what i was talking about sometimes you fill a lot of times you fill these with random numbers and so this is the random dot uniform is one of the ways to do that now i just talked a little bit about this float 32 and all these data types one of the things that comes up of course is recasting your data so if we have a d type float 32 we might want to convert these two integers because of the project we're working on i know one of the projects i've worked on ended up wanting to do a lot of round off so that it would take a dollar amount or a float value and then have to round it off to a dollar amount so we only wanted two decimal points um in which case you have a lot of different options you can multiply by 100 and then round it off or whatever you want to do there's a lot or then converted to an integer was one way to round it off uh kind of a cheap and dirty trick uh so we can take this and we can take this same tensor and we'll go ahead and create a um as an integer and so we're going to take this tensor we're going to tf.cast it and if we print tensor and then we're going to go ahead and print our tensor let me just do a quick copy and paste and when i'm actually programming i usually type out a lot of my stuff just to double check it in doing a demo copy and paste works fine but sometimes be aware that copy and paste can copy the wrong code over personal choice depends on what i'm working on and you can see here we took a float 32 4.6 4.2 and so on and it just converts it right down to a integer value that's our integer 32 set up and remember we talked about a little bit about reshape um as far as flipping it and i just did uh four comma three on the reshape up here and we talked about axes zero axis one uh one of the things that is important to be able to do is to take one of these variables we'll just take this last one tensor as integer and i want to go ahead and transpose it and so i can do we'll do a equals tf.transpose and we'll do our tensor integer in there and then if i print the a out and we run this you'll see that's the same array but we've flipped it so that our columns and rows are flipped this is the same as reshaping so when you transpose you're just doing a reshape what's nice about this is if you look at the numbers the columns when we went up here and we did the reshape they kind of roll down to the next row so you're not maintaining the structure of your matrix so when we do a reshape up here they're similar but they're not quite the same and you can actually go in here and there's settings in the reshape that would allow you to turn it into a transform so we come down here it's all done for you and so there are so many times you have to transpose your digits that this is important to know that you can just do that you can flip your rows and columns rather quickly here and just like numpy you can also do multiple your different math functions we'll look at multiplication and so we're going to take matrix multiplication of tensors we'll go ahead and create a as a constant five eight three nine and we'll put in a vector v four comma two and we could have done this where they matched where this was a two by two array um but instead we're going to do just a two by one array and the code for that is your tf dot mat mole uh so matrix multiplier and we have a times v and if we go ahead and run this up let's make sure we print out our av on there and if we go ahead and run this you'll see that we end up with 36 by 30. and if it's been a while since you've seen the matrix math this is 5 times 4 plus eight times two three times four plus nine times two and that's where we get the 36 and 30. now i know we're covering a lot really quickly as far as the basic functionality so the matrix or your matrix multiplier is a very commonly used back-end tool as far as computing different models or linear regression stuff like that one of the things is to note is that just like in numpy you have all of your different math so we have our tf math and if we go in here we have functions we have our cosines absolute angle all of that's in here so all of these are available for you to use in the tensorflow model and if we go back to our example and let's go ahead and pull oh let's do some multiplication that's always good we'll stick with our av our constant a and our vector v and we'll go ahead and do some bit wise multiplication and we'll create an av which is a times b let's go and print that out and you can see coming across here we have the four two and the five eight three nine and it produces a 2032 618 and that's pretty straightforward if you look at you have 4 times 5 is 20 4 times 8 is 32 that's where those numbers come from now we can also quickly create an identity matrix which is basically your main values on the diagonal being ones and zeros across the other side let's go ahead and take a look and see what that looks like and we can do let's do this so we're going to get the shape this is a simple way very similar to your numpy you can do a dot shape and it's going to return a tuple in this case our rows and columns and so we can do a quick print we'll do rows and we'll do columns and if we run this you can see we have three rows two columns and then if we go ahead and create an identity matrix whoops the script for that got a wrong button there the script for that looks like this where we have the number of rows equals rows the number of columns equals columns and d type is a 32 and then if we go ahead and just print out our identity you can see we have a nice identity column with our ones going across here now clearly we're not going to go through every math module available but we do want to start looking at this as a prediction model and seeing how it functions so we're going to move on to a more of a direct setup we can actually see the full tensorflow in use for that let's go back and create a new setup and we'll go in here new python 3 module there we go bring this out so it takes up the whole window because i like to do that hopefully you made it through that first part and you have a basic understanding of tensorflow as far as being a series of numpy arrays you've got your math equations and different things that go into them we're going to start building a full setup as far as the numpy so you can see how kara sits on top of it and the different aspects of how it works the first thing we want to do is we're going to go ahead and do a lot of imports uh date times warning scipy scipy is your math so the back end scientific math warnings because whenever we do a lot of this you have older versions newer versions and so sometimes when you get warnings you want to go ahead and just suppress them we'll talk about that if it comes up on this particular setup and of course date time pandas again is your data frame think rows and columns we import it as pd numpy is your numbers array which of course tensorflow is integrated heavily with seaborn for our graphics and the seaborn as sns is going to be set on top of our matplot library which we import as mpl and then of course we're going to import our matplot library pi plot as plt and right off the bat we're going to set some graphic colors patch force edge color equals true the style we're going to use the 538 style you can look this all up there's when you get into matplot library into seaborne there are so many options in here it's just kind of nice to make it look pretty when we start the um when we start up that way we don't have to think about it later on and then we're going to take we have our mplrc we're going to put a patch ed color dim gray line with again this is all part of our graphics here in our setup we'll go ahead and do an interactive shell node interactivity equals last expression here we are pd for pandas options display max columns so we don't want to display more than 50. and then our matplot library is going to be inline this is a jupiter notebook thing the matplot library in line then warnings we're going to filter our warnings and we're just going to ignore warnings that way when they come up we don't have to worry about them not really what you want to do when you're working on a major project you want to make sure you know those warnings and then filter them out and ignore them later on and if we run this it's just going to be loading all that into the background so that's a little back end kind of stuff then we want to go ahead and do is we want to go ahead and import our specific packages that we're going to be working with which is under keras now remember cross kind of sits on tensorflow so when we're importing cross and the sequential model we are in effect importing tensorflow underneath of it we just brought in the math probably should have put that up above and then we have our cross models we're going to import sequential now if you remember from our slide there was three different options let me just flip back over there so we can have a quick recall on that and so in keras we have sequential functional and subclassing so remember those three different setups in here we talked about earlier and if you remember from here we have a sequential where it's going one tensorflow layer at a time you go kind of look at think of it as going from left to right or top to bottom or whatever direction it's going in but it goes in one direction all the time where functional can have a very complicated graph of directions you can have the data split into two separate tensors and then it comes back together into another tensor all those kinds of things and then subclassing is really the really complicated one where now you're adding your own subclasses into the tensor to do external computations right in the middle of like a huge flow of data but we're going to stick with sequential it's not a big jump to go from sequential to functional but we're running a sequential tensorflow and that's what this first import is here we want to bring in our sequential and then we have our layers and let's talk a little bit about these layers this is where cross and tensorflow really are happening this is what makes them so nice to work with is all these layers are pre-built so from cross we have layers import dense from cross layers import lstm when we talk about these layers cross has so many built-in layers you can do your own layers the dense layer is your standard neural network by default it uses relu for its activation and then the lstm is a long short term memory layer since we're going to be looking probably at sequential data we want to go ahead and do the lstm and if we go into karass and we look at their layers this is across website you can see as we scroll down for the cross layers that are built in we can get down here and we can look at let's see here we have our layer activation our base layers um activation layer weight layer waste there's a lot of stuff in here we have the relu which is the basic activation that was listed up here for layer activations you can change those and here we have our core layers and our dense layers we have an input layer a dense layer and then we've added a more customized one with the long term short term memory layer and of course you can even do your own custom layers in cross there's a whole functionality in there if you're doing your own thing what's really nice about this is it's all built in even the convolutional layers this is for processing graphics there's a lot of cool things in here you can do this is white cross is so popular it's open source and you have all these tools right at your fingertips so from cross we're just going to import a couple layers the dense layer and the long short term memory layer and then of course from sk learn our scikit we want to go ahead and do our min max scalar standard scalar for pre editing our data and then metrics just so we can take a look at the errors and compute those let me go ahead and run this and that just loads it up we're not expecting anything from the output and our file coming in is going to be air quality.csv and let's go ahead and take a quick look at that this is in openoffice it's just a standard you know like we do excel whatever you're using for your spreadsheet and you can see here we have a number of columns a number of rows it actually goes down to like 8 000. the first thing we want to notice is that the first row is kind of just a random number put in going down probably not something we're going to work with the second row is bendung i'm guessing that's a reference for the profile if we scroll to the bottom which i'm not going to do because it takes forever to get back up they're all the same the same thing with the status the status is the same we have a date so we have a sequential order here here is the jam which i'm going to guess is the time stamp on there so we have a date and time we have our o3 co no2 reading so2 no co2 voc and then some other numbers here pm1 pm 2.5 pm4 pm time 10. without actually looking through the data i mean some of this i can guess is like temperature humidity i'm not sure what the pms are but we have a whole slew of data here so we're looking at air quality as far as an area in a region and what's going on with our date time stamps on there and so code wise we're going to read this into a pandas data frame so our data frame df is a nice abbreviation commonly used for data frames equals pd.read csv and then our the path to it just happens to be on my d drive uh separated by spaces and so if we go ahead and run this we'll print out the head of our data and again this looks very similar to what we were just looking at being in jupiter i can take this and go the other way make it real small so you can see all the columns going across and we get a full view of it or we can bring it back up in size that's pretty small on there overshot but you can see it's the same data we were just looking at we're looking at the number we're looking at the profile which is the bandung the date we have a timestamp our o3 count co and so forth on here and this is just your basic printing out the top five rows we could easily have done three rows uh five rows ten whatever you want to put in there by default that's five for pandas now i talk about this all the time so i know i've already said it at least once or twice during this video most of our work is in pre-formatting data what are we looking at how do we bring it together so we want to go ahead and start with our date time it's come in in two columns we have our date here and we have our time and we want to go ahead and combine that and then we have this is just a simple script in there that says combined date time that's our formula we're building our we're going to submit our pandas data frame and the tab name when we go ahead and do this that's all of our information that we want to go ahead and create and then goes for i in range df shape 0 so we're going to go through the whole setup and we're going to list tab append df location i and here is our date going in there and then return the numpy array list tab d types date time 64. that's all we're doing we're just switching this to a date time stamp and if we go ahead and do df date time equals combine date time and then i always like to print we'll do df head just we can see what that looks like and so when we come out of this we now have our setup on here and of course it's added it on to the far right here's our date time you can see the format's changed uh so there's our we've added in the date time column and we've brought the date over and we've taken this format here and it's an actual variable with a 0 0 0 on here well that doesn't look good so we need to also include the time part of this and we want to convert it into hourly data so let's go ahead and do that to do that to finish combining our date time let's go ahead and create a little script here to combine the time in there same thing we just did we're just creating a numpy array returning a numpy array and cr forcing this into a date time format and we can actually spend hours just going through these conversions how do you pull it from the panda's data frame how do you set it up um so i'm kind of skipping through it a little fast because i want to stay focused on tensorflow and cross keep in mind this is like 80 of your coding when you're doing a lot of this stuff is going to be reformatting these things resetting them back up uh so that it looks right on here and you know it just takes time to get through all that but that is usually what the companies are paying you for that's what the big bucks are for and we want to go ahead and a couple things going on here is we're going to go ahead and do our date time we're going to reorganize some of our setup in here convert into hourly data we just put a pause in there now remember we can select from df are different columns we're going to be working with and you're going to see that we actually dropped a couple of the columns those ones i showed you earlier they're just repetitive data so there's nothing in there that exciting and then we want to go ahead and we'll create a second data frame here let me just get rid of the df head and df2 is we're going to group by date time and we're looking at the mean value and then we'll print that out so you can see what we're talking about we have now reorganized this so we put in date time 03 co so now this is in the same order as it was before and you'll see the date time now has our 0 0 same date 1 2 3 and so on so let's group the data together so there's a lot more manageable and in the format we want and in the right sequential order and if we go back to there we go our air quality you can see right here we're looking at these columns going across we really don't need since we're going to create our own date time column we can get rid of those these are the different columns of information we want and that should reflect right here in the columns we picked coming across so this is all the same columns on there that's all we've done is reformatted our data grouped it together by date and then you can see the different data coming out set up on there and then as a data scientist first thing i want to do is get a description what am i looking at and so we can go ahead and do the df2 describe and this gives us our you know describe gives us our basic data analytics information we might be looking for like what is the mean standard deviation uh minimum amount maximum amount we have our first quarter second quarter and third quarter numbers also in there so you can get a quick look at a glance describing the data or descriptive analysis and even though we have our quantile information in here we're going to dig a little deeper into that we're going to calculate the quantile for each variable we're going to look at a number of things for each variable and we'll see right here q1 we can simply do the quantile 0.25 percent which should match our 25 percent up here and we're going to be looking at the min the max and we're just going to do this is basically we're breaking this down for each uh different variable in there one of the things that's kind of fun to do we're going to look at that in just a second let me get put the next piece of code in here clean out some of our we're going to drop a couple thing our last rows and first row because those have usually have a lot of null values in the first row is just our titles so that's important it's important to drop those rows in here and so this right here as we look at our different quantiles again it's it's the same you know we're still looking at the 25 quantile here we're going to do a little bit more with this so now that we've cleared off our first and last rows we're going to go ahead and go through all of our columns and this way we can look at each column individually and so we'll just create a q1 q3 min max min iqr max iqr and calculate the quantile of i of df2 we're basically doing uh the math that they did up here but we're splitting it apart that's all this is and this happens a lot because you might want to look at individual if this was my own project i would probably spend days and days going through what these different values mean one of the biggest data science uh things we can look at that's important is uh use your use your common sense you know if you're looking at this data and it doesn't make sense and you go back in there and you're like wait a minute what the heck did i just do at that point you probably should go back and double check what you have going on now we're looking at this and you can see right here here's our attribute for our o3 so we've broken it down we have our q1 5.88 q3 10.37 if we go back up here here's our 5 8 we've rounded it off 10.37 is in there so we've basically done the same math just split it up we have our minimum and our max iqr and that's computed let's see where is it here we go uh q1 minus 1.5 times iqr and the iqr is your q3 minus q1 so that's the difference between our two different quarters and this is all data science as far as the hard math we're really not we're actually trying to focus on cross and tensorflow you still got to go through all this stuff i told you 80 of your programming is going through and understanding what the heck uh happened here what's going on what does this data mean and so when we're looking in that we're going to go ahead and say hey um we've computed these numbers and the reason we've computed these numbers is if you take the minimum value and it's less than your minimum iqr that means something's going wrong there and they usually in this case is going to show us an outlier so we want to go ahead and find the minimum value if it's less than the min minimum iqr it's an outlier and if the max value is greater than the max iqr we have an outlier and that's all this is doing low outliers found minimum value high outliers found really important actually outliers are almost everything in data sometimes sometimes you do this project just to find the outliers because you want to know crime detection what are we looking for we're looking for the outliers what doesn't fit a normal business deal and then we'll go ahead and throw in just threw in a lot of code oh my goodness so we have if your max is greater than iqr print outlier is found what we want to do is we want to start cleaning up these outliers and so we want to convert we'll do create a convert nand x max iqr equals max underscore iqr min iqr equals mini iqr so this is just saying this is the data we're going to send that's all that is in python and if x is greater than the max iqr and x is less than the min iq are x equals null we're going to set it to null why because we want to clear these outliers out of the data now again if you're doing fraud detection you would do the opposite you would be cleaning everything else that's not in that series so that you can look at just the outlier and then we're going to convert the nand hum again we have x max iqr is 100 percent min iqr is men iqr if x is greater than max iqr and x is less than min iqr again we're going to return a null value otherwise it's going to remain the same value x x equals x and you can see as we go through the code if i equals our hum then we go ahead and do this that's a column specific to humidity that's your hum column uh then we're going to go ahead and convert do the run a map on there and convert the non hum uh you can see here it's this is just clean up we run we found out that humidity probably has some weird values in it we have our outliers that's all this is and so when we go ahead and finish this and we take a look at our outliers and we run this code here we have a low outlier 2.04 we have a high outlier 99.06 outliers have been interpolated that means we've given them a new value chances are these days when you're looking at something like these sensors coming in they probably have a failed sensor in there something went wrong that's the kind of thing that you really don't want to do your data analysis on so that's what we're doing is we're pulling that out and then converting it over and setting it up method linear so we interpolate left linears it's going to fill that data in based on a linear regression model of similar data same thing with this up here with the um df2i interpolate that's what we're doing again this is all data prep we're not actually talking about tensorflow we're just trying to get all our data set up correctly so that when we run it it's not going to cause problems or have a huge bias so we've dealt with outliers specifically in humidity and again this is one of these things where when we start running we run through this you can see down here that we have our outliers found high low outliers migrated them in we also know there's other issues going on with this data how do we know that some of it's just looking at the data playing with it until you start understanding what's going on let's take the temp value and we're going to go ahead and and use a logarithmic function on the temp value and it's interesting because it's like how do you how do you heck do you even know to use logarithmic on the temp value that's domain specific we're talking about being an expert in air care i'm not an expert in air care um you know it's not what i go look at i don't look at air care data in fact this is probably the first air care data set up i've looked at but the experts come in there and they come to you and say hey in data science this is a exponentially variable variable on here so we need to go ahead and do transform it and use a logarithmic scale on that so at that point that would be coming from your data here we go data science programmer overview does a lot of stuff connecting the database and connecting in with the experts data analytics a lot of times you're talking about somebody who is a data analysis might be all the way usually a phd level data science programming level interfaces database manager that's going to be the person who's your admin working on it so when we're looking at this we're looking at something they've sent to me and they said hey domain air care this needs to be this is a skew because the data just goes up exponentially and affects everything else and we'll go ahead and take that data let me just go ahead and run this just for another quick look at it we have our uh we'll do a distribution df we'll create another data frame from the temp values and then from a data set from the log temp so we can put them side by side and we'll just go ahead and do a quick histogram and this is kind of nice plot a figure figure size here's our plt from matplot library and then we'll just do a distribution underscore df there's our data frames this is nice because it just integrates the histogram right into pandas love pandas and this is a chart you would send back to your data analysis and say hey is this what you wanted this is how the data is converting on here as a data science scientist the first thing i note is we've gone from a 10 20 30 scale to 2.5 3.0 3.5 scale and the data itself has kind of been uh adjusted a little bit based on some kind of a skew on there so let's jump into we're getting a little closer to actually doing our cross on here we'll go ahead and split our data up and this of course is any good data scientists you want to have a training set and a test set uh and we'll go ahead and do the train size we're going to use 0.75 of the data make sure it's an integer we don't want to take a slice as a float value give you a nice error and we'll have our train size of 75 percent and the test size is going to be of course the train size minus the length of the data set and then we can simply do train comma test here's our data set which is going to be the train size the test size and then if we go and print this let me just go ahead and run this we can see how these values split it's a nice split of 1298 and then 433 points of value that are going to be for our setup on here and if you remember we're specifically looking at the data set where did we create that data set from um that was from up here that's what we called the logarithmic value of the temp that's where the data set came from so we're looking at just that column with this train size and the test with the train and test data set here and let's go ahead and do a converted an array of values into a data set matrix we're going to create a little um set up in here we create our data set our data set's going to come in we're going to do a look back of one so we're going to look back one piece of data going backward and we have our data x and our data y for i and range length of data set look back minus one uh this is creating let me just go ahead and run this actually the best way to do this is to go ahead and create this data and take a look at the shape of it let me go ahead and just put that code in here so we're going to do the lookback one here's our train x our train y and it's going to be adding the data on there and then when we come up here and we take a look at the shape there we go and we run this piece of code here we look at the shape on this and we have a new slightly different change on here but we have a shape of x 1296 comma 1 shape of y train y test x text y and so what we're looking at is that the x comes in and we're only having a single value at out we want to predict what the next one is that's what this little piece of code is here for what are we looking for well we want to look back one that's the um what we're going to train the data with is yesterday's data yesterday says hey the humidity was at 97 percent what should today's humidity be at if it's 97 yesterday is it going to go up or is it going to go down today if 97 does it go up to 100 what's going on there uh and so our we're looking forward to the next piece of data which says hey tomorrow's is going to be today's humidity is this this is what tomorrow's humidity is going to be that's all that is all that is is stacking our data so that our y is basically x plus 1 or x could be y minus 1. and then a couple things to note is our x data we're only dealing with the one column but you need to have it in a shape that has it by the columns so you have the two different numbers and since we're doing just a single point of data we have and you'll see with the train y we don't need to have the extra shape on here now this is going to run into a problem and the reason is is that we have what they call a time step and the time step is that long-term short-term memory layer so we're going to add another reshape on here let me just go down here and put it into the next cell and so we want to reshape the input array in the form of sample time step features we're only looking at one feature and i mean this is one of those things when you're playing with this you're like why am i getting an error in the numpy array why is this giving me something weird going on so we're going to do is we're going to add one more level on here instead of being 1299.1 we want to go one more and when they put the code together in the back you can see we kept the same shape the 1299 we added the one dimension and then we have our train x shape one and this could have depends again on how far back in the long short term memory you want to go that is what that piece of code is for and that reshape is and you can see the new shape is now one uh 1299 one one versus the 1299 one and then the other part of the shape 432 one one again this is our tr our x in and of course our test x and then our y is just a single column because we're just doing one output that we're looking for so now we've done our 80 percent um you know that's all the the writing all the code reformatting our data bringing it in now we want to go ahead and do the fun part which is we're going to go ahead and create and fit the lstm neural network and if we're going to do that the first thing we need is we're going to need to go ahead and create a model and we'll do this sequential model and if you remember sequential means it just goes in order that means we have if you have two layers the layers go from layer one to layer two or layer zero to layer one this is different than functional functional allows you to split the data and run two completely separate models and then bring them back together we're doing just sequential on here and then we decided to do the long short term memory and we have our input shape uh which it comes in again this is what all this switching was we could have easily made this one two three or four going back as far as the end number on there we just stuck to going back one and it's always a good idea when you get to this point where the heck is this model coming from um what kind of models do we have available and there's let me go and put the next model in there because we're going to do two models and the next model is going to go ahead and we're going to do dent so we have model equal sequential and then we're going to add the lstm model and then we're going to add a dense model and if you remember from the very top of our code where we did the imports oops there we go our cross this is it right here here's our importing a dense model and here's our importing an lstm now just about every tensorflow model uses dents your dense model is your basic forward propagation reverse propagation error or it does reverse propagation to program the model so any of your neural networks you've already looked at that luxon says here's the error and sends the error backwards that's what this is the long short term memory is a little different the real question that we want to look at right now is where do you find these models what kind of models do you have available and so for that let's go to the cross website which is the cross dot io if you go under api layers and i always have to do a search just search for karas api layers it'll open up and you can see we have your base layers right here class trainable weights all kinds of stuff like there your activation so a lot of your layers you can switch how it activates relu which is like your smaller arrays or if you're doing convolutional neural networks the convolution usually uses a relu your sigmoid all the way up to soft mac soft plus all these different choices as far as how those are set up and what we want to do is we want to go ahead and if you scroll down here you'll see your core layers and here is your dense layer so you have an input object your dense layer your activation layer embedding layer this is your your kind of your one set up on there that's most common uh convolutional neural networks or convolutional layers these are like for doing image categorizing so trying to find objects in a picture that kind of thing we have pooling layers so as you have the layers come together usually you bring them down into a single layer although you can still do like global max pulling 3d and there is just i mean this list just goes on and on uh there's all kinds of different things hidden in here as far as what you can do and it changes you know you go in here and you just have to do a search for what you're looking for and figure out what's going to work best for you as far as which project you're working on long short term memory is a big one because this is when we start talking about text what if someone says the what comes after the uh the cat and the hat a little kid's book there it starts programming it and so you really want to know not only what's going on but it's going to be something that has a history the history behind it tells you what the next one coming up is now once we've built all our different you know we built our model we've added our different layers we went in there play with it remember if you're in functional you can actually link these layers together and they branch out and come back together if you do a the sub setup then you can create your own different model you can embed a model in there that might be coming linear regression you can embed a linear regression model as part of your functional split and then have that come back together with other things so we're going to go ahead and compile your model this brings everything together we're going to put in what the loss is which we'll use the mean squared error and we'll go ahead and use the atom optimizer clearly there's a lot of choices on here depending on what you're doing and just like any of these different prediction models if you've been doing any uh scikit from python you'll recognize that we have to then fit the model uh so what are we doing in here we're going to send in our train x our train y um we're going to decide how many epics we're going to run it through 500 is probably a lot for this i'm guessing it'd probably be about two or three hundred probably do just fine our batch size so how many different when you process it this is the math behind it if you're in data analytics you might try to know what this number is as a data scientist where i haven't had the phd level math that says this is why you want to use this particular batch size you kind of play with this number a little bit you can dig deeper into the math see how it affects the results depending what you're doing and there's a number of other settings on here we did verbose 2. i'd have to actually look that up to tell you what verbose means i think that's actually the default on there if i remember correctly there's a lot of different settings when you go to fit it the big ones are your epic and your batch size those are what we're looking for and so we're going to go ahead and run this and this is going to take a few minutes to run because it's going through 500 times through all the data so if you have a huge data set this is the point where you're kind of wondering oh my gosh is this going to finish tomorrow if i'm running this on a single machine and i have a tera terabyte of data going into it if this is my personal computer and i'm running a terabyte of data into this um you know this is running rather quickly through all 500 iterations uh but yeah a terabyte of data we're talking something closer to days week you know even with a 3.5 gigahertz machine in in eight cores it's still going to take a long time to go through a full terabyte of data and then we want to start looking at putting it into some other framework like spark or something that will print the process on there more across multiple um processors and multiple computers and if we scroll all the way down to the bottom you're going to see here's our square mean error or 0.0088 if we scroll way up you'll see it kind of oscillates between 0.088 and 0.8089 it's right around 2 to 250 where you start seeing that oscillation where it's really not going anywhere so we really didn't need to go through a full 500 epics you know if you're retraining the stuff over and over again it's kind of good to know where that error zone is so you don't have to do all the extra processing of course if you're going to build a model uh we want to go ahead and run a prediction on it so let's go ahead and make our prediction remember we have our training test set and our test set or that we have the train x and the train y for training it or train predict and then we have our test x and our test y going in there uh so we can test to see how good it did uh when we come in here we have you'll see right here we go ahead and do our train predict equals uh model predict train x and test predict model predict test x why would we want to run the prediction on train x well it's not a hundred percent on its prediction we know it has a certain amount of error and we want to compare the error we have on what we programmed it with with the error we get when we run it on new data that's never seen the model's never seen before and one of the things we can do we go ahead and invert the predictions this helps us level it off a little bit more get rid of some of our bias we have train predict equals an np exponential m1 the train predict and then train y equals the exponential m1 for train y and then we do the again that with train test predict and test y um again reformatting the data so that we can it all matches and then we want to go ahead and calculate the root mean square error so we have our train score which is your math square root times the mean square root error train y and train predict and again we're just this is just feeding the data through so we can compare it and the same thing with the test and let's take a look at that because really the code makes sense if you're going through it line by line you can see we're doing but the answer really helps to zoom in so we have a train score which is 2.40 of our root mean square error and we have a test score of 3.16 of the root mean square error if these were reversed if our test score is better than our training score then we've over trained something's really wrong at that point you got to go back and figure out what you did wrong because you should never have a better result on your test data than you do when you're training data and that's why we put them both through that's why we look at the error for both the training and the testing when you're going out and quoting you're publishing this you're saying hey how good is my model it's the test score that you're showing people this is what it did on my test data that the model had never seen before this is how good my model is and a lot of times you actually want to put together like a little formal code where we actually want to print that out and if we print that out you can see down here test prediction and standard deviation of data set 3.16 is less than 4.40 i'd have to go back and we're up here if you remember we did the square means error this is standard deviation that's why these numbers are different it's saying the same thing that we just talked about uh 3.16 is less than 4.40 model is good enough we're saying hey this is this model is valid we have a valid model here so we can go ahead and go with that and along with putting a formal print out of there we want to go ahead and plot what's going on and this we just want to pretty graft here so that people can see what's going on when i walk into a meeting and i'm dealing with a number of people they really don't want these numbers they don't want to say hey what's i mean standard deviation unless you know what statistics are you might be dealing with a number of different departments head of cells might not work with standard deviation or have any idea what that really means number wise and so at this point we really want to put it in a graph so we have something to display and with displaying you remember that we're looking at the data today going into it and what's going to happen tomorrow so let's take a quick look at this we're going to go ahead and shift the train predictions for plotting we have our train predict plot np empty like data set train predict plot set it up with uh null values yeah you know it's just kind of it's kind of a weird thing where we're creating the um the data groups as we like them and then putting the data in there is what's going on here so we have our train predict plot which is going to be our look back our length plus look back we're just is going to equal train train predict so we're creating this basically we're taking this and we're dumping the train predict into it so now we have our nice train predict plot and then we have the shift test predictions for the plotting uh we're going to continue more of that oops it looks like i put it in here double no it's just uh yeah they put it in here double um didn't mean to do that we really only need to do it once oh here we go um this is where the problem was is because this is the test predict so we have our training prediction we're doing the shift on here and then the test predict we're going to look at that same thing we're just creating those two data sets uh test predict plot length prediction setup on there and then we're going to go through the plotting the original data set and the predictions so we have a time axis always nice to have your time set on there set that to the time array time axes lap all this is setting up the time variable for the bottom and then we have a lot of stuff going on here as far as setting up our figure let's go ahead and run that and then we'll break it down we have on here our main plot we have two different plots going on here the ispu going up and the data and the ispu here with all these different settings on it and so we look at this we have our ax1 that's the main plot i mean our ax that's the main plot and we have our ax1 which is the secondary plot over here so we're doing a figure plt or plt.figure and we're going to dump those two graphs on there and so we take and if you go through the code piece by piece which we're not going to do we're going to do the the data set here exponential reverse exponential so it looks correctly we're going to label it the original data set we're going to plot the train predict plot that's what we just created we're going to make that orange and we'll label it train prediction test predicts plot we're going to make that red and label it test prediction and so forth set our ticks up this actually just put ticks time axes gets its ticks the little little marks there going along the axes that kind of thing and let's take a look and see what these graphs look like and these are just kind of fun you know when you show up into a meeting and this is the final output you say hey this is what we're looking at here's our original data in blue here's our training prediction you can see that it trains pretty close to what the data is up there i would also probably put a like a little little time stamp and do just right before and right after where we go from train to test prediction and you can see with the test prediction the data comes in in red and then you can also see what the original data set look like behind it and how it differs and then we can just isolate that here on the right that's all this is is just the test prediction on the right uh and it's you know there's you'll see with the original data set there's a lot of peaks were missing and a lot of lows were missing but as far as the actual test prediction it's pretty does pretty good it's pretty right on you can get a good idea what to expect for your ispu and so from this we would probably publish it and say hey this is what you expect and this is our area of this is a range of error that's the kind of thing i put out on a daily basis maybe we predict the cells are going to be this or maybe weekly so you kind of get a nice kind of flatten the data coming out and you say hey this is what we're looking at the big takeaway from this is that we're working with let me go back up here oops oh too far there we go is this model here this is what this is all about we worked through all of those pieces all the tensorflows and that is to build this sequential model and we're only putting in the two layers this can get pretty complicated if you get too complicated it never it never verges into a usable model so if you have like 30 different layers in here there's a good chance you might crash it kind of thing so don't go too haywire on that and that you kind of learn as you go again it's domain knowledge um and also starting to understand all these different layers and what they mean the data analytics behind those layers is something that your data analysis professional will come in and say this is what we want to try but i tell you as a data scientist a lot of these basic setups are common and i don't know how many times uh working with somebody and they're like oh my gosh if i only did a tangent h instead of a relu activation i worked for two weeks to figure that out well the data science i can run it through the model in you know five minutes instead of spending two weeks doing the the math behind it so that's one of the advantages of data scientists is we do it from programming side and a data analytics is going to look for it how does it work in math and this is really the core right here of tensorflow and cross is being able to build your data model quickly and efficiently and of course with any data science putting out a pretty graph so that your shareholders again we want to take and reduce the information down to something people can look at and say oh that's what's going on they can see stuff what's going on as far as the dates and the change in the ispu in this video i will walk you through the tensorflow code to perform object detection in a video so let's get started this part is basically we are importing all the libraries we need a lot of these libraries for example numpy we need image i o date time and pill and so on and so forth and of course matplotlib so we import all these libraries and then there are a bunch of variables which have some parts for the files and folders so this is regular stuff let's keep moving then we import the matplotlib and make it inline and a few more imports all right and then these are some warnings we can just ignore them so if i run this code once again it will go away right and then here onwards we do the model preparation what we are going to do is we are going to use an existing neural network model so we are not going to chain a new one because that really will take a long time and it needs a lot of computation resources and so on and it is really not required there are already models that have been trained and in this case it is the ssd with mobilenet that's the model that we are going to use and this model is trained to detect objects and it is readily available as open source so we can actually use this and if you want to use other models there are a few more models available so you can click on this link here and let me just take you there there are few more models but we have chosen this particular one because this is faster it may not be very accurate but that is one of the faster models but on this link you will see a lot of other models that are readily available these are trained models some of them would take a little longer but they may be more accurate and so on so you can probably play around with these other models okay so we will be using that model so this piece of code this line is basically importing that model and this is also known as uh frozen model the term we use is frozen model so we import download and import that and then we will actually use that model in our code all right so these two cells we have downloaded and import the model and then once it is available locally we will then load this into our program all right so we are loading this into memory and you need to perform a couple of additional steps which is basically we need to to map the numbers to text as you may be aware when we actually build the model and when we run predictions the model will not give a text the output of the model is usually a number so we need to map that to a text so for example if the network predicts that the output is 5 we know that 5 means it is an airplane things like that so this mapping is done in this next cell all right so let's keep moving and then we have a helper code which will basically load the data or load the images and transform into numpy arrays this is also used in doing object detection in images so we are actually going to reuse because video is nothing but it consists of frames which in turn are images so we are going to pretty much use reuse the same code which we use for doing object detection in images so this is where the actual detection starts so here this is the path for where the images are stored so this is here once again we are reusing the code which we wrote for detecting objects in an image so this is the path where the images were stored and this is the extension and this was done for about two or three images so we will continue to use this and we go down i'll skip this section so this is the cell where we are actually loading the video and converting it into frames and then using frame by frame we are detecting the objects in the image so in this code what we are doing basically is there are a few lines of code what they do is basically once they find an object a box will be drawn around those each of those objects and the input file the name of the input video file is traffic it is the extension is mp4 and we have this video reader it's a excellent object which is basically part of this class called image i o so we can read and write videos using that and the video that we are going to use is traffic.mp4 you can use any mp4 file but in our case i picked up video which has like car so let me just show you so this is in this object detection folder i have this mp4 file i'll just quickly play this video it's a little slow yeah okay so here we go this is the video it's a short one relatively small video so that for this particular demo and what it will do is once we run our code it will detect each of these cars and it will annotate them as cars so in this particular video we only have cars we can later on see with another video i think i have cat here so we can also try with that but let's first check with this traffic video so let me go back so we will be reading this frame by frame and you know actually we will be reading the video file but then we will be analyzing it frame by frame and we will be reading them at 10 frames per second that is the rate we are mentioning here and analyzing it and then annotating and then writing it back so you will see that we will have a video file named something like this traffic underscore annotated and we will see the annotated video so let's go back and run through this piece of code and then we will come back and see the annotated video this might take a little while so i will pause the video after running this particular cell and then come back to show you the results all right so let's go ahead and run it so it is running now and it is also important that at the end you close the video writer so that it is similar to a file pointer when you open a file you should also make sure you close it so that it doesn't hog the resources so it's very similar at the end of it the last piece or last line of code should be video underscore writer dot close all right so i'll pause and then i'll come back okay so i will see you in a little bit all right so now as you can see here the processing is done the hourglass has disappeared that means the video has been processed so let's go back and check the annotated video we'll go back to my file manager so this was the original traffic.mp4 and now you have here traffic underscore annotated mp4 so let's go and run this and see how it looks you see here just got each of these cars are getting detected let me pause and show you so we pause here it says car 70 let us allow it to go a little further it detects something on top what is that truck okay so i think because of the board on top it somehow thinks there is a truck let's play some more and see if it detects anything else so this is again a car looks like so let us yeah so this is a car and it has confidence level of 69 percent okay this is again a car all right so basically till the end it goes and detects each and every car that is passing by now we can quickly repeat this process for another video let me just show you the other video which is a cat again there is uh this cat is not really moving or anything but it is just standing there staring and moving a little slowly and our application will our network will detect that this is a cat and uh even when the cat moves a little bit in the other direction it will continue to detect and show that it is a cat okay so yeah so this is how the original video is let's go ahead and change our code to analyze this one and see if it detects our network detects this cat close this here we go and i'll go back to my code all we need to do is change this traffic to cat the extension it will automatically pick up because it is given here and then it will run through so very quickly once again what it is doing is this video reader video underscore reader has a neat little feature or interface whereby you can say for frame in video underscore reader so it will basically provide frame by frame so you're in a loop frame by frame and then you take that each frame that is given to you you take it and analyze it as if it is an image individual image so that's the way it works so it is very easy to handle this all right so now let's once again run just this cell the rest of the stuff remains the same so i will run this cell again it will take a little while so the r classes come back i will pause and then come back in a little while all right so the processing is done let's go and check the annotated video go here so we have get annotated dot mp4 let's play this all right so you can see here just detecting the cat and in the beginning you also saw it detected something else here there looks like it detected one more object so let's just go back and see what it has detected here let's see yes so what is it trying to show here it's too small not able to see but it is trying to detect something i think it is saying is a car i don't know all right okay so in this video there's only pretty much only one object which is the cat and uh let's wait for some time and see if it continues to detect it when the cat turns around and moves as well just in a little bit that's going to happen and we will see there we go and in spite of turning the other way i think our network is able to detect that it is a cat so let me freeze and then show here it is actually still continuous to detect it as a cat all right so that's pretty much it i think that's the only object that it detects in this particular video okay so close this so that's pretty much it in this video we will learn about an important popular deep learning neural network called generative adversarial networks yan lacoon one of the pioneers in the field of machine learning and deep learning described it as the most interesting idea in the last 10 years in machine learning in this video you will learn about what are generative adversarial networks and look in brief at generator and discriminator then we'll understand how gans work and the different types of gans finally we'll look at some of the applications of gans so let's begin so what are generative adversarial networks generative adversarial networks or gans introduced in 2014 by ian j good fellow and co-authors became very popular in the field of machine learning gan is an unsupervised learning task in machine learning it consists of two models that automatically discover and learn the patterns in input data the two models called generator and discriminator compete with each other to analyze capture and copy the variations within a data set gans can be used to generate new examples that possibly could have been drawn from the original data set in the image below you can see that there is a database that has real hundred rupee notes the generator which is basically a neural network generates fake hundred rupees notes the discriminator network will identify if the notes are real or fake let us now understand in brief about what is a generator a generator in gans is a neural network that creates fake data to be trained on the discriminator it learns to generate plausible data the generated instances become negative training examples for the discriminator it takes a fixed length random vector carrying noise as input and generates a sample now the main aim of the generator is to make the discriminator classify its output as real the portion of the gan that trains a generator includes a noisy input vector the generator network which transforms the random input into a data instance a discriminator network which classifies the generated data and a generator loss which penalizes the generator for failing to dalt the discriminator the back propagation method is used to adjust each weight in the right direction by calculating the weight's impact on the output the back propagation method is used to obtain gradients and these gradients can help change the generator weights now let us understand in brief what a discriminator is a discriminator is a neural network model that identifies real data from the fake data generated by the generator the discriminator's training data comes from two sources the real data instances such as real pictures of birds humans currency notes etc are used by the discriminator as positive samples during the training the fake data instances created by the generator are used as negative examples during the training process while training the discriminator it connects with two loss functions during discriminator training the discriminator ignores the generator loss and just uses the discriminator laws in the process of training the discriminator the discriminator classifies both real data and fake data from the generator the discriminator loss penalizes the discriminator for misclassifying a real data instance as fake or a fake data instance as real now moving ahead let's understand how gans work now gans consists of two networks a generator which is represented as g of x and a discriminator which is represented as d of x they both play an adversarial game where the generator tries to fool the discriminator by generating data similar to those in the training set the discriminator tries not to be fooled by identifying fake data from the real data they both work simultaneously to learn and train complex data like audio video or image files now you are aware that gans consists of two networks at generator g of x and discriminator d of x now the generator network takes a sample and generates a fake sample of data the generator is trained to increase the probability of the discriminator network to make mistakes on the other hand the discriminator network decides whether the data is generated or taken from the real sample using a binary classification problem with the help of a sigmoid function that gives the output in the range 0 and 1. here is an example of a generative adversarial network trying to identify if the 100 rupee notes are real or fake so first a noise vector or the input vector is fed to the generator network the generator creates fake hundred rupee nodes the real images of hundred rupee nodes stored in a database are passed to the discriminator along with the fake nodes the discriminator then identifies the notes and classifies them as real or fake we train the model calculate the loss function at the end of the discriminator network and back propagate the loss into both discriminator and generator now the mathematical equation of training again can be represented as you can see here now this is the equation and these are the parameters here g represents generator d represents the discriminator now p data of x is the probability distribution of real data p of z is the distribution of generator x is the sample of probability data of x z is the sample size from p of z d of x is the discriminator network and g of said is the generator network now the discriminator focuses to maximize the objective function such that d of x is close to 1 and z of z is close to 0. it simply means that the discriminator should identify all the images from the training set as real that is one and all the generated images as fake that is zero the generator wants to minimize the objective function such that d of z of z is one this means that the generator tries to generate images that are classified as real that is one by the discriminator network next let's see the steps for training a neural network so we have to first define the problem and collect the data then we'll choose the architecture of gan now depending on your problem choose how your gans should look like then we need to train the discriminator in real data now that will help us predict them as real for n number of times next you need to generate fake inputs for the generator after that you need to train the discriminator on fake data to predict the generator data as fake finally train the generator on the output of discriminator with the discriminator predictions available train the generator to fool the discriminator let us now look at the different types of gans so first we have vanilla gans now vanilla gans have min max optimization formula that we saw earlier where the discriminator is a binary classifier and is using sigmoid cross entropy loss during optimization in vanilla gans the generator and the discriminator are simple multi-layer perceptrons the algorithm tries to optimize the mathematical equation using stochastic gradient descent up next we have deep convolutional gans or dc gans now dc gans support convolutional neural networks instead of vanilla neural networks at both discriminator and generator they are more stable and generate higher quality images the generator is a set of convolutional layers with fractional strided convolutions or transpose convolutions so it unsamples the input image at every convolutional layer the discriminator is a set of convolutional layers with strided convolution so it down samples the input image at every convolutional layer moving ahead the third type you have is conditional gans or c vanilla gans can be extended into conditional models by using an extra label information to generate better results in c gan an additional parameter called y is added to the generator for generating the corresponding data labels are fed as input to the discriminator to help distinguish the real data from fake data generated finally we have super resolution gans now sr gans use deep neural networks along with adversarial neural network to produce higher resolution images super resolution gans generate a photorealistic high resolution image when given a low resolution image let's look at some of the important applications of gans so with the help of dc gans you can train images of cartoon characters for generating faces of anime characters and pokemon characters as well next gans can be used on the images of humans to generate realistic faces the faces that you see on your screens have been generated using gans and do not exist in reality third application we have is gans can be used to build realistic images from textual descriptions of objects like birds humans and other animals we input a sentence and generate multiple images fitting the description here is an example of a text-to-image translation using gans for a bird with a black head yellow body and a sort beak the final application we have is creating 3d objects so gans can generate 3d models using 2d pictures of objects from multiple perspectives gans are very popular in the gaming industry gans can help automate the task of creating 3d characters and backgrounds to give them a realistic feel so we're going to dive right into what is cross we'll also go all the way through this into a couple of tutorials because that's where you really learn a lot is when you roll up your sleeves so we talk about what is cross cross is a high level deep learning api written in python for easy implement implementation of neural networks it uses deep learning frameworks such as tensorflow pytorch etc is back in to make computation faster and this is really nice because as a programmer there is so much stuff out there and it's evolving so fast it can get confusing and having some kind of high level order in there we can actually view it and easily program these different neural networks is really powerful it's really powerful to have something out really quick and also be able to start testing your models and seeing where you're going so crossworks by using complex deep learning frameworks such as tensorflow pytorch ml played etc as a back end for fast computation while providing a user friendly and easy to learn front end and you can see here we have the cross api uh specifications and under that you'd have like tf cross for tensorflow thano cross and so on and then you have your tensorflow workflow that this is all sitting on top of and this is like i said it organizes everything the heavy lifting is still done by tensorflow or whatever you know underlying package you put in there and this is really nice because you don't have to um dig as deeply into the heavy end stuff while still having a very robust package you can get up and running rather quickly and it doesn't distract from the processing time because all the heavy lifting is done by packages like tensorflow this is the organization on top of it so the working principle of cross the working principle of corrosive cross uses computational graphs to express and evaluate mathematical expressions you can see here we put them in blue they have the expression expressing complex problems as a combination of simple mathematical operators where we have like the percentage or in this case in python that's usually your left your remainder or multiplication you might have the operator of x to the power of 0.3 and it uses useful for calculating derivatives by using back propagation so if we're doing with neural networks we send the error back up to figure out how to change it this makes it really easy to do that without really having not banging your head and having to hand write everything it's easier to implement distributed computation and for solving complex problems specify input and outputs and make sure all nodes are connected and so this is really nice as you come in through is that as your layers are going in there you can get some very complicated different setups nowadays which we'll look at in just a second and this just makes it really easy to start spinning this stuff up and trying out the different models so we look across models across model we have a sequential model sequential model is a linear stack of layers where the previous layer leads into the next layer and this if you've done anything else even like the sk learn with their neural networks and propagation and any of these setups this should look familiar you should have your input layer it goes into your layer 1 layer 2 and then to the output layer and it's useful for simple classifier decoder models and you can see down here we have the model equals across sequential this is the actual code you can see how easy it is we have a layer that's dense your layer one has an activation they're using the relu in this particular example and then you have your name layer one layer dense relu name layer two and so forth and they just feed right into each other so it's really easy just to stack them as you can see here and it automatically takes care of everything else for you and then there's a functional model and this is really where things are at this is new make sure you update your cross or you'll find yourself running this doing the functional model you'll run into an error code because this is a fairly new release and he uses multi-input and multi-output model the complex model which forks into two or more branches and you can see here we have our image inputs equals your cross input shape equals 32 by 32 by 3. you have your dense layers dense 64 activation relu this should look similar to what you already saw before but if you look at the graph on the right it's going to be a lot easier to see what's going on you have two different inputs and one way you could think of this is maybe one of those is a small image and one of those is a full sized and that feedback goes into you might feed both of them into one note because it's looking for one thing and then only into one note for the other one and so you can start to get kind of an idea that there's a lot of use for this kind of split and this kind of setup we have multiple information coming in but the information is very different even though it overlaps and you don't want to send it through the same neural network and they're finding that this trains faster and is also has a better result depending on how you split the date up and how you fork the models coming down and so here we do have the two complex models coming in we have our image inputs which is a 32 by 32 by three or three channels or four if you're having an alpha channel you have your dens your layers dense is 64 activation using the relu very common x equals dense inputs x layers dense x64 activation equals relu x outputs equals layers dense 10 x model equals cross model inputs equals inputs outputs equals outputs name equals minced model so we add a little name on there and again this is this kind of split here this is setting us up to have the input go into different areas so if you're already looking at cross you probably already have this answer what are neural networks uh but it's always good to get on the same page and for those people who don't fully understand neural networks to dive into them a little bit or do a quick overview neural networks are deep learning algorithms modeled after the human brain they use multiple neurons which are mathematical operations to break down and solve complex mathical problems and so just like the neuron one neuron fires in and it fires out to all these other neurons or nodes as we call them and eventually they all come down to your output layer and you can see here we have the really standard graph input layer a hidden layer and an output layer one of the biggest parts of any data processing is your data pre-processing so we always have to touch base on that with a neural network like many of these models they're kind of uh when you first start using them they're like a black box you put your data in you train it and you test it and see how good it was and you have to pre-process that data because bad data in is bad outputs so in data pre-processing we will create our own data examples set with cross the data consists of a clinical trial conducted on 2100 patients ranging from ages 13 to 100 with a the patients under 65 and the other half over 65 years of age we want to find the possibility of a patient experiencing side effects due to their age and you can think of this in today's world with covid uh what's going to happen on there and we're going to go ahead and do an example of that in our life hands on like i said most of this you really need to have hands-on to understand so let's go ahead and bring up our anaconda and open that up and open up a jupiter notebook for doing the python code in now if you're not familiar with those you can use pretty much any of your setups i just like those for doing demos and showing people especially shareholders it really helps because it's a nice visual so let me go and flip over to our anaconda and the anaconda has a lot of cool two tools they just added data lore and ibm watson studio cloud into the anaconda framework but we'll be in the jupiter lab or jupiter notebook i'm going to do a jupyter notebook for this because i use the lab for like large projects with multiple pieces because multiple tabs where the notebook will work fine for what we're doing and this opens up in our browser window because that's how jupiter helps our jupiter notebook is set to run and we'll go under new create a new python 3 and it creates an untitled python we'll go ahead and give this a title and we'll just call this a cross tutorial and let's change that to capital there we go i'm going to just rename that and the first thing we want to go ahead and do is uh get some pre-processing tools involved and so we need to go ahead and import some stuff for that like our numpy do some random number generation i mentioned sklearn or your site kit if you're installing sk learn the sk learn stuff it's a scikit you want to look up that should be a tool of anybody who is um doing data science if you're not if you're not familiar with the sklearn toolkit it's huge but there's so many things in there that we always go back to and we want to go ahead and create some train labels and train samples for training our data and then just a note of what we're we're actually doing in here let me go ahead and change this this is kind of a fun thing you can do we can change the code to markdown and then markdown code is nice for doing examples once you've already built this uh our example data we're going to do experimental there we go experimental drug was tested on 2 100 individuals between 13 to 100 years of age half the participants are under 65 and 95 of participants are under 65 experience no side effects well 95 participants over 65 perc experience side effects so that's kind of where we're starting at um and this is just a real quick example because we're going to do another one with a little bit more uh complicated information and so we want to go ahead and generate our setup so we want to do for i and range and we want to go ahead and create if you look here we have random integers train the labels append so we're just creating some random data let me go ahead and just run that and so once we've created our random data and if you if i mean you can certainly ask for a copy of the code from simply learn they'll send you a copy of this or you can zoom in on the video and see how we went ahead and did our train samples a pen and we're just using this i do this kind of stuff all the time i was running a thing on that had to do with errors following a bell-shaped curve on a standard distribution error and so what do i do i generate the data on a standard distribution error to see what it looks like and how my code processes it since that was the baseline i was looking for in this we're just doing uh generating random data for our setup on here and we could actually go in print some of the data up let's just do this print we'll do train samples and we'll just do the first five pieces of data in there to see what that looks like and you can see the first five pieces of data in our train samples is 49 85 41 68 19 just random numbers generated in there that's all that is and we generated significantly more than that um let's see 50 up here 1 000 yeah so there's 1 000 here 1000 numbers we generated and we could also if we wanted to find that out we could do a quick print the length of it and so or you could do a shape kind of thing and if you're using numpy although the link for this is just fine and there we go it's actually 2100 like we said in the demo setup in there and then we want to go ahead and take our labels oh that was our train labels we also did samples didn't we so we could also print do the same thing labels and let's change this to labels and labels and run that just to double check and sure enough we have 2100 and they're labeled one zero one zero one zero i guess that's if they have symptoms or not one symptoms zero none and so we wanna go ahead and take our train labels and we'll convert it into a numpy array and the same thing with our samples and let's go ahead and run that and we also shuffle this is just a neat feature you can do in numpy right here put my drawing thing on which i didn't have on earlier i can take the data and i can shuffle it uh so we have our so it's it just randomizes it that's all that's doing we've already randomized it so it's kind of an overkill it's not really necessary but if you're doing a larger package where the data is coming in and a lot of times it's organized somehow and you want to randomize it just to make sure that that you know the input doesn't follow a certain pattern that might create a bias in your model and we go ahead and create a scalar the scalar range minimum max scalar feature range zero to one then we go ahead and scale the scaled train samples so we're going to go ahead and fit and transform the data so it's nice and scaled and that is the age so you can see up here we have 49 85 41. we're just moving that so it's going to be between 0 and 1. and so this is true with any of your neural networks you really want to convert the data uh to zero and one otherwise you create a bias so if you have like a hundred creates a bias versus the math behind it gets really complicated um if you actually start multiplying stuff there's a lot of multiplication addition going on in there that higher end value will eventually multiply down and it will have a huge bias as to how the model fits it and then it will not fit as well and then one of the fun things we can do in jupiter notebook is that if you have a variable you're not doing anything with it it's the last one on the line it will automatically print and we're just going to look at the first five samples on here it's just going to print the first five samples and you can see here we go 0.9195.791 so everything's between 0 and and that just shows us that we scaled it properly and it looks good it really helps a lot to do these kind of print ups halfway through you never know what's going to go on there i don't know how many times i've gotten down and found out that the data is sent to me that i thought was scaled was not and then i have to go back and track it down and figure it out on there uh so let's go ahead and create our artificial neural network and for doing that this is where we start diving into tensorflow and cross tensorflow if you don't know the history of tensorflow it helps to jump into we'll just use wikipedia be careful don't quote wikipedia on these things because you get in trouble but it's a good place to start uh back in 2011 google brain built disbelief as a proprietary machine learning setup tensorflow became the open source for it so tensorflow was a google product and then it became uh open sourced and now it's just become probably one of the defactos when it comes for neural networks as far as where we're at so when you see the tensorflow setup it's got like a huge following there are some other setups like the scikit under the sk learn has their own little neural network but the tensorflow is the most robust one out there right now and cross sitting on top of it makes it a very powerful tool so we can leverage both the cross easiness in which we can build a sequential setup on top of tensorflow and so in here we're going to go ahead and do our input of tensorflow uh and then we have the rest of this is all cross here from number two down uh we're going to import from tensorflow the cross connection and then you have your tensorflow cross models import sequential it's a specific kind of model we'll look at that in just a second if you remember from the files that means it goes from one layer to the next layer to the next layer there's no funky splits or anything like that uh and then we have from tensorflow cross layers we're going to import our activation and our dense layer and we have our optimizer atom um this is a big thing to be aware of how you optimize uh your data when you first do it atoms as good as any atom is usually there's a number of optimizer out there there's about uh there's a couple of main ones but atom is usually assigned to bigger data it works fine usually the lower data does it just fine but atom is probably the mostly used but there are some more out there and depending on what you're doing with your layers your different layers might have different activations on them and then finally down here you'll see our setup where we want to go ahead and use the metrics and we're going to use the tensorflow cross metrics for categorical cross entropy so we can see how everything performs when we're done that's all that is a lot of times you'll see us go back and forth between tensorflow and then scikit has a lot of really good metrics also for measuring these things again it's the end of the you know at the end of the story how good does your model do and we'll go ahead and load all that and then comes the fun part um i actually like to spend hours messing with these things and four lines of code you're like oh you're gonna spend hours on four lines of code um no we don't spend hours on four lines of code that's not what we're talking about when i say spend hours on four lines of code uh what we have here i'm gonna explain that in just a second we have a model and it's a sequential model if you remember correctly we mentioned the sequential up here where it goes from one layer to the next and our first layer is going to be your input it's going to be what they call dense which is uh usually it's just dense and then you have your input and your activation how many units are coming in we have 16 what's the shape what's the activation and this is where it gets interesting because we have in here relu on two of these and softmax activation on one of these there are so many different options for what these mean and how they function how does the relu how does the soft max function and they do a lot of different things um we're not going to go into the activations in here that is what really you spend hours doing is looking at these different activations and just some of it is just almost like you're playing with it like an artist you start getting a fill for like a inverse tangent activation or the 10 h activation takes up a huge processing amount so you don't see it a lot yet it comes up with a better solution especially when you're doing uh when you're analyzing word documents and you're tokenizing the words and so you'll see this shift from one to the other because you're both trying to build a better model and if you're working on a huge data set you'll crash the system it'll just take too long to process and then you see things like soft max softmax generates an interesting setup where a lot of these when you talk about rayleigh it lets me do this rayleigh there we go raelu has a setup where if it's less than zero it's zero and then it goes up um and then you might have what they call lazy uh setup where it has a slight negative to it so that the errors can translate better same thing with softmax it has a slight laziness to it so that errors translate better all these little details make a huge difference on your model so one of the really cool things about data science that i like is you build your what they call you build to fail and it's an interesting design setup oops i forgot the end of my code here the concept to build a file is you want the model as a whole to work so you can test your model out so that you can do you can get to the end and you can do your let's see where was it over shot down here you can test your test out the quality of your setup on there and see where did i do my tensorflow oh here we go it was right above me there we go we start doing your cross entropy and stuff like that is you need a full functional set of code so that when you run it you can then test your model out and say hey it's either this model works better than this model and this is why and then you can start swapping in these models and so when i say spend a huge amount of time on pre-processing data is probably 80 of your programming time well between those two it's like 80 20 you'll spend a lot of time on the models once you get the model down once you get the whole code and the flow down set depending on your data your models get more and more robust as you start experimenting with different inputs different data streams and all kinds of things and we can do a simple model summary here here's our sequential here's a layer output a parameter this is one of the nice things about cross is you just you can see right here here's our sequential one model boom boom boom boom everything's set and clear and easy to read so once we have our model built the next thing we're going to want to do is we're going to go ahead and train that model and so the next step is of course model training and when we come in here this a lot of times it's just paired with the model because it's so straightforward it's nice to print out the model setup so you can have a tracking but here's our model the key word in cross is compile optimizer atom learning rate another term right there that we're just skipping right over that really becomes the meat of [Music] the setup is your learning rate uh so whoops i forgot that i had an arrow but i'll just underline it a lot of times the learning rate is set to 0.00 set to 0.01 depending on what you're doing this learning rate can overfit and underfit so you'd want to look up i know we have a number of tutorials out on overfitting and under fitting that are really worth reading once you get to that point in understanding and we have our loss categorical cross entropy so this is going to tell keros how far to go until it stops and then we're looking for metrics of accuracy so we'll go ahead and run that and now that we've compiled our model we want to go ahead and run it fit it so here's our model fit we have our scaled train samples our train labels our validation split um in this case we're going to use 10 percent of the data for validation uh batch size another number you kind of play with not a huge difference as far as how it works but it does affect how long it takes to run and it can also affect the bias a little bit uh most of the time though a batch size is between 10 to 100 depending on just how much data you're processing in there we want to go ahead and shuffle it we're going to go through 30 epics and put a verb rose of two let me just go ahead and run this and you can see right here here's our epic here's our training here's our loss now if you remember correctly up here we set the loss let's see where was it compiled our data there we go loss uh so it's looking at the sparse categorical cross entropy this tells us that as it goes how how how much how how much does the error go down is the best way to look at that and you can see here the lower the number the better it just keeps going down and vice versa accuracy we want let's see where's my accuracy value accuracy at the end and you can see 0.619 0.69 0.74 it's going up we want the accuracy would be ideal if it made it all the way to 1 but we also the loss is more important because it's a balance you can have 100 accuracy and your model doesn't work because it's over fitted again you won't look up overfitting and under fitting models and we went ahead and went through 30 epics it's always fun to kind of watch your code going to be honest i usually [Music] the first time i run it i'm like oh that's cool i get to see what it does and after the second time of running it i'm like i'd like to just not see that and you can repress those of course in your code repress the warnings in the printing and so the next step is going to be building a test set and predicting it now so here we go we want to go ahead and build our test set and we have just like we did our training set a lot of times you just split your your initial set up but we'll go ahead and do a separate set on here and this is just what we did above there's no difference as far as the randomness that we're using to build this set on here the only difference is that we already did our scalar up here well it doesn't matter because the the data is going to be across the same thing but this should just be just transformed down here instead of fit transform because you don't want to refit your data on your testing data there we go now we're just transforming it because you never want to transform the test data easy to mistake to make especially on an example like this where we're not doing you know we're randomizing the data anyway so it doesn't matter too much because we're not expecting something weird and then we go ahead and do our predictions the whole reason we built the model as we take our model we predict and we're going to do here's our x scale data batch size 10 verbose and now we have our predictions in here and we could go ahead and do a oh we'll print predictions and then i guess i could just put down predictions in five so we can look at the first five of the predictions and what we have here is we have our age and uh the prediction on this age versus what is what we think it's going to be but what we think is going to they are going to have symptoms or not and the first thing we notice is that's hard to read because we really want a yes no answer so we'll go ahead and just round off the predictions using the arg max the numpy arg max for prediction so it just goes to 001 and if you remember this is a jupiter notebook so i don't have to put the print i can just put in rounded predictions and we'll just do the first five and you can see here zero one zero zero zero so that's what the predictions are that we have coming out of this is no symptoms symptoms no symptoms symptoms no symptoms and just as we were talking about at the beginning we want to go ahead and take a look at this there we go confusion matrixes for accuracy check most important part when you get down to the end of the story how accurate is your model before you go and play with the model and see if you can get a better accuracy out of it and for this we'll go ahead and use the scikit the sk learn metrics site kit being where that comes from import confusion matrix some iteration tools and of course a nice matte plot library that makes a big difference so it's always nice to have a nice graph to look at a picture is worth a thousand words and then we'll go ahead and call it cm for confusion matrix why true equals test labels why predict rounded predictions and we'll go ahead and load in our cm and i'm not going to spend too much time on the plotting going over the different plotting code you can spend like whole we have whole tutorials on how to do your different plotting on there but we do have here is we're going to do a plot confusion matrix there's our cm our classes normalize false title confusion matrix c map is going to be in blues and you can see here we have uh to the nearest c map titles all the different pieces whether you put tick marks or not the marks the classes the color bar so a lot of different information on here as far as how we're doing the printing of the of the confusion matrix you can also just dump the confusion matrix into a seaborn and real quick get an output it's worth knowing how to do all this when you're doing a presentation to the shareholders you don't want to do this on the fly you want to take the time to make it look really nice like our guys in the back did and let's go ahead and do this forgot to put together our cm plot labels we'll go ahead and run that and then we'll go ahead and call the little the definition for our mapping and you can see here plot confusion matrix that's our the little script we just wrote and we're going to dump our data into it so our confusion matrix our classes title confusion matrix and let's just go ahead and run that and you can see here we have our basic setup no side effects 195 had side effects 200 no side effects that had side effects so we predicted the 10 of them who actually had side effects and that's pretty good i mean i i don't know about you but you know that's five percent error on this and this is because there's 200 here that's where i get five percent is uh divide these both by by two and you get five out of a hundred uh you can do the same kind of math up here not as quick on the fly because it's 15 and 195 not an easily rounded number but you can see here where they have 15 people who predicted to have no with the no side effects but had side effects kind of set up on there and these confusion matrix are so important at the end of the day this is really where where you show whatever you're working on comes up and you can actually show them hey this is how good we are or not how messed up it is so this was a i spent a lot of time on some of the parts uh but you can see here is really simple uh we did the random generation of data but when we actually built the model coming up here uh here's our model summary and we just have the layers on here that we built with our model on this and then we went ahead and trained it and ran the prediction now we get a lot more complicated let me flip back on over here because we're going to do another uh demo so that was our basic introduction to it we talked about the uh oops there we go okay so implementing a neural network with keras after creating our samples and labels we need to create our cross neural network model we will be working with a sequential model which has three layers and this is what we did we had our input layer our hidden layers and our output layers and you can see the input layer coming in was the age factor we had our hidden layer and then we had the output are you going to have symptoms or not so we're going to go ahead and go with something a little bit more complicated training our model is a two-step process we first compile our model and then we train it in our training data set so we have compiling compiling converts the code into a form of understandable by machine we used the atom in the last example a gradient descent algorithm to optimize a model and then we trained our model which means it let it learn on training data and i actually had a little backwards there but this is what we just did is we if you remember from our code we just had let me go back here here's our model that we created summarized we come down here and we compile it so it tells it hey we're ready to build this model and use it and then we train it this is the part where we go ahead and fit our model and and put that information in here and it goes through the training on there and of course we scaled the data which was really important to do and then you saw we did the creating a confusion matrix with keras as we are performing classifications on our data we need a confusion matrix to check the results a confusion matrix breaks down the various misclassifications as well as correct classifications to get the accuracy and so you can see here this is what we did with the true positive false positive true negative false negative and that is what we went over let me just scroll down here on the end we printed it out and you see we have a nice printout of our confusion matrix with the true positive false positive false negative true negative so the blue ones we want those to be the biggest numbers because those are the better side and then we have our false predictions on here as far as this one so i had no side effects but we predicted let's see no side effects predicting side effects and vice versa if getting your learning started is half the battle what if you could do that for free visit skill up by simply learn click on the link in the description to know more now uh saving and loading models with carrass we're going to dive into a more complicated demo and you're going to say oh that was a lot of complication before well if you broke it down we randomized some data we created the um cross setup we compiled it we trained it we predicted and we ran our matrix uh so we're going to dive into something a lot a little bit more fun is we're going to do a face mask detection with keras so we're going to build a cross model to check if a person is wearing a mask or not in real time and this might be important if you're at the front of a store this is something today which is might be very useful as far as some of our you know making sure people are safe and so we're going to look at mask and no mask and let's start with a little bit on the data and so in my data i have with a mask you can see they just have a number of images showing the people in masks and again if you want some of this information contact simply learn and they can send you some of the information as far as people with and without masks so you can try it on your own and this is just such a wonderful example of this setup on here so before i dive into the mass detection talk about being in the current with covid and seeing if people are wearing masks this particular example i had to go ahead and update to a python 3.8 version uh it might run in a three seven i'm not sure i haven't i kind of skipped three seven and installed three eight uh so i'll be running in a three python three eight and then you also wanna make sure your tensorflow is up to date because the they call functional layers with that's where they split if you remember correctly from back let's take a look at this remember from here the functional model and a functional layer allows us to feed in the different layers into different you know different nodes into different layers and split them a very powerful tool very popular right now in the edge of where things are with neural networks and creating a better model so i've upgraded to python 3.8 and let's go ahead and open that up and go through our next example which includes multiple layers programming it to recognize whether someone wears a mask or not and then saving that model so we can use it in real time so we're actually almost a full end-to-end development of a product here of course this is a very simplified version and it'd be a lot more to it you'd also have to do like recognizing whether it's someone's face or not all kinds of other things go into this so let's go ahead and jump into that code and we'll open up a new python three oops python three it's working on it there we go and then we want to go ahead and train our mask we'll just call this train mask and we want to go ahead and train mask and save it so it's it's safe mask train mask detection not to be confused with masking data a little bit different we're actually talking about our physical mask on your face and then from the cross standpoint we got a lot of imports to do here and i'm not going to dig too deep on the imports we're just going to go ahead and notice a few of them so we have in here d there we go i have something to draw with a little bit here we have our image processing and the image processing right here let me underline that uh deals with how do we bring images in because most images are like a a square grid and then each value in there has three values for the three different colors cross and tensorflow do a really good job of working with that so you don't have to do all the heavy listening and figuring out what's going to go on and we have the mobile net average pooling 2d this again is how do we deal with the images and pulling them uh dropout's a cool thing worth looking up if you haven't when you as you get more and more into cross and tensorflow uh it'll auto drop out certain nodes that way you'll get a better the notes just kind of die and they find that they actually create more of a bias and help and they also add processing time so they remove them and then we have our flatten that's where you take that huge array with the three different colors and you find a way to flatten it so it's just a one dimensional array instead of a two by two by three uh dense input we did that in the other one so that should look a little familiar oops there we go our input our model again these are things we had on the last one here's our optimizer with our atom um we have some pre-processing on the input that goes along with bringing the data in more pre-processing with image to array loading the image this stuff is so nice it looks like a lot of works you have to import all these different modules in here but the truth is is it does everything for you you're not doing a lot of pre-processing you're letting the software do the pre-processing and we're going to be working with the setting something to categorical again that's just a conversion from a number to a category zero one doesn't really mean anything it's like true false um label binarizer the same thing uh we're changing our labels around and then there's our train test split classification report um our im utilities let me just go ahead and scroll down here a notch for these this is something a little different going on down here this is not part of the tensorflow or the sklearn this is the site kit setup and tensorflow above the path this is part of opencv and we'll actually have another tutorial going out with the opencv so if you want to know more about opencv you'll get a glance on it in this software especially the second piece when we reload up the data and hook it up to a video camera we're going to do that on this round um but this is part of the opencv thing you'll see cv2 is usually how that's referenced but the im utilities has to do with how do you rotate pictures around and stuff like that and resize them and then the matte plot library for plotting because it's nice to have a graph tells us how good we're doing and then of course our numpy numbers array and just a straight os access wow so that was a lot of imports like i said i'm not going to spend i spent a little time going through them but we didn't want to go too much into them and then i'm going to create some variables that we need to go ahead and initialize we have the learning rate number of epochs to train for and the batch size and if you remember correctly we talked about the learning rate uh to the negative 4.0001 a lot of times it's 0.001 or 0.001 usually it's in that variation depending on what you're doing and how many epochs and they kind of play with the epics the epics is how many times we're going to go through all the data now i have it as two the actual setup is for 20 and 20 works great the reason i have it for two is it takes a long time to process one of the downsides of jupiter is that jupiter isolates it to a single kernel so even though i'm on an 8 core processor with 16 dedicated threads only one thread is running on this no matter what so it doesn't matter so it takes a lot longer to run even though tensorflow really scales up nicely and the batch size is how many pictures do we load at once in process again those are numbers you have to learn to play with depending on your data and what's coming in and the last thing we want to go ahead and do is there's a directory with a data set we're going to run and this just has images of mass and not masks and if we go in here you'll see data set and then you have pictures with mess they're just images of people with mass on their face uh and then we have the opposite go back up here without masks so it's pretty straightforward they look kind of askew because they tried to format them into very similar uh setup on there so they're they're mostly squares you'll see some that are slightly different on here and that's kind of important thing to do on a lot of these data sets get them as close as you can to each other and we'll we actually will run in the in this processing of images up here and the cross layers and importing and dealing with images it does such a wonderful job of converting these that a lot of it we don't have to do a whole lot with so you have a couple things going on there and so we're now going to be this is now loading the images and let me see and we'll go ahead and create data and labels here's our here's the features going in which is going to be our pictures and our labels going out and then for categories in our list directory directory and if you remember i just flashed that at you it had face mask or or no face mask those are the two options and we're just going to load into that we're going to append the image itself and the labels so we'll just create a huge array and you can see right now this could be an issue if you had more data at some point thankfully i have a 32 gig hard drive or ram even that you could do with a lot less of that probably under 16 or even eight gigs would easily load all this stuff and there's a conversion going on in here i told you about how we are going to convert the size of the image so it resizes all the images that way our data is all identical the way it comes in and you can see here with our labels we have without mask without mask without mask uh the other one would be with mask those are the two that we have going in there uh and then we need to change it to the one not hot encoding and this is going to take our up here we had was it labels and data we want the labels to be categorical so we're going to take labels and change it to categorical and our labels then equal a categorical list we'll run that and again if we do uh labels and we just do the last or the first 10 let's do the last 10 just because minus 10 to the end there we go just so we can see where the other side looks like we now have one that means they have a mask one zero one zero so on uh one being they have a mask and zero no mask and if we did this in reverse i just realized that this might not make sense if you've never done this before let me run this 0 1. so 0 is do they have a mask on zero do they not have a mask on one so this is the same as what we saw up here without mask one equals the second value is without mass so with mass without mask and that's just a with any of your data processing we can't really zero if you have a zero one output uh it causes issues as far as training and setting it up so we always want to use a one hot encoder if the values are not actual linear value or regression values they're not actual numbers if they represent a thing and so now we need to go ahead and do our train x test x train y test y train split test data and we'll go ahead and make sure it's going to be random and we'll take 20 percent of it for testing and the rest for setting it up as far as training their model this is something that's become so cool when they're training these set they realize we can augment the data what does augment mean well if i rotate the data around and i zoom in i zoom out i rotate it share it a little bit flip it horizontally fill mode as i do all these different things to the data it is able to it's kind of like increasing the number of samples i have so if i have all these perfect samples what happens when we only have part of the face or the face is tilted sideways or all those little shifts cause a problem if you're doing just a standard set of data so we're going to create an augment in our image data generator which is going to rotate zoom and do all kinds of cool thing and this is worth looking up this image data generator and all the different features it has a lot of times i'll the first time through my models i'll leave that out because i want to make sure there's a thing we call build to fail which is just cool to know you build the whole process and then you start adding these different things in so that you can better train your model and so we go and run this and then we're going to load and then we need to go ahead and you probably would have gotten an error if you hadn't put this piece in right here i haven't run it myself because the guys in the back did this we take our base model and one of the things we want to do is we want to do a mobile net v2 and this we this is a big thing right here include the top equals false a lot of data comes in with the label on the top row so we want to make sure that that is not the case and then the construction of the head of the model that will be placed on the top of the base model we want to go ahead and set that up and you'll see a warning here i'm kind of ignoring the warning because it has to do with the size of the pictures and the weights for input shape so they'll switch things to defaults to saying hey we're going to auto shape some of this stuff for you you should be aware that with this kind of imagery we're already augmenting it by moving it around and flipping it and doing all kinds of things to it so that's not a bad thing in this but another data it might be if you're working in a different domain and so we're going to go back here and we're going to have we have our base model we're going to do our head model equals our base model output and what we got here is we have an average pooling 2d pool size 77 head model head model flatten so we're flattening the data so this is all processing and flattening the images and the pooling has to do with some of the ways it can process some of the data we'll look at that a little bit when we get down to the lower levels processing it and then we have our dents we've already talked a little bit about a dense just what you think about and then the head model has a drop out of 0.5 what we can do with a dropout the dropout says that we're going to drop out a certain amount of nodes while training so when you actually use the model it will use all the nodes but this drops certain ones out and it helps stop biases from forming so it's really a cool feature in here they discovered this a while back we have another dense mode this time we're using soft max activation lots of different activation options here softmax is a real popular one for a lot of things and so was the relu and you know there's we could do a whole talk on activation formulas uh and why what their different uses are and how they work when you first start out you'll you'll use mostly the relu and the softmax for a lot of them just because they're some of the basic setups it's a good place to start and then we have our model equals model inputs equals base model dot input outputs equals head model so again we're still building our model here we'll go ahead and run that and then we're going to loop over all the layers in the base model and freeze them so they will not be updated during the first training process so for layer and base model layers layers dot trainable equals false a lot of times when you go through your data you want to kind of jump in part way through i i'm not sure why in the back they did this for this particular example but i do this a lot when i'm working with series and specifically in stock data i wanted to iterate through the first set of 30 data before it does anything i would have to look deeper to see why they froze it on this particular one and then we're going to compile our model so compiling the model atom init layer decay initial learning rate over epics and we go ahead and compile our loss is going to be the binary cross entropy which we'll have that print out optimizer for opt metrics is accuracy same thing we had before not a huge jump as far as the previous code and then we go ahead and we've gone through all this and now we need to go ahead and fit our model uh so train the head of the network print info training head run now i skipped a little time because it you'll see the run time here is at 80 seconds per epic takes a couple minutes for it to get through on a single kernel one of the things i want you to notice on here while we're well it's finishing the processing is that we have up here our augment going on so anytime the train x and trading y go in there's some randomness going on there and is jiggling it around what's going into our setup uh of course we're batch sizing it so it's going through whatever we set for the batch values how many we process at a time and then we have the steps per epic uh the train x the batch size validation data here's our test x and test y where we're sending that in and this again it's validation one of the important things to know about validation is our when both our training data and our test data have about the same accuracy that's when you want to stop that means that our model isn't biased if you have a higher accuracy on your uh testing you know you've trained it and your accuracy is higher on your actual test data then something in there is probably has a bias and it's over fitted so that's what this is really about right here with the validation data and validation steps so it looks like it's let me go ahead and see if it's done processing looks like we've gone ahead and gone through two epics again you could run this through about 20 with this amount of data and it would give you a nice refined model at the end we're going to stop at 2 because i really don't want to sit around all afternoon and i'm running this on a single thread so now that we've done this we're going to need to evaluate our model and see how good it is and to do that we need to go ahead and make our predictions these are predictions on our test x to see what it thinks are going to be uh so now it's going to be evaluating the network and then we'll go ahead and go down here and we will need to turn the index in because remember it's it's either 0 or 1 it's a 0 1 0 1 so you have two outputs not wearing wearing a mask not wearing a mask and so we need to go ahead and take that argument at the end and change those predictions to a 0 or 1 coming out uh and then to finish that off we want to go ahead and let me just put this right in here and do it all in one shot we want to show a nicely formatted classification report so we can see what that looks like on here and there we have it we have our precision uh it's 97 with the mask there's our f1 score support without a mass 97 um so that's pretty high set up on there you know you three people are going to sneak into the store who are without a mask and that thinks they have a mask and there's going to be three people with a mask that's going to flag the person at the front to go oh hey look at this person you might not have a mask that's if i guess this is set up in front of a store um so there you have it and of course one of the other cool things about this is if someone's walking in to the store and you take multiple pictures of them um you know this is just an it would be a way of flagging and then you can take that average of those pictures and make sure they match or don't match if you're on the back end and this is an important step because we're gonna this is just cool i love doing this stuff uh so we're gonna go ahead and take our model and we're gonna save it so model save mass detector.model we're going to give it a name we're going to save the format in this case we're going to use the h5 format and so this model we just programmed has just been saved so now i can load it up into say another program what's cool about this is let's say i want to have somebody work on the other part of the program well i just saved the model they upload the model now they can use it for whatever and then if i get more information and we start working with that at some point i might want to update this model and make a better model and this is true of so many things where i take this model and maybe i'm running a prediction on making money for a company and as my model gets better i want to keep updating it and then it's really easy just to push that out to the actual end user and here we have a nice graph you can see the training loss and accuracy as we go through the epics we only did the you know only shows just the one epic coming in here but you can see right here as the value loss train accuracy and value accuracy start switching and they start converging and you'll hear converging this is a convergence they're talking about when they say you're you're i know when i work in the scikit with sk learn neural networks this is what they're talking about a convergence is our loss and our accuracy come together and also up here and this is why i'd run it more than just two epochs as you can see they still haven't converged all the way so that would be a cue for me to keep going but what we want to do is we want to go ahead and create a new python 3 program and we just did our train mask so now we're going to go ahead and import that and use it and show you in a live action get a view of both myself in the afternoon along with my background of an office which is in the middle still of reconstruction for another month and we'll call this a mask detector and then we're going to grab a bunch of a few items coming in uh we have our mobile net v2 import pre-processing input so we're still going to need that we still have our tensorflow image to array we have our load model that's where most of this stuff's going on this is our cv2 or opencv again i'm not going to dig too deep into that we're going to flash a little open cv code at you and we actually have a tutorial on that coming out our numpy array our im utilities which is part of the opencv or cv2 setup and then we have of course time and just our operating system so those are the things we are going to go ahead and set up on here and then we're going to create this takes just a moment our module here which is going to do all the heavy lifting so we're going to detect and predict a mask we have frame face net mass net these are going to be generated by our open cv we have our frame coming in and then we want to go ahead and create a mask around the face it's going to try to detect the face and then set that up so we know what we're going to be processing through our model and then there's a frame shape here this is just our height versus width that's all hw stands for they've called it blob which is a cv2 dnn blob form image frame so this is reformatting this frame that's going to be coming in literally from my camera and we'll show you that in a minute that little piece of code that shoots that in here and we're going to pass the blob through the network and obtain the face detections so face net dot set import blob detections face net forward print detections shape uh so these is this is what's going on here this is that model we just created we're going to send that in there and i'll show you in a second where that is but it's going to be under face net and then we go ahead and initialize our list of faces their corresponding locations and the list of predictions from our face mask network we're going to loop over the detections and this is a little bit more work than you think um as far as looking for different faces whatever the view of a crowd of faces um so we're looping through the detections and the shapes going through here and probability associated with the detection here's our confidence of detections we're going to filter out weak detection by ensuring the confidence is greater than the minimum confidence so we've said it remember 0 to 1 so 0.5 would be our minimal confidence probably is pretty good [Music] and then we're going to put in compute bounding boxes for the object if i'm zipping through this it's because we're going to do an open cv and i really want to stick to just the cross part and so i'm just kind of jumping through all this code you can get a copy of this code from simply learn and take it apart or look for the opencv coming out and we'll create a box the box sets it around the image ensure the bounding boxes fall within the dimensions of the frame so we create a box around what's going to what we hope is going to be the face extract the face roi convert it from bgr to rgb channel [Music] again this is an open cv issue not really an issue but it has to do with the order i don't know how many times i forgot to check the order colors when working with opencv because there's all kinds of fun things when red becomes blue and blue becomes red and we're going to go ahead and resize it process it frame it face frame setup again the face the cbt color we're going to convert it we're going to resize it image to array pre-process the input pin the face locate face start x dart y and x boy that was just a huge amount and i skipped over a ton of it but the bottom line is we're building a box around the face and that box because the open cv does a decent job of finding the face and that box is going to go in there and see hey does this person have a mask on it and so that's what that's what all this is doing on here and then finally we get down to this where it says predictions equals mass net dot predict faces batch size 32. uh so these different images of where we're guessing where the face is are then going to go through and generate an array of faces if you will and we're going to look through and say does this face have a mask on it and that's what's going right here is our prediction that's the big thing that we're working for and then we return the locations and the predictions the location just tells where on the picture it is and then the prediction tells us what it is is it a mask or is it not a mask all right so we've loaded that all up so we're going to load our serialized face detector model from disk um and we have our the path that it was saved in obviously you're going to put it in a different path depending on where you have it or however you want to do it and how you saved it on the last one where we trained it and then we have our weights path and so finally our face net here it is equals cb2.dnn.readnet prototext path weights path and we're going to load that up on here so let me go ahead and run that and then we also need to i'll just put it right down here i always see it separating these things in there and then we're going to load the actual mass detector model from disk this is the the model that we saved so let's go ahead and run that on there also so this is pulling on all the different pieces we need for our model and then the next part is we're going to create open up our video and this is just kind of fun because it's all part of the opencv a video set up and we just put this all in as one there we go uh so we're going to go ahead and open up our video we're going to start it and we're going to run it until we're done and this is where we get some real like kind of live action stuff which is fun this is what i like working about with images and videos is that when you start working with images and videos it's all like right there in front of you it's visual and you can see what's going on uh so we're going to start our video streaming this is grabbing our video stream source zero start that means it's grabbing my main camera i have hooked up and then you know starting video you're gonna print it out here's our video source equals zero start loop over the frames from the video stream oops a little redundancy there um let me close i'll just leave it that's how they headed in the code so uh so while true we're going to grab the frame from the threaded video stream and resize it to have the maximum width of 400 pixels so here's our frame we're going to read it from our visual stream we're going to resize it and then we have we're returning remember we return from the our procedure the location and the prediction so detect and predict mask we're sending it the frame we're sending it the face net and the mass net so we're sending all the different pieces that say this is what's going through on here and then it returns our location of predictions and then for our box and predictions in the location and predictions and the boxes is again this is an open cv set that says hey this is a box coming in from the location because you have the two different points on there and then we're going to unpack the box in predictions and we're going to go ahead and do mask without a mask equals prediction we're going to create our label no mask and create color if the label equals mask l0225 and you know it's going to make a lot more sense when i hit the run button here but we have the probability of the label we're going to display the label and bounding box rectangle in the output frame and then we're going to go ahead and show the output from the frame cv2 i am show frame frame and then the key equals cp2 weight key one we're just going to wait till the next one comes through from our feed and we're going to do this until we hit the stop button pretty much so you're ready for this to see if it works we've distributed our our model we've loaded it up into our distributed code here we've got a hooked into our camera and we're going to go ahead and run it and there it goes it's going to be running and we can see the data coming down here and we're waiting for the pop-up and there i am in my office with my funky headset on uh and you can see in the background my unfinished wall and it says up here no mask oh no i don't have a mask on i wonder if i cover my mouth what would happen you can see my no mask goes down a little bit i wish i brought a mask into my office it's up at the house but you can see here that this says you know there's a 95 98 chance that i don't have a mask on and it's true i don't have a mask on right now and this could be distributed this is actually an excellent little piece of script that you could start you know you install somewhere on a video feed on a on a security camera or something and then you'd have this really neat setup saying hey do you have a mask on when you enter a store or a public transportation or whatever it is where they're required to wear a mask let me go ahead and stop that now if you want a copy of this code definitely give us a hauler we will be going into opencv in another one so i skipped a lot of the opencv code in here as far as going into detail really focusing on the cross saving the model uploading the model and then processing a streaming video through it so you can see it that the model works we actually have this working model that hooks into the video camera which is just pretty cool a lot of fun so i told you we're going to dive in and really roll up our sleeve and do a lot of coding today we did the basic uh demo up above for just pulling in a cross and then we went into a cross model where we pulled in data to see whether someone was wearing a mask or not so very useful in today's world as far as a fully running application welcome to the rnn tutorial that's the recurrent neural network so we talk about a feed forward neural network in a feed forward neural network information flows only in the forward direction from the input nodes through the hidden layers if any into the output nodes there are no cycles or loops in the network and so you can see here we have our input layer i was talking about how it just goes straight forward into the hidden layers so each one of those connects and then connects to the next hidden layer connects to the output layer and of course we have a nice simplified version where it has a predicted output and the refer to the input is x a lot of times in the output as y decisions are based on current input no memory about the past no future scope why recurrent neural network issues in feed forward neural network so one of the biggest issues is because it doesn't have a scope of memory or time a feed-forward neural network doesn't know how to handle sequential data it only considers only the current input so if you have a series of things and because three points back affects what's happening now and what your output affects what's happening that's very important so whatever i put as an output is going to affect the next one um a feedforward doesn't look at any of that it just looks at this is what's coming in and it cannot memorize previous inputs so it doesn't have that list of inputs coming in solution to feed forward neural network you'll see here where it says recurrent neural network and we have our x on the bottom going to h going to y that's your feed forward but right in the middle it has a value c so there's a whole another process so it's memorizing what's going on in the hidden layers and the hidden layers as they produce data feed into the next one so your hidden layer might have an output that goes off to y but that output goes back into the next prediction coming in what this does is this allows it to handle sequential data it considers the current input and also the previously received inputs and if we're going to look at general drawings and solutions we should also look at applications of the rnn image captioning rnn is used to caption an image by analyzing the activities present in it a dog catching a ball in midair that's very tough i mean you know we have a lot of stuff that analyzes images of a dog in the image of a ball but it's able to add one more feature in there that's actually catching the ball in midair time series prediction any time series problem like predicting the prices of stocks in a particular month can be solved using rnn and we'll dive into that in our use case and actually take a look at some stock one of the things you should know about analyzing stock today is that it is very difficult and if you're analyzing the whole stock the stock market at the new york stock exchange in the u.s produces somewhere in the neighborhood if you count all the individual trades and fluctuations by the second it's like three terabytes a day of data so we're going to look at one stock just analyzing one stock is really tricky in here we'll give you a little jump on that so that's exciting but don't expect to get rich off of it immediately another application of the rnn is natural language processing text mining and sentiment analysis can be carried out using rnn for natural language processing and you can see right here the term natural language processing when you stream those three words together is very different than ice if i said processing language natural leap so the time series is very important when we're analyzing sentiments it can change the whole value of a sentence just by switching the words around or if you're just counting the words you might get one sentiment where if you actually look at the order they're in you get a completely different sentiment when it rains look for rainbows when it's dark look for stars both of these are positive sentiments and they're based upon the order of which the sentence is going in machine translation given an input in one language rnn can be used to translate the input into a different languages as output i myself very linguistically challenged but if you study languages and you're good with languages you know right away that if you're speaking english you would say big cat and if you're speaking spanish you would say cat big so that translation is really important to get the right order to get there's all kinds of parts of speech that are important to know by the order of the words here this person is speaking in english and getting translated and you can see here a person is speaking in english in this little diagram i guess that's denoted by the flags i have a flag i own it no um but they're speaking in english and it's getting translated into chinese italian french german and spanish languages some of the tools coming out are just so cool so somebody like myself who's very linguistically challenged i can now travel into worlds i would never think of because i can have something translate my english back and forth readily and i'm not stuck with a communication gap so let's dive into what is a recurrent neural network recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer sounds a little confusing when we start breaking it down it'll make more sense and usually we have our propagation forward neural network with the input layers the hidden layers the output layer with the recurrent neural network we turn that on its side so here it is and now our x comes up from the bottom into the hidden layers into y and they usually draw very simplified x to h with c as a loop a to y where a b and c are the perimeters a lot of times you'll see this kind of drawing in here digging closer and closer into the h and how it works going from left to right you'll see that the c goes in and then the x goes in so the x is going upward bound and c is going to the right a is going out and c is also going out that's where it gets a little confusing so here we have xn cn and then we have y out and c out and c is based on ht minus one so our value is based on the y and the h value are connected to each other they're not necessarily the same value because h can be its own thing and usually we draw this or we represent it as a function h of t equals a function of c where h of t minus 1 that's the last h output and x a t going in so it's the last output of h combined with the new input of x where h t is the new state fc is a function with the parameters c that's a common way of denoting it h t minus one is the old state coming out and then x of t is an input vector at time of step t well we need to cover types of recurrent neural networks and so the first one is the most common one which is a one to one single output one to one neural network is usually known as a vanilla neural network used for regular machine learning problems why because vanilla is usually considered kind of a just a real basic flavor but because it's very basic a lot of times they'll call it the vanilla neural network which is not the common term but it is you know kind of a slang term people will know what you're talking about usually if you say that then we run one to many so you have a single input and you might have a multiple outputs in this case uh image captioning as we looked at earlier where we have not just looking at it as a dog but a dog catching a ball in the air and then you have many to one network takes in a sequence of inputs examples sentiment analysis where a given sentence can be classified as expressing positive or negative sentiments and we looked at that as we were discussing if it rains look for a rainbow so positive sentiment where rain might be a negative sentiment if you're just adding up the words in there and then the course if you're going to do a one-to-one mini to one one to many there's many to many networks takes in a sequence of inputs and generates a sequence of outputs example machine translation so we have a lengthy sentence coming in english and then going out in all the different languages you know just a wonderful tool very complicated set of computations you know if you're a translator you realize just how difficult it is to translate into different languages one of the biggest things you need to understand when we're working with this neural network is what's called the vanishing gradient problem while training an rnn your slope can be either too small or very large and this makes training difficult when the slope is too small the problem is known as vanishing gradient and you'll see here they have a nice image image loss of information through time so if you're pushing not enough information forward that information is lost and then when you go to train it you start losing the third word in the sentence or something like that or it doesn't quite follow the full logic of what you're working on exploding gradient problem oh this is one that runs into everybody when you're working with this particular neural network when the slope tends to grow exponentially instead of decaying this problem is called exploding gradient issues and gradient problem long training time poor performance bad accuracy and i'll add one more in there your computer if you're on a lower end computer testing out a model will lock up and give you the memory error explaining gradient problem consider the following two examples to understand what should be the next word in the sequence the person who took my bike and blank a thief the students who got into engineering with blank from asia and you can see in here we have our x value going in we have the previous value going forward and then you back propagate the error like you do with any neural network and as we're looking for that missing word maybe we'll have the person took my bike and blank was a thief and the student who got into engineering with a blank were from asia consider the following example the person who took the bike so we'll go back to the person who took the bike was blank a thief in order to understand what would be the next word in the sequence the rnn must memorize the previous context whether the subject was singular noun or a plural noun so was a thief is singular the student who got into engineering well in order to understand what would be the next word in the sequence the rnn must memorize the previous context whether the subject was singular run into the gradient problem we need a solution the solution to the gradient problem first we're going to look at exploding gradient where we have three different solutions depending on what's going on one is identity initialization so the first thing we want to do is see if we can find a way to minimize the identities coming in instead of having it identify everything just the important information we're looking at next is to truncate the back propagation so instead of having whatever information it's sending to the next series we can truncate what it's sending we can lower that particular set of layers make those smaller and finally is a gradient clipping so when we're training it we can clip what that gradient looks like and narrow the training model that we're using when you have a vanishing gradient the option problem we can take a look at weight initialization very similar to the identity but we're going to add more weights in there so it can identify different aspects of what's coming in better choosing the right activation function that's huge so we might be activating based on one thing and we need to limit that we haven't talked too much about activation so we'll look at that just minimally there's a lot of choices out there and then finally there's long short term memory networks the lstms and we can make adjustments to that so just like we can clip the gradient as it comes out we can also expand on that we can increase the memory network the size of it so it handles more information and one of the most common problems in today's setup is what they call long-term dependencies suppose we try to predict the last word in the text the clouds are in the and you probably said sky here we do not need any further context it's pretty clear that the last word is going to be sky suppose we try to predict the last word in the text i have been staying in spain for the last 10 years i can speak fluent maybe you said portuguese or french no you probably said spanish the word we predict will depend on the previous few words in context here we need the context of spain to predict the last word in the text it's possible that the gap between the relevant information and the point where it is needed to become very large lstms help us solve this problem so the lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default behavior all recurrent neural networks have the form of a chain of repeating modules of neural network connections in standard rnns this repeating module will have a very simple structure such as a single tangent h layer lstms's also have a chain like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default behavior ls tmss also have a chain like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way as you can see the deeper we dig into this the more complicated the graphs get in here i want you to note that you have x of t minus one coming in you have x of t coming in and you have x a t plus one and you have h of t minus one and h of t coming in and h of t plus 1 going out and of course on the other side is the output a in the middle we have our tangent h but it occurs in two different places so not only when we're computing the x of t plus one are we getting the tangent h from x to t but we're also getting that value coming in from the x of t minus 1. so the short of it is as you look at these layers not only does it does the propagate through the first layer goes into the second layer back into itself but it's also going into the third layer so now we're kind of stacking those up and this can get very complicated as you grow that in size it also grows in memory too and in the amount of resources it takes but it's a very powerful tool to help us address the problem of complicated long sequential information coming in like we were just looking at in the sentence and when we're looking at our long short term memory network there's three steps of processing sensing in the lstms that we look at the first one is we want to forget irrelevant parts of the previous state you know a lot of times like you know is as in a unless we're trying to look at whether it's a plural noun or not they don't really play a huge part in the language so we want to get rid of them then selectively update cell state values so we only want to update the cell state values that reflect what we're working on and finally we want to put only output certain parts of the cell state so whatever is coming out we want to limit what's going out too and let's dig a little deeper into this let's just see what this really looks like uh so step one decides how much of the past it should remember first step in the lstm is to decide which information to be omitted in from the cell in that particular time step it is decided by the sigmoid function it looks at the previous state h to t minus 1 and the current input x of t and computes the function so you can see over here we have a function of t equals the sigmoid function of the weight of f the h at t minus 1 and then x a t plus of course you have a bias in there with any of our neural network so we have a bias function so f of t equals forget gate decides which information to delete that is not important from the previous time step considering an stm is fed with the following inputs from the previous and present time step alice is good in physics john on the other hand is good in chemistry so previous output john plays football well he told me yesterday over the phone that he had served as a captain of his college football team that's our current input so as we look at this the first step is the forget gate realizes there might be a change in context after encounting the first full stop compares with the current input sentence of x a t so we're looking at that full stop and then compares it with the input of the new sentence the next sentence talks about john so the information on alice is deleted okay that's important to know so we have this input coming in and if we're going to continue on with john then that's going to be the primary information we're looking at the position of the subject is vacated and is assigned to john and so in this one we've seen that we've weeded out a whole bunch of information and we're only passing information on john since that's now the new topic so step two is then to decide how much should this unit add to the current state in the second layer there are two parts one is the sigmoid function and the other is the tangent h in the sigmoid function it decides which values to let through zero or one tangent h function gives a weightage to the values which are passed deciding their level of importance minus one to one and you can see the two formulators that come up the i of t equals the sigmoid of the weight of i h to t minus one x of t plus the bias of i and the c of t equals the tangent of h of the weight of c of h to t minus 1 x of t plus the bias of c so our i of t equals the input gate determines which information to let through based on its significance in the current time step if this seems a little complicated don't worry because a lot of the programming is already done when we get to the case study understanding though that this is part of the program is important when you're trying to figure out these what to set your settings at you should also note when you're looking at this it should have some semblance to your forward propagation neural networks where we have a value assigned to a weight plus a bias very important steps than any of the neural network layers whether we're propagating into them the information from one to the next or we're just doing a straightforward neural network propagation let's take a quick look at this what it looks like from the human standpoint as i step out of my suit again consider the current input at x of t john plays football well he told me yesterday over the phone that he had served as a captain of his college football team that's our input input gate analysis the important information john plays football and he was the captain of his college team is important he told me over the phone yesterday is less important hence it is forgotten this process of adding some new information can be done via the input gate now this example is as a human form and we'll look at training this stuff in just a minute but as a human being if i wanted to get this information from a conversation maybe it's a google voice listening in on you or something like that um how do we weed out the information that he was talking to me on the phone yesterday well i don't want to memorize that he talked to me on the phone yesterday or maybe that is important but in this case it's not i want to know that he was the captain of the football team i want to know that he served i want to know that john plays football and he was the captain of the college football team those are the two things that i want to take away as a human being again we measure a lot of this from the human viewpoint and that's also how we try to train them so we can understand these neural networks finally we get to step three decides what part of the current cell state makes it to the output the third step is to decide what will be our output first we run a sigmoid layer which decides what parts of the cell state make it to the output then we put the cell state through the tangent h to push the values to be between -1 and 1 and multiply it by the output of the sigmoid gate so when we talk about the output of t we set that equal to the sigmoid of the weight of zero of the h of t minus one and back one step in time by the x of t plus of course the bias the h of t equals the outer t times the tangent of the tangent h of c at t so our o t equals the output gate allows the passed in information to impact the output in the current time step let's consider the example to predicting the next word in the sentence john played tremendously well against the opponent and one for his team for his contributions brave blank was awarded player of the match there could be a lot of choices for the empty space current input brave is an adjective adjectives describe a noun john could be the best output after brave thumbs up for john awarded player of the match and if you were to pull just the nouns out of the sentence team doesn't look right because that's not really the subject we're talking about contributions you know brave contributions or brave teen brave player brave match um so you look at this and you can start to train this these this neural network so starts looking at and goes oh no john is what we're talking about so brave is an adjective john's going to be the best output and we give john a big thumbs up and then of course we jump into my favorite part the case study use case implementation of lstm let's predict the prices of stocks using the lstm network based on the stock price data between 2012 2016. we're going to try to predict the stock prices of 2017. and this will be a narrow set of data we're not going to do the whole stock market it turns out that the new york stock exchange generates roughly three terabytes of data per day that's all the different trades up and down of all the different stocks going on and each individual one uh second to second or nanosecond to nanosecond but we're going to limit that to just some very basic fundamental information so don't think you're going to get rich off this today but at least you can give an eye you can give a step forward in how to start processing something like stock prices a very valid use for machine learning in today's markets use case implementation of lstm let's dive in we're going to import our libraries we're going to import the training set and get the scaling going now if you watch any of our other tutorials a lot of these pieces just start to look very familiar because it's very similar setup let's take a look at that and just reminder we're going to be using anaconda the jupiter notebook so here i have my anaconda navigator when we go under environments i've actually set up a cross python 36 i'm in python36 and nice thing about anaconda especially the newer version remember a year ago messing with anaconda in different versions of python in different environments anaconda now has a nice interface and i have this installed both on a ubuntu linux machine and on windows so it works fine on there you can go in here and open a terminal window and then in here once you're in the terminal window this is where you're going to start installing using pip to install your different modules and everything now we've already pre-installed them so we don't need to do that in here but if you don't have them installed in your particular environment you'll need to do that and of course you don't need to use the anaconda or the jupiter you can use whatever favorite python id you like i'm just a big fan of this because it keeps all my stuff separate you can see on this machine i have specifically installed one for cross since we're going to be working with cross under tensorflow we go back to home i've gone up here to application and that's the environment i've loaded on here and then we'll click on the launch jupiter notebook now i've already in my jupiter notebook have set up a lot of stuff so that we're ready to go kind of like martha stewart's in the old cooking show so we want to make sure we have all our tools for you so you're not waiting for them to load and if we go up here to where it says new you can see where you can create a new python 3. that's what we did here underneath the setup so it already has all the modules installed on it and i'm actually renamed this if you go under file you can rename it we've i'm calling it rnn stock and let's just take a look and start diving into the code let's get into the exciting part now we've looked at the tool and of course you might be using a different tool which is fine let's start putting that code in there and seeing what those imports and uploading everything looks like now first half is kind of boring when we hit the run button because we're going to be importing numpy as np that's uh the number python which is your numpy array and the matplot library because we're going to do some plotting at the end and our pandas for our data set our pandas is pd and when i hit run it really doesn't do anything except for load those modules just a quick note let me just do a quick draw here oops shift alt there we go you'll notice when we're doing this setup if i was to divide this up oops i'm going to actually let's overlap these here we go this first part that we're going to do is our data prep a lot of prepping involved in fact depending on what your system is since we're using karass i put an overlap here but you'll find that almost maybe even half of the code we do is all about the data prep and the reason i overlapped this with let me just put that down because that's what we're working in uh is because cross has like their own preset stuff so it's already pre-built in which is really nice so there's a couple steps a lot of times that are in the kara setup we'll take a look at that to see what comes up in our code as we go through and look at stock and the last part is to evaluate and if you're working with shareholders or classroom whatever it is you're working with the evaluate is the next biggest piece so the actual code here crossed is a little bit more but when you're working with some of the other packages you might have like three lines that might be it all your stuff is in your pre-processing and your data since cross has is is cutting edge and you load the individual layers you'll see that there's a few more lines here and crosses a little bit more robust and then you spend a lot of times like i said with the evaluate you want to have something you present to everybody else and say hey this is what i did this is what it looks like so let's go through those steps this is like a kind of just general overview and let's just take a look and see what the next set of code looks like and in here we have a data set train and it's going to be read using the pd or pandas dot read csv and it's a googlestockpricetrain.csv and so under this we have training set equals dataset train dot ilocation and we've kind of sorted out part of that so what's going on here let's just take a look at let's look at the actual file and see what's going on there now if we look at this ignore all the extra files on this i already have a train and a test set where it's sorted out this is important to notice because a lot of times we do that as part of the pre-processing of the data we take 20 percent of the data out so we can test it and then we train the rest of it that's what we use to create our neural network that way we can find out how good it is uh but let's go ahead and just take a look and see what that looks like as far as the file itself and i went ahead and just opened this up in a basic word pad and text editor just so we can take a look at it certainly you can open up an excel or any other kind of spreadsheet and we note that this is a comma separated variables we have a date uh open high low close volume this is the standard stuff that we import into our stock or the most basic set of information you can look at in stock it's all free to download in this case we downloaded it from google that's why we called the google stock price and specifically is google this is the google stock values from as you can see here we started off at 1 3 2012. so when we look at this first setup up here we have a data set train equals pd underscore csv and if you noticed on the original frame let me just go back there they had it set to home ubuntu downloads google stock price train i went ahead and changed that because we're in the same file where i'm running the code so i've saved this particular python code and i don't need to go through any special paths or have the full path on there and then of course we want to take out certain values in here and you're going to notice that we're using our data set and we're now in pandas so pandas basically it looks like a spreadsheet and in this case we're going to do i location which is going to get specific locations the first value is going to show us that we're pulling all the rows in the data and the second one is we're only going to look at columns one and two and if you remember here from our data as we switch back on over columns we so we start with zero which is the date and we're going to be looking at open and high which would be one and two we'll just label that right there so you can see now when you go back and do this you certainly can extrapolate and do this on all the columns but for the example let's just limit a little bit here so that we can focus on just some key aspects of stock and then we'll go up here and run the code and uh again i said the first half is very boring whenever we hit the run button it doesn't do anything because we're still just loading the data and setting it up now that we've loaded our data we want to go ahead and scale it we want to do what they call feature scaling and in here we're going to pull it up from the sk learn or the sk kit pre-processing import min max scaler and when you look at this you got to remember that biases in our data we want to get rid of that so if you have something that's like a really high value let's just draw a quick graph and i have something here like the maybe the stock has a value one stock has a value of a hundred and another stock has a value of five you start to get a bias between different stocks and so when we do this we go ahead and say okay 100 is going to be the max and 5 is going to be the min and then everything else goes and then we change this so we just squish it down i like the word squish so it's between one and zero so 100 equals one or one equals a hundred and zero equals five and you can just multiply it's usually just a simple multiplication we're using uh multiplication so it's going to be uh minus 5 and then 100 divided or 95 divided by one so or whatever value is is divided by 95. and once we've actually created our scale we've tolling is going to be from zero to one we want to take our training set and we're going to create a training set scaled and we're going to use our scalar sc we're going to fit we're going to fit and transform the training set so we can now use the sc this this particular object will use it later on our testing set because remember we have to also scale that when we go to test our model and see how it works and we'll go ahead and click on the run again it's not going to have any output yet because we're just setting up all the variables okay so we pasted the data in here and we're going to create the data structure with the 60 time steps and output first note we're running 60 time steps and that is where this value here also comes in so the first thing we do is we create our x train and y train variables we set them to an empty python array very important to remember what kind of array we're in what we're working with and then we're going to come in here we're going to go for i in range 60 to 1258 there's our 60 60 time steps and the reason we want to do this is as we're adding the data in there's nothing below the 60. so if we're going to use 60 time steps we have to start at point 60 because it includes everything underneath of it otherwise you'll get a pointer error and then we're going to take our x train and we're going to append training set scaled this is a scaled value between 0 and 1. and then as i is equal to 60 this value is going to be 60 minus 60 is 0. so this actually is 0 to i so it's going to be 0 60 1 to 61. let me just circle this part right here 1 to 61 2 to 62 and so on and so on and if you remember i said 0 to 60 that's incorrect because it does not count remember it starts at 0 so this is a count of 60. so it's actually 59 59. important to remember that as we're looking at this and then the second part of this that we're looking at so if you remember correctly here we go we go from 0 to 59 of i and then we have a comma a 0 right here and so finally we're just going to look at the open value now i know we did put it in there for one to two if you remember correctly it doesn't count the second one so it's just the open value we're looking at just open and then finally we have y train dot append training set i to zero and if you remember correctly i2 or i comma zero if you remember correctly this is zero to 59 so there's 60 values in it uh so we do i down here this is number 60. so we're going to do this is we're creating an array and we have 0 to 59 and over here we have number 60 which is going into the y train it's being appended on there and then this just goes all the way up so this is down here is a 0 to 59 and we'll call it 60 since that's the value over here and it goes all the way up to 12 58 that's where this value here comes in that's the length of the data we're loading so we've loaded two arrays we've loaded one array that has uh which is filled with arrays from 0 to 59 and we loaded one array which is just the value and what we're looking at you want to think about this as a time sequence uh here's my open open open open open open what's the next one in the series so we're looking at the google stock and each time it opens we want to know what the next one 0 through 59 what's 60. 1 through 60 what's 61 2 through 62 what's 62 and so on and so on going up and then once we've loaded those in our for loop we go ahead and take x train and y train equals np.arrayxtrain.org train we're just converting this back into a numpy array that way we can use all the cool tools that we get with numpy array including reshaping so if we take a look and see what's going on here we're going to take our x train we're going to reshape it wow what the heck does reshape mean that means we have an array if you remember correctly so many numbers by 60. that's how wide it is and so we're when you when you do x train dot shape that gets one of the shapes and you get um x train dot shape of one gets the other shape and we're just making sure the data is formatted correctly and so you use this to pull the fact that it's 60 by um in this case where's that value 60 pi 1199 1258 minus 60 11.99 and we're making sure that that is shaped correctly so the data is grouped into 11 99 by 60 different arrays and then the one on the end just means at the end because this when you're dealing with shapes and numpy they look at this as layers and so the in layer needs to be one value that's like the leaf of a tree where this is the branch and then it branches out some more um and then you get the leaf np dot reshape comes from and using the existing shapes to form it we'll go ahead and run this piece of code again there's no real output and then we'll import our different cross modules that we need so from cross models we're going to import the sequential model dealing with sequential data we have our dense layers we have actually three layers we're going to bring in our dents our lstm which is what we're focusing on and our drop out and we'll discuss these three layers more in just a moment but you do need the with the lstm you do need the dropout and then the final layer will be the dents but let's go ahead and run this and they'll bring port our modules and you'll see we get an error on here and if you read it closer it's not actually an error it's a warning what does this warning mean these things come up all the time when you're working with such cutting edge modules are complete being updated all the time we're not going to worry too much about the warning all it's saying is that the h5py module which is part of cross is going to be updated at some point and if you're running new stuff on keras and you start updating your cross system you better make sure that your h5 pi is updated too otherwise you're going to have an error later on and you can actually just run an update on the h5 pi now if you wanted to not a big deal we're not going to worry about that today and i said we were going to jump in and start looking at what those layers mean i meant that and we're going to start off with initializing the rnn and then we'll start adding those layers in and you'll see that we have the lstm and then the dropout lstm then dropout lstm then dropout what the heck is that doing so let's explore that we'll start by initializing the rnn regressor equals sequential because we're using the sequential model and we'll run that and load that up and then we're going to start adding our lstm layer and some dropout regularization and right there should be the q dropout regularization and if we go back here and remember our exploding gradient well that's what we're talking about the dropout drops out unnecessary data so we're not just shifting huge amounts of data through the network so and so we go in here let's just go ahead and add this in i'll go ahead and run this and we had three of them so let me go ahead and put all three of them in and then we can go back over them there's the second one and let's put one more in let's put that in and we'll go ahead and put two more in i mean i said one more in but it's actually two more in and then let's add one more after that and as you can see each time i run these they don't actually have an output so let's take a closer look and see what's going on here so we're going to add our first lstm layer in here we're going to have units 50. the units is the positive integer and it's the dimensionality of the output space this is what's going out into the next layer so we might have 60 coming in but we have 50 going out we have a return sequence because it is a sequence data so we want to keep that true and then you have to tell it what shape it's in well we already know the shape by just going in here and looking at x train shape so input shape equals the x train shape of one comma one makes it really easy you don't have to remember all the numbers that put in 60 or whatever else is in there you just let it tell the regressor what model to use and so we follow our stm with a dropout layer now understanding the dropout layer is kind of exciting because one of the things that happens is we can over train our network that means that our neural network will memorize such specific data that it has trouble predicting anything that's not in that specific realm to fix for that each time we run through the training mode we're going to take point two or twenty percent of our neurons and just turn them off so we're only gonna train on the other ones and it's going to be random that way each time we pass through this we don't over train these nodes come back in in the next training cycle we randomly pick a different 20. and finally they see a big difference as we go from the first to the second and third and fourth the first thing is we don't have to input the shape because the shapes already the output units is 50 here this item the next step automatically knows this layer is putting out 50 and because it's the next layer it automatically sets that and says oh 50 is coming out from our last layer that's coming up you know goes into the regressor and of course we have our dropout and that's what's coming into this one and so on and so on and so the next three layers we don't have to let it know what the shape is it automatically understands that and we're going to keep the units the same we're still going to do 50 units it's still a sequence coming through 50 units and a sequence now the next piece of code is what brings it all together let's go ahead and take a look at that and we come in here we put the output layer the dense layer and if you remember up here we had the three layers we had lstm dropout and dents dents just says we're going to bring this all down into one output instead of putting out a sequence we just know i want to know the answer at this point and let's go ahead and run that and so in here you notice all we're doing is setting things up one step at a time so far we've brought in our way up here we brought in our data we brought in our different modules we formatted the data for training it we set it up you know we have our y x train and our y train we have our source of data and the answers where we know so far that we're going to put in there we've reshaped that we've come in and built our cross we've imported our different layers and we have in here if you look we have what uh five total layers now cross is a little different than a lot of other systems a lot of other systems put this all in one line and do it automatic but they don't give you the options of how those layers interface and they don't give you the options of how the data comes in cross is cutting edge for this reason so even though there's a lot of extra steps in building the model this has a huge impact on the output and what we can do with this these new models from cross so we brought in our dents we have our full model put together a regressor so we need to go ahead and compile it and then we're going to go ahead and fit the data we're going to compile the pieces so they all come together and then we're going to run our training data on there and actually recreate our regressor so it's ready to be used so let's go ahead and compile that and i'll go ahead and run that and if you've been looking at any of our other tutorials on neural networks you'll see we're going to use the optimizer atom atom is optimized for big data there's a couple other optimizers out there beyond the scope of this tutorial but certainly atom will work pretty good for this and loss equals mean squared value so when we're training it this is what we want to base the loss on how bad is our error we're going to use the mean squared value for our error and the atom optimizer for its differential equations you don't have to know the math behind them but certainly it helps to know what they're doing and where they fit into the bigger models and then finally we're going to do our fit fitting the rn into the training set we have the regressor.fit x train y train epochs and batch size so we know where this is this is our data coming in for the x train our y train is the answer we're looking for of our data our sequential input epics is how many times we're going to go over the whole data set we created a whole data set of x trains so this is each each of those rows which includes a time sequence of 60. and bad size another one of those things where cross really shines is if you were pulling the save from a large file instead of trying to load it all into ram it can now pick smaller batches up and load those indirectly we're not worried about pulling them off a file today because this isn't big enough to cause a computer too much of a problem to run not too straining on the resources but as we run this you can imagine what happened if i was doing a lot more than just one column in one set of stock in this case google stock imagine if i was doing this across all the stocks and i had instead of just the open i had open close high low and you can actually find yourself with about 13 different variables times 60 because there's a time sequence suddenly you find yourself with a gig of memory you're loading into your ram which will just completely you know if it's just if you're not on multiple computers or cluster you're going to start running into resource problems but for this we don't have to worry about that so let's go ahead and run this and this will actually take a little bit on my computer it's an older laptop and give it a second to kick in there there we go all right so we have epic so this is going to tell me it's running the first run through all the data and as it's going through it's batching them in 32 pieces so 32 lines each time and there's 1198 i think i said 11.99 earlier but it's 11.98 i was off by one and each one of these is 13 seconds so you can imagine this is roughly 20 to 30 minutes run time on this computer like i said it's an older laptop running at 0.9 gigahertz on a dual processor and that's fine what we'll do is i'll go ahead and stop go get a drink of coffee and come back and let's see what happens at the end and where this takes us and like any good cooking show i've kind of gotten my latte i also have some other stuff running in the background so you'll see these numbers jumped up to like 19 seconds 15 seconds but you can scroll through you can see we've run it through 100 steps or 100 epics so the question is what does all this mean one of the first things you'll notice is that our loss can is over here it kind of stopped at 0.0014 but you can see it kind of goes down until we hit about 0.014 three times in a row so we guessed our epic pretty close since our losses remain the same on there so to find out we're looking at we're going to go ahead and load up our test data the test data that we didn't process yet and a real stock price data set test eye location this is the same thing we did it when we prepped the data in the first place so let's go ahead and go through this code and we can see we've labeled it part three making the predictions and visualizing the results so the first thing that we need to do is go ahead and read the data in from our test csv you see i've changed the path on it for my computer and then we'll call it the real stock price and again we're doing just the one column here and the values from i location so it's all the rows and just the values from these that one location that's the open stock open let's go ahead and run that so that's loaded in there and then let's go ahead and create we have our inputs we're going to create inputs here and this would all look familiar because this is the same thing we did before we're going to take our data set total we're going to do a little panda concat from the datastate train now remember the end of the dataset train is part of the data going in let's just visualize that just a little bit here's our train data let me just put tr for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data so to find out we're looking at we're going to go ahead and load up our test data the test data that we didn't process yet and a real stock price data set test eye location this is the same thing we did it when we prepped the data in the first place so let's go ahead and go through this code and we can see we've labeled it uh part three making the predictions and visualizing the results so the first thing that we need to do is go ahead and read the data in from our test csv you see i've changed the path on it for my computer and then we'll call it the real stock price and again we're doing just the one column here and the values from i location so it's all the rows and just the values from these that one location that's the open stock open let's go ahead and run that so that's loaded in there and then let's go ahead and create we have our inputs we're going to create inputs here and this should all look familiar because this is the same thing we did before we're going to take our data set total we're going to do a little panda concat from the data state train now remember the end of the data set train is part of the data going in let's just visualize that just a little bit here's our train data let me just put tr for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data because that's part of the next data or it's actually the top 59. so that's what this first setup is over here is we're going in we're doing the real stock price and we're going to just take the data set test and we're going to load that in and then the real stock price is our data test.test location so we're just looking at that first column the open price and then our data set total we're going to take pandas and we're going to concat and we're going to take our data set train for the open and our dataset test open and this is one way you can reference these columns we've referenced them a couple different ways we've referenced them up here with the one two but we know it's labeled as a panda set as open so pandas is great that way lots of versatility there and we'll go ahead and go back up here and run this there we go and you'll notice this is the same as what we did before we have our open data set where pended our two different or concatenated our two data sets together we have our inputs equals data set total length data set total minus length data set minus test minus 60 values so we're going to run this over all of them and you'll see why this works because normally when you're running your test set versus your training set you run them completely separate but when we graph this you'll see that we're just going to be we'll be looking at the part that we didn't train it with to see how well it graphs and we have our inputs equals inputs dot reshapes or reshaping like we did before we're transforming our inputs so if you remember from the transform between 0 and 1 and finally we want to go ahead and take our x test and we're going to create that x test and for i in range 60 to 80. so here's our x test and we're appending our inputs i to 60 which remember is 0 to 59 and i comma 0 on the other side so it's just the first column which is our open column and once again we take our x test we convert it to a numpy array we do the same reshape we did before and then we get down to the final two lines and here we have something new right here on these last two lines let me just highlight those or or mark them predicted stock price equals regressor dot predicts x test so we're predicting all the stock including both the training and the testing model here and then we want to take this prediction and we want to inverse the transform so remember we put it between 0 and 1. well that's not going to mean very much to me to look at a float number between zero and one i want the dollar amount so i wanna know what the cash value is and we'll go ahead and run this and you'll see it runs much quicker than the training that's what's so wonderful about these neural networks once you put them together it takes just a second to run the same neural network that took us what a half hour to train add and plot the data we're going to plot what we think it's going to be and we're going to plot it against the real data what the google stock actually did so let's go ahead and take a look at that in code and let's uh pull this code up so we have our plt that's our oh if you remember from the very beginning let me just go back up to the top we have our matplot library.piplot as plt that's where that comes in and we come down here we're going to plot let me get my drawing thing out again we're going to go ahead and plt is basically kind of like an object it's one of the things that always threw me when i'm doing graphs in python because i always think you have to create an object and then it loads that class in there well in this case plt is like a canvas you're putting stuff on so if you've done html5 you'll have the canvas object this is the canvas so we're going to plot the real stock price that's what it actually is and we're going to give that color red so it's going to be in bright red we're going to label it real google stock price and then we're going to do our predicted stock and we're going to do it in blue and it's going to be labeled predicted and we'll give it a title because it's always nice to give a title to your graph especially if you're going to present this to somebody you know to your shareholders in the office and the x label is going to be time because it's a time series and we didn't actually put the actual date and times on here but that's fine we just know they're incremented by time and then of course the y label is the actual stock price plt.legend tells us to build the legend on here so that the color red and real google stock price show up on there and then the plot shows us that actual graph so let's go ahead and run this and see what that looks like and you can see here we have a nice graph and let's talk just a little bit about this graph before we wrap it up here's our legend i was telling you about that's why we have the legend to show the prices we have our title and everything and you'll notice on the bottom we have a time sequence we didn't put the actual time in here now we could have we could have gone ahead and plotted the x since we know what the dates are and plotted this to dates but we also know that it's only the last piece of data that we're looking at so last piece of data which in somewhere probably around here on the graph i think it's like about 20 of the data probably less than that we have the google price and the google price has this little up jump and then down and you'll see that the actual google instead of a turn down here just didn't go up as high and didn't go down so our prediction has the same pattern but the overall value is pretty far off as far as um stock but then again we're only looking at one column we're only looking at the open price we're not looking at how many volumes were traded like i was pointing out earlier we talk about stock just right off the bat there's six columns there's open high low close volume then there's whether i mean volume shares then there's the adjusted open adjusted high adjusted low adjusted close they have a special formula to predict exactly what it would really be worth based on the value of the stock and then from there there's all kinds of other stuff you can put in here so we're only looking at one small aspect the opening price of the stock and as you can see here we did a pretty good job this curve follows the curve pretty well it has like a little jumps on it bins they don't quite match up so this bin here does not quite match up with that bin there but it's pretty darn close we have the basic shape of it and the prediction isn't too far off and you can imagine that as we add more data in and look at different aspects in the specific domain of stock we should be able to get a better representation each time we drill in deeper of course this took a half hour for my program my computer to train so you can imagine that if i was running it across all those different variables it might take a little bit longer to train the data not so good for doing a quick tutorial like this and that brings us to the end of this live session on ai and machine learning full course hope you liked watching the video if you have any questions then please feel free to put them in the comment section thanks for watching stay safe and keep learning you