Transcript for:
TensorFlow / Machine Learning Course

Hello, everybody, and welcome to an absolutely massive TensorFlow slash machine learning slash artificial intelligence course. Now, please stick with me for this short introduction, as I am going to give you a lot of important information regarding the course content, the resources for the course, and what you can expect after going through this. Now, first, I will tell you who this course is aimed for. So this course is aimed for people that are beginners in machine learning and artificial intelligence, or maybe have a little bit of understanding but are trying to get better, but do have a basic fundamental knowledge of programming and Python. So this is not a course you're going to take if you haven't done any programming before, or if you don't know any Python syntax in general, it's gonna be highly advised that you understand the basic syntax behind Python, as I'm not going to be explaining that throughout this course. Now, in terms of your instructor for this course, that is going to be me, my name is Tim, some of you may know me, as tech with Tim, from my YouTube channel, right teach all kinds of different programming topics. And I've actually been working with Free Code Camp and posted some of my series on their channel as well. Now let's get into the course break down and talk about exactly what you're going to learn and what you can expect from this course. So as this course is geared towards beginners, and people just getting started in the machine learning and AI world, we're gonna start by breaking down exactly what machine learning and artificial intelligence is. So talking about what the differences are, between them, the different types of machine learning, reinforcement learning, for example, versus neural networks versus simple machine learning, we're gonna go through all those different differences. And then we're going to get into a general introduction of TensorFlow. Now, for those of you that don't know, TensorFlow is a module developed and maintained by Google, which can be used within Python to do a ton of different scientific computing, machine learning and artificial intelligence applications. We're gonna be working with that through the entire tutorial series. And after we do that general introduction to TensorFlow, we're going to get into our core learning algorithms. Now, these are the learning algorithms that you need to know before we can get further into machine learning, they build a really strong foundation, they're pretty easy to understand and implement. And they're extremely powerful. After we do that, we're going to get into neural networks discuss all the different things that go into how neural networks work, how we can use them, and then do a bunch of different examples. And then we're going to get into some more complex aspects of machine learning and artificial intelligence and get to convolution neural networks, which can do things like image recognition and detection. And then we're going to get into recurrent neural networks, which are going to do things like natural language processing, chatbots, text processing, all those different kinds of things, and finally ended off with reinforcement learning. Now, in terms of resources for this course, there are a ton, and what we're going to be doing to make this really easy for you, and for me, is doing everything through Google Collaboratory. Now, if you haven't heard of Google Collaboratory, essentially, it's a collaborative coding environment that runs an iPython Notebook in the cloud, on a Google machine where you can do all of your machine learning for free. So you don't need to install any packages, you don't need to use Pip, you don't need to get your environment set up, all you need to do is open a new Google Collaboratory window, and you can start writing code. And that's what we're gonna be doing in this series. If you look in the description right now, you will see links to all of the notebooks that I use throughout this guide. So if there's anything that you want to cleared up, if you want the code for yourself, if you want just text based descriptions of the things that I'm saying, you can click those links and gain access to them. So with that being said, I'm very excited to get started, I hope you guys are as well. And let's go ahead and get into the content. So in this first section, I'm going to spend a few minutes discussing the difference between artificial intelligence, neural networks and machine learning. And the reason we need to go into this is because we're going to be covering all of these topics throughout this course. So it's vital that you guys understand what these actually mean. And you can kind of differentiate between them. So that's what we're going to focus on now. Now, quick disclaimer here, just so everyone's aware, I'm using something called Windows Inc. This just default comes with Windows, I have a drawing tablet down here. And this is what I'm going to be using for some of the explanatory parts, but there's no real coding, just to kind of illustrate some concepts and topics to you. Now I have very horrible handwriting, I'm not artistic whatsoever, programming is more definitely more of my thing than, you know, drawing and doing diagrams and stuff. But I'm going to try my best. And this is just the way that I find I can convey information the best to you guys. So anyways, let's get started and discuss the first topic here, which is artificial intelligence. Now, artificial intelligence is a huge hype nowadays. And it's funny because a lot of people actually don't know what this means. Or they tried to tell people that what they've created is not artificial intelligence, when in reality, it actually is. Now the kind of formal definition of AI and I'm just gonna read it off of my slide here to make sure that I'm not messing this up, is the effort to automate intellectual tasks normally performed by humans. Now, that's a fairly big definition, right? What is considered an intellectual task and, you know, really, that doesn't help us too much. So what I'm going to do is bring us back to when AI was first created to kind of explain to you how AI has evolved and what it really started out being so back in 1950 There was kind of a question being asked by scientists and researchers, can computers think, can we get them to figure things out? Can we get away from hard coding? And you know, having like, Can we get a computer to think can it do its own thing? So that was kind of the question that was asked. And that's when the term artificial intelligence was kind of coined and created. Now back then AI was simply a predefined set of rules. So if you're thinking about an AI for maybe like tic tac toe, or an AI for chess, all they would have had back then, is predefined rules that humans had come up with and typed into the computer in code. And the computer would simply execute those set of rules and follow those instructions. So there was no deep learning machine learning crazy algorithms happening, it was simply if you wanted the computer to do something, you would have to tell it beforehand, say you're in this position. And this happens, do this. And that's what AI was. And very good AI was simply just a very good set of rules were a ton of different rules that humans had implemented into some program, you could have AI programs that are stretching, you know, half a million lines of code, just with tons and tons and tons of different rules that have been created for that AI. So just be aware that AI does not necessarily mean anything crazy, complex or super complicated. But essentially, if you're trying to simulate some intellectual task, like playing a game that a human would do with a computer, that is considered AI, so even a very basic artificial intelligence for Tic Tac Toe game where it plays against you, that is still considered AI. And if we think of something like Pac Man, right, where we have, you know, our little ghost, and this will be my rough sketch of a ghost, we have our Pac Man guy who will just be this. Well, when we consider this ghost AI, what it does is it attempts to find and kind of simulate how we get to Pac Man, right. And the way this works is just using a very basic pathfinding algorithm. This has nothing to do with deep learning, or machine learning or anything crazy. But this is still considered artificial intelligence, the computer is figuring out how it can kind of play and do something by following an algorithm. So we don't necessarily need to have anything Crazy, Stupid complex to be considered AI, it simply needs to just be simulating some intellectual human behavior. That's kind of the definition of artificial intelligence. Now, obviously, today, AI has evolved into a much more complex field where we now have machine learning and deep learning and all these other techniques, which is what we're going to talk about now. So what I want to start by doing is just drawing a circle here. And I want to label this circle and say, A, I like that. So this is going to define AI, because everything I'm about to put inside of here is considered artificial intelligence. So now, let's get into machine learning. So what I'm going to do is draw another circle inside of here. And we're going to label this circle, ml for machine learning. Now notice I put this inside of the artificial intelligence circle. This is because machine learning is a part of artificial intelligence. Now, what is machine learning? Well, what we talked about previously, was kind of the idea that AI used to just be a predefined set of rules, right? Where what would happen is, we would feed some data, we would go through the rules by and then analyze the data with the rules. And then we'd spit out some output, which would be you know, what we're going to do. So in the classic example of chess, say, we're in check, what we pass that board information to the computer, it looks at it sets of rules, it determines we're in check, and then it moves us somewhere else. Now, what is machine learning in contrast to that? Well, machine learning is kind of the first field that actually figuring out the rules for us. So rather than us hard coding the rules into the computer, what machine learning attempts to do is take the data and take what the output should be, and figure out the rules for us. So you'll often hear that, you know, machine learning requires a lot of data. And you need a ton of examples and, you know, input data to really train a good model. Well, the reason for that is because the way that machine learning works is it generates the rules for us. We give it some input data, we give it what the output data should be. And then it looks at that information and figures out what rules can we generate, so that when we look at new data, we can have the best possible output for that. Now, that's also why a lot of the times, machine learning models do not have 100% accuracy, which means that they may not necessarily get the correct answer every single time. And our goal when we create machine learning models is to raise our accuracy as high as possible, which means it's going to make the fewest mistakes possible. Because just like a human, you know, our machine learning models, which are trying to simulate, you know, human behavior can make mistakes. But to summarize that, essentially machine learning the difference between that and kind of, you know, algorithms and basic artificial intelligence is the fact that rather get that rather than us the programmer giving it the rules. It figures out the rules for us. And we might not necessarily know explicitly what the rules are when we look at machine learning and create machine learning models. But we know that we're giving some input data, we're giving the expected output data, and then it looks at all of that information does some algorithms, which we'll talk about later on that and figures out the rules for us. So that later when we give it some input data, and we don't know the output data, it can use those rules that it's figured out from our examples and all that training data that we gave it to generate some output. Okay, so that's machine learning. Now we've covered AI and machine learning. And now it's time to cover neural networks or deep learning. Now, this circle gets to go right inside of the machine learning right here, I'm just gonna label this one and n, which stands for neural networks. Now, neural networks get a big hype, they're usually what the first, you know, when you get into machine learning, you want to learn neural networks are kind of like neural networks are cool, they're capable of a lot. But let's discuss what these really are. So the easiest way to define a neural network is it as a form of machine learning that uses a layered representation of data. Now, we're not going to really understand this completely right now. But as we get further in, that should start to make more sense as a definition. But what I need to kind of illustrate to you is that in the previous example, where we just talked about machine learning, essentially what we had is we had some input bubbles, which I'm going to define this these, we had some set of rules that is going to be in between here, and then we had some output. And what would happen is we feed this input to this set of rules, something happens in here, and then we get some output. And then that is what you know, our program does, that's what we get from the model, we pretty much just have two layers, we have kind of the input layer, the output layer, and the rules are kind of just what connects those two layers together. Now in neural networks, and what we call deep learning, we have more than two layers. Now, I'm just trying to erase all this quickly. So I can show you that. So let's say and all journalists want another color, because why not? If we're talking about neural networks, what we might have, and this will vary, and I'll talk about this in a second is the fact that we have an input layer, which will be our first layer of data, we could have some layers in between this layer, that are all connected together. And then we could have some output layer. So essentially, what happens is, our data is going to be transformed through different layers, and different things are going to happen, there's gonna be different connections between these layers. And then eventually, we will reach an output. Now it's very difficult to explain neural networks without going completely in depth. So we'll cover a few more notes that I have here. Essentially, in neural networks, we just have multiple layers, that's kind of the way to think of them. And as we see machine learning, you guys should start to understand this more. But just understand that we're dealing with multiple layers. And a lot of people actually call this a multi stage information extraction process. Now, I did not come up with that term. I think that's from a book or something. But essentially, what ends up happening is we have our data at this first layer, which is that input information, which we're going to be passing to the model that we're going to do something with, It then goes to another layer, where it will be transformed, it will change into something else, using a predefined kind of set of rules and weights that we'll talk about later, then it will pass through all of these different layers were different kind of features of the data, which again, we'll discuss in a second will be extracted will be figured out will be found until eventually, we reach an output layer where we can kind of combine everything we've discovered about the data into some kind of output that's meaningful to our program. So that's kind of the best that I can do to explain neural networks without going on to a deeper level, I understand that a lot of you probably don't understand what they are right now. And that's totally fine. But just know that there are layered representation of data, we have multiple layers of information. Whereas in standard machine learning, we only have you know, one or two layers, and then artificial intelligence in general, we don't necessarily have to have like a predefined set of layers. Okay, so that is pretty much it for neural networks, there's one last thing I will say about them is that they're actually not modeled after the brain. So a lot of people seem to think that neural networks are modeled after the brain and the fact that you have neurons firing in your brain. And that can relate to neural networks. Now, there is a biological inspiration for the name neural networks in the way that they work from, you know, human biology, but is not necessarily modeled about the way that our brain works. And in fact, we actually don't really know how a lot of the things in our brain operate and work. So it would be impossible for us to say that neural networks are modeled after the brain, because we actually don't know how information is kind of happens and occurs and transfers through our brain. Or at least we don't know enough to be able to say this is exactly what it is a neural network. So anyways, that was kind of the last point there. Okay, so now we need to talk about data. Now data is the most important part of machine learning and artificial intelligence, neural networks as well. And it's very important that we understand how important data is and what the different kinds of parts of it are, because they're going to be referenced a lot in any of the resources that we're using. Now what I want to do is just create an example here, I'm going to make a data set that is about students final grades in like a school system. So essentially, we're gonna make this a very easy example. We're all we're gonna have for this data set is we're going to have information about students. So we're gonna have their midterm one grade, their midterm to grade, and then we're gonna have their final grade. So I'm just gonna say mid term, one. And again, excuse my handwriting here, it's not the easiest thing to write with this drawing tablet. And then I'll just do final. So this is going to be our data set. And we'll actually see some similar data sets to this as we go through and do some examples later on. So for student one, which we'll just put some students here, we're going to have their midterm one grade, maybe that's a 70, their midterm to grade, maybe that was an 80. And then let's say their final was like their final term grade, not just the mark on the final exam, let's give them a 77. Now for midterm one can give someone a 60, maybe we give them a 90, and then we determine that the final grade on their exam was, let's say, an 84. And then we can do something with maybe a lower grade here, so 4050, and then maybe they got a 38, or something in the final grade. Now, obviously, we could have some other information here that we're admitting like maybe there was some exams, some assignments, whatever some other things they did that contributed to their grade. But the problem that I want to consider here is the fact that given our midterm, one grade and our midterm to grade and our final grade, how can I use this information to predict any one of these three columns. So if I were given a student's midterm, one grade, and I were given a student's final grade, how could I predict their midterm to grade. So this is where we're going to talk about features and labels. Now, whatever information we have that is the input information, which is the information we will always have that we need to give to the model to get some output is what we call our features. So in the example where we're trying to predict midterm two, and let's just do this and highlight this in red, so we understand what we would have as our features, our input information are going to be midterm one. And finally, because this is the information we're going to use to predict something it is the input, it is what we need to give the model. And if we're training a model to look at midterm one and final grade, whenever we want to make a new prediction, we need to have that information to do so. Now, what's highlighted in red, so this midterm two here is what we would call the label or the output. Now, the label is simply what we are trying to look for or predict. So when we talk about features versus labels, features is our input information, the information that we have that we need to use to make a prediction. And our label is that output information that is just representing you know what we're looking for. So when we feed our features to a model, he will give to us a label. And that is kind of the point that we need to understand. So that was the basic here. And now I'm just going to talk a little bit more about data, because we will get into this more as we continue going, and about the importance of it. So the reason why data is so important is this is kind of the key thing that we use to create models. So whenever we're doing AI and machine learning, we need data, pretty much unless you're doing a very specific type of machine learning and artificial intelligence, which we'll talk about later. Now for most of these models, we need tons of different data, we need tons of different examples. And that's because we know how machine learning works now, which is essentially, we're trying to come up with rules for a data set, we have some input information, we have some output information or some features and some labels, we can give that to a model and tell it to start training. And what it will do is come up with rules such that we can just give some features to the model in the future. And then it should be able to give us a pretty good estimate of what the output should be. So when we're training, we have a set of training data. And that is data where we have all of the features and all of the labels. So we have all of this information, then when we're going to test the model or use the model later on, we would not have this midterm two information, we wouldn't pass this in the model, we would just pass our features, which is midterm one and final and then we would get the output of midterm two. So I hope that makes sense. That just means data is extremely important. If we're feeding incorrect data or data that we shouldn't be using to the model that could definitely result in a lot of mistakes. And if we have incorrect output information or incorrect input information that is going to cause a lot of mistakes as well, because that is essentially what the model is using to learn and to kind of develop and figure out what it's going to do with new input information. So anyways, that is enough of data. Now let's talk about the different types of machine learning. Okay, so now that we've discussed the difference between artificial intelligence, machine learning and neural networks, we have a kind of decent idea about what data is in the difference between features and labels. It's time to talk about the different types of machine learning specifically, which are unsupervised learning, supervised learning and reinforcement learning. Now, these are just the different types of learning the different types of figuring things out. Now, different kinds of algorithms fit into these different categories from within artificial intelligence within machine learning and within neural networks. So the first one we're going to talk about is supervised learning, which is kind of what we've already discussed. So I'll just write supervised up here. Again, excuse the handwriting. So supervised Learning. Now what is this? Well, supervised learning is kind of everything we've already learned, which is we have some features. So we'll write our features like this, right, we have some features. And those features correspond to some label or potentially labels, sometimes we might predict more than one information. So when we have this information, we have the features, and we have the labels, what we do is we pass this information to some machine learning model, it figures out the rules for us. And then later on, all we need is the features. And it will give us some labels using those rules. But essentially, what supervised learning is, is when we have both of this information, the reason that's called supervised is because what ends up happening when we train our machine learning model is we pass the input information, it makes some arbitrary prediction using the rules it already knows. And then it compares that prediction that it made to what the actual prediction is, which is this label. So we supervise the model and we say, okay, so you predicted that the color was red, but really, the color of whatever we passed in should have been blue. So we need to tweak you just a little bit so that you get a little bit better, and you move in the correct direction. And that's kind of the way that this works. For example, say we're predicting, you know, student's final grade, well, if we predict that the final grade is 76, but the actual greatest 77, we were pretty close, but we're not quite there. So we supervise the model and we say, Hey, we're gonna tweak you just a little bit, move you in the correct direction. And hopefully we get you to 77. And that is kind of the way to explain this, right, you have the features. So you have the labels, when you pass the features, the model has some rules, and it's already built, it makes a prediction. And then it compares that prediction to the label, and then re tweaks the model and continues doing this with 1000s upon 1000s upon 1000s of pieces of data until eventually it gets so good that we can stop training. And that is what supervised learning is, it's the most common type of learning, it's definitely the most applicable in a lot of instances. And most machine learning algorithms that are actually used use a form of supervised machine learning. A lot of people seem to think that this is, you know, a less complicated, less advanced way of doing things. That is definitely not true. All of the different methods, I'm going to tell you have different advantages and disadvantages. And this has a massive advantage when you have a ton of information and you have the output of that information as well. But sometimes we don't have the luxury of doing that. And that's where we talk about unsupervised learning. So hopefully that made sense for supervised learning, tried my best to explain that. And now let's go into or sorry, for supervised learning. Now let's go into unsupervised learning. So if we know the definition of supervised learning, we should hopefully be able to come up with a definition of unsupervised learning, which is when we only have features. So given a bunch of features like this, and absolutely no labels, no output for these features. What we want to do is have the model come up with those labels for us. Now, this is kind of weird. You're kind of like Wait, how does that work? Why would we even want to do that? Well, let's take this for an example. We have some axis some axes of data, okay. And we have like a two dimensional data point. So I'm just gonna call this, let's say x, and let's say y, okay, and I'm gonna just put a bunch of dots on the screen that kind of represents like maybe a scatterplot of some of our different data. And I'm just going to put some dots specifically closer to other ones, just so you guys kind of get the point of what we're trying to do here. So let's do that. Okay, so let's say I have this data set, this here is what we're working with. And we have these features, the features in this instance, are going to be x and y, right? So X, and Y are my features. Now we don't have any output specifically for these data points, what we actually want to do is we want to create some kind of model that can cluster these data points, which means figure out kind of, you know, unique groups of data and say, okay, so you during group one, you're in group two, you're in group three, and you're in group four, we may not necessarily know how many groups we have, although sometimes we do. But what we want to do is just group them and kind of say, okay, we want to figure out which ones are similar. And we want to combine those together. So hopefully, what we could do with an unsupervised machine learning model is pass all of these features, and then have the model create kind of these groupings. So like maybe this is a group, maybe this is a group, maybe this is a group if we were having four groupings, and maybe if we had two groupings, we might get groupings that look something like this, right. And then when we pass a new data point in, that could we could figure out what group that was a part of by determining, you know, which one is closer to. Now this is kind of a rough example. It's hard to again, explain all of these without going very in depth into the specific algorithms. But unsupervised machine learning or just learning in general is when you don't have some output information. You actually want the model to figure out the output for you. And you don't really care how it gets there. You just want it to get there. And again, a good example is clustering data points, and we'll talk about some specific Applications of when we might even want to use that later on, just understand you have the features, you don't have the labels, and you get the unsupervised model to kind of figure it out for you. Okay, so now our last type, which is very different than the two types I just explained, is called reinforcement learning. Now personally reinforcement learning, and I don't even know if I want to spell this because I feel like I'm going to mess it up. Reinforcement Learning is the coolest type of machine learning in my opinion. And this is when you actually don't have any data, you have what you call an agent, and environment and a reward. I'm going to explain this very briefly with a very, very, very simple example, because it's hard to get too far. So let's say we have a very basic game, you know, maybe we made this game ourselves. And essentially, the objective of the game is to get to the fluc. Okay, that's all it is, we have some ground, you can move left to right, and we want to get to this flag. Well, we want to train some artificial intelligence, some machine learning model that can figure out how to do this. So what we do is we call this our agent, we call this entire thing. So this whole thing here, the environment. So I guess I could write that here. So and by our meant, think I spelt that correctly. And then we have something called a reward. And a reward is essentially what the agent gets when it does something correctly. So let's say the agent takes one step over this way. So let's say he's a new position is here, I just want to keep drawing him. So I'm just gonna use a dot. Well, he got closer to the flag. So what I'm actually going to do is give him a plus two reward. So let's say he moves again, closer to the flag, maybe I give him now plus one, this time, he got even closer. And as he gets closer, I'll give him more and more reward. Now what happens if he moves backwards? So let's erase this. And let's say that at some point in time, rather than moving closer to the for the flag, he moves backwards, well, he might get a negative reward. Now, essentially, what the objective of this agent is to do is to maximize its reward. So if you give it a negative reward for moving backwards, it's going to remember that it's going to say, Okay, at this position here, where I was standing, when I moved backwards, I got a negative reward. So if I get to this position, again, I don't want to go backwards anymore, I want to go forwards, because that should give me a positive reward. And the whole point of this is we have this agent that starts off with absolutely no idea, no kind of, you know, knowledge of the environment. And what it does is it starts exploring, and it's a mixture of randomly exploring and exploring using kind of some of the things that's figured out so far, to try to maximize its reward. So eventually, when the agent gets to the flag, it will have the most the highest possible reward that it can have. And the next time that we plug this agent into the environment, it will know how to get to the flag immediately, because it's kind of figured that out, it's determined that in all these different positions, if I move here, this is the best place to move. So if I get in this position, move there. Now this is again, hard to explain without more detailed examples, and going more mathematically and all that, but essentially, just understand we have the agent, which is kind of what the thing is that's moving around in our environment, we have this environment wizard, which is just what the agent can move around in. And then we have a reward. And the reward is what we need to figure out as the programmer a way to reward the agent correctly so that it gets to the objective in the best possible way. But the agent simply maximizes that reward. So it just figures out where I need to go to maximize that reward, it starts at the beginning, kind of randomly exploring the environment, because it doesn't know any of the rewards it gets at any of the positions. And then as it explores some more different areas, it kind of figures out the rules and the way that the environment works. And then we'll determine how to reach the objective, which is whatever it is that is this a very simple example, you could train a reinforcement model to do this. And you know, like half a second, right. But there is way more advanced examples. And there's been examples of reinforcement learning, like of AI is pretty much figuring out how to play games together how to it's it's actually pretty cool some of the stuff that reinforcement learning is doing. And it's a really awesome kind of advancement in the field because it means we don't need all this data anymore. We can just get this to kind of figure out how to do things for us and explore the environment and learn on its own. Now this can take a really long time, this can take a very short amount of time really depends on the environment. But a real application of this is training AI's to play games, as you might be able to tell by kind of what I was explaining here. And yeah, so that is kind of the fundamental differences between supervised, unsupervised and reinforcement learning, we're going to cover all three of these topics throughout this course. And it's really interesting to see some of the applications we can actually do with this. So with that being said, I'm going to kind of end what I'm going to call module one, which is just a general overview of the different topics, some definitions and getting a fundamental knowledge. And in the next one, what we're gonna be talking about is what TensorFlow is, we're going to get into code In a little bit, and we're going to discuss some different aspects of TensorFlow and things we need to know to be able to move forward and do some more advanced things. So now in module two of this course, what we're going to be doing is getting a general introduction to TensorFlow, understanding what a tensor is understanding shapes and data representation, and then how TensorFlow actually works on a bit of a lower level, this is very important, because you can definitely go through and learn how to do machine learning without kind of gaining this information and knowledge. But it makes it a lot more difficult to tweak your models and really understand what's going on, if you don't, you know, have that fundamental lower level knowledge of how TensorFlow actually works and operates. So that's exactly what we're gonna cover here. Now, for those of you that don't know what TensorFlow is, essentially, this is an open source Machine Learning Library. It's one of the largest ones in the world, it's one of the most well known and it's maintained and supported by Google. Now, TensorFlow, essentially allows us to do and create machine learning models and neural networks, and all of that without having to have a very complex math background. Now, as we get further in, and we start discussing more in detail how neural networks work, and machine learning algorithms actually function, you'll realize there's a lot of math that goes into this. Now, it starts off being very kind of fundamental, like basic calculus and basic linear algebra. And then it gets much more advanced into things like gradient descent, and some more regression techniques and classification. And essentially, you know, a lot of us don't know that we don't really need to know that, so long as we have a basic understanding of it, then we can use the tools that TensorFlow provides for us to create models. And that's exactly what TensorFlow does. Now, what I'm in right now is what I call Google Collaboratory. I'm going to talk about this more in depth in a second. But what I've done for this whole course, is I've transcribed very detailed everything that I'm going to be covering through each module. So this is kind of the transcription of module one, which is the introduction to TensorFlow, you can see it's not crazy long. But I wanted to do this so that any of you can follow along with kind of the text base and kind of my lecture notes, I almost want to call them as I go through the different content. So in the description, there will be links to all of these different notebooks. This is in something called Google Collaboratory, which again, we're going to discuss in a second, but you can see here that I have a bunch of text, and then it gets down to some different coding aspects. And what I'm going to be doing to make sure that I stay on track is simply following along through this, I might deviate slightly, I might go into some other examples. This will be kind of everything I'm going to be covering through each module. So again, to follow along, click the link in the description. Alright, so what can we do with TensorFlow? Well, these are some of the different things I've listed them here. So I don't forget, we can do image classification, data clustering, regression, reinforcement learning, natural language processing, and pretty much anything that you can imagine with machine learning. Essentially, what TensorFlow does, is gives us a library of tools that allow us to omit having to do these very complicated math operations. It just does them for us. Now, there is a bit that we need to know about them, but nothing too complex. Now let's talk about how TensorFlow actually works. So TensorFlow has two main components that we need to understand, to figure out how operations and math are actually performed. Now we have something called graphs and sessions. Now, the way that tensor flow works, is it creates a graph of partial computations. Now, I know this is gonna sound a little bit complicated, some of you guys just try to kind of forget about the complex vocabulary and follow along. But essentially, what we do when we write code in TensorFlow is we create a graph. So if I were to create some variable, that variable gets added to the graph, and maybe that variable is the sum or the summation of two other variables. What the graph will define now is say, you know, we have variable one, which is equal to the sum of variable two and variable three. But what we need to understand is that it doesn't actually evaluate that it simply states that that is the computation that we've defined. So it's almost like writing down an equation without actually performing any math, we kind of just, you know, have that equation there. We know that this is the value, but we haven't evaluated it. So we don't know that the value is like 7%, we just know that it's the sum of, you know, vector one and vector two, or it's the sum of this or it's the cross product, or the dot product, we just defined all of the different partial computations, because we haven't evaluated those computation yet. And that is what is stored in the graph. And the reason that's called a graph is because different computations can be related to each other. For example, if I want to figure out the value of vector one, but vector one is equal to the value of vector three plus vector four, I need to determine the value of vector three and vector four, because before I can do that computation, so they're kind of linked together and I hope that makes a little bit of sense. Now what is a session? Well session is essentially a way to execute part or the entire graph. So when we start In a session, what we do is we start executing different aspects of the graph. So we start at the lowest level of the graph where nothing is dependent on anything else, we have maybe constant values, or something like that. And then we move our way through the graph, and start doing all of the different partial computations that we've defined. Now, I hope that this isn't too confusing. I know this is kind of a lot of lingo you guys should will understand this as we go through. And again, you can read through some of these components here that I have in Collaboratory, if I'm kind of skipping through anything you don't truly understand. But that is the way that graphs and sessions work, we won't go too in depth with them, we do need to understand that that is the way TensorFlow works. And there's some times where we can't use a specific value in our code yet, because we haven't evaluated the graph, we haven't created a session and gotten the values yet, which we might need to do before we can actually, you know, use some specific value. So that's just something to consider. Alright, so now we're actually going to get into coding importing, installing TensorFlow. Now, this is where I'm going to introduce you to Google Collaboratory and explain how you guys can follow along without having to install anything on your computer. And it doesn't matter if you have like a really crappy computer, or even if you're on like an iPhone, per se, you can actually do this, which is amazing. So all you need to do is Google Google Collaboratory, and create a new notebook. Now what Google Collaboratory is, is essentially a free Jupyter Notebook in the cloud for you. The way this works is you can open up this notebook, you can see this is called I, py and B, I yeah, what is that I py and B, which I think just stands for IPython notebook. And what you can do in here is actually write code and write text as well. So this in here is what it's called, you know, Google Collaboratory notebook. And essentially, why it's called a notebook is because not only can you put code but you can also put notes, which is what I've done here with these specific titles. So you can actually use markdown inside of this. So if I open up one of these, you can see that I've used markdown text, to actually kind of create these sections. And yeah, that is kind of how Collaboratory works. But what you can do in Collaboratory is forget about having to install all of these modules. They're already installed for you. So what you're actually going to do when you open a Collaboratory window is Google is going to automatically connect you to one of their servers or one of their machines that has all of this stuff done and set up for you. And you can start writing code and executing it off their machine and seeing the result. So for example, if I want you to print hello, like this, and I'll zoom in a little bit, so you guys can read this, all I do is like create a new code block, which I can do by clicking code. Like that I can delete one like that as well. And I hit run. Now notice, give it a second, it does take longer than typically on your own machine, and we get Hello popping up here. So the great thing about Collaboratory is the fact that we can have multiple code blocks, and we can run them in whatever sequence we want. So to create another code block, you can just you know, do another code block from up here or but just by looking down here, you get code and you get text. And I can run this in whatever order I want. So I can do like print. Yes, for example, I can run Yes, and we'll see the output of Yes, and then I can print hello one more time. And notice that it's showing me the number on this left hand side here on which these kind of code blocks were run. Now all these code blocks can kind of access each other. So for example, I do define funk, and we'll just take some parameter H and all we'll do is just print H, well, if I create another code block down here, so let's go code. I can call funk with say, Hello, make sure I run this block first. So we define the function. Now we'll run funk and notice we get the output Hello. So we can access all of the variables, all the functions, anything we've defined in other code blocks from code blocks that are below it, or code blocks that are executed after it. Now another thing that's great about Collaboratory is the fact that we can import pretty much any module we can imagine. And we don't need to install it. So I'm not actually going to be going through how to install TensorFlow completely. There is a little bit on how to install TensorFlow on your local machine inside of this notebook, which I'll refer you to. But essentially, if you know how to use Pip, it's pretty straightforward. You can pip install TensorFlow or pip install TensorFlow GPU if you have a compatible GPU, which you can check from the link that's in this notebook. Now, if I want to import something, what I can do is literally just write the import. So I can say import NumPy, like this. And usually NumPy is a module that you need to install. But we don't need to do that here. It's already installed on the machine. So again, we hook up to those Google servers, we can use their hardware to perform machine learning. And this is awesome. This is amazing. And it gives you performance benefits when you're running on like a lower kind of crappier machine, right. So we can have a look at the RAM and the disk space of our computer, we can see we have 12 gigs of RAM. We're dealing with 107 gigabytes of data on our disk space. And we can obviously, you know, look at that if we want, we can connect connect to our local runtime, which I believe connects to your local machine, but I'm not going to go through all that. I just want to show you guys some basic components of Have Collaboratory. Now, some other things that are important to understand is this runtime tab, which you might see me use. So restart runtime essentially clears all of your output, and just restarts whatever's happened. Because the great thing with Collaboratory is since I can run specific code blocks, I don't need to execute the entire thing of code every time I want to run something, if I've just made a minor change in one code block, I can just run that code. Sorry, I can just run that code block. I don't need to run everything before it or even everything after it right. But sometimes you want to restart everything and just rerun everything. So to do that, you click Restart runtime, that's just going to clear everything you have. And then restart and run all will restart the runtime as well as run every single block of code you have in sequential order in which it shows up in the thing. So I recommend you guys open up one of these windows, you can obviously follow along with this notebook if you want. But if you want to type it out on your own and kind of mess with it, open up a notebook, save it, it's very easy. And these are again, extremely similar to Jupyter Notebooks or Jupyter notebooks. They're pretty much the same. Okay, so that is kind of the Google Collaboratory aspect how to use that. Let's get into importing TensorFlow. Now, this is going to be kind of specific to Google Collaboratory. So you can see here, these are kind of the steps we need to follow to import TensorFlow. So since we're working in Google Collaboratory, they have multiple versions of TensorFlow, they have the original version of TensorFlow, which is 1.0, and the 2.0 version. Now to define the fact that we want to use TensorFlow 2.0. Just because we're in this notebook, we need to write this line of code at the very beginning of all of our notebooks. So percent, TensorFlow underscore version two point x. Now this is simply just saying, we need to use TensorFlow two point x. So whatever version that is, and this is only required in a notebook, if you're doing this on your local machine in a text editor, you're not going to need to write this. Now once we do that, we typically import TensorFlow as an alias name of TF. Now to do that, we simply import the TensorFlow module and then we write as TF. If you're on your local machine, again, you're going to need to install TensorFlow first, to make sure that you're able to do this, but since we're in Collaboratory, we don't need to do that. Now, since we've defined the fact we're using version two point x, when we print the TensorFlow version, we can see here that it says version two, which is exactly what we're looking for. And then this is TensorFlow. 2.1. Point Oh. So make sure that you print your version you're using version 2.0. Because there is a lot of what I'm using in this series that is kind of if you're in TensorFlow 1.0. It's not going to work. So it's new in TensorFlow 2.0. Or it's been refactored and the names have been changed. Okay, so now that we've done that, we've imported TensorFlow, we've got this here. And I'm actually going to go to my fresh notebook and just do this. So we'll just copy these lines over just so we have some fresh code, and I don't have all this text that we have to deal with. So let's do this TensorFlow, let's import TensorFlow as TF. And then we can print the TF dot version and have a look at that. So version. Okay, so let's run our code. Here. We can see TensorFlow is already loaded. Oh, it says 1.0. So if you get this error, it's actually good. I ran into this where TensorFlow is already been loaded, all you need to do is just restart your runtime. So I'm going to restart and run all just click Yes. And now we should see that we get that version 2.0. Once this starts running, give it a second, TensorFlow 2.0 selected, we're going to import that module. And there we go, we get version 2.0. Okay, so now it's time to talk about tensors. Now, what is a tensor? Now tensor just immediately seems kind of like a complicated name. You're like, Alright, tensor like this is confusing. But what is well, obviously, this is going to be a primary aspect of TensorFlow, considering the name similarities. And essentially, all it is, is a vector generalized to higher dimensions. Now, what is a vector? Well, if you've ever done any linear algebra, or even some basic kind of vector calculus, you should hopefully know what that is. But essentially, it is kind of a data point is kind of the way that I like describe it. And the reason we call it a vector is because it doesn't necessarily have a certain coordinate. So like, if you're talking about a two dimensional data point, you have, you know, maybe an x and a y value, or like an x one value and an x two value. Now a vector can have any amount of dimensions in it, it could have one dimension, which simply means it just one number could have two dimensions, which means we're having two numbers, so like an x and a y value. If we're thinking about a two dimensional graph, we'd have three dimensions if we're thinking about a three dimensional graph, so that would be three data points, we get a four dimensions, if we're talking about sometimes some image data and some video data, five dimensions, and we can keep going going going with vectors. So essentially, what a tensor is, and I'll just read this formal definition to make sure I haven't butchered anything that's from the actual TensorFlow website. A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represented tensors as n dimensional arrays of base datatypes. Now we'll understand what that means in a second. But hopefully, that makes sense. Now, since tensors are so important to TensorFlow, they're kind of the main object that we're going to be working with manipulating and viewing. And it's the main object that's passed around through our program. Now, what we can see here is each tensor represents a partially defined computation that will eventually produce a value. So just like we talked about in the graphs and sessions, what we're going to do is when we create our program, we're going to be creating a bunch of tensors. And TensorFlow is going to be creating them as well. And those are going to store partially defined computations in the graph. Later, when we actually build the graph and have the session running, we will run different parts of the graph, which means we'll execute different tensors and be able to get different results from our tensors. Now each tensor has what we call a data type and a shape. And that's what we're going to get into now. So a data type is simply what kind of information is stored in the tensor. Now, it's very rare that we see any data types different than numbers, although there is the data type of strings and a few others as well. But I haven't included all of them here, because they're not that important. But some examples we can see are float, 32, and 32, string and others. Now, the shape is simply the representation of the tensor in terms of what dimension it is. And we'll get to some examples, because I don't want to explain the shape until we can see some examples to really dial in. But here is some examples of how we would create different tensors. So what you can do is you can simply do TF dot variable, and then you can do the value and the datatype that your tensor is. So in this case, we've created a string tensor, which stores one string, and it is TF dot strings, we define the data type Second, we have a number tensor, which stores some integer value. And then that is of type TF, int 16. And we have a floating point tensor, which stores a simple floating point. Now these tensors have a shape of, I believe it's going to be one, which simply means they are a scalar. Now a scalar value. And you might hear me say this a lot simply means just one value. That's all it means. When we talk about like vector values, that typically means more than one value. And we talked about matrices, we're having different it just it goes up. But scalar simply means one number. So yeah, that is what we get for the different datatypes and creating tensors, we're not really going to do this very much in our program. But just for some examples here, that's how we do it. So we've imported them. So I can actually run these. And I mean, we're not going to really get any output by running this code, because well, there's nothing to see. But now we're going to talk about the rank slash degree of tensors. So another word for rank is degree. So these are interchangeably. And again, this simply means the the, the number of dimensions involved in the tensor. So when we create a tensor of rank zero, which is what we've done up here, we call that a scalar. Now, the reason this has rank zero is because it's simply one thing, we don't have any dimension to this, there's like zero dimensionality, if that was even a word, it just one value. Whereas here, we have an array. Now when we have an array or a list, we immediately have at least rank one. Now the reason for that is because this array can store more than one value in one dimension, right? So I can do something like test, I can do okay, I could do, Tim, which is my name. And we can run this, and we're not going to get any output obviously here. But this is what we would call a rank one tensor. Because it is simply one list one array, which means one dimension, and again, you know, that's also like vector. Now, this, what we're looking at here is a rank two tensor. The reason this is a rank two tensor is because we have a list inside of a list or in this case, multiple lists inside of a list. So the way that you can actually determine the rank of a tensor is the deepest level of nested lists, at least in Python with our representation. That's what that is. So here, we can see we have a list inside of a list, and then another list inside of this upper list. So this would give us rank two. And this is what we typically call a matrices. And this again, is going to be of TF dot strings. So that's the datatype for this tensor variable. So all of these we've created are tensors. They have a data type, and they have some rank and some shape. And we're going to talk about the shape and the second. So to determine the rank of a tensor, we can simply use the method TF dot rank. So notice, when I run this, we get the shape, which is blank of rank two tensor, that's fine. And then we get NumPy two, which simply means that this is of rank two. Now, if I go for that rank one tensor and I print this out. So let's have a look at it, we get NumPy one here, which is telling us that this is simply of rank one. Now if I want to use one of these ones up here and see what it is, so let's try it. We can do numbers, so TF dot rank number, so we'll print that here, and we get NumPy zero because that's rank zero, right? So We'll go back to what we had, which was rank two tensor. But again, those are kind of the examples we want to look at. Okay, so shapes of a tensor. So this is a little bit different now, what a shape simply tells us is how many items we have in each dimension. So in this case, when we're looking at rank two, tensor dot shape, so we have dot shape here, that's an attribute of all of our tensors, we get to two. Now let's look up here, what we have is Whoa, look at this two, and two, so we have two elements in the first dimension, right, and then two elements in the second dimension. That's pretty much what this is telling us. Now let's look at the rank of or the shape of rank one tensor, we get three. So because we only have a rank one, notice we only get one number, whereas when we had rank two, we got two numbers. And it told us how many elements were in each of these lists, right? So if I go and I add another one here, like that, and we have a look now at the shape, oops, I gotta run this first. So that's something Oh, can't convert non square to tensor. Sorry, so I need to have a uniform amount of elements in each one here, I can't just do what I did there. So we'll add a third element here. Now what we can do is run this shouldn't get any issues, let's have a look at the shape. And notice we get now two, three. So we have two lists. And each of those lists have three elements inside of them. So that's how the shape works. Now, I could go ahead and add another list in here if I wanted to. And I could say like, okay, okay. Okay, so let's run this, hopefully, no errors looks like we're good. Now let's look at the shape again. And now we get a shape of three, three, because we have three interior lists. And in each of those lists, we have three elements. And that is pretty much how that works. Now, again, we could go even further here, we could put another list inside of here, that would give us a rank three tensor. And we'd have to do that inside of all of these lists. And then what that would give us now would be three numbers representing how many elements we have in each of those different dimensions. Okay, so changing shape. Alright, so this is what we need to do a lot of times when we're dealing with tensors in TensorFlow. So essentially, there is many different shapes that can represent the same number of elements. So up here, we have three elements in a rank one tensor. And then here, we have nine elements in a rank two tensor. Now, there's ways that we can reshape this data so that we have the same amount of elements but in a different shape. For example, I could flatten this, right, take all of these elements, and throw them into a rank one tensor. That simply is a length of nine elements. So how do we do that? Well, let me just run this code for us here and have a look at this. So what we've done is we've created tensor one, that is TF dot ones, what this stands for is we're going to create a tensor that simply is populated completely with ones of this shape. So shape 123, which means you know, that's the shape we're going to get. So let's print this out and look at tensor one, just so I can better illustrate this. So tensor one, look at the shape that we have 123, right, so we have one interior list, which we're looking at here. And then we have two lists inside of that list. And then each of those lists, we have three elements. So that's the shape we just defined. Now, we have six elements inside of here. So there must be a way that we can reshape this data to have six elements, but in a different shape. In fact, what we can do is reshape this into a 231 shape, we're going to have two lists, right? We're going to have three inside of those. And then inside of each of those, we're going to have one element. So let's have a look at that one. So let's have a look at tensor two, actually, what am I doing, we print all we can print all of them here. So let's just print them and have a look at them. So when we look at tensor one, we saw this was a shape. And now we look at this tensor two. And we can see that we have two lists, right? inside of each of those lists, we have three lists. And inside of each of those lists, we have one element. Now finally, our tensor three is a shape of three, negative one, what is negative one, when we put negative one here, what this does is infer what this number actually needs to be. So if we define an initial shape of three, what this does is say, okay, we're going to have three lists. That's our first level. And then we need to figure out based on how many elements we have in this reshape, which is the method we're using, which I didn't even talk about, which will go into a second, what this next dimension should be. Now, obviously, this is going to need to be three. So three, three, right? Because we're gonna have three lists inside of each of those lists we need to have are actually is that correct? Let's see if that's even the shape three to my bat. So this actually needs to change to three, two, I don't know why I wrote three, three there. But you get the point. Right, so what this does, we have three lists, we have six elements. This number obviously needs to be two because well, three times two is going to give us six and that is essentially how you can determine how many elements are actually in a tensor by just looking at its shape. Now this is the reshape method where all we need to do is called TF dot reshape, give the tensor and give the shape we want to change it to so long as that's a valid shape. And when we multiply all the numbers in here, it's equal to the number of elements in this tensor that will reshape it for us and give us that new shaped data. This is very useful. We'll use this actually a lot as we go through TensorFlow. So make sure you're kind of familiar with how that works. Alright, so now we're moving on to types of tensors. So there is a bunch of different types of tensors that we can use. So far, the only one we've looked at is variable. So we've created TF dot variables, and kind of just hard coded our own tensors, we're not really going to do that very much. But just for that example. So we have these different types, we have constant placeholder sparsetensor variable, there's actually a few other ones as well. Now, we're not going to really talk about these two that much, although constant and variable are important to understand the difference between so we can read this, with the exception of variable, all of these tensors are immutable, meaning their value may not change during execution. So essentially, all of these, when we create a tensor mean, we have some constant value, which means that whatever we've defined here, it's not going to change, whereas the variable tensor could change. So that's just something to keep in mind when we use variable. That's because we think we might need to change the value of that tensor later on. Whereas if we're using a constant value tensor, we cannot change it. So that's just something to keep in mind, we can obviously copy it, but we can't change. Okay, so evaluating tensors, we're almost at the end of this section, I know. And then we'll get into some more kind of deeper code. So there will be some times for this guide, we need to evaluate a tense, of course, so what we need to do to evaluate a tensor is create a session. Now, this isn't really like, we're not going to do this that much. But I just figured I'd mention it to make sure that you guys are aware of what I'm doing. If I start kind of typing this later on. Essentially, sometimes we have some tensor object, and throughout our code, we actually need to evaluate it to be able to do something else. So to do that, all we need to do is literally just use this kind of default template, a block of code, where we say with TF dot session, as some kind of session doesn't really matter what we put here, then we can just do whatever the tensor name is dot eval, and calling that will actually have TensorFlow, just figure out what it needs to do to find the value of this tensor, it will evaluate it, and then it will allow us to actually use that value. So I put this in here, you guys can obviously read through this, if you want to understand some more in depth on how that works. And the source for this is straight from the TensorFlow website, a lot of this is straight up copied from there. And I've just kind of added my own spin to it and made it a little bit easier to understand. Okay, so we've done all that. So let's just go in here and do a few examples of reshaping just to make sure that everyone's kind of on the same page. And then we'll move on to actually talking about some simple learning algorithms. So I want to create a tensor that we can kind of mess with and reshape, so what I'm going to do is just say t equals and we'll say TF dot ones. Now, what TF dot ones does is just create, again, all the values to be ones that we're going to have and whatever shape now we can also do zeros and zeros is just going to give us a bunch of zeros. And let's create some like crazy shape and just visualize this, let's see like a five by five by five. So obviously, if we want to figure out how many elements are going to be in here, we need to multiply this value. So I believe this is going to be 625, because that should be five to the power of four, so five times five times five times five. And let's actually print T and have a look at that and see what this is. So we run this now. And you can see this is the output we're getting. So obviously, this is a pretty crazy looking tensor. but you get the point, right, and it tells us the shape is 55555. Now watch what happens when I reshape this tensor. So if I want to take all of these elements and flatten them out, what I could do is simply say, we'll say t equals TF dot reshape, like that. And we'll reshape the tensor t to just the shape 625. Now, if we do this, and we run here, oops, I got to print T. At the bottom, after we've done that, if I could spell the print statement correctly, you can see that now we just get this massive list that just has 625 zeros. And again, if we wanted to reshape this to something like 125, and maybe we weren't that good at math, and couldn't figure out that this last value should be five, we could put a negative one, this would mean that TensorFlow would infer now what the shape needs to be. And now when we look at it, we can see that we're we're going to get is well just simply five kind of sets of these. I don't know matrices, whatever you want to call them, and our shape is 125. Five. So that is essentially how that works. So that's how we reshape. That's how we kind of deal with tensors. Create variables, how that works in terms of sessions and graphs. And hopefully with that, that gives you enough of an understanding of tensors of shapes of ranks a value so that when we move into the next part of the tutorial, we're actually writing code and I promise we're going to be writing some more advanced code, you'll understand how that works. So with that being said, let's get into the next section. So welcome to Module Three of this course. Now what we're going to be doing in this module is learning the core machine learning algorithms that come with TensorFlow. Now, these algorithms are not specific to TensorFlow, but they are used within there. And we'll use some tools from TensorFlow to kind of implement them. But essentially, these are the building blocks. Before moving on to things like neural networks and more advanced machine learning techniques, you really need to understand how these work because they're kind of used in a lot of different techniques and combined together, and one of them but to show you is actually very powerful if you use it in the right way, a lot of what machine learning actually is and a lot of machine learning algorithms and implementations and businesses and applications and stuff like that, actually just use pretty basic models. Because these models are capable of actually doing, you know, very powerful things. When you're not dealing with anything that's crazy complicated, you just need some basic machine learning some basic classification, you can use these kind of fundamental core learning algorithms. Now, the first one we're going to go through is linear regression. But we will cover classification and clustering in hidden Markov models. And those are kind of going to give us a good spread of the different core algorithms. Now there is a ton ton like 1000s of different machine learning algorithms. These are kind of the main categories that you'll cover. But within these categories, there is more specific algorithms that you can get into, I just feel like I need to mention that because I know a lot of you will have maybe seen some different ways of doing things in this course might show you, you know, a different perspective on that. So let me just quickly talk about how I'm going to go through this, it's very similar to before I have this notebook, as I've kind of talked about, there is a link in the description, I would recommend that you guys hit that and follow along with what I'm doing and read through the notebook. But I will just be going through the notebook. And then occasionally, what I will actually do, oops, I need to open this up here is go to this kind of untitled tab I have here and write some code in here, because most of what I'm going to do is just copy code over into here, so we can see it all in kind of one block. And then we'll be good to go. And the last note before we really get into it, and I'm sorry, I'm talking a lot. But it is important to make you guys aware of this, you're going to see that we use a lot of complicated syntax throughout this kind of series and the rest of the course in general, I just want to make it extremely clear that you should not have to memorize or even feel obligated to memorize any of the syntax that you see everything that you see here, I personally don't even have memorized, there's a lot of what's in here that I can't just come up with the top of my head, when we're dealing with kind of library and module so big that like TensorFlow, it's hard to memorize all those different components. So just make sure you understand what's happening. But you don't need to memorize it, if you're ever going to need to use any of these tools, you're going to look them up, you're going to see what it is you're like, Okay, I've used this before, you're going to understand it. And then you can go ahead and you know, copy that code in and use it in whatever way you need to, you don't need to memorize anything that we do here. Alright, so let's go ahead and get started with linear regression. So what is linear regression? What's one of those basic forms of machine learning, and essentially, what we try to do is have a linear correspondence between data points. So I'm just going to scroll down here to a good example. So what I've done is use matplotlib, just to plot a little graph here. So we can see this one right here. And essentially, this is kind of our data set, this will we'll call our data set, what we want to do is use linear regression to come up with a model that can give us some good predictions for our data points. So in this instance, maybe what we want to do is given some x value for a data point, we want to predict the y value. Now in this case, we can see there's kind of some correspondence linearly for these data points. Now, what that means is we can draw something called a line of best fit through these data points that can kind of accurately classify them, if that makes any sense. So I'm going to scroll down here and look at what our line of best fit for this data set actually is, we can see this blue line, it pretty much I mean, is the perfect line of best fit for this data set. And using this line, we can actually predict future values in our data set. So essentially, linear regression is used when you have data points that correlate in kind of a linear fashion. Now, this is a very basic example, because we're doing this in two dimensions with x and y. But oftentimes, what you'll have is you'll have data points that have you know, eight or nine kind of input values. So that gives us you know, a nine dimensional kind of data set. And what we'll do is predict one of the different values. So in the instance, where we were talking about students before, maybe we have a student's What does it midterm grade and their second midterm grade, and then we want to predict their final grade, what we can do is use linear regression to do that, where our kind of input values are going to be the two midterm grades and the output value is going to be that final grade that we're looking to predict. So if we were to plot that, we would plot that on a three dimensional graph, and we would draw a three dimensional line that would represent the line of best fit for that data set. Now, for any of you that don't know what line of best fit stands for, it says line or this is just the definition I got from this website here. line of best fit refers to a line through a scatterplot of data points that best expresses the relationship between those points. So exactly what I've kind of been trying to explain when we have data that correlates linearly and I always butcher that word. What we can do is draw a line through it. And then we can use that line to predict new data points. Because if that line is good, it's a good line of best fit for the data set, then hopefully, we would assume that we can just, you know, pick some point, find where it would be on that line. And that'll be kind of our predicted value. So I'm going to go into an example now where I start drawing and going into a little bit of math. So we understand how this works on a deeper level. But that should give you a surface level understanding. So actually, I'll leave this up because I was messing with this beforehand. This is kind of a data set that I've drawn on here. So we have our x and we have our y, and we have our line of best fit. Now what I want to do is I want to use this line of best fit to predict a new data point. So all these red data points are ones that we've trained our model with the information that we gave to the model so that it could create this line of best fit, because essentially, all linear regression really does is look at all of these data points and create a line of best fit for them. That's all it does. It's pretty, I don't know the word for it's pretty easy to actually do this, this algorithm is not that complicated is not that advanced. And that's why we start with it here because it just makes sense to explain. So I hope that a lot of you know in two dimensions, a line can be defined as follows. So with the equation y equals mx plus b, now B stands for the y intercept, which means somewhere on this line, so essentially where the line starts. So in this instance, our b value is going to be right here. So this is going to be B because that is the y intercept. So we can say that that's like maybe you know, we go on, we'll do this, we'll say this is like 123, we might say B is something like 0.4, right? So I can just pencil that in 0.4. And then what is m x and y? Well, X and Y stands for the coordinates of this data point. So this would have, you know, some x y value. In this case, we might call it you know, something like, what do you want to say to 2.7 that might be the value of this data point. So that's our x and y. And then our M stands for the slope, which is probably the most important part. Now slope simply defines the steepness of this line of best fit that we've done here. Now the way we calculate slope is using rise over run, no rise over run essentially just means how much we went up versus how much we went across. So if you want to calculate the slope of a line, what you can actually do is just draw a triangle. So right angled triangle anywhere on the line, so just pick two data points. And what you can do is calculate this distance and this distance, and then you can simply divide the distance up by the distance across, and that gives you the slope. Now, I'm not going to go too far into slope, because I feel like you guys probably understand what that is. But let's just pick some values for this line. And I want to actually show you some real examples of math and how we're going to do this. So let's say that our linear regression algorithm, you know, comes up with this line, I'm not going to discuss really how it does that, although it just pretty much looks at all these data points, and finds the line that you know, goes, it splits these data points evenly. So essentially, you want to be as close to every data point as possible. And you want to have as many data points, you want to have the same amount of data points on the left side and the right side of the line. So in this example, we have you know, a data point on the left a data point on the left, we have one two that are pretty much on the line. And then we have two that are on the right. So this is a pretty good line of best fit, because all of the points are very close to the line, and they split them evenly. So that's kind of how you come up with a line of best fit. So let's say that the equation for this line is something like y equals, let's just give it 1.5 and x plus, and let's say that value is just 0.5 to make it easy. So this is going to be the equation of our line. Now notice that X and Y don't have a value, that's because we need to give the value to come up with one of the other ones. So what we can do is we can say if we have either the y value, or we have the x value of some point, and we want to figure out, you know where it is on the line, what we can do is just feed one in do a calculation, and that will actually give us the other value. So in this instance, let's say that, you know, I'm trying to predict something and I'm given the the fact that x equals two, I know that x equals two, and I want to figure out what y would be if x equals two, well, I can use this line to do so. So what I would do is I'm going to say y equals 1.5 times two plus 0.5. Now all of you quick math majors out there, give me the value of 3.5. Which means that if x was at two, then I would have my data point as a prediction here on this line. And I would say okay, so you're telling me access to my prediction is that y is going to be equal to 3.5. Because given the line of best fit for this data set, that's where this point will lie on that line. So I hope that makes sense. You can actually do this the reverse way as well. So if I'm just given some y value, say I know that you know my y value is at like 2.7 or something. I could plug that in just rearrange the numbers in this equation and then solve for x. Now obviously, this is a very basic example, because we're just doing all of this in two dimensions, but you can do this in higher dimensions as well. So actually, most times, what's gonna end up happening is you're gonna have, you know, like eight or nine input variables, and then you're gonna have one output variable that you're predicting. Now, so long as our data points are correlated linearly, in three dimensions, we can still do this. So I'm going to attempt to show you this actually, in three dimensions, just to hopefully clear some things up because it is important to kind of get a grasp and perspective of the different dimensions. So let's say we have a bunch of data points that are kind of like this. And I'm trying my best to kind of draw them in some linear fashion using like all the dimensions here. But it is hard because drawing in three dimensions on a two dimensional screen is not easy. Okay. So let's say this is kind of like what our data points look like. Now, I would say that these correlate linearly, like pretty pretty well, they kind of go up in one fashion. And we don't know the scale of this. So this is probably fun. So the line of best fit for this data set, and I'll just put my kind of thickness up might be something like this right? Now notice that this line is in three dimensions, right? This is going to cross our I guess this is our x, y, and Zed axes. So we have a three dimensional line. Now, the equation for this line a little bit more complicated, I'm not going to talk about exactly what it is. But essentially, what we do is we make this line and then we say, Okay, what value do I want to predict? Do I want to predict y x y Zed. Now, so long as I have two values, so two values, I can always predict the other one. So if I have, you know, the x y of a data point, that will give me the Zed. And if I have the Zed y, that will give me the x. So so long as you have you know, all of the data points, except one, you can always find what that point is, based on the fact that, you know, we have this line, and we're using that to predict. So I think I'm going to leave it at that for the explanation. I hope that makes sense. Again, just understand that we use linear regression when our data points are correlated linearly. Now, some good examples of linear regression were, you know, that kind of student predicting the grade kind of thing, you would assume that if someone has, you know, a low grade, then they would finish with a lower grade, and you would assume if they have a higher grade, they would finish with a higher grade. Now, you could also do something like predicting, you know, future life expectancy. Now, this is kind of a darker example. But essentially, what you could think of here is, if someone is older, they're expected to live, you know, like, not as long. Or you could look at health conditions, if someone is in critical illness condition, and they have a critical illness, then chances are their life expectancy is lower. So that's an example of something that is correlated linearly, essentially, something goes up and something goes down or something goes up, the other thing goes up. That's kind of what you need to think of when you think of a linear correlation. Now the magnitude of that correlation, so you know, how much does one go up versus how much one goes down, is exactly what our algorithm figures out for us, we just need to know to pick linear regression when we think things are going to be correlated in that sense. Okay, so that is enough of the explanation of linear regression. Now, we're going to get into actually coding and creating a model. But we first need to talk about the data set that we're going to use in the example we're going to kind of illustrate linear regression with. Okay, so I'm here and I'm back in the notebook. Now, these are the inputs, we need to start with to actually start programming and getting some stuff done. Now, the first thing we need to do is actually install SK learn. Now, even if you're in a notebook, you actually need to do this because for some reason, it doesn't come by default with the notebook. So to do this, we just did an exclamation point, pip install hyphen, Q, sk learn. Now if you're going to be working on your own machine. Again, you can use PIP to install this. And I'm assuming that you know to use PIP if you're going to be going along in that direction. Now, as before, since we're in the notebook, we need to define we're going to use TensorFlow version two point x. So to do that, we're going to just, you know, do that up here with the percent sign. And then we have all these imports, which we're going to be using throughout here. So from future import, absolutely important division, print function, Unicode literals. And then obviously, the big ones. So NumPy, pandas matplotlib. When we're using ipython, we're gonna be using TensorFlow. And Yep, so I'm actually just gonna explain what some of these modules are. Because I feel like some of you may actually not know. NumPy is essentially a very optimized version of arrays in Python. So what this allows us to do is lots of kind of multi dimensional calculations. So essentially, if you have a multi dimensional array, which we've talked about before, right when we had, you know, those crazy shapes, like 5555 NumPy, allows us to represent data in that form, and then very quickly manipulate and perform operations on it. So we can do things like cross product dot product, matrix addition, matrix subtraction, element wise, addition, subtraction, you know, vector operations, that's what this does. For us. It's pretty complex, but we're going to be using it a fair amount. pandas. Now what pandas does is it's kind of a data analytics tool. I almost want to say, I don't know the formal definition of what pandas is, but it allows us to very easily manipulate data so you know, loading data sets, view data sets, cut off specific columns, or cut out rows from our data set, visualize the data sets. That's what pandas does for us. Now, matplotlib is actually a visualization have kind of graphs and charts. So we'll use that a little bit lower when I actually graph some different aspects of our data set, the ipython display. This is just specific for this notebook. It's just to clear the output. There's nothing crazy with that. And then obviously, we know what TensorFlow is this crazy import for TensorFlow here. So compact v2 feature column as FC we'll talk about later. But we need something called a feature column when we create a linear regression algorithm or model in TensorFlow, so we're going to use that. Okay. So now that we've gone through all that, we need to start talking about the data set that we're going to use for linear regression. And for this example, because what we're going to do is, you know, actually create this model and start using it to predict values. So the data set that we're going to use, actually, I need to read this because I forget exactly what the name of it is, is the Titanic data set, that's what it is. So essentially, what this does is aimed to predict who's going to survive, or the likelihood that someone will survive being on the Titanic given a bunch of information. So what we need to do is a load in this data set. Now I know this seems like a bunch of gibberish, but this is how we need to load it. So we're going to use pandas, so PD dot read CSV from this URL. So what this is going to do is take this CSV file, which stands for comma separated values, and we can actually look at this if we want, I think, so I said it said Ctrl, click, let's see if this pops up. So let's actually download this. And let's open this up ourselves and have a look at what it is in Excel. So I'm going to bring this up here, you can see that link. And this is what our data set is. So we have our columns, which just stand for, you know, what is it the different attributes in our data set of the different features and labels of our data set, we have survived. So this is what we're actually going to be aiming to predict. So we're going to call this our label right where our output information. So here, a zero stands for the fact that someone did not survive, and one stands for the fact that someone did survive. Now just thinking about it on your own for a second, and looking at some of the categories we have up here, can you think about why linear regression would be a good algorithm for something like this? Well, for example, if someone is a female, we can kind of assume that they're going to have a higher chance of surviving on the Titanic, just because of you know, the kind of the way that our culture works, you know, saving woman and children first, right. And if we look through this data set, we'll see that when we see females, it's pretty rare that they don't survive, although as I go through, there is quite a few that didn't survive. But if we look at it, compared to males, you know, there's definitely a strong correlation that being a female results in a stronger survival rate. Now, if we look at age, right, can we think of how age might affect this? Well, I would assume if someone's way younger, they probably have a higher chance of surviving, because they would be you know, prioritized in terms of lifeboats or whatever it was, I don't know much about the Titanic. So I can't talk about that specifically. But I'm just trying to go through the categories and explain to you why we pick this algorithm. Now, number of siblings that one might not be as you know, influential, in my opinion, parche. I don't actually remember what parche stands for, I think it is like what part, I don't know exactly what this column stands for. So unfortunately, I can't tell you guys that one. But we'll talk about some more the second fare, again, not exactly sure what fare stands for, I'm going to look on the TensorFlow website after this and get back to you guys. And we have a class. So class is what class they were on the boat, right? So first class, second class, third class. So you might think someone that's in a higher class might have a higher chance of surviving, we have deck. So this is what deck they were on when it crashed. So unknown is pretty common. And then we have all these other decks, you know, if someone got hit, if someone was standing on the deck that had the initial impact, we might assume that they would have a lower chance of survival bark to is where they were going, and then are they alone? Yes or no. And this one, you know, this is interesting, we're going to see does this make an effect, if someone has a loan is that a higher chance of survival? Is that a lower chance of survival? So this is kind of interesting. And this is what I want you guys to think about is that when we have information and data like this, we don't necessarily know what correlations there might be. But we can kind of assume there's some linear thing that we're looking for some kind of pattern, right. Whereas if something is true, then you know, maybe it's more likely someone will survive. Whereas like, if they're not alone, maybe it's less likely. And maybe there's no correlation whatsoever. But that's where we're gonna find out as we do this model. So let me look actually, on the TensorFlow website and see if I can remember what parche and I guess what fair was. So let's go up to the top here. Again, a lot of this stuff is just straight up, copied from the TensorFlow website. I've just added my own stuff to it. You can see like, I just copied all this. We're just bringing it in there. Let's see what it says about the different columns if it gives us any exact explanations. Okay, so I couldn't find what parts were fair stance where for some reason, it's on the TensorFlow website, either I couldn't really find any information about it. Because no, you know, leave a comment down below. But it's not that important. We just want to use this data to do a test. So what I've done here, we've I've loaded in my data set, and notice that I've loaded a training data set in a testing data set. Now we'll talk about this more later. This is important, I have two different data sets, one to train the model with, and one to test the model with no kind of the basic reason we would do this Because when we test our model for accuracy to see how well it's doing, it doesn't make sense to test it on data it's already seen, it needs to see fresh data so we can make sure there's no bias and that it hasn't simply just memorize the data, you know, that we had. Now, what I'm doing here at the bottom with this y train, and this y eval is I'm essentially popping a column off of this data set. So if I print out the data set here, and I'm actually I'm going to show you a cool trick with pandas that we can use to look at this. So I can say df train.so, by looking at this by just looking at the head, and we'll print this out, I might need to import some stuff above. We'll see if this works or not yet, so I need to just do these imports. So let's install. And let's do these imports away for the serrana. Okay, so I've just selected TensorFlow 2.0. We're just importing this now should be done in one second. And now what we'll do is we'll print out the data frame here. So essentially, what this does is load this into a panda's data frame. This is a specific type of object. Now, we're not going to go into this specifically. But a data frame allows us to view a lot of different aspects about the data and kind of store it in a nice form, as opposed to just loading it in and storing it in like a list or a NumPy array, which we might do if we didn't know how to use pandas, this is a really nice way to do it read CSV loaded into a data frame object, which actually means we can reference specific columns and specific rows in the data frame. So let's run this and just have a look at it. Yep, I got need to print df train dot head. So let's do that. And there we go. So this is what our data frame head looks like. Now head, what that does is show us the first five entries in our data set, as well as show us a lot of the different columns that are in it. Now since we have more than you know, we have a few different columns, it's not showing us all of them, it's just giving us the dot dot dot. But we can see this is what the data frame looks like. And this is kind of the representation internally. So we have entries zero survived zero survived. One, we have male, female, all that. Now notice that this has the survived column, okay? Because what I'm going to do is I'm going to print the data frame head again. So df train, dot head, after we run these two lines. Now what this line does is takes this entire survived column, so all these zeros and ones, and removes it from this data frame, so the head data frame, and stores it in the variable y train. The reason we need to do that is because we need to separate the data, we're going to be classifying from the data that is kind of our input information or our initial data set. Right. So since we're looking for the survived information, we're gonna put that in its own, you know, kind of variable store here. Now we'll do the same thing for the evaluation data set, which is df evaluation or testing data. And notice that here, this was trained as CSV, and this one was eval dot CSV. Now these have the exact same form, they look the look completely identical. It's just that, you know, some entries, we've just kind of arbitrarily split them. So we're gonna have a lot of entries in this training set. And we'll have a few in the testing set that we'll just used to do an evaluation on the model later on. So we pop them off by doing this pop removes and returns this column. So if I print out y train, which are actually let's look at this one, first, just to show you how it's been removed, we can see that we have the survived column here, we popped, and now the survived column is removed from that data set. So it's just important to understand. Now we can print out some other stuff too. So we can look at the Y train, see what that is just to make sure we really understand this data. So let's look at y train. And you can see that we have 626 or 627 entries, and just you know, zeros or ones representing whether someone survived or whether they did not. Now the corresponding indexes in this kind of list or data frame correspond to the indexes in the testing and training data frame. What I mean by that is, you know, entry zero in this specific data frame, corresponds to entry zero in our y train variable. So if someone survived, you know, at entry zero, it would say one here, right, or in this case, entry zero did not survive. Now, I hope that's clear. I hope I'm not confusing you with that. But I just want to show one more example to make sure. So we'll say df train zero, I'm going to print that and then we're going to print y train at index zero, oops, if I didn't mess up my brackets, and we'll have a look at it. Okay, so I've just looked up the documentation, because I totally forgot that I couldn't do that. If I want to find one specific row in my data frame, what I can do is print dot look. So I do my data frame, and then dot look, and then whatever index i want. So in this case, I'm locating row zero, which is this. And then on the Y train, I'm doing the same thing. I'm locating row zero. Now what I had before, right, if I did df train, and I put square brackets inside here, what I can actually do is reference a specific column. So if I wanted to look at, you know, say the column for age, right, so we have a column for age, what I can do is do df train age. And then I can print this out like this, and it gives me all of the different age values. So that's kind of how we use a data frame. We'll see that as we go further on. Now. Let's go back to the other example I had, because I just erased it, where I wanted to show you the rows zero in the data frame. That's training. And then in the y train, you know, output, whatever that is. So the survival. So you can see here that this is what we get from printing df train dot loke, zero. So row zero, this is all the information. And then here, this corresponds to the fact that they did not survive at row zero because it's simply just the output is value zero. I know this is weird is saying like name zero D type object zero, don't worry about that. It's just because it's trying to print it with some information. But essentially, this just means this person who was male 22, and had one sibling did not survive. Okay, so let's get out of this. Now we can close this, and let's go to Oh, we've pretty much already done what I've just had down here, we can look at the data frame head, this is a little bit of a nicer output, when we just have df train dot head, we can see that we get kind of a nice outputted little graph, we've already looked at this information. So we know kind of some of the attributes of the data set. Now we want to describe the data set, sometimes what describe does is just give us some overall information. So let's have a look at it here, we can see that we have 627 entries, the mean of age is 29, the standard deviation is you know, 12, point, whatever. And then we get the same information about all of these other different attributes. So for example, it gives us you know, the mean fair, the minimum fair, and just some statistics. Because understand this great, if you don't doesn't really matter, the important thing to look at typically is just how many entries we have, sometimes we need that information. And sometimes the mean can be helpful as well, because you can kind of get an average of like what the average value is in the data set. So if there's any bias later on, you can figure that out. But it's not crazy important. Okay, so let's have a look at the shape. So just like NumPy arrays, and tensors have a shape attribute. So do data frames. So we want to look at the shape, you know, we can just print a df train dot shape, we get 627 by nine, which essentially means we have 627 rows, and nine columns or nine attributes. So yeah, this is what it says here, you know, 627 entries, nine features, we can interchange attributes and features. And we can look at the head information for y. So we can see that here, which we've already looked at before. And that gives us the name, which was survived. Okay, so now what we can actually do is make some kind of graphs about this data. Now I've just stolen this code, you know, straight up from the TensorFlow website, I wouldn't expect you guys to do any of this, you know, like output any of these values, what we're going to do is create a few histograms and some plots just to look at kind of some correlations in the data so that when we start creating this model, we have some intuition on what we might expect. So let's look at age. So this gives us a histogram of the age. So we can see that there's about 25, people that are kind of between zero and five, there is, you know, maybe like five people that are in between five and 10. And then the most amount of people are kind of in between their 20s, and 30s. So in the mid 20s, this is good information to know, because that's going to introduce a little bit of bias into kind of our linear correlation graph, right. So just understanding you know that we have like a large subset, there's some outliers here, like there's one person that's ad right over here, a few people that are 70, slim, important things to kind of understand before we move on to the algorithm. So let's look at the sex values now. So this is how many female and how many male, we can see that there's much, many more males on there as females, we can have a look at the class. So we can see if they're in first, second or third class, most people are in third, then followed by first and then second. And then lastly, we can look at what is this that we're doing Oh, the percentage survival by sex. So we can see how likely a specific person or specific sex is to survive just by plotting this. So we can see that males have about a 20% survival rate, whereas females are all the way up to about 78%. So that's important to understand, that kind of confirms that what we were looking at before in the data set when we were exploring it, and you don't need to do this every time that you're looking at a data set, but it is good to kind of get some intuition about it. So this is what we've learned. So far, majority passengers are in their 20s, or 30s, the majority passengers are male, they're in third class. And females have a much higher chance of survival kind of already knew that. Alright, so training and testing data sets. Now, we already kind of went through this, so I'll skim through it quickly. Something that we did above is load in two different data sets. The first data set was that training data set, which had the shape of 627 by nine, what I'm actually going to do is create a code block here, and just have a look at what was this df eval dot shape to show you how many entries we have in here. So here in our testing data set, you can see we have significantly less at 264 entries, or rows, whatever you want to call them. So that's how many things we have to actually test our model. So what we do is we use that training data to create the model and then the testing data to evaluate it and make sure that it's working properly. So these things are important. Whenever we're doing machine learning models, we typically have testing and training data. And yeah, that is pretty much it. Now I'm just gonna take one second to copy over a lot of this code into the kind of other notebook I have just so we can see all of it at once. And then we'll be back and we'll get into actually making the model. Okay, so I've copied in some code here. I know this seems A lot of kind of gibberish right now, but I'm gonna break down line by line what all this is doing and why we have this here. But we first need to discuss something called the feature columns. And the difference between categorical and numeric data. So get categorical data is actually fairly common. Now when we're looking at our data set, and actually I can open I don't have it open in Excel anymore. But let's open this from my downloads. So let's downloads where is this train? Okay, awesome. So we have this Excel data sheet here. And we can see what a categorical data or categorical data is, is something that's not numeric. So for example, unknown see, first third, city and why right, so anything that has different categories, there's going to be like a specific set of different categories, there could be so for example, for age, kind of the set of values we could have for age was is numeric, so that's different. But for categorical, we can have male or female, and I suppose we could have other but in this data set, we just have mail and we just have female. For class, we're gonna have first, second third for deck, we can have unknown CA, I'm sure through all the letters of the alphabet, but that is still considered categorical. Now, what do we do with categorical data? Well, we always need to transform this data into numbers somehow. So what we actually end up doing is we encode this data using integer values. So for the example of male and female, what we might say, and this is what we're gonna do in a second is that female is represented by zero and male is represented by one, we do this because although it's interesting to know what the actual class is, the model doesn't care, right female and male, it doesn't make a difference to it, it just needs to know that those values are different or different, or those values are the same. So rather than using strings and trying to find some way to pass that in and do math with that, we need to turn those into integers, we turn those into zeros and ones right now for class, right, so first, second, third, you know, you guys can probably assume that we're going to encode this with we're gonna encode it with 012. Now, again, this doesn't necessarily need to be an order. So a third could be represented by one and first can be represented by two, right? It doesn't need to be an order, it doesn't matter. So long as every third has the same number, every first has the same number, and every second has the same number. And then same thing with deck. Same thing with embark and same thing with a loan. Now, we could have an instance where you know, we've encoded every single one of these values with a different value. So in the, you know, rare occasion where there's one category, that's categorical, and every single value in that category is different than we will have, you know, 627, in this instance, different encoding labels that are going to be numbers, that's fine, we can do that. And actually, we don't really need to do that, because you're going to see how TensorFlow can handle that for us. So those categorical data, numeric columns are pretty straightforward. They're anything that just have integer or float values already. So in this case, age and fair. And yeah, so that's what we've done. We've just defined our categorical columns here, and our numeric columns here. This is important, because we're going to loop through them, which we're doing here to create something called feature columns. feature columns are nothing special, they're just what we need to feed to our linear estimator or linear model to actually make predictions. So kind of our steps here that we've gone through so far, is import, load the data set, explore the data set, make sure we understand it, create our categorical columns and our numeric columns. So I've just hard coded these in right, like sex parch class deck alone, all these ones. And then same thing with numeric columns. And then for a linear estimator, we need to create these as feature columns using some kind of advanced syntax, which we're going to look at here. So we create a blank list, which is our feature columns, which will just store our different feature columns, we loop through each feature name in the categorical columns. And what we do is we define a vocabulary, which is equal to the data frame at that feature name. So first, we would start with sex, then we go and siblings, then we go parched, then we go class. And we get all of the different unique values. So that's actually what this does dot unique gets a list of all unique values from the feature code. And I can print this out, she'll put this in a different line, we'll just take this value and have a look at actually what this is, right. So if I run, I just will have to run all these in order. And then we'll create a new code block while we wait for that to happen. Let's see if we can get this installing fast enough. Run, run, run. Okay, now we go to df train, and we can see this is what this looks like. So these are all the different unique values that we had in that specific feature name. Now that feature name was what? categorical columns. Oh, what I do feature name of sorry, that's gonna be the unique one. Let's just put rather than feature name, let's put sex right and let's have a look at what this is. So we can see that the two unique values are male and female. Now I actually want to do what is it embark town and I want to see what this one is. So how many different values we Um, so we'll copy that in and we can see we have Southampton cannot pronounce that and then the other cities and unknown, and that is kind of how we get the unique value. So that's what that method is doing. There. Let's actually delete this code block because we don't need anymore. Alright, so that's what we do. And then when we do down here is we say feature columns dot append. So just add to this list, the TensorFlow feature column dot categorical column with vocabulary list. Now, I know this is a mouthful, but this is kind of something again, you're just going to look up when you need to use it, right. So understand that you need to make feature columns for linear regression, you don't really need to completely understand how, but you just need to know that that's something you need to do. And then you can look up the syntax and understand. So this is what this does, this is actually going to create for us a column, it's going to be in the form of a like NumPy array, kind of that has the feature name, so whatever one we've looped through, and then all of the different vocabulary associated with it. Now we need this because we just need to create this column, so that we can create our model using those different columns, if that makes any sense. So our linear model needs to have you know, all of the different columns we're going to use, it needs to know all of the different entries that could be in that column, and needs to know whether this is a categorical column or a numeric column. In previous examples, what we might have done is actually changed the data set manually, so encode it manually, TensorFlow just can do this for us now in touch for 2.0. So we'll just use that too. Okay, so that's what we did with these feature columns. Now, for the numeric columns a little bit different, it's actually easier, all we need to do is give the feature name and whatever the data type is, and create a column with that. So notice, we don't we can omit this unique value because we know when it's numeric, but you know, there could be an infinite amount of values. And then I've just printed out the feature columns, you can see what this looks like. So vocabulary lists, categorical column, gives us the number of siblings, and then the vocabulary list is these are all the different encoding values that is created. then same thing, you know, we go down here parch, these are different encodings. So they're not necessarily an order is like what I was talking about before. Let's go to a numeric one. What do we have here? Um, yeah, so for numeric comm, just as the key, that's the shape we're expecting, and this is the data type. So that is pretty much it. We're actually loading these in. So now it's almost time to create the model. So what we're going to do to create the model now is talk about first the training process and training some kind of, you know, machine learning model. Okay, so the training process, now, the training process of our model is actually fairly simple, at least for linear model. Now, the way that we train the model is we feed it information, right? So we feed it that those data points from our data set. But how do we do that? Right? Like, how do we feed that to the model? Do we just give it all at once? Well, in our case, we only have 627 rows, which isn't really that much data, like we can fit that in RAM in our computer, right? But what if we're training a crazy machine learning model, and we have, you know, 25 terabytes of data that we need to pass it, we can't load that into RAM, at least I don't know, any ram that's that large. So we need to find a way that we can kind of load it in what's called batches. So the way that we actually load this model is we load it in batches. Now we don't need to understand really kind of how this process works and how batching kind of occurs, what we do is give 32 entries at once to the model. Now the reason we don't just feed one at a time is because that's a lot slower, we can load you know, a small batch size of 32, that can increase our speed dramatically. And that's kind of a lower level understanding. So I'm not going to go too far into that. Now that we understand, we kind of load it in batches, right? So we don't load it entirely all at once we just load a specific set of kind of elements as we go. What we have is called epochs. Now, what are epochs? Well, epochs are essentially how many times the model is going to see the same data. So what might be the case, right, and then when we pass the data to our model, the first time, it's pretty bad, like it looks at the model creates a line of best fit, but it's not great, it's not working perfectly. So we need to use something called an epoch, which means we're just going to feed the model feed the data again, but in a different order. So we do this multiple times, so that the model will look at the data, look at the data in a different way, and then kind of a different form and see the same data a few different times it pick up on patterns, because the first time it sees a new data point is probably not going to have a good idea how to make a prediction for that. So we can feed it more and more and more than you know, we can get a better prediction. Now this is where we talk about something called overfitting, though, sometimes we can see the data too much of the past too much data to our model to the point where it just straight up memorizes those data points. And it's it's really good at classifying for those data points, but we pass it some new data points, like our testing data, for example. It's horrible at kind of, you know, classifying those. So what we do to kind of prevent this from happening is we just make sure that we start with like a lower amount of epochs and then we can work our way up and kind of incrementally change that if we need to, you know, go higher, right? We need more epochs. So yeah, so that's kind of it for epochs. Now, I will say that this training process kind of applies to all the different What is it machine learning models that we're going to look at? We have epochs, we have batches we have a batch size, and now we have something called an input function. Now, this is pretty complicated. This is the code for the input function. Don't like that we need to do this, but it's necessary. So essentially what an input function is, is the way that we define how our data is going to be broke into epochs and into batches to feed to our model. Now, these, you probably aren't ever going to really need to code like from scratch by yourself. But this is one I've just stolen from the tangible website pretty much like everything else that's in the series. And what this does, is it takes our data and encodes it in a TF data data set object. Now this is because our model needs this specific object to be able to work, it needs to see a data set object to be able to use that data to create the model. So what we need to do is take this panda's data frame, we need to turn it into that object. And the way we do that is with the input function. So we can see that what this is doing here. So this is make input function, we actually have a function defined inside of another function. I know this is kind of complicated for some of you guys, but and what I'm actually gonna do Sorry, I'm gonna just copy this into the other page, because I think it's easier to explain without all the text around. So let's create a new code block. let's paste this in. And let's have a look at what this does. So actually, let me just tap down. Okay, so make input function. We have our parameters data data frame, which is our panda's data frame, our labeled data frame, which stands for those labels. So that y train, or that eval, y eval, right? We have number of epochs, which is how many epochs we're going to do we set the default 10 shuffle, which means are we going to shuffle our data and mix it up? Before we pass it to the model in batch size, which is how many elements are we going to give to that to the model? Well, it's training at once. Now, what this does is we have an input function defined inside of this function. And we say data set equals tensor frame dot data dot data set from tensor slices, dict data frame labelled data. Now what this does, and we can read the comment, I mean, create a TF data data set object with the data and its label. Now I can't explain to you like how this works on a lower level. But essentially, we pass a dictionary representation of our data frame, which is whatever we passed in here. And then we pass the label data frame, which is going to be you know, all those y values. And we create this object. And that's what this line of code does. So TF data data set from tensor slices, which is just what you're going to use, I mean, we can read this documentation, create a data set whose elements are slices of the given tensors. The given tensors are sliced along the first dimension, this operation preserves the structure of the input tensors, removing the first dimension of each tensor and using it as the data set dimension. So I mean, you guys can look at that, like read through the documentation, if you want. But essentially, what it does is create the desert object for this. Now, if shuffle, DS equals DS dot shuffle 1000, what this does is just shuffle the data set, you don't really need to understand more than that. And then what we do is we see data set equals data set dot batch, the batch size, which is going to be 32. And then repeat for the number of epochs. So what this is going to do is essentially take our data set and split it into a number of I don't want to what do I want to call it, like blocks that are going to be passed to our model. So we can do this by knowing the batch size, it obviously knows how many elements because that's the data set object itself, and then repeat number of epochs. So this can figure out you know, how many one how many blocks Do I need to split it into, to feed it to my model, never returned out a set simply from this function here, we'll return that data set object, and then on the outside return, we actually return this function. So what this out exterior function does, and I'm really just trying to break this down. So you guys understand is make an input function, it literally makes a function and returns the function object to wherever we call it from. So that's how that works. Now we have a train input function and an eval input function. And what we need to do to create these images use this function that we've defined above. So we say make input function, df train, y train, so our data frame for training and our data frame for the labels of that. So we can see the comment, you know, here we will call the input function, right. And then eval train. So it's going to be the same thing except for the evaluation. We don't need to shuffle the data because we're not training it, we only need one epoch, because again, we're just training it. And we'll pass the evaluation data set and the evaluation value from why. Okay, so that's it for making the input function. Now, I know this is complicated, but that's the way we have to do it. And unfortunately, if you don't understand after that, there's not much more I can do, you'd might just have to read through some of the documentation. Alright, creating the model. We're finally here. I know this has been a while, but I need to get through everything. So linear estimates, we're gonna copy this and I'm just gonna put it in here. And we'll talk about what this does. So linear underscore, st equals TF dot estimator dot linear classifier, and we're giving it the feature columns that we created up here. So this work was not for nothing. We have this feature column, which defines you know, what is in every single way. What should we expect for our input data, we pass that to a linear classifier object from the estimator module from TensorFlow. And then that creates the model for us. Now, this, again, is syntax. So you don't need to memorize, you just need to understand how it works, what we're doing is creating an estimator, all of these kind of core learning algorithms use what's called estimators, which are just basic implementations of algorithms in TensorFlow. And again, pass the feature columns. That's how that works. Alright, so now let's go to training the model. Okay, so I'm just going to copy this again, I know you guys think I'm just copying the code back and forth. But I'm not going to memorize the syntax, I just want to explain to you how all this works. And again, you guys will have all this code, you can mess with it, play with it, and learn on your own that way. So to train is really easy. All we need to do, I say, linear ESP dot train, and then just give that input function. So that input function that we created up here, right, which was returned from make input function, like this train input function here is actually equal to a function, it's equal to a function object itself. If I were to call train underscore input function like this, this would actually call this function. That's how this works in Python, it's a little bit of a complicated syntax, but it's how it works, we pass that function here. And then this will use the function to grab all of the input that we need and train the model. Now, the result is going to be rather than trained, we're going to evaluate right and notice that we didn't store this one in a variable, but we're storing the result in a variable, so that we can look at it. Now clear output is just from what we import above just gonna clear the console output, because there will be some output while we're training, that we can print the accuracy of this model. So let's actually run this and see how this works. This will take a second. So I'll be back once this is done. Okay, so we're back, we've got a 73.8% accuracy. So essentially, what we've done right is we've trained the model, you might have seen a bunch of output while you were doing this on your screen. And then we printed out the accuracy. after evaluating the model. This accuracy isn't very good, but for our first shot this Okay, and we're gonna talk about how to improve this in a second. Okay, so we've evaluated the data set, we stored that in result, I want to actually look at what result is because obviously, you can see we've referenced the accuracy part, like, you know, as if this was a Python dictionary. So let's run this one more time. I'm just going to take a second again. So okay, so we printed out results here. And we can see that we have actually a bunch of different values, we have accuracy, accuracy, baseline, AUC, and all these different kind of statistical values. Now, these aren't really going to mean much to you guys. But I just want to show you that we do have those statistics. And to access any specific one, this is really just a dictionary object. So we can just reference the key that we want, which is what we did with accuracy. Now notice, our accuracy actually changed here, we went to 76. And the reason for this is, like I said, you know, our data is getting shuffled, it's getting put in in a different order. And based on the order in which we see data, our model will, you know, make different predictions and be trained differently. So if we had, you know, another epoch, right, if I change epochs to say, 11, or 15, our accuracy will change. Now it might go up, it might go down, that's something we have to play with, as you know, our machine learning developer, right, that's what your goal is, is to get the most accurate model. Okay, so now it's time to actually use the model to make predictions. So up until this point, we've just been doing a lot of work to understand how to create the model, you know what the model is how we make an input function, training, testing data, I know a lot, a lot, a lot of stuff. Now to actually use this model and like make accurate predictions with it is somewhat difficult, but I'm going to show you how. So essentially, TensorFlow models are built to make predictions on a lot of things at what they're not great at making predictions on like one piece of data, you just want like one passenger to make a prediction for, they're much better at working in like large batches of data, that you can definitely do it with one, but I'm going to show you how we can make a prediction for every single point that's in that evaluation data set. So right now we looked at the accuracy. And the way we determine the accuracy was by essentially comparing the results that the predictions gave from our model versus what the actual results were, for every single one of those passengers. And that's how we came up with an accuracy of 76%. Now, if we want to actually check and get predictions from the model and see what those actual predictions are, what we can do is use a method called dot predict. So what I'm going to do is I'm going to say, I guess results like this, equals and in this case, we're going to do the model name, which is linear ESD dot predict. And then inside here, what we're going to pass is that input function we use for the evaluation. So just like you know, we need to pass an input function to actually train the model, we also need to pass an input function to make a prediction. Now this input function could be a little bit different. We can modify this a bit if we wanted to. But to keep things simple, we use the same one for now. So what I'm going to do is just use this eval input functions, the one we've already created where we did, you know, one epoch we don't need to shuffle because it's just the evaluation set. So inside here rindu eval input funk. Now what we need to do though, is convert this to a list, just because we're going to loop through it. And I'm actually going to print out this value. So we can see what it is, before we get to the next step. So let's run this and have a look at what we get. Okay, so we get logistics array, we can see all these different values. So we have, you know, this array with this value, we have probabilities, this value. And this is kind of what we're getting. We're getting logistic, all classes, like there's all this random stuff. What you hopefully should notice, and I know I'm just like whizzing through is that we have a dictionary that represents the predictions. And I'll see if I can find the end of the dictionary here. For every single, what is it prediction. So since we passed, you know, 267 input data from this, you know, eval input function, what was returned to us is a list of all of these different dictionaries that represent each prediction. So what we need to do is look at each dictionary so that we can determine what the actual prediction was, is what I'm going to do is actually just present do result wandering to result zero, because this is a list. So that should mean we can index it. So we actually look at one prediction. Okay, so this is the dictionary of one prediction. So I know this seems like a lot. But this is what we have. This is our prediction. So logistics, we get some array, we have logistic in here in this dictionary, and then we have probabilities. So what I actually want is probability. Now since what we ended up having was a prediction of two classes, right, either zero or one, we're predicting either someone survived, or they didn't survive, or what their percentage should be, we can see that the percentage of survival here is actually 96%. And the percentage that it thinks that it won't survive is, you know, 3.3%. So if we want to access this, what we need to do is click do result at some index, so whatever, you know, one we want. So we're gonna say result. And then here, we're going to put probabilities, so I'm just going to print that like that. And then we can see the probabilities. So let's run this. And now we see our probabilities are 96, and 33. Now, if we want the probability of survival, so I think I actually might have messed this up, I'm pretty sure the survival probability is actually the last one. Whereas like the non survival is the first one because zero means you didn't survive, and one means you did survive. So that's my bad, I messed that up. So I actually want their chance of survival, or index one. So if I index one, you see, we get 3.3%. But if I wanted their chance of not surviving, I would index zero. And that makes sense, because zero is, you know, what we're looking at like zero represents, they didn't survive, whereas one represents they did survive. So that's kind of how we do that. So that's how we get. Now if we wanted to loop through all of these, we could we could loop through every dictionary, we could print every single probability of each person, we could also look at that person stats, and then look at their probability. So let's see, the probability of surviving is in this case, you know, 3%, or whatever it was 3.3%. But let's look at the person that we were actually predicting them and see if that makes sense. So if I go eval are what was it df eval dot Loke. Zero, we print that and then we print the result, what we can see is that for the person who was male, and 35, that had no siblings, their fare was this, they're in third class, we don't know what deck they were on. And they were alone, they have a 3.3% chance of survival. Now, if we change this, we could go like to two, let's have a look at this second person and see what their chances survival is, okay, so they have a higher percent chance a 38% chance, they're female, they're a little bit older. So that might be a reason why their survival rates a bit lower. I mean, we can keep doing this and look through and see what it is, right? If we want to get the actual value, like if this person survived, or if they didn't survive. And what I can do is I can print df eval, actually, it's not going to be eval, it's going to be y underscore eval. Yep. And that's going to be dot Loke. Three. Now, this will give us if they survived or not. So actually, in this case, that person did survive, but we're only predicting a 32%. So you can see that that's, you know, represented in the fact that we only have about a 76% accuracy. Because this model is not perfect. And in this instance, it was pretty bad. It's saying they have a 32% chance of surviving, but they actually get survived. So maybe that should be higher, right? So we could change this number, and go for four. I'm just messing around and showing you guys you know how we use this. So in this one, you know, same thing, this person survived, although, what is it they only were given a 14% chance of survival. So anyways, that is how that works. This is how we actually make predictions and look at the predictions, you understand that now what's happening is I've converted this to a list just because it's actually a generator object, which means it's meant to just be looped through rather than just look at it with a list but that's fine. We'll use a list and then we can just print out you know, result that whatever index probabilities and then one to represent their chance of survival. Okay, so that has been it for linear regression. Now, let's get into classification. And now we are on to classification. So essentially classification Is differentiating between, you know, data points and separating them into classes. So rather than predicting a numeric value, which we did with regression earlier, so linear regression, and you know, the percentage survival chance, which is a numeric value, we actually want to predict classes. So what we're going to end up doing is predicting the probability that a specific data point or a specific entry, or whatever we're going to call it is within all of the different classes it could be. So for example, here, we're gonna use flowers. So it's called the iris, I think it's the iris flower data set, or something like that. And we're gonna use some different properties of flowers to predict what species of flower it is. So that's a difference between classification and regression. Now, I'm not going to talk about the specific algorithm we're going to use here for classification, because there's just so many different ones you can use. But yeah, I mean, if you really care about how they work on a lower mathematical level, I'm not going to be explaining that because it doesn't make sense to explain it for one algorithm when there's like hundreds, and they all work a little bit differently. So you guys can kind of look that up. And I'll tell you some resources and where you can find that. I'm also going to go faster through this example, just because I've already covered kind of a lot of the fundamental stuff in linear regression. So hopefully, we should get this one done a little bit quicker, and move on to the next kind of aspects in this series. Alright, so first steps, load TensorFlow, import TensorFlow, we've done that already. data set, we need to talk about this. So the data set we're using is that Iris flowers data set, like I talked about, and this specific data set separates flowers into three different species. So we have these different species. This is the information we have. So sepal, length, width, petal length, petal width, we're going to use that information, obviously, to make the predictions. So given this information, you know, in our final model, can it tell us which one of these flowers it's most likely to be? Okay. So what we're going to do now is define the CSV call names and the species. So the column names are just going to define, what we're going to have in our data set is like the headers for the columns. species, obviously, it's just the species and we'll throw them there. Alright, so now we're going to load in our data sets. So this is going to be different every time you're kind of working with models depending on where you're getting your data from. And our example, we're going to get it from Kara's, which has kind of been a sub module of TensorFlow has a lot of useful data sets and tools that we'll be using throughout the series. But Cara's that utils dot get file, again, don't really focus on this, just understand what this is going to do is save this file onto our computer as Iris training dot CSV, grab it from this link. And then what we're gonna do down here is load the train and test and again, notice this training, and this is testing into two separate data frames. So here, we're going to use the names of the columns as the CSV column names, we're going to use the path as whatever we loaded here, header equals zero, which just means row zero is the header. Alright, so now, we will move down and we'll have a look at our data set. So like we've done before, oh, I've got to run this code first. csv column nips, okay, so we've just, we're just running things in the wrong order here, apparently. Okay, so let's look at the head. So we can see this is kind of what our data frame looks like. And notice that our species here are actually defined numerically. So rather than before, when we had to do that thing, where, you know, we made those feature columns, and we converted the categorical data into numeric data with those kind of weird tensor flow tools. This is actually already encoded for us. No zero stands for setosa. And then wanting to obviously stand for these ones, respectively. And that's how that works. Now, these I believe, are in centimeters, the sepal length, petal length, petal width, that's not super important. But sometimes you do want to know that information. Okay, so now we're going to pop up those columns for the species like we did before, and separate that into train y, test y, and then have a look at the head again. So let's do that. And run this. Notice that is gone. Again, we've talked about how that works. And then these if we want to have a look at them. And actually, let's do this, we're just having a new block, let's say train, underscore, y dot, what is it? dot head? If I could spell head correctly, okay, so we run head, and we can see, this is what it looks like nothing special. That's what we're getting. Alright, so let's delete that. Let's look at the shape of our training data. I mean, we can probably guess what it is already. But we're gonna have shape for because we have four features. And then how many entries do we have? Well, I'm sure this will tell us. So 120 entries in shape for awesome. That's our shape. Okay, input function. So we're moving fast here already, we're getting into a lot of the coding. So what I'm actually going to do is, again, copy this over into a separate document. And I'll be back in a second with all that. Okay, so input function time, we already know what the input function does, because we used it previously. Now, this input function is a little bit different than before, just because we're kind of changing things slightly. So here, we don't actually have any. What do you call it? We don't have any epochs and our batch size is different. So what we've done here is rather than actually, you know, defining like make input function, we just have input function like this. And what we're going to do is a little bit different one passes input function. I'll kind of show you a little bit more complex. But you can see that we've cleaned this up a little bit. So, exactly, we're doing what we do before, we're converting this data, which is our features, which we're passing in here into a data set. And then we're passing those labels as well. And then if we're training, so if training is true, what we're going to do is say data set is equal to the data set dot shuffled. So we're going to shuffle the information, and then repeat that. And that is all we really need to do, we can do data set dot batch at the batch size 256, return that, and we're good to go. So this is our input function. Again, these are kind of complicated, you kind of have to just get experience seeing a bunch of different ones to understand how to actually make one on your own. From now on, don't worry about it too much, you can pretty much just copy the input functions you've created before and modify them very slightly if you're going to be doing your own models. But by the end of this, you should have a good idea of how these input functions work, we will have seen like four or five different ones. And then you know, we can kind of mess with them and tweak them as we go on. But don't focus on it too much. Okay, so input function, this is our input function, I'm not really going to go into much more detail with that. And now our feature columns. So this is, again, pretty straightforward. For the feature columns, all we need to do for this is since they're all numeric feature columns is rather than having to for loops, where we were separating the numeric and categorical feature columns before, we can just loop through all of the keys in our training data set. And then we can append to my feature columns blank list, the feature column dot numeric column, in the key is equal to whatever key we've looped through here. Let me show you what this means in case anyone's confused. Again, you can see when I print my feature columns, we get key equals sepal length, we get our shape, and we get all of that other nice information. So let's copy this into the other one, have a look at our output after this. Okay, so my feature columns for key and train keys. So notice trainers here, train keys, what that does is actually give us all the columns. So this was a really quick and easy way to kind of loop through all the different columns, although I could have looped through CSV column names and just removed the species column to do that, but again, we don't really need to. So for key and trade keys, my featured columns dot append TF, feature column, numeric column key equals key, this was just gonna create those feature columns, we don't need to do that vocabulary thing. And that dot unique because again, these are all already encoded for us. Okay, awesome. So that was the next step. So let's go back here, building the model. Okay. So this is where we need to talk a bit more in depth of what we're actually going to build. So the model for this is a classification model. Now there is like hundreds of different classification models we can use that are pre made in TensorFlow. And so far, what we've done with that linear classifier is that's a pre made model that we kind of just feed a little bit of information to, and it just works for us. Now here, we have two kind of main choices that we can use for this kind of classification tasks that are pre built in TensorFlow, we have a dnn classifier, which stands for a deep neural network, which we've talked about very vaguely, very briefly, and we have a linear classifier. Now, a linear classifier works very similarly to linear regression, except it does classification. Rather than regressions, we get actually numeric value, or we get sorry, you know, the labels like probability of being a specific label, rather than a numeric value. But in this instance, we're actually going to go with deep neural network. Now, that's simply because tensor flow on their website like this is all of this is kind of building off of the TensorFlow website, because all the code is very similar. And I've just added my own spin and explained things very in depth, they recommend using that deep neural network for this is a better kind of choice. But typically, when you're creating machine learning apps, you'll mess around with different models and kind of tweak them. And you'll notice that it's not that difficult to change models, because most of the work comes from loading and kind of pre processing our data. Okay, so what we need to do is build a deep neural network with two hidden later, two hidden layers with 30 nodes and 10 hidden nodes each. Now I'm going to draw out the architecture of this neural network in just one second, but I want to show you what we've done here. So we said classifier equals TF dot estimator. So this estimator module just stores a bunch of pre made models from TensorFlow. So in this case, dnn classifier is one of those, what we need to do is pass our feature columns, just like we did to our linear classifier. And now we need to define the hidden units. Now hidden units is essentially us a building the architecture of the neural network. So like you saw before, we had an input layer, we had some like middle layers, called our hidden layers in a neural network. And then we had our output layer, I'm going to explain neural networks in the next module. So this will all kind of click and make sense for now we've arbitrarily decided 30 nodes in the first hidden layer 10 in the second, and the number of classes is going to be three. Now that's something that we need to decide we know there's three classes for the flowers. So that's what we've defined. Okay, so let's copy this and go back to the other page here. And that is now our model. And now it is time to talk about how we can actually train the model which is coming down here. Okay, so I'm going to copy this. I'm going to paste it over here. And let's just dig through this because this is a bit more of a complicated piece of code than we usually use to work with. I'm also going to remove these comments just to clean things up in here. So we've defined the classifier which is it deep neural network classifier, we have our feature columns, hidden units classes. Now to train the classifier, so we have this input function here, this input function is different than the one we created previously, remember when when we had previously was like make input whatever function I will continue typing in inside of define another function, it actually returned that function from this function. I know complicated. If you're not a Python kind of Pro, I don't expect that to make perfect sense. But here, we just have a function, right, we do not returning a function from another function, it just one function. So when we want to use this, to train our model, what we do is create something called a lambda. Now, a lambda is an anonymous function that can be defined in one line, when you write lambda, what that means is, essentially, this is a function. So this is a function. And whatever's after the colon is what this function does. Now, this is a one line function. So like if I create a lambda here, right? And I say lambda, print, hi, and I said, x equals lambda. And I called x like that. This works. This is a valid line of syntax. Actually, I want to make sure that I'm not just like messing with you when I say that, and that this is actually correct. Okay, so sorry, I just accidentally trained the model. So I just commented that out. You can see we're printing Hi, right at the bottom of the screen. I know, it's kind of small, but it does say hi, that's how this works. Okay, so this is a cool thing. If you haven't seen this in Python before, that's what a lambda does, allows you to define a function in one line. Now, the thing that's great about this is that we can say like, you know, x equals lambda, and here put another function, which is exactly what we've done with this print function. And that means when we call x, it will, you know, execute this function, which will just execute the other function. So it's kind of like a chain where you call x, x is a function. And inside that function, it does another function, right? It just like calling a function from inside a function. So what is lambda doing here? Well, since we need the actual function object, what we do is we define a function that returns to us a function. So this actually just like it calls this function, when you put this here, now, there's no I can't, it's very difficult to explain this, if you don't really understand the concept of lambdas. And you don't understand the input functions. But just know we're doing this because of the fact that we didn't embed another function and return the function object. If we had done that, if we had done that, you know, input function that we created before where we had the interior function, then we wouldn't need to do this, because what would happen is we would return the input function, right like that, which means when we passed it into here, it could just call that directly, it didn't need to have a lambda. Whereas here, though, since we need to just put a lambda, we need to define what this is. And then and then this works. That's just this, there's no other way to really explain this. So yeah, what we do is we create this input function. So we pass we have train, we have train, why we have training equals true, and then we do steps equals 5000. So this is similar to an epoch, except this is just defining a set amount of steps we're going to go through. So rather than saying, like, we'll go through the data set 10 times, we're just gonna say we'll go through the data set until we've hit 5000 numbers like 5000, things that have been looked at. So that's what this does with that train. Let's run this and just look at the training output. From our model, it gives us some like, things here, we can kind of see how this is working. Notice that if I can stop here for a second, it tells us the current step, it tells us the loss, the lowest, the lower this number, the better. And then it tells us global steps per second. So how many steps we're completing per second. Now at the end here, we get final step, loss of 39, which is pretty high, which means this is pretty bad. But that's fine. This is kind of just our first test at training a neural network. So this is just giving us output, while it's training to kind of see what's happening. Now, in our case, we don't really care because this is a very small model, when you're training models that are massive and take terabytes of data, you kind of care about the progress of them. So that's when you would use kind of that output, right? And you would actually look at that. Okay, so now that we've trained the model, let's actually do an evaluation on the model. So we're just going to say classifier dot evaluate. And what we're going to do is a very similar thing to what we've done here is just past this input function, right, like here with a lambda once again, and reason we add the lambda when we don't have this like, double function going on, like a nested function, we need the lambda. And then in here, what we do is rather than passing train and train why we're gonna pass test, I believe, and I think it's, I just call it test why. Okay, and then for training, obviously, this is false. So we can just set that false like that. I'm just gonna look at the other screen to make sure I didn't mess this up. Because again, I don't remember the syntax. Yeah, so class classifier, dot evaluate TEST TEST why looks good to me. We'll take this print statement just so we get a nice output for our accuracy. Okay, so let's look at this. Again. We're gonna have to wait for this to train. But I will show you a way that we don't need to wait for this to train Every time and one second, and I'll be right back. Okay, so what I'm actually going to do, and I'm just kind of paused like the execution of this code is throw this in the next block under. Because the nice thing about Google Collaboratory is that I can run this block of code, right, I can train all this stuff, which is what I'll run now while we're talking just so it happens. And then I can have another code block kind of below it, which I have here. And it doesn't matter, I don't need to rerun that block every time I change something here. So if I change something in any lower blocks, I don't need to change the upper block, which means I don't need to wait for this to train every time I want to do an evaluation on it. Anyways, so we've done this, we can test we got test y, I just need to change this instead of eval result. Actually, I need to say eval underscore result equals classifier dot evaluate, so that we can actually store this somewhere and get the answer. And that will print this and notice this happens much, much faster, we get a test accuracy of 80%. So if I were to retrain the model, chances are this accuracy would change, again, because of the order in which we're seeing different flowers. But this is pretty decent, considering we don't have that much test data. And we don't really know what we're doing right? We're kind of just messing around and experimented for right now. So to get 80% is pretty good. Okay, so actually, what am I doing? We need to go back now and do prediction. So how am I going to predict this for specific flowers. So let's go back to our core learning algorithms. And let's go to predictions. Now, I've written a script already, just to save a bit of time that allows us to do a prediction on any given flower. So what I'm going to do is create a new block down here, code block and copy this function. And now we're going to digest this and kind of go through this on our own to make sure this makes sense. But what this little script does is allow the user to type in some numbers, so the sepal, length, width, and I guess, petal length and width, and then it will spit out to you what the predicted class of that flower is. So we couldn't do a prediction on every single one of our data points, like we did previously. And we already know how to do that. I showed you that with linear regression. But here, I just wanted to do it on one entry. So what do we do? So I start by creating a input function, it's very basic, we have batch size 256, all we do is we give some features, and we create a data set from those features. That's a dict. And then dot batch and the batch size. So what this is doing is notice we don't give any y value, right? We don't give any labels. reason we do we don't do that is because when we're making a prediction, we don't know the label, right? Like we actually want that the model to give us the answer. So here, I wrote down the features, I create a predictive dictionary, just cuz I'm going to add things to it. And then I just prompted here with a print statement, please type numeric values as property. So for feature and feature, valid equals true, well, valid Val equals input feature colon. So this just means what we're going to do is for each feature, we're going to wait to get some valid response. Once we get some valid response, what we're going to do is add that to our dictionary. So we're gonna say predict feature. So whatever that feature was, so sepal, length, sepal, width, petal length, or pept. petal width is equal to a list that has in this instance, whatever that value was. Now, the reason we need to do this is because again, the predict method from TensorFlow works on predicting for multiple things, not just one value. So even if we only have one value, we want to predict for it, we need to put it inside of a list because it's expecting the fact that we will probably have more than one value in which we would have multiple values in the list, right, each representing a different row or a new flower to make a prediction for, okay, now we say predictions equals classifier dot predict. And then in this case, we have input function, lambda input function predict, which is this input function up here. And then we say for prediction dictionaries, because remember, every prediction comes back as a dictionary in predictions, we'll say the class ID is equal to whatever the class IDs of the prediction dictionary at zero. And these are simply what? I don't know exactly how to explain this. We'll look at in a second, I'll go through that. And then we have the probability is equal to the prediction dictionary probabilities of class ID. Okay? Then we're going to say print prediction is we're going to do this weird format thing. I just stole this from TensorFlow. And it's going to be the species at the class ID and then 100 times probability, which will give us actual integer value. We're gonna digest this but let's run this right now and have a look. So please type numeric values as prompted. sepal length, let's type like 2.4 sub the width 2.6 petal width, let's just say that's like 6.5. And yeah, petal width like 6.3. Okay, so then it calls this and it says prediction is virginica. I guess that's the the class we're going with. And it says that's an 83 or 86.3% chance that that is the prediction. So yeah, that is how that works. So that's what this does. I wanted to give a little script I wrote most of this, I mean, I stole some of this from TensorFlow, but just to show you how you actually predict on one value, so let's look at these prediction dictionary because I just want to show you what one of them actually is. So I'm going to say print pred underscore dict, and then this will allow me to actually walk through what class IDs our probabilities are and how I've kind of done this. So let's run this up the length scales just go like 1.4 2.3. I don't know what these values are going to end up being. And we get prediction is same one with 77.1%, which makes sense because these values are similar kind of in difference to what I did before. Okay, so this is the dictionary. So let's look for what we were looking for. So probabilities, notice we get three probabilities, one for each of the different classes. So we can actually say what you know the percentages for every single one of the predictions, then what we have is class IDs. Now, class IDs, what this does is tell us what class ID it predicts is actually the flower right? So here, it says two, which means that this probability is 77%. That's at index two in this array, right? So that's why this value is two. So it's saying that that class is two, it thinks it's class two, like that's whatever was encoded in our system is two. And that's how that works. So that's how I know which one to print out is because this tells me it's class two. And I know for making this list all the way back up here if I could get rid of this output. Whereas when I say species, that number two is virginica, or I guess that's how you say. So that is what the classification is. That's what the prediction is. So that's how I do that. And that's how that works. Okay, so I think that is pretty much it for actually classification. So it's pretty basic, I'm going to go and see if there's anything else that I did for classification in here. Okay, so here, I just put some examples. So here's some example input expected classes. So you guys could try to do these if you want. So for example, on this one, sepal, length sepal width. So for 5.1 3.3 1.7 and 0.5, the output should be setosa. For 5.9 3.0 4.2 1.5, it should be this one. And then obviously, this for this, just so you guys can mess with them if you want. But that's pretty much it for classification, and now on to clustering. Okay, so now we're moving on to clustering. Now, clustering is the first unsupervised learning algorithm that we're going to see in this series. And it's very powerful. Now, clustering only works for a very specific set of problems. And you use clustering when you have a bunch of input information or features, but you don't have any labels or open information. Essentially, what clustering does, is finds clusters of like data points, and tells you the location of those clusters. So you give a bunch of training data, you can pick how many clusters you want find. So maybe we're going to be classifying digits, right, handwritten digits, using k means clustering. In that instance, we would have 10 different clusters for the digits zero through nine, and you pass all this information. And the algorithm actually finds those clusters in the data set for you, we're gonna walk through an example it'll make sense. But I just want to quickly explain the basic algorithm behind k means is essentially the set of steps because I'm going to walk you through them and with a visual example. So we're going to start by randomly picking k points to place a K centroids. Now a centroid stands for where our current cluster is kind of defined. And we'll see in a second, the next step is we're going to assign all of the data points to the centroids by distance. So actually, now that I'm talking about this, I think it just makes more sense to get right into the example because if I keep talking about this, you guys are probably just getting be confused, although I might come back to this just to reference those points. Okay, so let's create a little graph like this in two dimensions for our basic example. And let's make some data points here. So I'm just gonna make them all red. And you're gonna notice that I'm gonna make this kind of easier for ourselves by putting them in like their own unique little groups, right? So actually, we'll add one up here, then we can add some down here, and down here. Now the algorithm starts for K means clustering. And you'll understand how this works as we continue by randomly picking k centroids. I'm going to denote a centroid by a little filled in triangle like this. And essentially what these are, is where these different clusters currently exist. So we start by randomly picking K, which is what we've defined. So let me in this instance, we're going to say k equals 3k, centroid, wherever. So maybe we put one, you know, somewhere like here, you know what I might not bother filling these in, because we're going to take a while, maybe we put one here. Maybe we end up putting one over here. Now, I've kind of put them close to where the clusters are. But these are going to be completely random. Now what happens next is each group, or each data point, is assigned to a cluster by distance. So essentially, what we do is for every single data point that we have, we find what's known as the Euclidean distance, or it actually could be a different distance you'd use like Manhattan distance if you guys know what that is. To all of the centroids. So let's say we're looking at this data point here, what we do is find the distance to all of these different centroids. And we assign this data point to the closest centroid. So the closest one by distance, now in this instance is looking like it's going to be a bit of a tie between this centroid and this centroid, but I'm going to give it to the one on the left. So what we do is we're going to say this is now part of this central. So I'm calling this like, let's just say this is centered one, this is century two, and this is centroid three, than this now is going to be a part of centroid one because it's closest to centroid one. And we can go through and we do this for every single data point. So obviously, we know all of these are going to be our ones, right. And we know these are going to be our two, so two to two. And then these are obviously going to be our three. Now, I'm actually just going to add a few other data points, because I want to make this a little bit more sophisticated, almost, if that makes any sense. So add those data points here, we've an add one here. And that will give these labels. So these ones are close. So I'm going to say this one to one, I'm going to say this one's two, I know it's not closest to it. But just because I want to do that for now. We'll say two for that. And we'll say three here. Okay, so now that we've done that we've labeled all these points, what we do is we now move these centroids that we've defined into the middle of all of their data points. So what I do is I essentially find it's called center of mass, the center of mass between all of the data points that are labeled the same. So in this case, these will be all the ones that are labeled the same. And I take this centroid, which I'm going to have to erase, get rid of it here, and I put it right in the middle. So let's go back to blue. And let's say the middle of these data points ends up being somewhere around here. So we put it in here, and this is what we call center mass. And this again, would be centroid two. So let's just erase this. And there we go. Now we do the same thing with the other centroid. So let's remove these ones, these ones. So for three, I'm saying it's probably going to be somewhere in here. And then for one, our center mass is probably going to be located somewhere about here. Now what I do is I repeat the process that I just did, and I reassign all the points now to the closest center. So all these points are labeled one, two, all that, you know, we can kind of remove their labels, and this is just going to be great me trying to erase the labels, I shouldn't have wrote them on top. But essentially, what we do is we're just going to be like reassigning them. So I'm going to say okay, so this is two, and we just do the same thing as before, find the closest distance. So we'll say you know, these can stay in the same cluster, maybe this one actually here gets changed to one now, because it's closest to centroid one. And we just reassign all these points. And maybe you know this one. Now, if it was to before, let's say like this one's one, and we just reassign them. Now we repeat this process of finding the closest, or assigning all the points that are closest centroid, moving the centroid into the center of mass. And we keep doing this until eventually we reach a point where none of these points are changing which centroid they're part of. So eventually, we reach a point where I'm just gonna erase this and draw like a new graph, because it'll be a little bit cleaner. But what we have is, you know, like a bunch of data points. So we have some over here, some over here, maybe we'll just put some here. And maybe we'll do like a k equals, for example, for this one, and we have all these centroids. And I'll just draw these centroids with blue, again, that are directly in the middle of all of their data points. They're like as in the middle as they can get, none of our data points have moved. And we call this now our cluster. So now we have these clusters, we have these centroids, right, we know where they are. And what we do is when we have a new data point that we want to make a prediction for or figure out what cluster, it's a part of, what we do is we will plot that data points. So let's say it's this new data point here, we find the distance to all of the clusters that exist, and then we assign it to the closest one. So obviously, it would be assigned to that one. And we can do this for any data point, right. So even if I put a data point all the way over here, well, it's closest cluster is this so it gets assigned to this cluster. And my output will be whatever this label of this cluster is. And that's essentially how this works. So you're just clustering data points, figuring out which ones are similar. And there's a pretty basic algorithm, I mean, you draw your little triangle, you find distance from every point in the triangle, or to all of the triangles actually. And then what you do is just simply assign those values to that centroid, you move that centroid to the center of mass, and you repeat this process constantly, until eventually you get to a point where none of your data points you're moving. That means you found the best clusters that you can, essentially. Now the only thing with this is you do need to know how many clusters you want for K means clustering, because k is a variable that you need to define, although there is some algorithms that can actually determine the best amount of clusters for a specific data set. But that's a little bit beyond what we're going to be focused on focusing on right now. So that is pretty much clustering. There's not really much more to talk about it, especially because we can't really code anything for it now. So we're going to move on to hidden Markov models. Now hidden Markov models are way different than what we've seen so far. We've been using kind of algorithms that rely on data. So like k means clustering, we gave a lot of data and we don't clustered all those data points found those centroids. Use those centroids to find where new data points should be. Same thing with linear regression and classification. Whereas hidden Markov models, we actually deal with probability distributions. Now, example we're going to go into here and kind of I have to do a lot of examples for this, because it's a very abstract concept is a basic weather model. So what we actually want to do is predict the weather on any given day, given the probability of different events occurring. So let's say we know, you know, maybe in like a simulated environment or something like that this might be an application, that we have some specific things about our environment, like we know, if it's sunny, there's an 80% chance that the next day, it's going to be sunny again, and a 20% chance that it's going to rain. Maybe we know some information about sunny days and about cold days. And we also know some information about the average temperature on those days. Using this information, we can create a hidden Markov model that will allow us to make a prediction for the weather in future days, given kind of that probability that we've discovered. Now, you might be like, Well, how do we know this? Like, how do I know this probability, a lot of the times you actually do know the probability of certain events occurring, or certain things happening, which makes these models really good. But there's some times where what you actually do is you have a huge data set, and you calculate the probability of things occurring based on that data set. So we're not going to do that part, because that's just kind of going a little bit too far. And the whole point of this is just to introduce us to some different models. But in this example, what we will do is use some predefined probability distributions. So let me just read out the exact definition of a hidden Markov model and start going more in depth. So the hidden Markov model is a finite set of states, each of which is associated with a generally multi dimensional probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. So in a hidden Markov model, we have a bunch of states. Now in the example I was talking about with this weather model, the states we would have is hot day, and cold day. Now, these are what we call hidden because never do we actually access or look at these states, while we interact with the model, in fact, when we look at is something called observations. Now, at each state, we have an observation, I'll give you an example of an observation. If it is hot outside team has an 80% chance of being happy. If it is cold outside, Tim has a 20% chance of being happy. That is an observation. So at that state, we can observe the probability of something happening during that state is x, right? Or is y or whatever it is. So we don't actually care about the States. In particular, we care about the observations we get from that state. Now in our example, what we're actually going to do is we're going to look at the weather as an observation for the state. So for example, on a sunny day, the weather has, you know, the probability of being between five and 15 degrees Celsius with an average temperature of 11 degrees. That's like that's a probability we can use. And I know this is slightly abstract, but I just want to talk about the data we're going to work with here, I'm going to draw out a little example go through it, and we actually get into the code. So let's start by discussing the type of data we're going to use. So typically, in previous ones, right, we use like hundreds, if not, like 1000s of entries, or rows or data points for our models to train for this. We don't need any of that. In fact, all we need is just constant values for probability and our What is it transition distributions and observation distributions. Now, what I'm going to do is go in here and talk about states observations and transitions. So we have a certain amount of states. Now we will define how many states we have, we don't really care what that state is. So we could have states for example, like warm cooled, high, low, red, green, blue, you can have as many states as we want, we could have one state to be honest, although that would be kind of strange to have that. And these are called hidden because we don't directly observe. Now observations. So each state has a particular outcome or observation associated with it based on a probability distribution. So it could be the fact that during a hot day, it is 100% true that Tim is happy. Although in a hot day, we could observe that 80% of the time, Tim is happy, and 20% of the time, he is sad, right? Those are observations we make about each state, and each state will have their different observations and different probabilities of those observations occurring. So if we were just going to have like an outcome for the state, that means it's always the same, there's no probability that something happens. And in that case, that's just called an outcome because the probability of the event occurring will be 100%. Okay, then we have transitions. So each state will have a probability to find the likelihood of transitioning to a different state. So for example, if we have a hot day, there'll be a percentage chance that the next day will be a cold day and if we have a cold day, there'll be a percentage chance of the next day is either a hot day or a cold day. So we're going to go through like the exact what we have for our specific model below. Just understand there's a probability that we could transition into a different state. And from each state, we can transition into every other state or a defined set of states given a certain probability. So I know it's a mouthful, I know it's a lot. But let's go into a basic drawing example. Because I just want to illustrate like graphically a little bit kind of how this works. In case these are ideas are a little bit too abstract for any of you. Okay, I'm just pulling out the drawing tablet, just one second here, and let's do this basic weather model. So what I'm going to do is just simply draw two states, actually, let's do it with some colors, because why not? So we're gonna use yellow. And this is going to be our hot day, okay, this is going to be our Sun. And then I'm just going to make a cloud, we'll just do like a gray cloud, this will be my cloud. And we'll just say it's gonna be raining over here. Okay, so these are my two states. Now, in each state, there's a probability of transitioning to the other state. So for example, in a hot day, we have a, let's say, 20% chance of transitioning to a cold day, and we have a 80% chance of transitioning to another hot day, like the next day, right? Now, in a cold day, we have, let's say, a 30% chance of transitioning to a hot day. And we have in this case, what is that going to be a 70% chance of transitioning to another cold day. Now, on each of these days, we have a list of observations. So these are what we call states, right? So this could be s one. And this could be s two, it doesn't really matter. Like if we named them or anything, we just we have two states. That's what we know. We know the transition probability. That's what we've just defined. Now we want the observation probability or distribution for that. So essentially, on a hot day, our observation is going to be that the temperature could be between 15 and 25 degrees Celsius with an average temperature of let's say, 20. So we could say observation, right, say, observation. And we'll say that the mean, so the average temperature is going to be 20. And then the distribution for that will be like the minimum value is going to be 15. And the max is going to be 25. So this is what we call actually like a standard deviation, I'm not really going to explain exactly what standard deviation is, although you can kind of think of it as something like this. So essentially, there's a mean, which is the middle point, the most common event that could occur, and at different levels of standard deviation, which is going into statistics, which I don't really want to mention that much, because I'm definitely not an expert, we have a probability of hitting different temperatures as we move to the left and right of this value. So on this curve, somewhere, we have 15. And on this curve to the right, somewhere, we have 25. Now we're just defining the fact that this is where we're going to kind of end our curve. So we're going to say that, like the probability is in between these numbers is gonna be in between 15 and 25, with an average of 20. And then our model will kind of figure out some things to do with that. That's as far as I really want to go in standard deviation. And I'm sure that's like a really horrible explanation. But that's kind of the best I'm going to give you as for right now. Okay, so that's our observation here, our observation over here is going to be similar. So we're going to say mean, on a cold day temperature is going to be five degrees, we'll say the minimum temperature, maybe it's going to be something like negative five, and the max could be something like 15, or like, yeah, I guess a 15. So we'll have some distribution. That's just what we want to understand, right. And this is kind of a strange distribution, because we're dealing with what is it standard deviation, although we can just deal with like straight percentage observations. So for example, it was a 20% chance that Tim is happy, or there's an 80% chance that he is, like, those are probabilities that we can have as our observation probabilities in the model. Okay, so there's a lot of lingo. There's a lot going on, we're gonna get into like a concrete example now. So hopefully, this should make more sense. But again, just understand states transitions observations, we don't actually ever look at the states, we just have to know how many we have in the transition probability and observation probability in each of them. Okay. So what I want to say now, though, is what do we even do with this model? So once I make this right, once I make this hidden Markov model, what's the point of it? Well, the point of it is to predict future events based on past events. So we know that probability distribution, and I want to predict the weather for the next week. Well, I can use that model to do that. Because I can say, Well, if the current day today is warm, then what is the likelihood that the next day tomorrow is going to be cold, right? And that's what we're kind of doing with this model. We're making predictions for the future based on probability of past events occurring. Okay. So important stuff. So let's just run this already loaded. Import TensorFlow. And notice that here I've imported TensorFlow probability is TF P. This is because this is a separate module from TensorFlow that deals with probability. Now we also need TensorFlow to before this hidden Markov model, we're going to use the TensorFlow process. Build the module, not a huge deal. Okay, so whether model. So this is just going to define what our model actually is. So the different parts of it. So this is taken directly from the documentation of TensorFlow, you guys can see, you know, where I have all this information from, like I've sourced all of it. But essentially, what the model we're going to try to create is that cold days are encoded by zero and hot days encoded by one, the first day in our sequence has an 80% chance of being cold. So whatever day we're starting out has an 80% chance of being cold, which would mean 20% chance of being one, a cold day has a 30% chance of being followed by hot day, and a hot day is a 20% chance of being followed by a cold day, which would mean you know, 70% cold, the cold and 80%, hot hot. On each day, the temperature is normally distributed with mean and standard deviation, zero and five on a cold day and mean and standard deviation 15 and 10. On a hot day. Now, what that means standard deviation is essentially, I mean, we can read this thing here is that on a hot day, the average temperature is 15. That's mean, and ranges from five to 25. Because the standard deviation is 10 of that, which just means 10 on each side, kind of the min max value. Again, I'm not in statistics, so please don't quote me on any definitions of standard deviation. I just tried to explain it enough so that you guys can understand what we're doing. Okay, so what we're going to do to model this, and I'm just kind of going through this fairly quickly, because it's pretty easy to really do this is I'm going to load the TensorFlow probability distributions kind of module and just save that as TF D. And I'm just going to do that. So I don't need to write TF p dot distributions dot all of this, I can just kind of shortcut it. You'll notice I'm referencing TFT here, which just stands for TFP distributions, and TFP is TensorFlow probability. Okay, so my initial distribution is TensorFlow probability distributions, categorical. And this is probability of 80%. And point percent. Now this refers to point two. So let's look at point two, the first day in our sequence has an 80% chance of being cold. So we're saying that, that's essentially what this is, the initial distribution of being cold is 80%. And then 20%, after categorical, it's just a way that we can do this distribution. Okay. So transition distribution, was it TensorFlow probability categorical, the probability is 70%, and 30%, and 20% 80%. Now notice that since we have two states, we've defined two probabilities. Notice since we have two states, we have defined two probabilities, the probability of landing on each of these states at the very beginning of our sequence, this is the transition probability referred to points three and four above. So this is what we have here, so called this 30%, chance, 20% chance for a hot day. And that's what we've defined. So we say this is going to be cold, then state one, we have 70% chance of being cold there, again, we have 30% chance of going hot day, and then you know, reverse here. Okay, so observation distribution. Now, this one is a little bit different, but essentially, we do TFD dot normal. Now, I don't know, I'm not gonna explain exactly what all this is. But when you're doing standard deviation, you're going to do it like this, where you're going to say loke, which stands for your average or your mean, right. So that was their average temperature is going to be zero on a hot day. 15. On a cold day, the standard deviation on the cold day is five, which means we range from five, or negative five to five degrees. And on a hot day, it's 10. So that is going to be we go range from five to 25 degrees, and our average temperature is 15. Now the reason we've added dot here is because these just need to be float values. So rather than inserting integers here, and having potentially type errors later on, we just have flipped. Okay, so the loake argument represents the mean, and the scales, the standard deviation, yeah, exactly what we just defined there. Alright, so let's run this, I think we actually already did. And now we can create our model. So to create the models pretty easy. I mean, all we do is say model equals TensorFlow distribution dot hidden Markov model, give it the initial distribution, which is equal to initial distribution, transition, distribution, observation, distribution and steps. Now, what is steps will steps is how many days we want to predict for. So the number of steps is how many times we're going to step through this probability cycle, and run the model essentially. Now remember, what we want to do is we want to predict the average temperature on each day, right? Like that's what the goal of our example is, is to predict the average temperature. So given this information, using these observations and using these transitions, what we'll do is predict that so I'm going to run this model. What is the issue here? tensor is on hashtag tensor is. Okay, give me one sec. Have a look here, though, I haven't had this issue before. Okay. So after a painful amount of searching on Stack Overflow, and Google and actually just reading through more documentation on TensorFlow, I have determined the issue. So remember, the error was we're getting on actually this line here. I think I can see what the output is on this. Okay. Well, this is a different error, but it was there was an error at this line. Essentially, what was happening is we have a mismatch between the two versions here. So the most recent version of TensorFlow is not compatible with the older version of TensorFlow probability, at least in the sense that the things that we're trying to do with it. So I just needed to make sure that I installed the most recent version of TensorFlow probability. So what you need to do if this is in your notebook, and this should actually work fine for you guys, because this will be updated by the time you get there. But in case you run into the issue, I'll you know, deal with it. But essentially, we're going to do select version two point x of TensorFlow, you're going to run this install commands, you're gonna install TensorFlow probability, just run this command, then after you run this command, you're going to read need to restart your runtimes, go to runtime and then restart runtime. And then you can just continue on with the script, select TensorFlow two point x, again, do your imports. And then you know, we'll test if it's actually going to work for us here, run our distributions, create the model without any issues, this time notice no red text, and then run this final line, which will give you the output. Now, this is what I wanted to talk about here that we didn't quite get to because we were having some bugs. But this is how we can actually kind of run our model and see the output. So what you can do is do model dot mean. So you say mean equals model dot mean. And what this is going to do is essentially just calculate, the probability is going to essentially take that from the model. Now, when we have model dot mean, this is what we call, you know, a partially defined tensor. So remember, our tensors were like partially defined computations. Well, that's what model dot mean, actually is. That's what this method is. So if we want to get the value of that we actually need to do is create a new session in TensorFlow, run this part of the graph, which we're going to get by doing mean got NumPy. And then we can print that out. So I know this might seem a little bit confusing, but essentially, to run a session in the new version of TensorFlow, so two point x, or 2.1, or whatever it is, you're going to type with tF compat, v1 dot session, as sash. And then I mean, this is really matter what you have here, but whatever you want, and then what I'm doing is just printing mean NumPy. So to actually get the value from this here, this variable i called NumPy. And then what it does is print out this array, that gives me the expected temperatures on each day. So we have, you know, three, six, essentially 7.5 8.25. And you can see these are the temperatures based on the fact that we start with an initial probability of starting on a cold day. So we kind of get that here, right, we're starting at three degrees. That's what it's determined we're going to start at. And then we have all of these other temperatures as predicting for the next days. Now notice if we recreate this model, so just rerun the distributions, rerun them and go model dot mean, again, this stays the same, right? Well, because our probabilities are the same, this model is going to do the calculation the exact same, there's not really any training that goes into this. So we get, you know, very similar, if not the exact same values, I can't remember if these are identical. But that's what it looks like to me, I mean, we can run this again, see, we get the same one. And we'll create the model one more time. And let me just check these values here to make sure I'm not lying to you guys. Yes, they're the exact same. Okay, so let's start messing with a few probabilities and see what we can do to this temperature and see what changes we can cause. So if I do 0.5, here, and I do 0.5, for the categorical probability, remember, this refers to points three and four above. So that's a cold day has a 30% chance of being followed by a hot day, and then a hot day is 20% chance of being followed by a cold day. So what I've just done now is change the probability to be 50%. So that a cold day now has a 50% chance of being followed by a hot day and a 50% chance of being followed by cold day. And let's recreate this model. let's rerun this, and let's see if we get a difference. But we do notice this, the temperature now has been is going a little bit higher. Now notice that we get the same starting temperature, because that's just the average based on this probability that we have here. But if we wanted to potentially start, you know, hotter, we could reverse these numbers, we go 0.2 0.8. let's rerun all of this. And now look at this, what our temperatures are, we start at 12. And then we actually drop our temperature down to 10. So that's how this hidden Markov model works. And this is nice, because you can just tweak the probabilities, this happens pretty well instantly. And we can have a look at our output very nicely. So obviously, this is representing the temperature on our like the first day, this would be the second day, third day, fourth day, fifth, six, seven. And obviously like the more days you go on, the least accurate This is probably going to be because it's just runs off probability. And if you're going to try to predict, you know, a year in advance, and you're using the weather that you have from, I guess the previous year, you're probably not going to get a very accurate prediction. But anyways, these are hidden Markov models. They're not like extremely useful. There's some situations where you might want to use something like this. So that's why we're implementing them in this course, and showing you how they work. It's also another feature of TensorFlow that a lot of people don't talk about receive. And, you know, personally, I hadn't really heard of hidden Markov models until I started developing this course. So one of these that has been eight for this module. Now, I hope that this kind of gave you guys a little bit of an idea of how we can actually implement some of these machine learning algorithms, a little bit of idea of how to work with data, how we can feed that to a model, the importance between testing and training data. And then obviously, linear regression is when we focused a lot on so I hope you guys are very comfortable with that algorithm. Then what was the last, the second one we did, I kind of go up to remember exactly the sequence we had here. So classification that was important as well. So I hope you guys really understood that clustering. We didn't go too far into that. But again, this is an interesting algorithm. And if you need to do some kind of clustering, you now know of one algorithm to do that, called k means clustering, and you understand how that works. And now you know, hidden Markov models. So in the next module, we're going to start covering neural networks, we now have the knowledge, we need to really dive in there and start doing some cool stuff. And then in the future modules, we're going to do deep computer vision, I believe we're gonna do chatbots with recurrent neural networks, and then some form of reinforcement learning at the end. So with that being said, let's go to the next module. love everybody, and welcome to module four. Now, in this module of this course, we're gonna be talking about neural networks, discussing how neural networks work, a little bit of the math behind them talking about gradient descent, and back propagation, and how information actually flows to the neural network. And then getting into an example where we use a neural network to classify articles of clothing. So I know that was a lot, but that's what we're gonna be covering here. Now, neural networks are complex, there's kind of a lot of components that go into them. And I'm going to apologize right now, because it's very difficult to explain it all at once, what I'm going to be trying to do is kind of piece things together and explain them in blocks. And then at the end, you know, kind of combine everything together. Now, I will say, in case any of you didn't watch the beginning of this course, I do have very horrible handwriting. But this is the easiest way to explain things to you guys. So bear with me, you know, I'm sure you'll be able to understand what I'm saying. But it might just be painful to read some of it. Alright, so let's get into right away and start discussing what neural networks are and how they work. Well, the whole point of a neural network is to provide, you know, classification or predictions for us. So we have some input information, we feed it to the neural network, and then we want it to give us some output. So if we think of the neural network as this black box, we have all this input, right, we give all this data to the neural network, maybe we're talking about an image, maybe we're talking about just some random data points, maybe we're talking about a data set, then we get some meaningful output. This is what we're looking at. So if we're just looking at a neural network from kind of the outside, we think of it as this magical black box, we give some input, it gives us some output. And I mean, we could call this black box, just some function, right? Where it's a function of the input, maps it to some output. And that's exactly what a neural network does, it takes input and maps that input to some output, just like any other function, right, just like if you had a straight line like this, this is a function, you know, this is your line, you know, whatever it is, you get to say y equals like 4x, maybe that's your line, you give some input x, and it gives you some value y, this is a mapping of your input to your output. Alright, so now that we have that down, what is a neural network made up of? Well, a neural network is made up of layers. And remember, we talked about the layered representation of data when we talk about neural networks. So I'm going to draw a very basic neural network, we're going to start with the input layer. The input layer is always the first layer in our neural network. And it is what is going to accept our raw data. Now what I mean by raw data is whatever data we like want to give to network, whatever we want to classify whatever our input information is, that's what this layer is going to receive in the neural network. So we can say, you know, these arrows represent our input, and they come to our first input layer. So this means for example, if you had an image and this image, and I'll just draw like one like this, let's say this is our image, and it has all these different pixels, right? All these different pixels in the image, and you want to make a classification on this image. Well, maybe it has a width and a height and a classic width and height example is 28 by 28. If you had 28 by 28 pixels, and you want to make a classification on this image, how many input neurons you think you would need in your neural network to do this? Well, this is kind of, you know, a tough question, if you don't know a lot about neural networks. If you're predicting for the image, if you're going to be looking at the entire image to make a prediction, you're going to need every single one of those pixels, which is 28 times 28 pixels, which I believe is something like 784. I could be wrong on that number, but I believe that's what it is. So you would need 784 input input neurons, that's totally fine. That might seem like a big number. But we deal with massive numbers when it comes to computers. So this really isn't that many. But that's an example of you know how you would use a neural network input layer to represent an image, you would have 784 input neurons, and you would pass one pixel to every single one of those neurons. Now, if we're doing an example, where maybe we just have one piece of input information, maybe it's literally just one number, well, then all we need is one input neuron. If we have an example where we have four pieces of information, we would need four input neurons. Right now this can get a little bit more complicated, but that's the basis that I want you to understand is that you know the pieces of input, you're going to have for Regardless of what they are, you need one input neuron for each piece of that information, unless you're going to be reshaping or putting that information at different forms. Okay, so let's just actually skip ahead and go to now our output layer. So this is going to be our output. Now what is our output layer? Well, our output layer is going to have as many neurons and again, the neurons are just representing like a node in the layer as output pieces that we want. Now let's say we're doing a classification for images, right. And maybe there's two classes that we could represent. Well, there's a few different ways we could design our output layer, what we could do is say, okay, we're going to use one output neuron, this output neuron is going to give us some value, we want this value to be between zero, and one. And we'll say that's inclusive. Now, what we can do now if we're predicting two classes, say, okay, so if my open neuron is going to give me some value, if that value is closer to zero, then that's going to be close to zero. If this value is closer to one, it's going to be class one, right? And that would mean, we have our training data, right? And we talked about training and testing data, we give our input, and our output would need to be the value zero, or one because it's either the correct class which is zero, right? Or the correct class, which is one. So like our, what am I saying our labels for our training data set would be zero and one, and then this value on our output neuron would be guaranteed to be between zero and one, based on something that I'm going to talk about a little bit later. That's one way to approach it, right, we have a single value, we look at that value. And based on what that value is, we can determine, you know what class we predicted not work sometimes. But in other instances, when we're doing classification, what makes more sense is to have as many output neurons as classes you're looking to predict for. So let's say we're gonna have, you know, like five classes that were predicting for maybe these three pieces of input information are enough to make that prediction, well, we would actually have five output neurons, and each of these neurons would have a value between zero and one. And the combination, so the sum of every single one of these values would be equal to one. Now, can you think of what this means if every single one of these neurons has a value between zero and one, and their sum is one? What does this look like to you? Well, to me, this looks like a probability distribution. And essentially, what's going to happen is we're gonna make predictions for how strongly we think each our input information is each class. So if we think that it's like class one, maybe we'll just label these like this, then what we would do is say, Okay, this is going to be 0.9, representing 90%. Maybe this is like 0.001, maybe this is 0.05 0.003, right, you get the point, it's going to add up to one, and this is a probability distribution for our output layer. So that's a way to do it as well. And then obviously, if we're doing some kind of regression task, we can just have one neuron and that will just predict some value. And we'll define you know what we want that value to be. Okay. So that's my example, for my output. Now, let's erase this. And let's actually just go back to one output neuron, because that's what I want to use for this example. Now, we have something in between these layers, because obviously, you know, we can't just go from input to output with nothing else. What we have here is called a hidden layer. Now in neural networks, we can have many different hidden layers, we can add, you know, hidden layers that are connecting to other hidden layers, and like we could have hundreds 1000s if we wanted to, for this basic example, we'll use one. And I'll write this as hidden. So now we have our three layers. Now why is this called hidden? reason this is called hidden is because we don't observe it. When we're using the neural network, we pass information to the input layer, we get information from the output layer, we don't know what happens in this hidden layer, or in these hidden layers. Now, how are these layers connected to each other? How do we get from this input layer to the hidden layer to the output layer and get some meaningful hope? Well, every single layer is connected to another layer with something called weights. Now we can have different kind of architectures of connections, which means I could have something like this one connects to this, this connects to this, this connects to this. And that could be like my connection kind of architecture, right? We could have another one where this one goes here. And you know, maybe this one goes here. And actually, after I've drawn this line, now we get what we're gonna be talking about a lot, which is called a densely connected neural network. Now, a densely connected neural network, or a densely connected layer, essentially means that it's connected to every node from the previous layer. So in this case, you can see, every single node in the input layer is connected to every single node in the output layer, or in the hidden layer, my back. And these connections are what we call weights. Now, these weights are actually what the neural network is going to change and optimize to determine the mapping from our input to our output. Because again, remember, that's what we're trying to do. We have some kind of function, we get some input, it gives us some output. How do we get that input an output? Well, by modifying these weights, I was a little bit more complex, but this is the starting. So these are the lines that I've drawn are really just numbers. And every single one of these lines is some numeric value. Typically, these numeric values are between zero and one, but they can be large they can be negative really depends on what kind of network you're doing and how you've designed it. Now, let's just write some random numbers, we'd have like 0.1, this could be like 0.7, you get the point, right, we just have numbers for every single one of these lines. And these are what we call the trainable parameters that our neural network will actually tweak and change as we train to get the best possible result. So we have these connections. Now our hidden layers connected to our output layer as well. This is again another densely connected layer. Because every layer, or every neuron neuron from the previous layer is connected to every neuron from the next layer, if you would like to determine how many connections you have, what you can do is say there's three neurons here, there's two neurons here, three times two equals six connections. That's how that works from layers. And then obviously, you can just multiply all the neurons together as you go through and determine what that's going to be. Okay, so that is how we connect these layers, we have these weights, so let's just write a W on here. So we remember that those are weights. Now, we also have something called biases. So let's add a bias here, I'm going to label this beat. Now biases are a little bit different than these nodes, we have regular, there's only one bias, and a bias exists in the previous layer to the layer that it affects. So in this case, what we actually have is a bias that connects to each neuron in the next layer from this, right, so it's still densely connected. But it's just a little bit different. Now notice that this bias doesn't have an arrow beside it, because this doesn't take any input information. This is another trainable parameter for the network. And this bias is just some constant numeric value, that we're going to connect to the hidden layer, so we can do a few things with it. Now these weights always have a value of one, we're going to talk about why they have a value of one in a second. But just know that whenever a bias is connected to another layer, or to another neuron, its weight is typically one. Okay, so we have that connected, we have our bias. And that actually means we have a bias here as well. And this bias connects to this, notice that our biases do not connect with each other. The reason for this, again, is they're just some constant value. And they're just something we're kind of adding into the network is another trainable parameter that we can use. Now let's talk about how we actually pass information through the network and why we even use these weights and biases of what they do. So let's say we have, I can't really think of a good examples, we're just gonna do some arbitrary stuff. Let's say we have like a data points, right? x, y, z. And all these data points have some map value, right? There's some value that we're looking for for them, or there's some class we're trying to put them in, maybe we're clustering them between, like, red dots and blue dots. So let's do that. Let's say an XYZ is either a part of the red class, or the blue class, let's just do that. So what we want this opener on to give us is red or blue. So what I'm going to do is say, since you just one class, will get this output neuron in between the range is your own one will say, Okay, if it's closer to zero, that's red, if it's closer to one that's blue. And that's what we'll do for this network. And for this example, now, our input neurons are going to obviously be x, y, and Zed. So let's pick some data point. And let's say we have you know, the value to two. That's our data point. And we want to predict whether it's red or blue. How do we pass it through? Well, what we need to do is determine how we can, you know, find the value of this hidden layer note, we already know the value of this input node. But now we need to go to the next layer using these connections and find what the value of these nodes are. Well, the way we determine these values is I'm going to say and I've just said n one, just to represent like this is a node like this is node one, maybe this one should be node two, is equal to what we call a weighted sum of all of the previous nodes that are connected to it, if that makes any sense to you guys. So a weighted sum is something like this, I'm just gonna write the equation, I'll explain it, I'm gonna say n one is equal to the sum of not say n equals zero, let's say I equals zero to n. In this case, we're going to say w i times x i plus b. Now I know this equation looks really mathy and complicated, it's really not what this symbol and this equation here means is taking the weighted sum of all the neurons that are connected to this neuron. So in this case, we have neuron x neuron y and neuron Zed connected to N one. So when we take the weighted sum, or we calculate this, what this is really equal to is the weight at neuron x. So we say w x times the value at neuron x, which in this case, is just equal to two right? plus whatever the weight is at neuron y. So in this case, this is w y And then times two, and then you get the point where we have w Zed, and I'm trying on the edge of my drawing tablet to write this, times two. Now, obviously, these weights have some numeric value. Now, when we start our neural network, these weights are just completely random. They don't make any sense. They're just some random values that we can use. As the neural network gets better, these weights are updated and changed to make more sense in our network. So right now, we'll just leave them as w XWYW. Said, but no, these are some numeric values. So this returns to a some value, right, some value, let's just call this value v. And that's what this is equal to. So V, then what we do is we add the bias. Now remember, the bias was connected with a weight of one, which means if we take the weighted sum of the bias, right, all we're doing is adding whatever that biases value was. So if this bias value was 100, then what we do is we add 100. Now I've just written the plus B to explicitly state the fact that we're adding the bias, although it could really be considered as a part of the summation equation. Because it's another connection to the neural net, let's just talk about what this symbol means for anyone that's confused about that. Essentially, this stands for some AI stands for an index, an N stands for what index we'll go up to now, m means how many neurons we had in the previous layer. And then what we're doing here, saying wi xi, so we're gonna say weight 0x, zero plus weight 1x, one plus weight 2x. Two is almost like a for loop, we're just adding them all together. And then we add the beep. And I hope that makes enough sense. So that we understand that. So that is our weighted sum, enter bias. So essentially, what we do is we go through and calculate these values. So this gets some value, maybe this values like 0.3, maybe this value seven, whatever it is, and we do the same thing now at our output neuron. So we take the weighted sum of this value times its weight, and then we take the weighted sum, so this value times its weight, plus the bias, this is given some value here, and then we can look at that value and determine what the output of our neural network is. So that is pretty much how that works in terms of the weighted sums, the weights and the biases. Now let's talk about the kind of the training process and another thing called an activation function. So I've lied to you a little bit, because I've said, I'm just going to start erasing some stuff. So we have a little bit more room on here. So I've lied to you. And I've said that this is completely how this works. What we're missing one key feature that I want to talk about, which is called an activation function. Now remember how we want this value to be in between zero and one right at our output layer? Well, right now, we can't really guarantee that that's going to happen. I mean, especially if we're starting with random weights and random biases in our neural network, we're passing this information through, we could get to this point here, we could have like 700 as our value. That's kind of crazy to me, right? We have this huge value, how do we look at 700 and determine whether this is red or whether this is blue? Well, we can use something called an activation function. Now I'm gonna go back to my slides here, whatever we want to call this, this notebook, just to talk about what an activation function is. And you guys can see you can follow along, I have all the equations kind of written out here as well. So let's go to activation function, which is right here. Okay. So these are some examples of an activation function. And I just want you to look at what they do. So this first one is called a rectified linear unit. Now notice that essentially, what this activation function does is take any values that are less than zero and just make them zero. So any x values that are you know, in the negative, it just makes their y zero. And then any values that are positive, it's just equal to whatever their positive value is. So if it's 10, is 10. This allows us to just pretty much eliminate any negative numbers, right? That's kind of what rectified linear unit dots. Now 10 H or hyperbolic tangent? What does this do? It's actually squishes, our values between negative one and one. So it takes whatever values we have. And the more positive they are, the closer to one they are, the more negative they are the closer to negative one they are. So can we see why this might be useful, right for a neural network. And then last one is sigmoid, what this does is squish our values between zero and one. A lot of people call it like the squishy fire function, because all it does is take any extremely negative numbers and put them closer to zero and any extremely positive numbers and put them close to one any values in between, you're going to get some number that's kind of in between that based on the equation one over one plus e to the negative Zed, and this is a data set, I guess is equal to that. Okay, so that's how that works. So those are some activation functions. Now, I hope that's not too much math for you. But let's talk about how we use them. Right. So essentially, what we do is at each of our neurons, we're going to have an activation function that is applied to the output of that neuron. So we take this this weighted sum plus the bias, and then we apply an activation function to it before we send that value to the next neuron. So in this case, n one isn't actually just equal to this, what n one is equal to is n one is equal to F, which stands for activation function of this equation, right? So say I equals zero w i x i plus b. And that's what n ones value is equal to when it comes to this output neuron. So each of these have an activation function on them, and two has the same activation function as an one. And we can define what activation function we want to apply at each neuron. Now, at our output neuron, the activation function is very important, because we need to determine what we want our value to look like, do we want it between negative one and one? Do we want it between zero and one? Or do we want it to be some massively large number? Do we want it between zero and positive infinity? What do we want, right? So what we do is we pick some activation function for our output neuron. And based on what I said, where we want our values between zero and one, I'm going to be picking the sigmoid function. So sigmoid recall squishes, our values between zero and one. So what we'll do here is we'll take n one, right, so n one, times whatever the weight is there. So weight zero, plus n two times weight one plus a bias and apply sigmoid and then this will give us some value between zero and one, then we can look at that value. And we can determine what the output of this network is. So that's great. And that makes sense why we would use that on the output neuron, right? So we can squish our value in between some kind of value. So we can actually look at it and determine you know what to do with it, rather than just having these crazy, and I want to see if I can make this eraser any bigger. That's much better. Okay. So there we go. Let's just erase some of this. And now let's talk about why we use the activation function on like an intermediate layer like this. Well, the whole point of an activation function is to introduce complexity into our neural network. So essentially, we, you know, we just have these basic weights and these biases. And this is kind of just, you know, like a complex function. At this point, we have a bunch of weights, we have a bunch of biases. And those are the only things that we're training and the only things that we're changing to make our network better know what an activation function can do, is, for example, take a bunch of points that are on the same like plane, right, so let's just say these are in some point, if we can apply an activation function of these, where we introduce a higher dimensionality. So an activation function like sigmoid that is like a higher dimension function, we can hopefully spread these points out and move them up or down off the plane in hopes of extracting kind of some different features. Now, it's hard to explain this until we get into the training process of the neural network. But I'm hoping this is maybe giving you a little bit of idea, if we can introduce a complex activation function into this kind of process, then it allows us to make some more complex predictions, we can pick up on some different patterns. If I can see that, you know, when sigmoid or rectified linear unit is applied to this output, it moves my point opera moves it down or moves it in, like whatever direction and n dimensional space, then I can determine specific patterns I couldn't determine in the previous dimension. That's just like if we're looking at something in two dimensions. If I can move that into three dimensions, I immediately see more detail. There's more things that I can look at, right? And I will try to do a good example of why we might use it like this. So let's say we have a square right? Like this, right? And I asked you, I'm like, tell me some information about this square? Well, what you can tell me immediately is you can tell me the width, you can tell me the height. And I guess you could tell me the cover, right? You could tell me has one face, you could tell me that's four vertexes in Tell me a fair amount about the square, you could tell me its area. Now what happens as soon as I extend this square, and I make it into a cube? Well, now you can immediately tell me a lot more information, you can tell me you know the height, or I guess the depth, width, height, depth? Yeah, whatever you want to call it there. You can tell me how many faces it has, you can tell me what color each of the faces are, you can tell me how many vertexes you can tell me if this cube or the square this rectangle is uniform or not. And you can pick up on a lot more information. So that's kind of I mean, this is a very oversimplification of what this actually does. But this is kind of the the concept, right is that if we are in two dimensions, if we can somehow move our data points into a higher dimension by applying some function to them, then what we can do is get more information and extract more information about the data points, which will lead to better predictions. Okay. So now that we've talked about all this, as I'm talking about how neural networks train, and I think you guys are ready for this, this is a little bit more complicated. But again, it's not that crazy. Alright, so we talked about these weights and biases. And these weights and biases are what our network will come up with and determine to, you know, like, make the network better. So essentially, what we're going to do now is talk about something called a loss function. So as our network starts, right, the way that we train it just like we've trained other networks, or other machine learning models is we give it some information, we give it what the expected output is, and then we just see what the expected output or what the output was from the network compared to the expected output and modify it like that. So essentially, what we start with is we say okay, two to two, we say this class is red, which I forget what I labeled that what as but let's just say you Like that was a zero. Okay, so this class is zero. So I want this network to give me a zero for the point two to two. Now this network starts with completely random weights and completely random biases. So chances are when we get to this output here, we're not going to get zero, maybe we get some value after applying the sigmoid function that's like 0.7. Well, this is pretty far away from red. But how far away is it? Well, this is where we use something called loss function. Now, what a loss function does is calculate how far away our output was from our expected output. So if our expected output is zero, and our output was your point seven, the loss function is going to give us some value that represents like how bad or how good this network was. Now, it tells us this network was really bad, it gives us like a really high loss, then that tells us that we need to tweak the weights and biases more and move the network in a different direction. We're starting to get into gradient descent. But let's understand the loss function first. So it's going to say if it was really bad, let's move it more, let's change the weights more drastically, let's change the biases more drastically. Whereas if it was really good, it'll be like okay, so that one was actually decent, you only need to tweak a little bit, and you only need to move this, this and this. So that's good. And that's the point of this last month, it just calculate some value, the higher the value, the worse our network was. A few examples of loss function. Let's go down here cuz I think I had a few optimizer loss back here. mean squared error mean absolute error and hinge loss. Now mean absolute error. You know what, let's actually just look one up here. So mean, absolute error, and have a look at what this is. So images, let's pick something, this is mean absolute error. This is the equation for mean absolute error. Okay, so the summation of the absolute value of y i minus lambda of x i over n. Now, this is kind of complicated, I'm not going to go into it too much I was expecting, I was hoping I was gonna get like a better example for mean squared error. Okay, so these are the three loss functions here. So mean squared error mean absolute error hinge loss, obviously, there's a ton more that we could use, I'm not going to talk about which how each of these work specifically, I mean, you can look them up pretty easily. And also, so you know, these are also referenced as cost function. So cost or loss, you might hear these change these terms kind of interchange, cost and loss essentially mean the same thing. You want your network to cost the least you want your network to have the least amount of loss. Okay, so now that we have talked about the loss function, we need to talk about how we actually update these weights and biases. Now, let's, let's go back to here, because I think I had some notes on it. This is what we call gradient descent. So essentially, the parameters for our network our weights and biases, and by changing these weights and biases, we will you know, either make the network better or make the network worse, the loss function will determine if the network is getting better if it's getting worse. And then we can determine how we're going to move the network to change that. So this is now gradient descent where the math gets a little bit more complicated. So this is an example of what your neural network function might look like. Now, as you have higher dimensional math, you have, you know, a lot more dimensions, a lot more space to explore when it comes to creating different parameters and creating different biases and activation functions and all of that. So as we apply our activation functions, we're kind of spreading our network into higher dimensions, which just makes things much more complicated. Now, essentially, what we're trying to do with the neural network is optimize this loss function. This loss function is telling us how good it is or how bad it is. So if we can get this loss function as low as possible, then that means we should technically have the best neural network. So this is our kind of loss functions, like mapping or whatever, what we're looking for is something called a global minimum, we're looking for the minimum point where we get the least possible loss from our neural network. So if we start where these red circles are, right, I've just stole this image off Google Images, what we're trying to do is move downwards into this globe, global minimum. And this is the process of called gradient descent. So we calculate this loss and we use an algorithm called gradient descent, which tells us what direction we need to move our function to determine our to get to this global minimum. So it essentially looks where we are, it says this was the loss and it says, Okay, I'm going to calculate what's called a gradient, which is literally just a steepness or a direction. And we're going to move in that direction. And then the algorithm called brought backpropagation will go backwards through the network and update the weights and biases so that we move in that direction. I think this is as far as I really want to go because I know this is getting more complicated already, then some of you guys probably can handle on that I can probably explain, but that's kind of the basic principle. We'll go back to the drawing board and we'll do a very quick recap before we get into some of the other stuff, neural networks, input, output, hidden layers connected with weights, there's biases that connect to each layer. These biases can be thought of as Y intercepts, they'll simply move completely up or move completely down that entire, you know, activation function, right, we're shifting things left or right, because this will allow us to get a better prediction and have another parameter that we can train and add a little bit of complexity to our neural network model. Now, the way that information is passed through these layers is we take the weighted sum out of neurons of all of the connected neurons to it, we then add this bias neuron, and we apply some activation function that's going to put this you know, these values in between two set values. So for example, when we talk about sigmoid that's going to squish our values between zero and one, when we talk about hyperbolic tangent, that's going to squish our values between negative one and one. And when we talk about rectified linear unit, that's gonna squish our values between zero and positive infinity. So we apply those activation functions, and then we continue the process. So n, one gets its value, and two gets its value. And then finally, we make our way to our output layer, we might have passed through some other hidden layers before that. And then we do the same thing. We take the weighted sum, we add the bias, we apply an activation function, we look at the output, and we determine whether we know we are a class Why are we are classes that are whether this is the value we're looking for. And, and that's how it works. Now we're at the training process, right? So we're doing this now, that's kind of how this worked when we were making a prediction. So when we're training, essentially, what happens is we just make predictions, we compare those predictions to whatever these expected values should be using this loss function, then we calculate what's called a gradient, a gradient is the direction we need to move to minimize this last function. And this is where the Advanced Math happens and why I'm kind of skimming over this aspect. And then we use an algorithm called back propagation where we step backwards through the network, and update the weights and biases according to the gradient that we calculated. Now, that is pretty much how this works. So you know, the more info we have, likely, unless we're overfitting, but, you know, if we have a lot of data, if we can keep feeding the network, it starts off being really horrible, having no idea what's going on. And then as more and more information comes in, it updates these weights and biases gets better and better sees more examples. And after, you know, certain amount of epochs or certain amount of pieces of information, our network is making better and better predictions and having a lower and lower loss. And the way we will calculate how well our network is doing is by passing it, you know, our validation data set where it can say, okay, so we got an 85% accuracy on this data set, we're doing okay, you know, let's tweak this, let's tweak that, let's do this. So the loss function, the lower this is, the better also known as the cost function. And that is kind of neural networks. In a nutshell. Now, I know this wasn't really in a nutshell, because it was 30 minutes long. But that is, you know, as much of an explanation as I can really give you without going too far into the mathematics behind everything. And again, remember the activation function is to move us up in dimensionality, the bias is another layer of complexity and a trainable parameter for our network allows us to shift this kind of activation function left, right up, down. And yeah, that is how that works. Okay, so now we have an optimizer. This is kind of the last thing on how neural networks work optimizer is literally just the algorithm that does the gradient descent and back propagation for us. So I mean, you guys can read through some of them here, we'll be using probably the atom optimizer for most of our examples, although there's you know, lots of different ones that we can pick from now the this optimization technique, again, it's just a different algorithm. There's some of them are faster, some of them are slower, some of them work a little bit differently. And we're not really going to get into picking optimizers in this course, because that's more of an advanced machine learning technique. Alright, so enough explaining enough math enough drawings, enough talking now it is time to create our first official neural network. Now, these are the imports we're going to need. So import TensorFlow as TF TF from TensorFlow import Kerris again. So this does actually come with TensorFlow, I forget if I said you need to install that before, my apologies and then import NumPy as NP import matplotlib.pi plot as PLT. Alright, so I'm going to do actually similar thing to what I did before, I'm kind of just gonna copy some of this code into another notebook, just to make sure that we can look at everything at the end, and then kind of step through the code step by step rather than all the text going to happen here. Alright, so the data set, and the problem we are going to consider for our first neural network is the fashion amnesty data set. Now the fashion eminence data set contains 60,000 images for training and 10,000 images for validating and testing 70,000 images, and it is essentially pixel data of clothing articles. So what we're going to do to load in this data set from Kara's the section built into Kara as its mentors Like a beginner, like testing training data set, we're gonna say fashion underscore gymnast equals Kara's dot data sets dot fashion feminist. Now this will get the data set object. And then we can load that object by doing fashion m this dot load data. Now by doing this by having the topples, train images, train labels, test images, test labels equals this, this will automatically split our data into the sets that we need. So we need the training. And we need the testing. And again, we've talked about all that. So I'm going to kind of skim through that. And now we have it in all of these kind of topples here. Alright, so let's have a look at this data set to see what we were working with. Okay, so let's run some of this code. Let's get this import going. If it doesn't take forever, okay, let's get the data set. Yeah, this will take a second to download for you guys, if you don't already have a cached. And then we'll go train images dot shape. And let's look at what one of the images looks like. or sorry, what our data set looks like. So we have 60,000 images that are 28 by 28. Now, what that means is we have 28 pixels, or 28 rows of 28 pixels, right? So that's kind of what our, you know, information is. So we're going to have in total 784 pixels, which I've denoted here. So let's have a look at one pixel. So to reference one pixel, this is what I what I'm doing this comes in as a Actually, I'm not sure what type of data frame this is, but let's have a look at it. So let's say type of train underscore images, because I want to see that. So that's an NumPy array. So to reference the different indexes in this is similar to pandas, we're just going to do zero comma 23, comma 23, which stands for you know, image zero, 23, and then 23. And this gives us one pixel. So row 23, column 23, which will be that. Okay, so let's run this. And let's see, this value is 194. Okay, so that's kind of interesting. That's what one pixel looks like. So let's look at what multiple pixels look like. So we'll print, train underscore images. And okay, so we get all these zeros. let's print train images, zero colon, that should work for us. And we're getting all these zeros. Okay, so that's the border of the picture. That's okay, I can't show you what I wanted to show you anyways, one pixel, and I wanted to have you guys guess it is simply just represented by a number between zero and 255. Now what this stands for is the grayscale value of this pixel. So we're dealing with grayscale images, although we can deal with, you know, 3d 45 D images as well, or not five D images, but we can deal with images that have like RGB values for so for example, we could have a number between zero to 55, another number between zero and 255, and another number between zero and 255 for every single pixel, right? whereas this one is just one simple static value. Okay, so it sounds like your pixel values between zero and 205, zero being black and 255 being white. So essentially, you know, it's 255, that means that this is white. If it's zero, that means that it is black. Alright, so let's have a look at the first 10 training labels. So that was our training images. Now, what are the training labels? Okay, so we have an array and we get values from zero to nine. Now, this is because we have 10 different classes that we could have for our data set. So there's 10 different articles of clothing that are represented, I don't know what all of them are, although they are right here. So t shirt, trouser, pullover, dress coat, sandal shirt, sneaker bag, ankle boots. Okay, so let's run this class names, just so that we have that saved. And now what I'm going to do is just use matplotlib to show you what one of the images looks like. So in this case, this is a shirt. I know this is printing out kind of weird, but I'm just showing the image. I know it's like different colors. But that's because if we don't define that we're drawing it grayscale, it's going to do this. But anyways, that is what we get for the shirt. So let's go to another image. And let's have a look at what this one is. I actually don't know what that is. So we'll skip that. Maybe that's a What is it? t shirt or top? This I guess is gonna be like a dress. Yeah, so we do have dress there. Let's go for Have a look at this. Again, some of these are like hard to even make out when I'm looking at the myself. And then I guess this will be like a hoodie or something. I'm trying to get one to sandal to show you guys a few different ones. There we go. So that is a sandal or a sneaker. Okay, so that is kind of how we do that and how we look at the different images. So if you wanted to draw it out, all you do is just make a figure, you just show the image, do the color bar, which is just giving you this, then you're gonna say I don't want to grid and then you can just show the image, right? Because if you don't have this line here, and you show with the grid, oh, it's actually not showing the grid. That's interesting. Although I thought it was going to show me those pixel grid, so I guess you don't need that line. Alright, so data pre processing. Alright, so this is an important step in neural networks. And a lot of times when we have our data, we have it in these like random forms or we're missing data. There's information we don't know or that we haven't seen and typically what we need to do is pre processing. Now what I'm going to do here is squish all my values between zero and one. Typically, it's a good idea to get all of your input values in a neural network in between, like that range in between, I would say negative one, and one is what you're trying to do, you're trying to make your numbers as small as possible to feed to the neural network. The reason for this is your neural network starts out with random weights and biases that are in between the range zero and one, unless you change that value. So if you have massive input information and tiny weights, then you're kind of having a bit of a mismatch. And you're going to make it much more difficult for your network to actually classify your information. Because it's going to have to work harder to update those weights and biases to reduce how large those values are going to be, if that makes any sense. So it usually is a good idea to pre process these and make them in between the value of zero and one. Now, since we know that we're just going to have pixel values that are in the range of 255, we can just divide by 255. And that will automatically scale it down for us. Although it is extremely important that we do this to not only the training images, but the testing images as well. If you just pre process, your training images, and then you pass in, you know, new data that's not pre processed, that's going to be huge issue, you need to make sure that your data comes in the same form. And that means when we're using the model to to make predictions, whatever, you know, I guess it pixel data we have, we need to pre process in the same way that we pre processed our other data. Okay, so let's pre process that. So train images, and test images. And I'm just going to actually steal some of the stuff here and throw it in my other one before we get too far. So let's get this data set. And let's throw it in here, just so we can come back and reference all of this together. Let's go class names. We don't actually need the figures, a few things I can skip, we do need this pre processing step like that. If I could go over here, and then what else do we need, we're going to need this model. Okay, so let's actually just copy the model into this and just make it a little bit cleaner. I'm gonna have a look at it. So new codeblock model. Okay, so model, creating our model. Now creating our model is actually really easy. I'm hoping what you guys have realized so far is that data is usually the hardest part of machine learning and neural networks, getting your data in the right form the right shape, and you know, pre processed correctly. Building the model is usually pretty easy, because we have tools like TensorFlow, and Caris that can do it for us. So we're gonna say model equals Kara's dot sequential. Now, sequential simply stands for the most basic form of neural network, which we've talked about so far, which is just information going from the left side to the right side, passing through the layers, sequentially, right called sequential, we have not talked about recurrent or convolutional neural networks yet. Now what we're going to do here is go Kara's dot layers dot flat. So sorry, inside here, we're going to define the layers that we want in our neural network. This first layer is our input layer. And what flatten does is allows us to take in a shape of 28 by 28, which we've defined here, and flatten all of the pixels into 784 pixels. So we take this 28 by 28, kind of matrix like structure, and just flatten it out. interiors will do that for us, we don't actually need to take our you know, matrix data in transform before passing. So we've done that. Next we have Kara's dot layers dot dense 128. activation equals rectified linear unit. So this is our first hidden layer, layer two, right? That's what I've denoted here. And this is a dense layer. Now dense again means that all of the What is it, the neurons in the previous layer are connected to every neuron in this layer. So we have 828 neurons here, how do we pick that number? We don't know, we kind of just came up with it. Usually, it's a good idea that you're going to do this as like a little bit smaller than what your input layer is, although sometimes it's going to be bigger, you know, sometimes it's going to be half the size really depends on the problem, I can't really give you a straight answer for that. And then our activation function we'll define as rectified linear unit. Now we could pick a bunch of different activation functions, there's time, we could pick sigma, and we could pick tan h, which is hyperbolic tangent, doesn't really matter. And then we're going to define our last layer, which is our output layer, which is a dense layer of 10 output neurons with the activation of softmax. Okay, so can we think of why we would have picked 10? Here, right, I'll give you guys a second to think about it. based on the fact that our output layer, you know, is supposed to have as many neurons as classes, we're going to predict four. So that is exactly what we have 10. If we look, we have 10 classes here. So we're going to have 10 output neurons in our output layer. And again, we're going to have this probability distribution. And the way we do that is using the activation function softmax. So soft, Max will make sure that all of the values of our neurons add up to one and that there are between zero and one. So that is our, our model. We've created the model now. So let's actually run this See, you're gonna get any errors here, it's just going to run. And then we'll run the model. And then we'll go on to the next step, which is actually going to be training and testing the model. Okay, so let's create the model now shouldn't get any issues, and we're good. And now let's move on to the next step. I'm forgetting what it is, though, which is train the model. Oh, sorry, compiling the model. Okay, so to compiling the model. So we've built now what we call the architecture of our neural network, right, we've defined the amount of neurons in each layer, we've defined the activation function, and we define the type of layer and the type of connections. The next thing we need to pick is the optimizer the loss and the metrics we're going to be looking at. So the optimizer we're going to use is atom. This is, again, just the algorithm that performs the gradient descent, you don't really need to look at these too much, you can read up on some different activation functions, or sorry, optimizers, if you want to kind of see the difference between them, but it's not crazy, we're going to pick a loss. So in this case, sparse categorical cross entropy, again, not going to go into depth about that you guys can look that up if you want to see how it works, and then metrics. So what we're looking for the output that we want to see from the network, which is accuracy. Now from, you know, kind of right now, with our current knowledge, we're just going to stick with this as what we're going to compile our neural networks with, we can pick different values if we want. And these are what we call, what is it hyper parameter tuning. So the parameters that are inside here, so like the weights, and the biases are things that we can't manually change. But these are things that we can change, right, the optimizer, the loss, the metrics, the activation function, we can change that. So these are called hyper parameters. Same thing with the number of neurons in each layer. So hyper parameter tuning is a process of changing all of these values, and looking at how models perform with different hyper parameters change. So I'm not really going to talk about that too much. But that is something to note, because you'll probably hear that, you know, this hyper parameter kind of idea. Okay, so we've compiled the model now using this, which just means we picked all the different things that we need to use for it. And now on to training the model. So I'm just gonna copy this in. Again, remember, this, these parts are pretty syntactically heavy, but fairly easy to actually do. So we're going to fit the model. So fit just means we're fitting it to the training data, it's another word for training, essentially. So we're going to pass it the training images, the training labels, and notice how much easier it has to pass this. Now, we don't need to do this input function, we don't need to do all of that, because Kerris can handle it for us. And we define our epochs as 10 epochs is another hyper parameter that you could tune and change if you wanted to. Alright, so that will actually fit our model. So what I'm going to do is put this in another code block, so I don't need to keep retraining this. So we'll go like that. And let's actually look at this training process. So we run the model, this should compile and now let's fit it and let's see what we actually end up getting. Alright, so epoch one, and we can see that we're getting a loss, and we're getting accuracy printing out on the side here. Now this was going to take a second like this is going to take a few minutes, as opposed to our other models that we made or not a few minutes. But you know, a few seconds, when you have 60,000 images, and you have a network that's comprised of 784 neurons, 128 neurons, and then 10 neurons, you have a lot of weights and biases and a lot of math that needs to go on. So this will take a few seconds to run. Now, if you're on a much faster computer, you'll probably be faster than this. But this is why I like Google Collaboratory. Because you know, this isn't using any of my computer's resources to train. It's using this. And we can see, like the RAM and the disk. How do I look at this in this network? Oh, is it in? Let me look at this now. Okay, I don't know why it's not letting me click this. But usually you can have a look at it. And now we've trained and we've fit the model. So we can see that we had an accuracy of 91%. But the thing is, this is the accuracy, or testing or our training data. So now if we want to find what the true accuracy is, what we need to do is actually test it on our testing data. So I'm going to steal this line of code here. This is how we test our model. Pretty straightforward. I'll just close this. Let's go to new code block. So we have test last test accuracy is model dot evaluate test images, test labels, verbose equals one. Now what is verbose? I was hoping it was gonna give me the things so I could just read it to you guys. But verbose essentially, is just are we looking at output or not? So like, how much information are we seeing as this model evaluates? It's like how much is printing out to the console? That's what that means. And yes, this will just split up kind of the metrics that are returned to this into test loss and test accuracy. So we can have a look at. Now, you will notice when I run this, that the accuracy will likely be lower on this than it was on our model. So actually, the accuracy we had from this was about 91. And now we're only getting at 8.5. So this is an example of something we call overfitting. Our model seemed like it was doing really well on the testing. data or sorry, the training data. But that's because it was seeing that data so often right with 10 epochs, it started to just kind of memorize that data and get good at seeing that data. Whereas now when we pass it new data that it's never seen before, it's only 88.5% accurate, which means we overfit our model. And it's not as good at generalizing for other data sets, which is usually the goal, right? When we create a model, we want the highest accuracy possible, but we want the highest accuracy possible on new data. So we need to make sure our model generalizes properly. Now in this instance, you know, like, it's, it's hard to figure out how do we do that, because we don't know that much about neural networks. But this is the idea of overfitting and of hyper parameter tuning, right? So if we can start changing some of this architecture, and we can change, maybe the optimizer the loss function, maybe we go epochs eight, let's see if this does any better, right, so let's now fit the model with eight epochs. We'll have a look at what this accuracy is. And then we'll test it and see if we get a higher accuracy on the testing data set. And this is kind of the idea of that hyper parameter tuning, right? We just look at each epoch or not each epoch, we look at each parameter, we tweak them a little bit, and usually will, like write some code that automates this for us. But that's the idea is we want to get the most generalize accuracy that we can. So I'll wait for this to train, we're actually almost done. So I won't even bother cutting the video. And then we'll run it this evaluation. And we'll see now if we got a better accuracy. Now I'm getting a little bit scared because the accuracy is getting very high here. And sometimes you know that you want the accuracy to be high on your training data. But when it gets to a point where it's very high, you're in a situation where it's likely that you've overfit. So let's look at this now. And let's see what we get. So 88.4. So we actually dropped down a little bit. And it seemed like those epochs didn't make a big difference. So maybe if I train it on one epoch, let's have an idea and see what this does, you know, make your prediction, you think we're gonna be better? Do you think we're gonna be worse, it's only seen the training data one time, let's run this. And let's see 89.34. so in this situation, less epochs was actually better. So that's something to consider, you know, a lot of people I see just go like 100 epochs and just think their model is going to be great, that's actually not good to do. A lot of the times you're going to have a worse model, because what's going to end up happening is it's going to be seeing the same information so much tweaking, so specifically to that information that it's seen that when you show it new data, it can't actually, you know, classify and generalize on that. Alright, so let's go back. And let's see what else we're doing now, with this. Okay, so now that we've done that, we need to make predictions. So to make predictions is actually pretty easy. So I'm actually just going to copy this line in, we'll go into a new code block down here. So all you have to do is say model dot predict, and then you're going to give it an array of images that you want to predict on. So in this case, if we look at test images shape, so actually, let's make a new code block. And let's go here. So let's say test underscore images, dot shape. All right, give me a second. So we have 10,000 by 28, by 28. So this is an array of 10,000 entries of images. Now, if I just wanted to predict on one image, what I could do is say test images, zero and then put that inside of an array. The reason I need to do that is because the data that this model is used to seeing is an array of images to make a prediction on that's what this predict method means. And it's much better at making predictions on many things at once than just one specific item. So if you are predicting one item only, you do need to put it in an array, because it's used to seeing that form. So we could do this. Um, I mean, I'm just going to leave it so we're just going to predict on every single one of the test images, because then we can have a look at a cool function I've kind of made. So let's actually do this predictions equals model dot predict test images. I mean, let's print predictions. And look at actually what it is. Where is my autocomplete? There it is. Okay. So let's have a look. Is this some object? Whoa. Okay, so this is arrays of arrays, that looks like we have some, like really tiny numbers on them. So what this is, is essentially every single, you know, prediction, or so every single image has a list that represents the prediction for it, just like we've done with kind of the linear models and stuff like that. So if I want to see the prediction for test image zero, I would say prediction zero, right? let's print this out. And this is the array that we're getting these this is the probability distribution that was calculated on our output layer for you know, these, what does it for that image. So if we want to figure out what class we actually think that this is predicting for, we can use a cool function from NumPy called arg Max, which essentially is just going to take the index, this is going to return to us the index of the maximum value in this list. So let's say that it was I'm looking for the least negative, which I believe is this, so this should be nine. This should return to us nine because this is the index have the highest value in this list, unless I'm just wrong when I'm looking at the negatives here. So nine, that's what we got. Okay, so now if we want to see what the actual classes, well, we have our class names up here. So we know class nine is actually ankle boot. So let's see if this is actually an ankle boot. So I'm just going to do class underscore names. I think that's what I called it, like this. So that should print out what it thinks it is. Yeah, class underscore names. But now, let's actually show the image of this prediction. So to do that, I'm just going to steal some code from here because I don't remember all the syntax off the top of my head. So this, so let's steal this figure. Let's show this and let's see if it actually looks like an ankle boots. So to do that, we're gonna say, test underscore images, zero because obviously, image zero corresponds to prediction zero, and that will show this and see what we get. Okay, so ankle boot, and we'll be looking at the image is actually an ankle boot. And we can do this for any of the images that we want to read. So if I do prediction one, prediction one, now let's have a look pullover, that kind of looks like a pullover to me. I mean, I don't know if it actually is, but that's what it looks like. You need to to take a look here. Okay, trouser, yep, looks like trousers to me. And we can see that that is how we get to predictions for our model, we use model dot predict. Alright, so let's move down here now to the next thing that we did. Alright, so we've already done that. So verifying predictions, okay. So this is actually a cool kind of script that I wrote, I'll zoom out a little bit, so we can read it. What this does, is let us use our model to actually make, and I'm stealing some of this from TensorFlow to make predictions on any entry that we want. So what it's going to do is ask us to type in some number, we're going to type in that number, it's going to find that image in the test data set, it's going to make a prediction on that from the model, and then show us what it actually is versus what it was predicted being. Now I just need to actually run. Actually, let's just steal this code and bring it in the other one, cuz I've already trained the model there. So we don't have to wait again. So let's go 11. For 11, let's go to a new code block, and run that. So let's run this script, have a look down here. So pick a number, we'll pick some number, let's go 45. And then what it's going to do is say expected sneaker, guess sneaker and actually show us the image there. So we can see this is what you know, our pixel kind of data looks like. And this is what the expected was. And this is what the guess was from the neural network. Now we can do the same thing. If we run it again, pick a number 34. Let's see here, expected bank, guess back. So that's kind of showing you how we can actually use this model. So anyways, that that has been it for this kind of module on neural networks. Now I did this in about an hour, I'm hoping I explained a good amount that you guys understand now how neural networks work. In the next module, we're going to move on to convolution neural networks, which again, should help you know, kind of get your understanding of neural networks up as well as learn how we can do deep computer vision, object recognition and detection using convolutional neural networks. So with that being said, let's get into the next module. Hello, everyone, and welcome to the next module in this TensorFlow course. So what we're gonna be doing here is talking about deep computer vision, which is very exciting, very cool. This has been used for all kinds of things you ever seen the self driving cars, for example, Tesla, they you actually use a TensorFlow deep learning model is obviously very complicated more than I can really explain here to do a lot of their computer vision for self driving, we've used computer vision in the medicine field, computer vision is actually used in sports a lot. For things like goal line technology, and even detecting images and players on the field doing analysis. There's lots of cool things we're doing with it nowadays. And for our purposes, what we're gonna be doing is using this for to perform classification, although it can be used for object detection and recognition, as well as facial detection and recognition as well. So all kinds of applications, in my opinion, one of the cooler things in deep learning that we're doing right now, and let's go ahead and talk about we're actually gonna be focusing on here. So we're gonna start by discussing what a convolutional neural network is, which is essentially the way that we do deep learning, we're going to learn about image data. So what's the difference between image data and other regular data, we're gonna talk about convolutional layers and pooling layers and how stacks of those work together as what we call a convolutional base for our convolutional neural network. We're going to talk about cnn architectures and get into actually using pre trained models that have been developed by companies such as Google and TensorFlow themselves to perform classification tasks for us. So that is pretty much the breakdown of what we're about to learn there's quite a bit in this module is probably the more difficult one or the most difficult one we've been doing so far. So if you do get lost at any point, and you don't understand some of it, don't feel bad. This stuff is very difficult. And I would obviously recommend reading through some of the descriptions I have here in this notebook, which again, you can find from the link in the description or looking up some things that maybe I don't go into Enough, enough. depth about in your own time as I can't really spend, you know, 1011 hours explaining a convolutional neural network. So let's now talk about image data, which is the first thing we need to understand. So in our previous examples, what we did with when we had a neural network is we had two dimensional data, right, we had a width and a height when we're trying to classify some kind of images using a dense neural network. And well, that's what we use two dimensions, well, with an image, we actually have three dimensions. And what makes up those dimensions, well, we have a height and we have a width, and then we have something called color channels. Now, it's very important to understand this, because we're going to see this a lot as we get into convolution networks that the same image is really represented by three specific layers, right, we have the first layer, which tells us all of the red values of the pixels, the second layer, which tells us all the green values, and the third layer, which tells us all the blue values. So in this case, those are the color channels. And we're going to be talking about channels in depth quite a bit in this series. So just understand that although you think of an image as a two dimensional kind of thing, in our computer, it's really represented by three dimensions, where these channels are telling us the color of each pixel. Because remember, in red, green, blue, you have three values for each pixel, which means that you're going to need three layers to represent that pixel. Right. So this is what we can kind of think of it as a stack of layers. And in this case, a stack of pixels, right, or stack of colors really telling us the value for each pixel. So if we were to draw this to the screen, we would get the blue, green and red values of each pixel, determine the color of it, and then draw the two dimensional image right based on the width and the height. Okay, so now we're gonna talk about a convolutional neural network and the difference between that in a dense neural network. So in our previous examples, when we use the dense neural network to do some kind of image classification, like that fashion, M, this data set, what it essentially did was look at the entire image at once and determined based on finding features in specific areas of the image, what that image was, right? Maybe it found an edge here, a line here, maybe it found a shape, maybe it found a horizontal diagonal line. The important thing to understand, though, is that when it found these patterns and learn to the patterns that made up specific shapes, it learns them in specific areas, it knew that if we're in between, for example, looking at this cat image, we're gonna classify this as a cat. If an eye exists on, you know, the left side of the screen where the eyes are here, then that's a cat. It doesn't necessarily know that if we flipped this cat, we did a horizontal flip of this cat. And the eyes were over here that that is a pattern that makes up a cat. So the idea is that the dense network looks at things globally, it looks at the entire image and learns patterns in specific areas. That's why we need things to be centered, we need things to be very similar when we use a dense neural network to actually perform image classification. Because it cannot learn local patterns, and apply those to different areas of the image. So for example, some patterns we might look for, when we're looking at an image like a cat here would be something like this, right, we would hope that maybe we could find a few ears, we could find the eyes, the nose. And you know, the pause here, and those features would tell us that this makes up a cat. Now with a dense neural network, it would find these features it would learn them learn these patterns, we would only learn them in this specific area where they're boxed off, which means if I horizontally flipped this image, right, and I go like that, then it's not going to know that that's a catch, because it learned that pattern a specific area, it'll need to relearn that pattern in the other area. Now, convolutional neural network, on the other hand, learns local patterns. So rather than learning that the ear exists in, you know, this specific location, it just learns that this is what an ear looks like. And it can find that anywhere in the image. And we'll talk about how we do that as we get to the explanation. But the whole point is that our convolutional neural network will scan through our entire image, it will pick up features and find features in the image. And then based on the features that exist in that image will pass that actually to a dense neural network or a dense classifier, it will look at the presence of these features and determine, you know, the combination of these presences of features that make up specific classes or makeup specific objects. So that's kind of the point. I hope that makes sense. The main thing to remember is that dense neural networks work on a global scale, meaning they learned global patterns, which are specific and are found in specific areas. Whereas convolutional neural networks or convolutional layers will find patterns that exist anywhere in the image because they know what the pattern looks like. Not that it just exists in a specific area. Alright, so how they work right? So let's see, when a neural network regular neural network looks at this a dog image, this is a good example, I should have been using this before, it will find that there's two eyes that exist here, right? It will say okay, so I found that these eyes make up a dog. This is its training image, for example, and it's like okay, so this pattern makes up the dog the iris is in this location. Now what happens when we do this and we flip the image to the The other side, well, our neural network starts looking for these eyes right on the left side of the image where it found them previously, and where it was trained on, it obviously doesn't find them there. And so it says that our image isn't a dog, although it clearly is a dog, it's just a dog that's oriented differently. In fact, it's just flipped horizontally, right? We're actually I guess, I would say, vertically flipped vertically. So since it doesn't find the eyes in this location, and it can only look at patterns that is learned in specific locations, it knows that this or it's gonna say this isn't a dog, even though it is, whereas our convolutional layer will find the eyes regardless of where they are in the image, and still tell us that this is a dog, because even though the dogs moved over, it knows what an eye looks like. So we can find the eye anywhere in the image. So that's kind of the point of the convolutional neural network and the convolutional layer. And what the convolutional layer does is look at our image and essentially feed back to us what we call an output feature map that tells us about the presence of specific features, or what we're going to call filters in our image. So that is kind of the way that works. Now, essentially, the thing we have to remember is that our dense neural networks output just a bunch of numeric values. Whereas what our convolutional layers are actually going to be doing is outputting. What we call feature map, I'm going to scroll down here to show you this example, we're actually going to do is run what we call a filter over our image, we're going to sample the image at all these different areas. And then we're going to create what we call an output feature map that quantifies the presence of the filters pattern at different locations. And we'll run many, many, many different filters over our image at a time. So that we have all these different feature maps telling us about the presence of all these different features. So one convolutional layer will start by doing that with very small, simple filters, such as straight lines like this. And then other convolution layers on top of that, right, because it's going to return a map that looks something like this out of the layer. We'll take this map in now the one that was created from the previous layer and say, Okay, what this map is representing to me, for example, the presence of these diagonal lines, let me try to look for curbs, right or let me try to look for edges. So it will look at the presence of the features from the previous convolutional layer, and then say, Okay, well, if I have all these lines combined together, that makes up an edge, and it will look for that, right. And that's kind of the way that a convolutional neural network works and why we stack these different layers. Now, we also use something called pooling, and there's a few other things that we're going to get into. But that is the basics. I'm going to go into a drawing example and show you exactly how that works. But hopefully, this makes a little bit of sense that the convolution layer returns a feature map that quantifies the presence of a filter at a specific location. And this filter, the advantage of it is that we slide it across the entire image. So if this filter or This feature is present anywhere in the image, we will know about it rather than in our dense network where it had to learn that pattern in a specific global location. Okay, so let's get on the drawing tablet and do a few examples. Alright, so I'm here on my drawing tablet, and we're going to explain exactly how a convolutional layer works and how the network kind of works together. So this is an image I've drawn on the left side of our screen here. I know this is very basic, you know, this is just an x, right? This is what our images, we're just going to assume this is grayscale, we're going to avoid doing anything with color channels this second, just because they're not that important. But just understand that what I'm about to show you does apply to cover channels, as well and to multiple kind of layers and depth. And then if we can understand it on a simple level, we should understand it more thoroughly. So what we want essentially is our convolutional layer, to give us some output, it's meaningful about this image. So we're gonna assume this is the first convolutional layer. And what it needs to do essentially is returned to us some feature map that tells us about the presence of specific what we call filters in this image. So each convolutional layer has a few properties to it. The first one is going to be the input signs. So what can we expect? Well, what is that that was as the as the input size? How many filters are we going to have? So filters like this, and what's the sample size of our filters? That's what we need to know, for each of our convolutional neural networks. So essentially, what is a filter will filter is just some pattern of pixels. And we saw them before we'll do a pretty basic one here, as the filter we're going to look for, which looks something like this. This will be the first filter we're going to look for just to illustrate how this works. But the idea is that at each convolutional layer, we look for many different filters. And in fact, the number we're typically looking for, is actually about times 32 filters. Sometimes we have 64 filters as well and sometimes even 128. So we can do as many filters so we want as few filters as we want, but the filters are what is going to be trained. So this filter is actually what is going to be found by the neural network. It's what's going to change it You know, this is essentially what we're looking for this is what's created in the program. And that's kind of like the trainable parameter of a convolutional neural network is the filter. So the amount of filters and what they are will change as the program goes on. As we're learning more and figuring out what features that make up, you know, a specific image. So I'm going to get rid of this stuff right now, just so we can draw and do a basic example. But I want to show you how we look for a filter in the image. So we have filters, right, they'll come up with them looking to start completely random, but they'll change as we go on. So let's say the filter we're looking for is that one I drew before, I'm just gonna redraw it at the top here a little bit smaller. And we'll just say it's a diagonal line, right. But another filter we could look for might be something like, you know, a straight line, just like that all across, we could have a horizontal line. And in fact, we'll have 32 of them. And when we're doing just, you know, three by three grids of filters, well, there's not that many combinations, we're going to do at least grayscale wise. So what we'll do is we'll define the sample size, which is how big our filter is, is going to be three by three, which we know right now, which means that what we're going to do is we're going to look at three by three spots in our image, and look at the pixels. And try to find how closely these filters match with the pixels we're looking at on each sample. So what this is going to do, this convolution layer is going to output us what we call a feature map, which can be a little bit smaller than the original image. And you'll see why in a second. But that tells us about the presence of specific features in areas of image. So since we're looking for two filters, here, actually, we'll do two filters, which means that we're actually going to have a depth two feature map being returned to us right, because for two filters, that means we need two maps, quantifying the presence of both of those filters. So for this green box that we're looking at the left side here, we'll look for this first filter here. And what do we get? Well, the way we actually do this, the way we look at this filter, is we take the cross product, or actually the cross product, the dot product, sorry, between this little green box and this filter, right? Because they're both pixels, they're both actually numeric values down at the bottom. So what we do is we take that dot product, which essentially means we're element wise, adding, or what is it element wise, multiplying all these pixels by each other. So if this pixel values is zero, right, because it's white, or it could be the other way around, we could say White is one black is zero, it doesn't really matter, right? If this is a zero, and this is a one, these are obviously very different. And when we do the dot product of those two, so we multiply them together, then in our output feature, we would have zero, right, that's kind of the way it works. So we do this dot product of this entire thing. If you don't know what the dot product is, I'm not really going to go into that. But we do the dot product. And that gives us some value essentially telling us how similar these two blocks are. So how similar this sample is that we're taking the image and the filter that we're looking for, they're very similar, we're going to likely put a one or something telling us you know, they're very close together, they're not similar at all, we're going to put a zero. So in this case, for our first filter, we're probably going to have a value because this middle pixel is the same of something like zero point like one, two, right? But the all the other values are different. So it's not going to be very similar whatsoever. So then, what we're going to do now is we'll look at the actually second filter, which is this horizontal line. And in fact, we're going to get a very similar output response here, probably something like, you know, 0.12, that's going to go in the top left. And again, these are both maps representing each filter, right? So now we'll move our green box over one like this, to just shift that over one. And now we'll start looking at the next section. And in fact, I'm gonna see if I can erase this, just to make it a little bit cleaner here. Get rid of the green, there we go. Okay, so we'll move this box over like this. And now start looking at this one, it will do the exact same thing we did again before. So we're gonna say, all right, how similar are these? Well, they're not similar at all. So we're gonna get zero for that first filter, how some of the other ones? Oh, actually, they're like, a little bit similar. There's a lot of white that's kind of in the same space, like, you know, stuff like that. So we'll say maybe this is like 0.7, right? I'm just randomly picking these numbers, they are going to be much different than what I'm putting in here. But I'm just trying to get you to understand what's kind of happening, right, and this is completely random, the way I'm making the numbers, just make sure you understand that because this is not exactly what it would look like. Okay, so then we're gonna move the box over one more time, let's just erase this to keep this clean. This will be the last time we do this for the purpose of this example. And now what we're going to have is Wow, we have a perfect match for the first filter. So we put one, the other ones like add, it's kind of similar, there's a few things that are different. So maybe this gets like 0.4 or something, right? whatever they are, we end up getting some value. So we'll fill in all these values. Let's just put some arbitrary values here for now, just so we can do something with the examples 0.70 0.1 to 0.4 to 0.3 0.9 to point one, again, completely random 0.4 0.6 Alright, so this is now what We've gotten our response map from looking at two filters on our original image of five by five. Now notice that the size of these is three by three. And obviously the reason for that is because in a five by five image, when we're taking three by three samples, well, we can only take nine, three by three samples, because when we go down a row, right, we're going to move down one, and we're going to do the same thing we did before of this, these three by three samples. And if we add the amount of times we can do that, well, we just get three by three, which is now. So this now is kind of telling us the presence of features in this original image map. Now, the thing is, though, we're going to do this 64 times right, for 64 filters, or 32 filters of the amount of filters that we have. So we're going to have a lot of layers like a ton of different layers, which means that we're going to be constantly expanding as we go through the convolutional layers, the depth of this, this kind of output feature map. And that means that there's a lot of computations that need to be done. And essentially, that means that this can be very slow. So now we need to talk about an operation called pooling. So I'll backtrack a little bit, but we will talk about pooling in a second, what's going to happen right is when we have all these layers that are generated, so this is called the output feature map right? From this original image, what we're gonna do is the next convolution layer, and the network is now going to do the process we just talked about, except on this output feature map, which means that since this one was picking up things like lines and edges, right, the next convolutional layer, will pick up combinations of lines and edges and maybe find what a curve is, right, we'll slowly work our way up from very, very small amount of pixels, to finding more and more, almost, I want to say abstract, different features that exist in the image. And this is what really allows us to do some amazing things with a convolutional neural network. When we have a ton of different layers stacking up on each other, we can pick out all the small little edges, which are pretty easy to find. And with all these combinations of layers working together, we can even find things like say eyes, right, or feet, or heads or face, right, we can find very complicated structures, because we slowly work our way up starting by solving very easy problem, which are like finding lines, and then finding combinations of lines, combination of edges, shapes, and very abstract things. That's how this convolutional network works. So we've done that now it's now time to talk about pooling. And we'll also talk about Pat, actually, we'll go padding first, before we go pooling, I just, it doesn't really matter what order we talk about this in. But I just think padding makes sense based on the way we're going right now. So sometimes, we want to make sure that the output feature map from our original image here is the same dimensions are same size as this red. So this is five by five, obviously, and this is three by three. So if we want this to be five by five as an output, what we need to do is add something called padding to our original image. So padding is essentially just adding an extra row and column on each side of our image here, so that when we and we just fill in all these pixels, and like kind of the padded pixels here, I just blank random pixels, they don't mean anything. Essentially, why we do that is so that when we do our three by three sample size here like this, we can take a three by three sample where every single pixel is in the center of that sample, because right now, this pixels not in the center, this pixel can never be in the center, this pixel can never be in the center, only, you know, a few pixels get to be in the center. And what this allows us to do is generate an output map that is the same size as our original input, and allows us to look at features that are maybe right on the edges of images that we might not have been able to see. See before. Now this isn't super important when you go to like very large images, but it just something to consider you can add padding, or we may do this as we get through our examples. And there's also something called stride which I want to talk about as well. So what a stride is is essentially how much we move the sample box every time that we're about to move it right. So before like so let's say we're doing an example of padding here, right, we our first sample we would take here and again, these pixels are just added we added them in to make this work better for us, you would assume that the next time we move the box, you're gonna move it one pixel over, that's called a stride of one, we can do that. But we also can employ a stride of two, which means we'll move over by two, obviously, the larger your stride, the smaller your output feature map is going to be. So you might want to add more padding, well, you don't want to add too much padding, but it's just something to consider and we will use a stride in different instances. Okay, so that's great. That hopefully makes sense. Let's erase this. Now we don't need this anymore. We talked about padding, we talked about the stride. Now we're gonna talk about a pooling operation, which is very important. So kind of the idea is that we're gonna have a ton of layers, right for all these filters, and we're just gonna have a lot of numbers, a lot of computations, and there must be some way to make these a little bit simpler, a little bit easier to use. Well, yes, that's true. And there is a way to do that. And that's called pooling. So there's three different types of pooling. Well, there's more but the basic ones are min, max, and average. And essentially, a pooling operation is just taking specific values from a sample of the output feature map. So once we generate this output feature map, what we do to reduce its dimensionality and just make it a little bit easier to work with, is when we sample, typically two by two areas of this output feature map, and just take either the min max or average value of all the values inside of here and map these, we're gonna go back this way to a new feature map that's twice the one times the size essentially, or not, what am I saying two times smaller than this original map. And it's kind of hard with three, like three by three to really show you this. But essentially, what's gonna end up happening is we're gonna have something like this. So we're gonna take the sample here, we're gonna say, Okay, what are we doing min max, or average pooling, we're doing min pooling, we're gonna take the smallest value, which means we'll take zero, we're doing max pooling, we'll take the maximum value, which means we'll take 0.3, if we're doing average, we're probably gonna get an average value of close to what 0.2, maybe. So let's say 0.2, we'll go there. That's how we do that with pooling again, just to make this feature map smaller. So we'll do that for both of the filters. But let's just say this is your point to let's say, this here that I'm blocking off, is, I don't know, what is this gonna be 0.6. It's hard to do this average with four numbers, let's say this one down here, is going to be zero point. I don't know, let's just do two one or something. And then this last one, here, we Okay, we got some bigger values, maybe this will be like 0.4. Okay, so that's one, the one down here we'll have some values of its own, we'll just do squiggles to represent that it has something, we've effectively done a max pooling operation on this, we've reduced the size of it by about half. And that is kind of how that works. Now typically, what we do is we use a two by two pooling, or like sample size like that with a stride of two, which actually means that we would straw it like this. But since we're not going to do padding on this layer, right now, we'll just do a stride of one. And this is how we pool it. Now, the different kinds of pooling are used for different kinds of things, the reason we would use a max pooling operation is to pretty much tell us about the maximum presence of a feature in that kind of local area, we really only care if the feature exists, where if it doesn't exist, and average pooling is not very often used, although in this case, we did use an average pooling. But you know, it just different kinds of pooling average tells you about the average presence of the feature in that area. Max tells you about is that feature present in that area at all, and min tells you does it not exist. If it doesn't exist, right, we're just gonna have a zero if there's even one zero in that area. So that's the point of pooling and support of convolutional layers. I think I'm done with the whiteboarding. For now we're actually going to start getting into a little bit of code, and talking about creating our own convolutional networks, which hopefully, will make this a lot more clear. So let's go ahead and get into that. Alright, so now it is time to create our first convolutional neural network. Now we're gonna be using Kerris to do this. And we're also gonna be using the CIA FDR image data set that contains 60,000 images of 10 different classes of everyday objects. Now, these images are 32 by 32, which essentially means they are blurs, and they are colorful. Now, I just want to emphasize as we get into this, that the reason I'm not typing all of these lines out, and I just have them in here already, is because this is likely what you guys will be using or doing when you actually make your own models. Chances are that you are not going to sit unless you're a pro at TensorFlow, and I am not even there yet either with my knowledge of it, and have all of the lines memorized and not have to go reference the syntax. So the point is here, so long as you can understand why this works, and what these lines are doing, you're gonna be fine. You don't need to memorize them, and I have not memorized them and I don't I look up the documentation, I copy and paste what I need, I alter them, I write a little bit of my own code. But that's kind of what you're going to end up doing. So that's what I'm doing here. So this is the image data set. We have truck, core ship, airplane, you know, just an everyday regular objects, there is 60,000 images, as we said, and 6000 images of each class. So we don't have too many images of just one specific class. So we'll start by importing our modules. So TensorFlow, we're gonna import TensorFlow caros. We're gonna use the data set built into Kerris for this. So that's the CI FDR image data set, would you actually look at just by clicking at this, it'll bring you and give me information about the data set, although we don't need that right now, because I already know the information about it. And now we're just going to load our images in. So again, this stuff, the way this works is you're gonna say data sets.ci, fA r 10 dot load data. Now this load data has like a very strange TensorFlow object that's like a data set object. So this is different from what we've used before where some of our objects have actually been, like a NumPy arrays where we can look at them better, this is not going to be in that. So just something to keep in mind here. We're going to normalize this data into train images and test images, but it's dividing both of them by 255. Now again, we're doing that because we want to make sure that our values are between zero and one because that's a lot better to work with in our neural networks rather than large integer values, just causes, you know, some things to mess up sometimes, no class names, we're just going to find a list here. So we have all the class names, so that zero represents airplane one automobiles so far until truck, we run that block of code here, we'll download this data set, although I don't think it takes that long to do that. So okay, so wait, I guess yeah, I guess that's good. I think we're okay there. And now, let's just have a look at actually some of the images here by running this script. So we can see this is a truck can change the image index to be two, we can see this is another truck, let's go to say six, we get a bird. And you can see these are really blurry, but that's fine. For this example, we're just trying to get something that works. Alright, guys, that's a horse, you know, you get the point. Alright, so now cnn architecture. So essentially, we've already talked about how a convolution neural network works. We haven't talked about the architecture and how we actually make one, essentially, what we do is we stack a bunch of convolutional layers, and Max pooling, min pooling, or average pooling layers together in something like this, right. So after each convolutional layer, we have a max pooling layer, some kind of pooling layer, typically, to reduce the dimensionality, although you don't need that, you could just go straight into three convolutional layers. And on our first layer, what we do is we define the amount of filters just like here, we define the sample size. So how big are those filters and activation function, which essentially means after we apply that, what is it that cross not crossed by dot product operation that we talked about, will apply rectified linear unit to that and then put that in the output feature map? Again, we've talked with activation functions before, so I won't go too far into depth with them. And then we define the input shape, which essentially means what can we expect in this first layer? Well, 32 by 32, by three, these ones, we don't need to do that, because they're going to figure out what that is based on the input from the previous layer. Alright, so these are just a breakdown of the layers. The convolution or the max pooling layers here, two by two essentially means that we're going to do is we're going to have a two by two sample size with actually a stride of two. Again, the whole point of this is to actually divide or, you know, shrink it by a factor of two, how large each of these layers are. Alright, so now let's have a summary. It's already printed out here, we can see that we have, Wait, is this correct? mobilenet v2, I don't think that's correct. That's because I haven't run this one. My apologies on that, guys, this is from something later in the tutorial, we can see that we have comm two D as our first layer, this is the output shape of that layer, notice that it is not 32 by 32, by 32. It is 30 by 30 by 32. Because when we do that sampling, without padding, right, that's what we're gonna get, we're gonna get to pixels less, because the amount of samples we can take. All right, next, we have the max pooling to dealer. So this now this is the output shape is 15 by 15, by 32, which means we've shrunk this shape by a factor of two, we do a convolution on this, which means that now we get 1313. And we're doing 64. Because we're going to take 64 filters, this time. And the max pooling again, we go six by six by 64, because we're going to divide this again by a factor of two, notice that it just rounded, right, and then come to do so another layer here, we get four by four by 64. Again, because of the way we take those values. So this is what we have defined so far. But this is not the end of our convolutional neural network. In fact, this doesn't really mean much to us, right? This just tells us about the presence of specific features as we've gone through this convolution base, which is what this is called the stack of convolution and Max pooling layers. So what we actually need to do is now pass this information into some kind of dense layer classifier, which is actually going to take this pixel data that we've kind of calculated and found, so the almost extraction of features that exist in the image, and tell us which combination of these features map to either you know, what one of these 10 classes are. So that's kind of the point you do this convolution base, which extracts all of the features out of your image. And then you use the dense network to say, Okay, well, if these combination of features exist, then that means this image is this, otherwise, it's this and that, and so on. So that's what we're doing here. Alright, so let's say adding the dense layer. So to add the dense layer is pretty easy model dot add is just how we add them, right. So we're going to flatten all of those pixels was, which essentially means take the four by four by 64. And just put those all into a straight line like we've done before, so just one dimensional, then we're going to have a 64 neuron dense layer that connects all of those things to it with an activation function of rectified linear unit, then our output layer of a dense layer with 10 neurons, obviously 10 because that's the amount of classes we have for this problem. So let's run this here. We'll add those layers. Let's look at a summary and see Oh, things have changed now. Should we go from four by four by 64 2024. Notice that that is precisely the calculation of four times four times 64. That's how we get that number here, then we have a dense layer and another dense layer. And this is our output layer. Finally, this is what we're getting is we're going to get 10 neurons out. So essentially just a list of values. And that's how we can determine which class is predicted. So this up to here is the convolutional base. This is what we call the classifier, and they work together to essentially extract the features, and then look at the features and predict the actual object or whatever it is the class. Alright, so that's how that works. Now it's time to train again, we'll go through this quickly. Um, I believe I've already trained this, this takes a long time to train. So I'm actually going to reduce the epochs here to just be four, I'd recommend you guys train this on higher. So like 10, if you're going to do but it does take a while. So for our purposes, and for my time, will leave a little bit shorter right now. But you should be getting about a 70% accuracy. And you can see I've trained this previously, if you train it on 10 epochs, but I'm just gonna train up to four, we get our 67 68%. And that should be fine. So we'll be back once this is training, and we'll talk about how some of this works. Okay, so the model is finally finished training, we did about four epochs, you can see we got an accuracy about 67% on the evaluation data, to quickly go over this stuff. optimizers, Adam talked about that before loss function is sparse, categorical cross entropy. That one I mean, you can read this, if you want computes the cross entropy loss between the labels and predictions. And I'm not gonna go into that, but these kinds of things are things that you can look up if you really understand why they work. For most problems, you can just if you want to figure out what you know, loss function optimizer to use, just use the basics, like use atom use a categorical cross entropy, using a classification task, you want to do something like this, there's just you can go up and look kind of all of the different loss functions. And it'll tell you when to use which one and you can kind of mess with them and tweak them if you want. Now, history equals model dot fit this just so we can access some of the statistics from this model dot fit. Obviously, it's just training the data to this test images, test labels, and train images and train labels where this is the validation data suite. So evaluating the model, we want to evaluate the model, we can evaluate it now on the test images and test labels, we're obviously going to get the same thing because the valuation is test images and test labels. So we should get the same accuracy as 60 735, which we do right here. Alright, so there we go, we get about 70% of you guys trained this on 10 epochs, you should get close to 70, I'm a little bit lower, just because I didn't want to go that high. And that is now the model. I mean, we could use this if we want, we could use predict, we could pass in some image. And we could see the prediction for it. I'm not going to do that just because we've already talked about that enough. And I want to get into some of the cooler stuff when we're working with smaller data sets. So the basic idea here is this is actually a pretty small data set, right, we use about 60,000 images. And if you think about the amount of different patterns, we need to pick up to classify, you know, things like horses versus trucks, that's a pretty difficult task to do. Which means that we need a lot of data. And in fact, some of the best convolutional neural networks that are out there are trained on millions of pieces of you know, sample information or data. So obviously, we don't have that kind of data. So how can we work with, you know, a few images, maybe like a few 1000 images, and still get a decent model? Well, the thing is, you can't unless we use some of the techniques, and I'm about to show you. So working with small data sets. So just like I mentioned, it's difficult to create a very good convolution neural network from scratch, if you're using a small amount of data. That is why we can actually employ these techniques, the first one data augmentation, but also using pre trained models to kind of accomplish what we need to do. And that's we're going to be talking about now on the second part of the tutorial, we're going to create another convolutional neural network. So just to clarify, this is created, we've made the model up here already, this is all we need to do to do it. This is the architecture and this was just to get you familiar with the idea. So data augmentation. So this is basically the idea. If you have one image, we can turn that image into several different images, and train and pass all those images to our, our model. So essentially, if we can rotate the image, if we can flip it, if we can stretch it, compress it, you know, shift it, zoom it, whatever it is, and pass that to our model, it should be better at generalizing. Because we'll see the same image, but modified and augmented multiple times, which means that we can turn a data set, say of 10,000 images into 40,000 images by doing four augmentations on every single image. Now, obviously, you still want a lot of unique images. But this technique can help a lot and is used quite a bit because that allows our kind of modeled to be able to pick up images that maybe are oriented differently or zoomed in a bit or stretch something different, right? Just better at generalizing, which is the whole point. So I'm not going to go through this in to depth, too much depth, but this is essentially a script that does data augmentation for you. We're gonna use this ima image data generator from the Kara's pre processing dot image module, we're going to create an image data generator object. And essentially what this allows us to do is specify some parameters on how we want to modify our image. In this case, we have the route range, some shifts, shear, zoom, horizontal flip and the mode. Now I'm not going to go into how this works, you can look at the documentation if you'd like. But essentially, this will just allow us to augment our images. Now what I'm going to do is pick one arbitrary image from the test image data set, just our test image, I guess, group of photos, whatever you want to call it, I'm going to convert that to an image array, which essentially takes it from the weird data set object that it kind of is and turns it into a NumPy array, then we're going to reshape this. So it's in the form one comma, which essentially means one, and then this will figure out what the rest of the shape should be, oh, sorry, one, and then plus the image shape, which is whatever this shape is. So we'll reshape that. And then what we're gonna do is we're gonna say, for batch in data flow Gen dot flow, talk about how that works in a second. Essentially, this is just going to augment the image for us and actually save it onto our drive. So in this instance, what's going to happen is this data Gen dot flow is going to take the image which we've created here, right, and we formatted it correctly, by doing these two steps, which you need to do beforehand, it's going to save this image as test dot jpg, and this will be the prefix, which means it'll be some information after, and it will do this as many times until we break. So essentially, given an image, it will do test one, test two, test three, test four, test five, with random augmentations. Using this, until eventually, we decided to break out of this. Now what I'm doing is just showing the image by doing this, and batch zero is just showing us the you know, that first image in there. And that's kind of how this works. So you can mess with the script and figure out a way to use it. But I would recommend if you want to do data augmentation, just look into image data generator, this is something that I just want to show you. So you're aware of, and I'll just run it so you can see exactly how this works. So essentially, given an image of a truck, what it will do is augmented in these different ways. You can see kind of the shifts, the translations, the rotations, all of that. And we'll do actually a different image here to see what one looks like, let's just do image size 22, we get something different. So in this case, I believe this is maybe like a deer rabbit or a dog or something, I don't really know exactly what it is because it's so blurry. But you can see that's kind of the shifts we're getting. And it makes sense because you want to have images in different areas so that we have a better generalization. Alright, so let's close that. Okay, so now we're gonna talk about using it or sorry, what is it pre trained models? Okay. So we talked about data augmentation, that's a great technique if you want to increase the size of your data set. But what if even after that we still don't have enough images in our data set? Well, what we can do is use something called a pre trained model. Now companies like Google, and you know, TensorFlow, which is owned by Google, make their own amazing convolutional neural networks that are completely open source that we can use. So what we're going to do is actually use part of a convolutional neural network that they've trained already on, I believe, 1.4 million images. And we're just gonna use part of that model, as kind of the base of our models that we have a really good starting point. And all we need to do is what's called fine tune the last few layers of that network, so that they work a little bit better for our purposes. So what we're going to do essentially is say, all right, we have this model that Google's trained, they've trained it on 1.4 million images, it's capable of classifying, let's say, 1000 different classes, which is actually the example we'll look at later. So obviously, the beginning of that model, is what's picking up on the smaller edges. And you know, kind of the very general things that appear in all of our images. So if we can use the base of that model, so kind of the beginning of it, that does a really good job picking up on edges, and general things that will apply to any images, then what we can do is just change the top layers of that model a tiny bit, or add our own layers to it to classify for the problem that we want. And that should be a very effective way to use this pre trained model. We're saying we're going to use the beginning part that's really good at kind of the generalization step, then we'll pass it into our own layers that will do whatever we need to do specifically for our problem. That's what's like the fine tuning step. And then we should have a model that works pretty well. In fact, that's what we're going to do in this example, now. So that's kind of the point of what I'm talking about here is using part of a model that already exists that's very good at generalizing, and it's been trained on so many different images. And then we'll pass our own training data in, we won't modify the beginning aspect of our neural network, because it already works really well. We'll just modify the last few layers that are really good at classifying, for example, just cats and dogs, which is exactly the example we're actually going to do here. So I hope that makes sense. As we get through this should be cleared up a little bit. But using a pre trained model is now the section we're getting into. So this is based on this documentation. As always, I'm referencing everything. So you guys can see that if you'd like we do our imports like this, we're going to load a data set. This actually takes a second to load the data set, I believe, oh, maybe not. And essentially, the problem we're doing is trying to classify dogs versus cats with a fair degree of accuracy. In fact, we'd like to get above 90%. So this is the data set we're loading in from TensorFlow data sets as TF DS This is kind of a weird way to load it in again, stuff like this, you just have to reference the documentation, I can explain it to you. But it's not really going to help when the next example is going to be a different way of loading the data, right? So so long as you know how to get the data in the correct form, you can get it into some kind of NumPy array, you can split it into training, testing and validation data, you should be okay. And if you're using a TensorFlow data set, it should tell you in the documentation, how to load it in properly. So loaded in here, we're training 80% train, I will go 10% for once, raw validation, and 10% for the testing data. So we've loaded that. And now what we're doing here is just we'll look at a few images. So this actually creates a function, I know, this is a weird thing, this is pretty unique to this example, that allows us to call this function with some integer, essentially, and get what the actual string representation of that is to the label for it. And what I'm doing here is just taking two images from our raw training data set, and just displaying them. And you can see that's what we're getting here, dog and dog. If I go ahead and take five, we'll see, these are what our images look like. Right, so here's an example of a dog, we have a cat, right, and so on so forth, you kind of get that you get the point there. Now, notice, though, that these images are different dimensions. In fact, none of these images other than these two actually are the same dimension at all. Oh, actually, I don't think these ones are either. So obviously, there's a step that we need to do, which is we need to scale all these images to be the same size. So to do that, what we're going to do is write a little function like this, that essentially will return an image that is reshaped. So I guess, that these reshaped to the image size, which I'm going to set out 160 by 160. Now we can make this bigger if we want. But the problem is, sometimes if you make an image that is bigger than like, you want to make your image bigger than most of your data set examples. And that means you're going to be really stretching a lot of the examples out and you're losing a lot of detail. So it's much better to make the image size smaller rather than bigger. You might say, well, if you make it smaller, you're gonna lose detail too. But it's just, it's better to compress it smaller than it is to go really big, even just when it comes to the amount of training time and how complex networks going to be. So that's something to consider. You can mess around with those when you're making your own networks. But again, smaller is typically better, in my opinion, you don't wanna go too small, but something that's like, you know, half the size, what an average image would be. Alright, so we're gonna go format example, we're gonna just take an image and a label. And what this will do is return to us just the reshaped image and labels. In this case, we're going to cast which means convert every single pixel in our image to be a float 32 value because it could be integers, we're then going to divide that by 127.5, which taken is exactly half of 255, and then subtract one, then we're going to resize this image to be the image size. So sorry, the image will be resized to the image size, so 160 by 160. And we'll return new image and the label. So now we can apply this function to all of our images using map if you don't know what map is, essentially, it takes every single example in, in this case going to be raw train, and applies the function to it, which will mean that it will convert rod train into images that are all resized to 160 by 160. And we'll do the same thing for validation and test. So run that no issue there. And now let's have a look at our images and see what we get. And there we are. Now I've just messed up the color because I didn't add a seam up thing, which I think I needed. Where was the sea map. Anyways, you know what, that's fine. For now. This is what our images look like this is the resize now we get all images 160 by 160. And we are good to go. Alright, so now let's have a look at the shape of an original image versus our new image. So I mean, this was just to prove that essentially, our original shapes were like 262 409 by some random values, and they're all reshaping that 161 60 by three, three, obviously is the color channel of the images. Alright, so picking a pre trained model. So this is the next step is probably one of the hardest steps is picking a model that you would actually like to use the base up. Now we're going to use one called mobile net v two, which is actually from Google. It's built into TensorFlow itself. That's why I've picked it. And all we're going to do is set this. So essentially, we're going to say the base model in our code is equal to tf Kara's the applications mobile net v2, which is just telling us the architecture of the model that we want, we'll have a look at it down below here. In just a second, we'll define the input shape, which is important because this can take any input shape that we want. So we'll change it to 161 60 by three, which we've defined up here, include top, very important means do we include the classifier that comes with this network already or not? Now in our case, we're going to be retraining parts of this network so that it works specifically for dogs and cats, and not for 1000 different classes, which is what this model was actually aimed to do is train a 1.4 million images for 1000 different classes of everyday objects. So we're going to not include the top which means don't include the classifier for these 1000 classes. And we're going to load the weights from what's called image net, which is just a specific save of the weights. So this is the architecture and this is kind of the data Now that we're filling in for that architecture, so the weights, and we'll load that in which we have here. So base model. Now let's look at it. So let's have a summary. You can see this is a pretty crazy model. I mean, we would never be expected to create something like this by herself. This is, you know, teams of data scientists, PhD students, engineers, would I write the experts in the field that have created a network like this. So that's why we're going to use it because it works. So effectively for the generalization at the beginning, which is what we want. And then we can take those features that this takes out, so in five by five by 1280, which is what I want us to focus on the output of this actual network here. So really, you can see this last layer, we're going to take this and using this information, pass that to some more convolutional layers, and actually our own classifier, I believe, and use that to predict versus dogs versus cats. So at this point, the base model will simply open a shape 32 by five by five by 1280. That's the tensor that we're going to get out of this. That's the shape, you can watch how this kind of works as you go through it. And yes, alright, so we can just have a look at this here, this, what I wanted to do essentially was just look at what the actual shape was going to be. So 32, five by five by 1280, just because this gives us none until it knows what the input is. And now it's I'm talking about freezing the base. So essentially, the point is, we want to use this as the base of our network, which means we don't want to change it. If we just put this network in right now is the base to our neural network. Well, what's going to happen is, it's going to start retraining all these weights and biases. And in fact, it's going to train 2.25 7 million more weights and biases, when in fact, we don't want to change these because these have already been defined, they've been set. And we know that they will work well for the problem already. Right? They worked well for classifying 1000 classes. Why are we going to touch this now. And if we were going to touch this, what's the point of even using this base, right, we don't want to train this, we want to leave it the same. So to do that, we're just gonna freeze it. Now freezing is a pretty, I mean, it just essentially means turning the trainable attribute of a layer off or of the model off. So what we do is we just say base model dot trainable equals false, which essentially means that we are no longer going to be training, any aspect of that I want to say model, although we'll just call it the base layer for now, the base model. So now if we look at the summary, we can see when we scroll down to the bottom, if we get there any day soon, that now the trainable parameters is zero instead of 2.25 7 million, which it was before. And now it's time to add our own classifier on top of this. So essentially, we've got a pretty good network, right five by five by 12, at our last output. And what we want to do now is take that, and we want to use it to classify either cat or either Doc, right, so what we're going to do is add a global average layer, which essentially is going to take the entire average of every single so of 12 180 different layers that are five by five, input that into a one D tensor, which is kind of flattening that for us. So we do that global average pooling. And then we're just going to add the prediction layer, which essentially is going to just be one dense node. And since we're only classifying two different classes, right, dogs and cats, we only need one, then we're going to add all these models together. So the base model, and I guess, layers, the global average layer that we define there, and then the prediction layer, to create our final model. So let's do this global average layer, prediction layer model. Give that a second to kind of run there. Now when we look at the summary, we can see we have mobile net v2, which is actually a model, but that is our base layer. And that's fine, because the output shape is that then global average pooling, which again, just takes this flattens it out does the average for us. And then finally, our dense layer, which is going to simply have one neuron, which is going to be our output. Now notice that we have 2.25 and 9 million parameters in total. And only 1200 81 of them are trainable. That's because we have 1200 80 connections from this layer to this layer, which means 1200 80 weights and one bias. So that is what we're doing. This is what we have created. Now this base, the majority of the network has been done for us. And we just add our own little classifier on top of this. And now we're going to feed some training samples and data to this. Remember, we're not training this base layer whatsoever. So the only thing that needs to be learned is the weights and biases on these two layers here. Once we have that we should have a decent model ready to go. So let's actually train this now. I'm going to compile this here. I'm picking a learning rate. It's very slow. What essentially what the learning rate means is how much am I allowed to modify the weights and biases of this network, which is what I've done, just made that very low, because we don't want to make any major changes if we don't have to, because we're already using a base model that exists, right. So we'll set the learning rate, I'm not going to talk about what this does specifically, you can look that up if you'd like to. And then the loss function will use binary cross entropy just because we're using two classes. If you're using more than more than two classes, you just have cross entropy or some other type of cross entropy. And then what we're going to do is actually evaluate the model right now before we even train it. So I've compiled it, I've just set what we'll end up using. But I want to evaluate the model currently, without training it whatsoever on our validation data validation batches and see what it actually looks like what it actually you know what we're getting right now, with the current base model being the way it is, and not having changed the weights and biases, the completely random from the global average pooling in the dense layer. So let's evaluate. And let's see what we get as an accuracy. Okay, so we can actually see that with the random weights and biases. For those last layer that we added, we're getting an accuracy of 56%, which pretty much means that it's guessing, right? It's, you know, 50% is late to classes. So if we got anything lower than 50, like 50, should have been our guests, which is what we're getting. So now what we're going to do, and I actually, I've trained this already, I think so I might not have to do it, again, is a train this model on all of our images to all of our images, and cats and cats and dogs. So we've loaded in four, which will allow us now to modify these weights and biases of this layer. So hopefully, it can determine what features need to be present for a dog to be a dog and for cat to be a cat, right. And then it can make a pretty good prediction. In fact, I'm not going to train this in front of us right now, because this actually takes close to an hour to train just because there is a lot of images that it needs to look at, and a lot of calculations that need to happen. But when you do end up training this, you end up getting an accuracy of a close to 92 or 93%, which is pretty good considering the fact that all we did was use that original layer, like base layer that classified up to 1000 different images, so very general, and applied that just to cats and dogs by adding our dense layer classifier on top. So you can see this was kind of the accuracy I had from training this previously, I don't want to train again, because it takes so long. But I did want to show that you can save a model and load a model by doing this syntax. So essentially, on your model object, you can call model dot save, save it as whatever name you'd like dot each phi, which is just a format for saving models. And Kara's is specific to Kara's not TensorFlow. And then you can load the model by doing this. So this is useful because after you train this for an hour, obviously, you don't want to retrain this if you don't have to, to actually use it to make predictions. So you can just load the model. Now, I'm not going to go into using them all specifically, you guys can look up the documentation to do that. We're at the point now where I've showed you so much syntax on predicting and how we actually use the models. But the basic idea would be to do model dot predict, right, and then you can see that it's even giving me the input here. So model dot predict, give it some x batch size verbose, right, because it will predict on multiple things. And that will spit back to you a class which then you can figure out, Okay, this is a cat or this is a dog, you're going to pass this obviously the same input information we had before, which is 160 by 160, by three, and that will make the prediction for you. So that's kind of the thing there, I was getting an error just because I hadn't saved this previously. But that's how you save and load models, which I think is important when you're doing very large models. So when you fit this, feel free to change the box to be something slower if you'd like, again, right? This takes a long time to actually end up running. But you can see that the accuracy increases pretty well exponential exponentially, from when we didn't even have that classifier on it. Now the last thing I want to talk about is object detection, I'm just going to load up a page, we're not going to do any examples, I'm just gonna give you a brief introduction, because we're kind of running out of time for this module, because you can use TensorFlow to do object detection and recognition, which is kind of cool. So let's get into that now. Okay, so right now I'm on a GitHub page that's built by TensorFlow here, I'm going to leave that link in the notebook where it says object detection, so you guys can look at that. But essentially, there is an API for TensorFlow that does object detection for you. And in fact, it works very well and even gives you confidence scores. So you can see this is what you'll actually end up getting if you end up using this API. Now, unfortunately, we don't have time to go through this because this will take a good amount of time to talk about the setup and how to actually use this project properly. But if you go through this documentation, you should be able to figure it out. And now you guys are familiar with TensorFlow, and you understand some of the concepts here. This runs a very different model than what we've discussed before. Unfortunately, we don't have time to get into it. But just something I wanted to make clear is that you can do something like this with TensorFlow. And I will leave that resource so that if you'd like to check this out, you can use it. There's also a great module in Python called facial recognition. It's not a part of TensorFlow. But it does use some kind of convolutional neural network to do facial detection and recognition, which is pretty cool as well. So I'll put that link in here. But for that, for now, that's going to be our what is a compositional neural network kind of module. So I hope this has cleared some things up on how deep vision works and how convolutional neural networks work. I know I haven't gone into crazy examples, what I've shown you some different techniques, that hopefully you'll go look up kind of on your own and really dive into because now you have that base kind of domain knowledge where you're going to be able to follow along with the tutorial and understand exactly what to do. And if you want to create your own model, so long as you can get enough sufficient training data, you can load that training data into your computer. Put that in A NumPy array, then what you can do is create a model like we've just done using even something like the mobile net, what is it v2 that we talked about previously, but could even get up, I need to close this output, oh my gosh, this just was massive output here, where's this begin to pre train model? Yeah, mobile net v2, you can use the base of that, and then add your own classifier on, do a similar thing to what I've done with that dense neuron and that global average layer. And hopefully, you should get a decent result from that. So this is just showing you what you can do, obviously, you can pick a different base layer, depending on what kind of problem you're trying to solve. So anyways, that has been conditional neural networks. I hope you enjoyed that module. Now we're on to recurrent neural networks, which is actually gonna be pretty interesting. So see you in that module. Hello, everyone, and welcome to the next module in this course, which is covering natural language processing with recurrent neural networks. Now, what we're going to be doing in this module here is, first of all, first off discussing what natural language processing is, which I guess I'll start with here, essentially, for those of you that don't know, natural language processing, or NLP, for short, is the field or discipline in computing, or machine learning that deals with trying to understand natural Earth human languages. Now, the reason we call them natural is because these are not computer languages, or programming languages, per se. And actually, computers are quite bad at understanding textual information and human languages. And that's what we've come up with this entire discipline focused on how they can do that. So we're going to do that using something called recurrent neural networks. But some examples of natural language processing would be something like spellcheck, autocomplete, voice assistants, translation between languages, there's all different kinds of things, chatbots, but essentially, anything that deals with textual data, so you like paragraphs, sentences, even words, that is probably going to be classified under natural language processing in terms of doing some kind of machine learning stuff with it. Now, we are going to be talking about a different kind of neural network in this series called recurrent neural networks. Now, these are very good, classifying and understanding textual data. And that's why we'll be using them. But they are fairly complex. And there's a lot of stuff that goes into them. Now, in the interest of time, and just not knowing a lot of your math background, I'm not going to be getting into the exact details of how this works on a lower level, like I did, when I explained kind of our, I guess, fundamental learning algorithms, which are a bit easier to grasp. And even just regular neural networks in general, we're going to be kind of skipping over that, and really focusing on why this works the way it does, rather than how and when you should use this. And then maybe understanding a few of the different kinds of layers that have to do with recurrent neural networks. But again, we're not going to get into the math, if you'd like to learn about that there will be some sources at the bottom of the guide. And you can also just look up recurrent neural networks. And you'll find lots of resources that explain all of the fancy math that goes on behind them. Now, the exact applications and kind of things we'll be working towards here is sentiment analysis, that's the first kind of task or thing we're going to do, we're actually going to use movie reviews and try to determine whether these movie reviews are positive or negative by performing sentiment analysis on them. Now, if you're unfamiliar with sentiment analysis, we'll talk about it more later. But essentially means trying to determine how positive or negative a sentence or piece of text is, once you can see why that'd be useful for movie reviews. Next, we're going to do character slash text generation. So essentially, we're going to use a natural language processing model, I guess, if you want to call it that, to generate the next character in a sequence of text for us. And we're going to use that model a bunch of times to actually generate an entire play. Now, I know this seems a little bit ridiculous compared to some of the trivial examples we've done before, this will be quite a bit more code than anything we've really looked at yet. But this is very cool, because we're actually going to train a model to learn how to write a play. That's literally what it's going to do, it's going to read through a play, I believe it's Rumi, Romeo and Juliet. And then we're going to give it a little prompt when we're actually using the model and say, Okay, this is the first part of the play, write the rest of it, and then it will actually go and write the rest of the characters in the play. And we'll see that we can get something that's pretty good using the techniques that we'll talk about. So the first thing that I want to do is talk about data. So I'm going to hop onto my drawing tablet here. And we're going to compare the difference between textual data and numeric data like we've seen before, and why we're going to have to employ some pretty complex and different steps to turn something like this, you know, a block of text into some meaningful information that our neural networks actually going to be actually going to be able to understand and process. So let's go ahead and get over to that. Okay, so now we're going to get into the problem of how we can turn some textual data into numeric data that we can feed to our neural network. Now, this is a pretty interesting problem. And we'll kind of go through as we start going through it, you should see why this is interesting and why there's like difficulties with the different methods that we pick But the first method that I want to talk about is something called bag of words, in terms of how we can kind of encode and pre process text into integers. Now, obviously, I'm not the first person to come up with this bag of words is a very famous almost, I want to say algorithm or method of converting textual data to numeric data. Although it is pretty flawed. It only really works for simple tasks. And we're going to understand why in a second. So we're going to call this bag of words. Essentially, what bag of words says is, what we're going to do is we're going to look at our entire training data set Rex, we're going to be turning our training data set into a form the network can understand. And we're going to create a dictionary lookup of the vocabulary. Now what I mean by that is, we're going to say that every single unique word in our data set is the vocabulary, right? That's the amount of words that the model is expected to understand, because we're going to show all those words to the model. And we're going to say that every single one of these words, so every single one of these words in the vocabulary is going to be placed in a dictionary. And Beside that, we're going to have some integer that represents it. So for example, maybe the vocabulary of our data set is the words you know, I, a, maybe, Tim, maybe day, me, right, we're gonna have a bunch of arbitrary words, let's put.dot.to show that this kind of goes to the length of the vocabulary. And every single one of these words, we place in a dictionary, which we're just going to call kind of our lookup table or word index table. And we're going to have a number that represents every single one of them. So you can imagine that in very large data sets, we're going to have you know, 10s, of 1000s of hundreds of 1000s, sometimes even maybe millions of different words, and they're all going to be encoded by different integers. Now, the reason we call this bag of words is because what we're actually going to do when we look at a sentence is we're only going to keep track of the words that are present and the frequency of those words. And in fact, what we'll do well is we'll create what we call a bag, and whenever we see a word appears, we'll simply add its number into the bag. So if I have a sentence, like, you know, I am, Tim de, de, I'm just going to do like a random sentence like that, then what we're going to do is, every time we see a word, we're going to take its number and throw it into the back. So we're gonna say, all right, I, that's zero AM. That's one, Tim, that's two, day, that's three. That's three again, and notice that what's happening here is we're losing the ordering of these words, but we're just keeping track of the frequency. Now there's lots of different ways to kind of format how we want to do bag of words. But this is the basic idea I'm not going to go too far in because we're not actually really going to use this technique. But essentially, you lose the ordering in which words appear, but you just keep track of the frequency and what words appear. So this can be very useful when you're looking, you know, you're doing very simple tasks, where the presence of a certain word will really influence the kind of type of sentence it is, or the meaning that you're going to get from it. But when we're looking at more complex input, where you know, different words have different meanings, depending on where they are in a sentence, this is a pretty flawed way to encode this data. Now, I won't go much further into this, this is not the exact way the bag of words works. But I just wanted to show you kind of an idea here, which is we just encode every single unique word by an integer. And then we don't even really care about where these words are, we just throw them into a bag and we say, Alright, you know, this is our bag right here that I'm doing the arrow to, we'll just throw in three, as many times as you know, the word de appears, we'll throw in one as many times as the word, I guess, m appears, and so on and so forth. And then what will happen is we'll feed this bag to our neural network in some form, depending on the network that we're using. And it will just look at and say, OK, so I have all these different numbers, that means these words are present, and try to do something with it. Now I'm going to show you a few examples of where this kind of breaks down. But just understand that this is how this works. This is the first technique called bag of words, which again, we will not be using. So what happens when we have a sentence where the same word conveys a very different meaning. Right? And I'm actually I think I have an example on the slides here that I'll go into. Yes. So like this. Okay, so for our bag of words technique, which we can kind of see here, and maybe we'll go through it. Let's consider the two sentences. Where are they here? I thought the movie was going to be bad, but it was actually amazing. And I thought the movie was going to be amazing, but it was actually bad. Right? So consider these two sentences. Now I know you guys already know what I'm going to get at. But essentially, these sentences use the exact same words. In fact, they use the exact same number of words, the exact same words in total. And well, they have a very different meaning. With our bag of words technique. We're actually going to encode these two sentences using the exact same representation because remember, all we do is we care about the frequency What words appear, but we don't care about where they appear. So we end up losing that meaning from the sentence. Because the sentence, I thought the movie was going to be bad, but it was actually amazing is encoded and represented by the same thing as this sentences. So that, you know, obviously is an issue. That's a flaw. And that's one of the reasons why bag of words is not very good to use. Because we lose the context of the words within the sentence, we just pick up the frequency and the fact that these words exist. So that's the first technique that's called bag of words, I've actually written a little function here that does this. For us, this is not really the exact way that we would write a bag of words function, but you kind of get the idea that when I have a text, this is a test to see if this test will work is test a, I've just did a bunch of random stuff. So we can see, what I'm doing is printing out the bag, which I get from this function. And you guys can look at this, if you kind of want to see how this works. And essentially, what it tells us is the word one appears two times. Yes, the word two appears three times the word three appears three times word four appears three times 51617. ones, so on, that's the information we get from our back right from that encoding. And then if we look up here, this is our vocabulary. So this stands for one is is to AES three, so on, and you can kind of get the idea from that. So that is how we would use bag of words, right? If we did an encoding kind of like this, that's what that does. And that's one way of encoding it. Now I'm going to go back and we'll talk about another method here as well, actually a few more methods before we get into anything further. Alright, so I'm sure a lot of you were looking at the previous example I did. And you saw the fact that what I did was completely remove the idea of kind of sequence or ordering of words, right. And what I did was just throwing in a bank, and I said, Alright, we're just gonna keep track of the fact that we have, you know, three A's, or we have for those are seven Tim's right, and we're gonna just going to lose the fact that, you know, words come after one each other, we're going to lose their ordering in the sentence. And that's how we're going to encode it. And I'm sure a lot of you are saying, Well, why don't we just not lose the ordering of those words, we'll just encode every single word with an integer and just leave it in its space where it would have been in the original string. Okay, good idea. So what you're telling me to do something like this, you know, Tim, is here will be our sentence. Let's say we encode the word Tim was zero is one, here's two. And then that means our translation goes 012. And that means, right, if we have a translation, say, like 210, even though these use the exact same number of words, and exact same representation for all these words, well, this is a different sentence. And our model should be able to tell that because these words come in a different order. And to you good point, if you made that point. But I'm going to discuss where this falls apart as well, and why we're not going to use this method. So although this does solve the problem I talked about previously, where we're going to kind of lose out on the context of a word, there's still a lot of issues with this. And they come especially when you're dealing with very large vocabularies. Now let's take an example where we actually have a vocabulary of, say, 100,000 words. And we know that that means we're going to have to have 100,000 unique mappings from words to integers. So let's say our mappings are something like this one, maps to the string happy, the word happy, right? Two maps to sad. And let's say that the string 100,000, or the number 100,000, maps to the word, I don't know, let's say Good. Now we know as humans, but kind of just thinking about, let's consider the fact that we're going to try to classify sentences as a positive or negative. So sentiment analysis, that the words happy and good in that regard, you know, sentiment analysis are probably pretty similar words, right. And then if we were going to group these words, we'd probably put them in a similar group, we classify them as similar words, we could probably interchange them in a sentence, and it wouldn't change the meaning a whole ton. I mean, it might, but it might not as well. And then we could say, these are kind of similar. But our model or our encoder, right, whatever we're doing to translate our text into integers here, has decided that 100,000 is going to represent good and one is going to represent happy and well, there's an issue with that, because that means when we pass in something like one, or 100,000 to our model, it's gonna have a very difficult time determining the fact that one and 100,000, although they're 99,999 kind of units apart, are actually very similar words. And that's the issue we get into when we do something like this is that the numbers we decide to pick to represent each word are very important. And we don't really have a way of being able to look at words, group them, and saying, Okay, well, we need to put all of the happy words in the range of zero to 100. All of the like adjectives in this range, we don't really have a way to do that. And this gets even harder for a model when we have these arbitrary mappings, right? And then we have something like to in between where two is very close to one, right? Yet these words are complete opposites. In fact, I'd say they're probably polar opposites. Our model trying to learn that the difference between one and two is actually way larger. than the difference between one and 100,000 is going to be very difficult. And say it's even able to do that as soon as we throw in the mapping 900. Not right, the 99,900 we put that as bad. Well, now it gets even more difficult, because it's now like, Okay, what the range is this big, then that means these words are actually very similar. But then you throw another word in here like this, and it messes up the entire system. So that's kind of what I want to show is that that's where this breaks apart on these large vocabularies. And that's what I'm going to introduce us now to another concept called word embeddings. Now, what word embeddings does is essentially try to find a way to represent words that are similar using very similar numbers. And in fact, what a word embedding is actually going to do. And I'll talk about this more in detail as we go on is classify or translate every single one of our words into a vector. And that vector is going to have some, you know, n amount of dimensions, usually, we're going to use something like 64, or maybe 128 dimensions for each vector. And every single component of that vector will kind of tell us what group it belongs to, or how similar it is to other words, so let me give you an idea what I mean. So we're going to create something called a word embeddings. Now don't ask why it's called embeddings. I don't know the exact reason but I believe it's to have has to do something with the fact that they're vectors. And let's just say we have a 3d plane like this. So we've already kind of looked at what vectors are before. So I'll skip over explaining them. And what we're going to do is take some word. So let's say we have the word good. And instead of picking some integer to represent it, we're going to pick some vector, which means we're going to draw some vector in this 3d space, actually, let's make this a different color. Let's make this vector say red, like this. And this vector represents this word good. And in this case, we'll say we have x 1x 2x. Three is our dimensions, which means that every single word in our data set will be represented by three coordinates. So one vector with three different dimensions, where we have x 1x, two and x three. And our hope is that by using this word embeddings layer, and we'll talk about how it accomplishes this in a second is that we can have vectors that represent very similar words being very similar, which means that you know, if we have the vector good here, we would hope the vector happy from our previous example, right would be a vector that points in a similar direction to it, that is kind of a similar looking thing where the angle between these two vectors, right, and maybe I'll draw it here, so we can see is small so that we know that these words are similar. And then we would hope that if we had a word that was much different, maybe say like the word bad, that that would point in a different direction, the vector that represents it, and that that would tell our model, because the angle between these two vectors is so big, that these are very different words, right? Now, in theory, does the embedding word layer work like this? You know, not always. But this is what it's trying to do is essentially pick some representation in a vector form for each word. And then these vectors, we hope, if there's similar words are going to be pointing in a very similar direction. And that's kind of the best explanation of a word embeddings layer I can give you. Now, how do we do this, though? How do we actually you know, go from Word to vector, and have that be meaningful? Well, this is actually what we call a layer. So word embeddings is actually a layer, it's something we're going to add to our model. And that means that this actually learns the embeddings for our words. And the way it does that is by trying to kind of pick out context in the sentence and determine based on where a word is in a sentence, kind of what it means, and then encodes it doing that. I know, that's kind of a rough explanation to give to you guys, I don't want to go too far into Word embeddings, in terms of the math, because I don't want to get, you know, waste our time or get too complicated if we don't need to, but just understand that our word embeddings are actually trained, and that the model actually learns these word embeddings as it goes. And we hope that by the time it's looked at enough training data, it's determined really good ways to represent all of our different words, so that they make sense to our model and the further layers. And we can use pre trained word embedding layers if we'd like just like we use that pre trained convolutional base in the previous section. And we might actually end up doing that, actually, probably not in this tutorial, but it is something to consider that you can do that. So that's how word embeddings work. This is how we encode textual data. And this is why it's so important that we kind of consider the way that we pass information to our neural network, because it makes a huge difference. Okay, so now that we've talked about kind of the form that we need to get our data in before we can pass it further in the neural network, right before we can get past that embedding layer, before it can get put in, put into any dense neurons before we can even really do any math with it. We need to turn it into numbers, right our textual data so now that we know that it's time to talk about recurrent neural networks. Now recurrent neural networks are the type of networks we use when we process textual data. Typically, you don't always have to use these but they are just the best for natural language processing. And that's why They're kind of their own class right? Now the fundamental difference between a recurrent neural network and something like a dense neural network or a convolutional neural network, is the fact that it contains an internal loop. Now, what this really means is that the recurrent neural network does not process our entire data at once. So it doesn't process the entire training example, or the entire input to the model at once. What it does is processes it at different time steps, and maintains what we call an internal memory, and kind of an internal state so that when it looks at a new input, it will remember what it seen previously, and treat that input based on kind of the context of the understanding, it's already developed. Now, I understand that this doesn't make any sense right now, with a dense neural network, or the neural networks we looked at so far, we call those something called feed forward neural networks. What that means is we give all of our data to it at once, and we pass that data from left to right, or I guess for you guys from left to right. So we give all of the information, you know, we would pass those through the convolutional layer to start, maybe we pass them through dense neurons, but they get given all of the info. And then that information gets translated through the network to the very end again, from left to right. Whereas here, with recurrent neural networks, we actually have a loop, which means that we don't feed the entire textual data at once we actually feed one word at a time, he processes that word, generate some output based on that word, and uses the internal memory state that is keeping track of to do that as part of the calculation. So essentially, the reason we do this is because just like humans, when we, you know, look at text, we don't just take a photo of this text and process it all at once we read it left to right, word to word. And based on the words that we've already read, we start to slowly develop an understanding of what we're reading, right? If I just read the word Now, that doesn't mean much to me, if I just read the word in code, that doesn't mean much. Whereas if I read the entire sentence, now that we've learned a little bit about how we can encode text, I start to develop an understanding about what this next word means based on the previous words before it right. And that's kind of the point here is that this is what a recurrent neural network is going to do for us, it's going to read one word at a time, and slowly start building up its understanding of what the entire textual data means. And this works in kind of a more complicated sense than that will draw it out a little bit. But this is kind of what would happen if we um, I guess unraveled a recurrent layer, because recurrent neural network, yes, it has a loop in it. But really, the recurrent aspect of a neural network is the layer that implements this recurrent functionality with a loop. Essentially, what we can see here is that we're saying x is our input, and h is our output, x t is going to be our input at time t, whereas each T is going to be our output at time t, if we had a text of, say, length four, so for words like we've encoded them into integers, now at this point, the first input at time zero will be the first word into our network, right, or the first word that this layer is going to see. And the output at that time is going to be our current understanding of the entire text after looking at just that one word. Next, what we're going to do is process input one, which will be the next word in the sentence. But we're going to use the output from the previous kind of computation of the previous iteration. To do this, so we're going to process this word in combination with what we've already seen, and then have a new output, which hopefully should now give us an understanding of what those two words mean. Next, we'll go to the third word, and so forth, and slowly start building our understanding what the entire textual data means by building it up one by one, the reason we don't pass the entire sequence at once is because it's very, very difficult to just kind of look at this huge blob of integers and figure out what the entire thing means. If we can do it one by one and understand the meaning of specific words based on the words that came before it and start learning those patterns, that's going to be a lot easier for a neural network to deal with, than just passing it all at once looking at it in trying to get some output. And that's why we have these recurrent layers. There's a few different types of them. And I'm going to go through them, and then we'll talk a little bit more in depth of how they work. So the first one is called long short term memory. And actually, in fact, before we get into this, let's, let's talk about just a firt, like a simple layer so that we kind of have a reference point before going here. Okay, so this is kind of the example I want to use here to illustrate how a recurrent neural network works and a more teaching style rather than what I was doing before. So essentially, the way that this works is that this whole thing that I'm drawing here, right, all of this circle stuff is really one layer. And what I'm doing right now is breaking this layer apart and showing you kind of how this works in a series of steps. So rather than passing all the information at once, we're going to pass it as a sequence, which means that we're going to have all these different words. And we're going to pass them one at a time to the code to the layer right to this recurrent layer. So we're going to start from this left side over here, as well as right, you know, start over here at time step zero, that's what zero mean. So time step is just you know the order. In this case, this is the first word. So let's say we have the sentence Hi, I am Tim, right, we've broken these down into vectors, they've been turned into their numbers, I'm just writing them here. So we can kind of see what I mean in like a natural language. And they are the input to this recurrent layer. So all of our different words, right, that's how many kind of little cells we're going to draw here is how many words we have in this sequence that we're talking about. So in this case, we have four, right four words. So that's why I've drawn four cells to illustrate that. Now, what we do is that time step zero, the internal state of this layer is nothing, there's no previous output, we haven't seen anything yet. Which means that this first kind of cell, which is what I'm looking at right here, what I'm drawing in this first cell, is only going to look and consider this first word and kind of make some prediction about it and do something with it, we're gonna pass high to the sell, some math is going to go on in here. And then what it's going to do is it's going to output some value, which you know, tells us something about the word high, right, some numeric value, we're not going to talk about what that is, but it's gonna be there's gonna be some output. Now, what happens is after the cell has finished processing this, so right, so this one's done, this is completed h zero, the outputs there, we'll do a check mark to say that that's done, it's finished processing, this output gets fed into actually the same thing. Again, we're kind of just keeping track of it. And now what we do is we processed the next input, which is AI. And we use the output from the previous cell to process this and understand what it means. So now, technically, we should have some output from the previous cell. So from whatever high was right, we do some analysis on the word AI, we kind of combine these things together. And that's the output of this cell is our understanding of not only the current input, but the previous input with the current input. So we're slowly kind of building up our understanding of what this word AI means, based on the words we saw before. And that's the point I'm trying to get at is that this network uses what it's seen previously, to understand the next thing that it sees it's building a context is trying to understand not only the word but what the word means, you know, in relation to what's come before it. So that's what's happening here. So then this output here, right, we get some output, we finish this, we get some output h one, H, one is passed into here. And now we have the understanding of what high and AI means. And we add em like that, we do some kind of computations, we build an understanding what the sentences, and then we get the output h2, that passes to H three. And they'll Finally we have this final output H three, which is going to understand hopefully, what this entire thing means. Now, this is good, this works fairly well. And this is called a simple RNN layer, which means that all we do is we take the output from the previous cell of the previous iteration, because really all of these cells is just an iteration, almost an a for loop, right based on all the different words in our sequence. And we slowly start building to that understanding as we go through the entire sequence. Now, the only issue with this is that as we have a very long sequence, so sequences of length, say 100, or 150, the beginning of those sequences starts to kind of get lost as we go through this because remember, all we're doing right is the output from h2 is really a combination of the output from h zero and H one. And then there's a new word that we've looked at, and H three is now a combination of everything before it and this new word. So it becomes increasingly difficult for our model to actually build a really good understanding of the text in general, when the sequence gets long, because it's hard for me to remember what it seemed at the very beginning because that is now so insignificant, there's been so many outputs tacked on to that, that is hard for it to go back and see that if that makes any sense. Okay, so what I'm going to do now is try to explain the next layer we're going to look at which is called LS tm. So the previous layer, we just looked at the recurrent layer was called a simple RNN layer. So simple recurrent neural network layer, whatever you want to call it, right simple, recurrent layer. Now we're going to talk about the layer which is lsdm, which stands for long short term memory. Now long and short, are hyphenated together. But essentially what we're doing and it just gets a little bit more complex, but I won't go into the math is we add another component that keeps track of the internal state. So right now, the only thing that we were tracking As kind of our internal state as the memory for this model was the previous output. So whatever the previous output was, so for example, at time zero here, there was no previous output, so there was nothing being kept in this model. But at time one, the output from this cell right here was what we were storing. And then at sell to, the only thing we were storing was the output at time one, right, and we've lost now the output from time zero, what we're adding in long short term memory is an ability to access the output from any previous state at any point in the future when we want it. Now, what this means is that rather than just keeping track of the previous output, we'll add all of the outputs that we've seen so far into what I'm going to call my little kind of conveyor belt, it's going to run at the top up here, I know it's kind of hard to see, but it's just what I'm highlighting, it's almost just like a lookup table that can tell us the output at any previous cell mean that we want. So we can kind of add things to this conveyor belt, we can pull things off, we can look at them. And this just adds a little bit of complexity to the model, it allows us to not just remember the last state, but look anywhere at any point in time, which can be useful. Now, I don't want to go into much more depth about exactly how this works. But essentially, you know, just think about the idea that as the sequence gets very long, it's pretty easy to forget the things we saw at the beginning. So if we can keep track of some of the things we've seen at the beginning, and some of the things in between on this little conveyor belt, and we can access them whenever we want, then that's going to make this probably a much more useful layer, right, we can look at the first sentence and the last sentence of a big piece of text at any point that we want, and say, okay, you know, this tells us x about the meaning of this text, right. So that's what this lsdm does. I again, I don't want to go too far, we've already spent a lot of time kind of covering, you know, recurrent layers and how all this works. Anyways, if you do want to look it up some great mathematical definitions, again, I will source everything at the bottom of this document, so you can go there. But again, that's lsdm long short term memory. That's what we're going to use for some of our examples. Although simple, RNN does work fairly well for shorter length sequences. And again, remember, we're treating our text as a sequence. Now, we're going to feed each word into the recurrent layer, and it's going to slowly start to develop an understanding as it reads through each word right and processes that Okay, so now we are on to our first example, where we're going to be performing sentiment analysis on movie reviews to determine whether they are positive reviews or negative reviews. Now, we already know what sentiment means. That's essentially what I just described. So picking up you know whether a block of text is considered positive or negative. And for this example, we're gonna be using the movie review data sets. Now, as per usual, this is based off of this TensorFlow tutorial slash guide, I found this one kind of confusing to follow on the TensorFlow website. But obviously, you can follow along with that if you not prefer that version overmind. But anyways, we're going to be talking about the movie review data set. So this data set is straight from Kara's, and contains 25,000 reviews, which are already pre processed and labeled. Now what that means for us is that every single word is actually already encoded by an integer. And in fact, they've done kind of a clever encoding system where what they've done is said, if a character is encoded by say, integer zero, that represents how common that word is in the entire data set. So if an integer was encoded by or non integer, a word was encoded by integer three, that would mean that it is the third most common word in the data set. And in this specific data set, we have vocabulary size of 88,584, unique words, which means that something that was classified as this, so 88,584, would be the least common word in the data set. So certainly keep in mind, we're going to load in the data set and do our imports just by hitting run here. And as I've mentioned previously, you know, I'm not going to be typing this stuff out, it's just kind of a waste of time, I don't have all the syntax memorize, I would never expect you guys to memorize this either. But what I will do is obviously walk through the code step by step, and make sure you understand why it is that we have what we have here. Okay. So what we've done is defined the vocabulary size, the max length of review, and the batch size. Now, what we've done is just loaded in our data set, by defining the vocabulary size. So this is just the words that will include so in this case, all of them. Then we have trained data, train labels, test data, test labels, and we can look at a review and see what it looks like by doing something like this. So this is an example of our first review, we can see kind of the different encodings for all of these words. And this is what it looks like they're already in integer form. Now, just something to note here is that the length of our reviews are not unique. So if I do the Len of train data, I guess I wouldn't say unique, but I mean, they're just all different. So the lead of trained at zero is different than the land of trained data one, right? So that's something to consider, as we go through this and something we're actually going to have to handle. Okay, so More pre processing. So this is what I was talking about. If you have a look at our loaded interviews, we'll notice they're of different lengths. This is an issue, we cannot pass different like data into our neural network, which is true. Therefore, we must make each review the same length. Okay, so what we're going to do for now is we're actually going to pad our sequences. Now, what that means is, we're going to follow this kind of step that I've talked about here. So if the review is greater than 250 words, we will trim off extra words, if the review is less than 250 words will add the necessary amount of this should actually be zeros in here, let's fix this of zeros to make it equal to 250. So what that means is, we're essentially going to add some kind of padding to a review. So in this case, I believe we're actually going to pad to the left side, which means that say we have a review of length, you know, 200, we're going to add 50, just kind of blank words, which will represent with the index zero to the left side of the review to make it the necessary length. So that's, that's good, we'll do that. So if we look at train data and test data, what this does is we're just gonna use something from Kara's, which we've imported above. So we're saying from Kara's pre processing, import sequence, again, we're treating our text data as a sequence, as we've talked about, we're gonna say sequence dot pad sequences, train data, and then we define the length that we want to pad it to. So that's what this will do, it will perform these steps that we've already talked about. And again, we're just going to assign test data and train data to you know, whatever this does for us, we pass the entire thing, it'll pass all of them for us at once. Okay, so let's run that. And then let's just have a look at say, train data one now, because remember, this was like 189, right? So if we look at train data, so train underscore data, one, like that, we can see those an array with a bunch of zeros before, because that is the padding that we've employed to make it the correct length. Okay, so that's padding, that's something that we're probably gonna have to do most of the time when we feed something to our neural networks. Alright, so the next step is actually to create the model. Now, this model is pretty straightforward. We have an embedding layer and lsdm, and a dense layer here. So the reason we've done dense with the activation function of sigmoid at the end, is because we're trying to pretty much predict the sentiment of this, right, which means that if we have the sentiment between zero and one, then if a number is greater than 0.5, we could classify that as a positive review. And if it's less than 0.5, or equal, you know, whatever you want to set the bounds at, then we can say that's a negative review. So sigmoid, as we probably might recall, squishes, our values between zero and one, so whatever the value is, at the end of the network will be between zero and one, which means that you know, we can make the accurate prediction. Now here, the reason we have the embedding layer, like Well, we've already pre processed our review is, even though we've pre processed this with these integers, and they are a bit more meaningful than just our random lookup table that we've talked about before, we still want to pass that to an embedding layer, which is going to find a way more meaningful representation for those numbers than just there integer values already. So it's going to create those vectors for us. And this 32 is denoting the fact that we're going to make the output of every single one of our embeddings, or vectors that are created 32 dimensions, which means that when we pass them to the lsdm layer, we need to tell the lsdm layer, it's going to have 32 dimensions for every single word, which is what we're doing. And this will implement that long short term memory process we talked about before, and output the final output to tf Kerris layers dense, which will tell us you know, that's what this is right? It'll make the prediction. So that's what this model is, we can see, let me give us a second to run here, the model summary, which is already printed out, we can look at the fact that the embedding layer actually has the most amount of parameters, because essentially, it's trying to figure out, you know, all these different numbers, how can we convert that into a tensor of 32 dimensions, which is not that easy to do. And this is going to be the major aspect that's being trained. And then we have our lsdm layer, we can see the parameters there. And our final dense layer, which is eight getting 33 parameters. That's because the output from every single one of these dimensions 32 plus a bias node, right that we need. So that's what we'll get there. You can see monitored summary, we get the sequential model. Okay. So training. Alright, so now it's time to compile and train the model, you can see I've already trained mine. What I'm going to say here is if you want to speed up your training, because this will actually take a second and we'll talk about why we pick these things in a minute is go to runtime, change runtime type, and add a hardware accelerator of GPU. What this will allow you to do is utilize a GPU while you're training, which should speed up your training by about 10 to 20 times. So I probably should have mentioned that beforehand. But you can do that. And please do for these examples. So model compile. Alright, so we're compiling our model. We're picking the loss function as binary cross entropy. The reason we're picking this is because this is going to essentially tell us how far away From the correct probability, right, because we have two different things we could be predicting. So you know, either zero or one so positive or negative. So this will give us a correct loss for that kind of problem that we've talked about before. The optimizer, we're gonna use RMS prop again, I'm not going to discuss all the different optimizers, you can look them up if you care that much about what they do. And we're gonna use metrics as ACC, one thing I will say is the optimizer is not crazy important. For this one, you can use atom if you wanted to, and it would still work fine. My usual go to is just use the atom optimizer unless you think there's a better one to use. But anyways, that's something to mention. Okay, so finally, we will fit the model, we've looked at the syntax a lot before. So model that fit will give the training data, the training labels, the epochs, and we'll do a validation split of 20%, such as 0.2 stands for, which means that what we're going to be doing is using 20% of the training data to actually evaluate and validate the models we go through. And we can see that after training, which I've already done, and you guys are welcome to obviously do on your own computer, we kind of stall out an evaluation accuracy of about 88%. Whereas the model actually gets overfit to about 97 98%. So what this is telling us essentially, is that we don't have enough training data and that after we've even done just one epoch, we're pretty much stuck on the same validation accuracy, and that there's something that needs to change in the model to make it better. But for now, that's fine. We'll leave it the way that it is. Okay, so now we can look at the results. I've already did the results here, just to again, speed up some time. But we'll do the evaluation on our test data and test labels to get a more accurate kind of result here. And that tells us we have an accuracy of about 85.5%, which you know, isn't great, but it's decent, considering that we didn't really write that much code to get to the point that we're at right now. Okay, so that's what we're getting the models been trained. Again, it's not too complicated. And now we're on to making predictions. So the idea is that now we've trained our model, and we want to actually use it to make a prediction on some kind of movie review. So since our data was pre processed, when we gave it to the model, that means we actually need to process anything, we want to make a prediction on in the exact same way, we need to use the same lookup table, we need to encode it, you know, precisely the same. Otherwise, when we give it to the model, it's going to think that the words are different, and it's not going to make an accurate prediction. So what I've done here is I've made a function that will encode any text into what he called the proper pre processed kind of integers, right, just like our training data was pre processed, that's what this function is going to do for us is pre processed some line of text. So what I've done is actually gotten the lookup table. So essentially, the mappings from IBM IB I am IMDb could read that properly from that data set that we loaded earlier. So let me go see if I can find where I defined IMDb. You can see up here. So Kara's dot data sets import IMDb, just like we loaded it in, we can also actually get all of the word indexes in that map, we can actually print this out if we want to look at what it is after. But anyways, we have that mapping, which means that all we need to do is Cara's pre processing text, text to word sequence What this means is give given some text, convert all of that text into what we call tokens, which are just the individual words themself. And then what we're going to do is just use a kind of for loop inside of here that says word index at word if word in Word index, LC, euro for word and tokens. Now what this means is essentially, if the word that's in these tokens now is in our mapping, so in that vocabulary of 88,000 words, then what we'll do is replace its location in the list with that specific word, or with that specific integer that represents it. Otherwise, we'll put zero just to stand for, you know, we don't know what this character is. And then what we'll do is return sequence dot pad sequences. And we'll pad this token sequence. And just return actually the first index here. The reason we're doing that is because this pad sequences works on a list of sequences, so multiple sequences. So we need to put this inside a list, which means that this is going to return to us a list of lists. So we just obviously want the first entry because we only want you know that one sequence that we padded. So that's how this works. Sorry, it's a bit of a mouthful to explain. But you guys can run through and print this stuff out if you want to see how all of it works specifically. But yeah, so we can run this cell and have a look at what this actually does for us on some sample text. So that movie was just amazing. So amazing. We can see we get the output that we were kind of expecting so integer encoded words down here, and then a bunch of zeros just for all the padding. Now while we're at it, I decided why not? Why don't we make a decode function so that if we have any movie review like this, that's in the integer form, we can decode that into the text value. So the way we're going to do that is start by reversing the word index that we just created. Now the reason for that is the word index we looked at which is this right? goes from Word to integer. But we actually now want to go from integer to word so that we can actually translate a sentence, right. So what I've done is made this decode integers function, we've set the padding key zero, which means that if we see zero, that's really just means you know nothing's there, we're going to create a text string, which we're going to add to that I'm ucsa. For num in integers, integers is our input, which will be a list that looks something like this, or an array, whatever you want to call it, we're gonna say if number does not equal pad, so essentially, if the number is not zero, right, it's not padding, then what we'll do is add the lookup of reverse word index num. So whatever that number is, into this new string plus a space and then just return text colon negative one, which means return everything except the last space that we would have added. And then if I print the decode integers, we can see that this encoded thing that we had before, which looks like this gets encoded by the string, that movie was just amazing soulmates are not encoded decoded, because this was the encoded form. So that's how that works. Okay, so now it's time to actually make a prediction. So I've written a function here that will make a prediction on some piece of text as the movie review for us. And I'll just walk us through quickly how this works, then I'll show us the actual output from our model, you know, making predictions like this. So what we say is, we'll take some parameter text, which will be our movie review. And we're going to encode that text using the ENCODE text function we've created above. So just this one right here, that essentially takes our sequence of words, we get the pre processing, so turn that into a sequence, remove all the spaces, whatnot, you know, get the words, then we turn those into the integers, we have that we return that. So here we have our proper pre processed text, then what we do is we create a blank NumPy array, that is just a bunch of zeros, that's in the form one 254 in that shape. Now, the reason I'm putting in that in that shape is because the shape that our model expects is something 250, which means some number of entries, and then 250, integers representing each word, right, because that's the length of movie review, is what we've told the model is like 250. So that's the length of the review, then what we do is we put press zero, so that's what's up here, equals the encoded text. So we just essentially insert our one entry into this, this array we've created, then what we do is say model dot predict on that array, and just return and print the result zero. Now, that's pretty much all there is to it. I mean, that's how it works. The reason we're doing result zero is because again, model is optimized to predict on multiple things, which means like, I would have to do you know, list of encoded text, which is kind of what I've done by just doing this prediction lines here, which means it's going to return to me an array of arrays. So if I want the first prediction, I need to index zero, because that will give me the prediction for our first and only entry. Alright, so I hope that makes sense. Now we have a positive review, I've written in a negative review, and we're just going to compare the analysis on both of them. So that movie was so awesome, I really loved it, and would watch it again, because it was amazingly great. And then that movie sucked, I hated it and wouldn't watch it again, was one of the worst things I've ever watched. So let's look at this. Now, we can see the first one gets predicted at 72% positive, whereas the other one is 23%. positive. So essentially, what that means is that, you know, if the lower the number, the more negative we're predicting it is, the higher the number, the more positive we're predicting it is if we wanted to not just print out this value, and instead what we wanted to do was print out, you know, positive or negative, we could just make a little if statement, it says if this number is greater than 0.5, say positive, otherwise say not say negative, right. And I just want to show you that changing these reviews ever so slightly actually makes a big difference. So if I remove the word Awesome, so that movie was so and then I run this, you can see that, oh, wow, this actually increases and goes up to 84%. Right? So the presence of certain words and certain locations actually makes a big difference. And especially when we have a shorter length review, right? If we have a longer length review, it won't make that big of a difference. But even the removal of a few words here. And let's see. So the removing the word awesome changed it by almost like 10%. Right? Now, if I move. So let's see if that makes a bigger difference, it makes very little difference because it's learned at least the model, right that the word so doesn't really make a huge impact into the type of review. Whereas if I remove the word I let's see if that makes a big impact. Probably not right now it goes back up to 84. So that's cool. And that's something to play with is removing certain words and seeing how much impact those actually carry. And even if I just add the word great, like would great. Watch it again, just in the middle of the sentence doesn't have to make any sense. Let's look at this here. Oh, boom, we increase like a little bit right? And let's say if I add this movie, you really suck. Let's see if that makes a difference. No, that just reduces it like a tiny bit. So swing cool, something to play with me. Now let's move on to the next example. So now we're on to our last and final example, which is going to be creating a recurrent neural network plate generator, this is going to be the first kind of neural network we've done, that's actually going to be creating something for us. But essentially, what we're going to do is make a model that's capable of predicting the next character in a sequence. So we're going to give it some sequence as an input. And what it's going to do is just simply predict the most likely next character. Now there's quite a bit that's going to go into this, but the way we're going to use this to predict a play is we're going to train the model on a bunch of sequences of texts from the play Romeo and Juliet. And then we're going to have it so that we'll ask the model, we'll give it some starting prompt some string to start with. And that'll be the first thing we pass to it, it will predict to us with the most likely next character for that sequence is, and we'll take the output from the model and feed it as the input again to the model and keep predicting sequence of characters. So keep predicting the next character from the previous output as many times as we want to generate an entire play. So we're gonna have this neural network that's capable of predicting one letter at a time, actually end up generating an entire play for us by running it multiple times on the previous output from the last iteration. Now, that's kind of the problem. That's what we're trying to solve. So let's go ahead and get into it and talk about what's involved in doing this. So the first thing we're going to do, obviously, is our imports. So from Kara's pre processing, import sequence, import Kara's we need TensorFlow NumPy, and oh, S. So we'll load that in. And now what we're gonna do is download the file, so the data set for Romeo and Juliet, which we can get by using this line here. So Kerris has this utils thing, which will allow us to get a file, save it as whatever we want. In this case, we're gonna save it as Shakespeare dot txt, and we're going to get that from this link. Now, I believe this is just some like shared drive that we have access to from Kara's, so we'll load that in here. And then this will simply give us the path on this machine, because remember, this is Google Collaboratory, to this text file. Now, if you want, you can actually load in your own text data. So we don't necessarily need to use the Shakespeare play, we can use anything we want. In fact, an example that I'll show later is using the B movie script. But the way you do that is run this block of code here. And you'll see that it pops up this thing for choose files, just choose a file from your local computer. And then what that will do is just save this on Google Collaboratory. And then that will allow you to actually use that. So make sure that's a txt file that you're loading in there. But regardless, that should work. And then from there, you'll be good to go. So if you, you know, you don't need to do that, you can just run this block of code here, if you want to load in the Shakespeare. txt, but otherwise, you can load in your own file. Now, after we do that, we want to do is actually open this file. So remember, that was just saving the path to it. So we'll open that file in RB mode, which is read bytes mode, I believe. And then we're going to say dot read, and we're going to read that in as an entire string, we're going to decode that into UTF, eight format. And then we're just printing the length of the text or the amount of characters in the text. So if we do that, we can see we have the length of the text is 1.1 million characters approximately. And then we can have a look at the first 250 characters by doing this. So we can see that this is kind of what the plate looks like we have whoever speaking colon, then some line, whoever speaking colon, some line, and there's all these brake lines. So backslash ends, which are telling us, you know, go to the next line, right. So it's going to be important, because we're going to hope that our neural network will be able to predict things like brake lines and spaces, and even this kind of format as we teach it more and get further in. But now it's time to talk about encoding. So obviously, all of this text is in text form, it's not pre processed for us, which means we need to pre process it and encode it as integers before we can move forward. Now fortunately, for us, this problem is actually a little bit easier than the problem we discussed earlier with encoding words, because what we're going to do is simply encode each character in the text with an integer. Now, you can imagine why this makes this easier because there really is a finite set of characters. Whereas there's kind of indefinite, or, you know, I guess, infinite amount of words that could be created. So we're not really going to run into the problem where, you know, two words are encoded with such differ two characters are encoded with such different integers. That makes it difficult for the model to understand because, I mean, we can look at what the value vocab is here, we're only going to have so many characters in the text. And for characters, it just doesn't matter as much because you know, an R isn't like super meaningful compared to an A. So we can kind of encode in a simple format, which is what we're going to do. So essentially, we need to figure out how many unique characters are in our vocabulary. So to do that, we're going to say vocab equals sorted set text. This will sort all of the unique characters in the text. And then what we're going to do is create a mapping from unique characters to index indices. So essentially we're gonna say UI for IU in a new enumerate vocabulary. What this will do is give us, essentially zero, whatever the string is, one, whatever the string is to whatever the string is, for every single letter or character in our vocabulary, which will allow us to create this mapping. And then what we'll do is just turn this initial vocabulary into a list or into an array. So we can just use the index at which a letter appears as the reverse mapping. So going from index to letter, rather than lettered index, which is what this one's doing here. Next, I've just written a function that takes some text and converts that to an int, or the into representation for it just to make a little bit easier for us as we get later on in the tutorial. So we're just going to say NP dot array of in this case, and we're just going to convert every single character in our text into its integer representation by just referencing that character, and putting that in a list here, and then obviously, converting that to NumPy array. So then, if we wanted to have a look at how this works, we can say text as int equals text to int text. So remember, text is that entire loaded file that we had above here. So we're just going to convert that to its integer representation entirely using this function. And now we can look at how this works down here. So we can see that the text first citizen, which is the first 13 letters, is encoded by 1840 750-657-5081. And obviously, each character has its own encoding. And you can go through and kind of figure out what they are based on the ones that are repeated, right. So that is how that works. Now, I figured while we were at it, we might as well write a function that goes the other way. So into Tech's reason I'm trying to convert this to a NumPy array first is just because we're going to be passing in different objects potentially in here. So if it's not already a NumPy array, it needs to be a NumPy array, which is kind of what this is doing. Otherwise, we're just going to pass on that we don't need to convert it to a NumPy array, if it already has one, we can just join all of the characters from this list into here. So that's essentially what this is doing for us it just joining into text. And then we can see if we go into text text is int, colon 13. That translates that back to us first citizen. I mean, you can look more into this function if you want, but it's not that complicated. Okay, so now that we have all this text encoded as integers, what we need to do is create some training examples, it's not really feasible to just pass the entire, you know, 1.1 million characters to our her model at once for training, we need to split that up into something that's meaningful. So what we're actually going to be doing is creating training examples where we have B first, where the training input, right, so the input value is going to be some sequence of some length, we'll pick the sequence length, in this case, we're actually going to pick 100, and then the output or the expected output, so I guess, like the label for that training example, is going to be the exact same sequence shifted right by one character. So essentially, I put a good example here, our input will be something like hell, right. Now, our output will be e Ll O. So what it's going to do is predict this last character, essentially. And these are what our training samples are going to look like. So the entire beginning sequence, and then the output sequence should be that beginning sequence minus the first letter, but tack on what the last letter should be. So that this way, we can look at some input sequence and then predict that output sequence that you know, plus a character, right. Okay, so that's how that works. So now we're going to do is define a sequence length of 100, we're going to say the amount of examples per epoch is going to be the length of the text divided by the sequence length plus one. The reason we're doing this is because for every training example, we need to create a sequence input that's 100 characters long. And we need to create a sequence output that's 100 characters long, which means that we need to have 101 characters that we use for every training example, right? Hopefully, that would make sense. So what this next line here is going to do is convert our entire string data set into characters. And it's actually going to allow us to have a stream of characters, which means that it's going to essentially contain 1.1 million characters inside of this TF dot data set object from tensor slices. That's what that's doing. Next, so let's run this and make sure this works. All right, what we're going to do is say sequences is equal to char data set dot batch sequence length is the length of each batch. So in this case, one to one, and then drop remainder means let's say that we have, you know, 105 characters in our text. Well, since we need sequences of length 101, we'll just drop the last four characters of our tax because we can even put those into a batch. So that's what this is doing for us is going to take our entire character data set here that we've created, and batch it into length of 101, and then just drop the remainder. So that's what we're going to do. sequences does. Now split input target. What this is going to do essentially is just create those training examples that we needed. So taking this these sequences of 101 length, and converting them into the input and target text, and I'll show you how they work in a second, we can do this convert the sequences to that by just mapping them to this function. So that's what this function does. So if we say sequences dot map, and we put this function here, that means every single sequence will have this operation applied to it. And that will be stored inside this data set object. Or I guess you'd say object, but we'll also just say that's it's gonna be, you know, the variable, right. So if we want to look at an example of how this works, we can kind of see so it just says example. The input will be first citizen. Before we proceed any further hear me speak, all speak speak, first citizen you and the output, notice the first characters gone, starts at I, in the last character is actually just a space here. Whereas here, it didn't have a space. So you can see there's no space. Here, there is a space, that's kind of what I'm trying to highlight for you. The next example, we get are all resolved rather to die rather than family, whatever it goes to here, right. And then you can see here, we omit that a, and the next letter is actually a K, right? That's added in there. So that's how that works. Okay, so next, we need to make training batches. So we're gonna say the batch size equals 64. The vocabulary size is the length of the vocabulary, which if you remember, all the way back up to the top of the code was the set, or the sorted set of the text, which essentially told us how many unique characters are in there, the embedding dimension is 256, the RNN units is 1024. And the buffer size is 10,000. What we're going to do now is create a data set that shuffles we're going to switch around all these sequences, they don't get shown in the proper order, which we actually don't want. And I'm going to batch them by the batch size. So if we haven't kind of gone over what batching. And all this does before. I mean, you can read these comments as a straight from the TensorFlow documentation, what we want to do is feed our model 64 batches of data at a time. So what we're going to do is shuffle all the data batches into that size, and then again, drop the remainder. If there's not enough batches, which is what we'll do, we're going to define the embedding dimension, which is essentially, how big we want every single vector to represent our words are in the embedding layer, and then the RNN units, I won't really discuss what that is right now. But that's essentially how many, it's hard to really, I'm just gonna omit describing that for right now. Because I don't want to butcher an explanation. It's not that important. Anyways, okay, so now we're gonna go down to building the model. So we've kind of set these parameters up here, remember what those are, we've batched. And we've shuffled the data set. And again, that's how this works, you can print it out if you want to see what a batch actually looks like. But essentially, it's just 64 entries of those sequences, right. So 64 different training examples is what a batch of that is. Alright, so now we go down here, and we're gonna say build model, we're actually making a function is going to return to us a built model. The reason for this is because, right now, we're going to pass the model batches of size 64, for training, right. But what we're going to do later is save this model, and then we're going to patch pass it batches of one pieces of you know, training, whatever data so that you can actually make a prediction on just one piece of data. Because for right now, what it's going to do is take a batch size of 64, it's gonna take 64 training examples and return to a 64 outputs. That's what this model is going to be built to do the way we build it now to start. But later on, we're going to rebuild the model using the same parameters that we've saved and trained for the model. But change it just be a batch size of one. So that that way, we can get one prediction for one input sequence, right? So that's why I'm creating this build model function. Now in here, it's going to have the vocabulary sizes first argument, the embedding dimension, which remember was 256 as a second argument, but also these are the parameters up here, right? And then we're going to find the batch size, as you know, batch size none would this none means is we don't know how long the sequences are going to be in each batch. All we know is that we're going to have 64 entries in each batch. And then of those 64 entries. So training examples, right, we don't know how long each one will be, although in our case, we're going to use ones that are length 100. But when we actually use the model to make predictions, we don't know how long the sequence is going to be that we input so we leave this nun. Next we'll make an LSTM layer, which is long short term memory RNN units, which is 1024, which again, I don't really want to explain, but you can look up if you want return sequences means return the intermediate stage at every step. The reason we're doing this is because we want to look at what the model is seeing at the end. immediate steps and not just the final stage. So if you leave this as false, and you don't set this to true, what happens is this lsdm just returns one output, that tells us what the model kind of found at the very last time step. But we actually want the output at every single time step for this specific model. And that's why we're setting this true stateful, I'm not going to talk about that one, right now, that's something you can look up if you want. And then recurrent initializer is just what these values are going to start at, in the lsdm. We're just picking this because this is what TensorFlow is kind of said is a good default to pick, I won't go into more depth about that, again, things that you can look up more if you want. Finally, we have a dense layer, which is going to contain the amount of vocabulary size notes. The reason we're doing this is because we want the final layer to have the amount of nodes in it equal to the amount of characters in the vocabulary. This way, every single one of those nodes can represent a probability distribution that that character comes next. So all of those nodes value some sum together should give us the value of one. And that's going to allow us to look at that last layer as a predictive layer, where it's telling us the probability that these characters come next, we've discussed how that's worked previously with other neural networks. So let's run this now. Name embedding dem is not defined, which I mean believes I have not ran this yet. So now we run that, and we should be good. So we look at the model summary, we can see we have our initial embedding layer, we have our lsdm. And then we have our dense layer at the end. Now notice 64 is the batch size, right? That's the initial shape, none is the length of the sequence, which we don't know. And then this is going to be just the output dimension, or sorry, this is the amount of values in the vector, right, so we're gonna start with 256, we'll just do 1024 units in the lsdm. And then 65 stands for the amount of nodes, because that is the length of the vocabulary. Alright, so combined, that's how many trainable parameters we get, you can see each of them for each layer. And now it's time to move on to the next section. Okay, so now we're moving on to the next step of the tutorial, which is creating a loss function to compile our model with. Now I'll talk about why we need to do this in a second. But I first want to explore the output shape of our model. So remember, the input to our model is something that is of length 64, because we're going to have batches of 64 training examples, right? So every time we feed our model, we're going to give it 64 training examples. Now, what those training examples are, are sequences of length 100, that's what I want you to remember, we're passing 64 entries, that are all of length 100, into the model as its training data, right. But sometimes, and when we make predictions with the model, later on, we'll be passing it just one entry that is of some variable length, right. And that's why we've created this build model function. So we can build this model using the parameters that we've saved later on, once we train the model, and it can expect a different input shape, right? Because when we're training it, it's gonna be given a different shape than we're actually testing with it. Know, what I want to do is explore the output of this model, though, at the current point in time. So we've created a model that accepts a batch of 64 training examples that are length 100. So let's just look at what the output is from the final layer, give this a second to run, we get 64 165. And that represents the batch size, the sequence length, and the vocabulary size. Now the reason for this is we have to remember that when we create a dense layer as our last layer that has 65 nodes, every prediction is going to contain 65 numbers. And that's going to be the probability of every one of those characters occurring, right. That's what that does it the last one for us. So obviously, our last dimension is going to be 65. For the vocabulary size. This is the sequence length, and that's a batch, I just want to make sure this is really clear before we keep going. Otherwise, this can get very confusing very quickly. So what I want to do now is actually look at the length of the example batch predictions, and just print them out and look at what they actually are. So example batch predictions is what happens when I use my model on some random input example actually pulled the first one from my data set with when it's not trained. So I can actually use my model before it's trained with random weights and random biases and parameters by simply using model and then I can put little brackets like this and just pass in some example that I want to get a prediction for. So that's what I'm going to do. I'm going to give it the first batch and it can even it shows me the shape of this batch 64 100. I'm going to pass that to the model. And it's going to give us a prediction for that. And in fact, it's actually going to give us a prediction For every single element in the batch, right, every single training example in the batch is going to give us a prediction for. So let's look at what those predictions are. So this is what we get, we get a length 64 tensor, right. And then inside of here, we get a list inside of a list or an array inside of an array with all these different predictions. So we'll stop there, for this, like explaining this aspect here. But you can see we're getting 64 different predictions, because there's 64 elements in the batch. Now, let's look at one prediction. So let's look at the very first prediction for say, the first element in the batch, right. So let's do that here. And we see now that we get a length 100 tensor. And that this is what it looks like, there's still another layer inside. And in fact, we can see that there's another nested layer here, right, another nested array inside of this array. So the reason for this is because at every single time step, which means the length of the sequence, right? Because remember, a recurrent neural network is going to feed one at a time, every word in the sequence. In this case, our sequences are length 100, at every time step, we're actually saving that output as a as a prediction, right, and we're passing that back. So we can see that for one batch one training, sorry, not one batch one training example, we get 100 outputs. And these outputs are in some shape, we'll talk about what those are in a second. So that's something to remember that for every single training example, we get whatever the length of that training example was outputs, because that's the way that this model works. And then finally, we look at the prediction at just the very first time step. So this is 100 different time steps. So let's look at the first time step and see what that prediction is. And we can see that now we get a tensor of length 65. And this is telling us the probability of every single character occurring next at the first time step. So that's what I want to walk through, is showing you what's actually outputted from the model, the current way that it works. And that's why we need to actually make our own loss function to be able to determine how, you know, good our models performing, when it outputs something ridiculous that looks like this, because there is no just built in loss function in TensorFlow that can look at a three dimensional nested array of probabilities over, you know, the vocabulary size, and tell us how different the two things are. So we need to make our own loss function. So if we want to determine the predicted character, from this array, so what we'll go there now, what we can do is get the categorical, with this call, we can sample the categorical distribution. And that will tell us the predicted character. So what I mean is, let's just look at this, and then we'll explain this. So since our model works on random weights and biases, right now, we haven't trained yet, this is actually all of the predicted characters that it had. So at every time step at the first time step, a predicted age, then it predicted hyphen, then age, then G, then u, and so on so forth, you get the point, right. So what we're doing to get this value is we're going to sample the prediction. So at this, this is just the first time step, actually, we're sample the prediction. Actually, no, sorry, we're sampling every time step by bad there. We're gonna say sampled indices equals NP dot reshapes there's reshaping this just changing the shape of it, we're gonna say predicted characters equals into two text sampled indices. So it's a really, it's hard to explain all this, if you guys don't have a statistics kind of background a little bit to talk about why we're sampling, and not just taking the argument max value of like this array, because you would think that what we'll do is just take the one that has the highest probability out of here, and that will be the index of the next predicted character. There's some issues with doing that for the loss function, just because if we do that, then what that means is, we're going to kind of get stuck in an infinite loop almost where we just keep accepting the biggest character. So what we'll do is pick a character based on this probability distribution. Kind of Yeah, again, it's hard, it's called sampling the distribution, you can look that up if you don't know what that means. But sampling is just like trying to pick a character based on a probability distribution, it doesn't guarantee that the character with the highest probability is going to be picked, it just uses those probabilities to pick it. I hope that makes sense. I know that was like a really rambley definition, but that's the best I can do. So here, we reshaped the array and convert all the integers to numbers to see the actual characters. So that's what these two lines are doing here. And then I'm just showing the predicted characters by showing you this. And you know, the character here is what was predicted at time step zero to be the next character, and so on. Okay, so now we can create a loss function that actually handles this for us. So this is the loss function that we have karass has like a built in one that we can utilize which is what we're doing What this is going to do is take all the labels and all of the probability distributions, which is what this is legit. So I'm not going to talk about that, really. And we'll compute a loss on those. So how different or how similar those two things are. Remember, the goal of our algorithm and the neural network is to reduce the loss, right? Okay, so next, we're going to compile the model, which we'll do here. So we're going to compile the model with the atom optimizer and the loss function as loss, which we defined here. And now we're going to set up some checkpoints. I'm not going to talk about how these work, you can kind of just read through this if you want. And then we're going to train the model. Remember to start your GPU hardware accelerator under runtime, change runtime type, GPU, because if you do not, then this is going to be very slow. But once you do that, you can train the model, I've already trained it. But if we go through this training, we can see it's gonna say train for 172 steps, it's gonna take about, you know, 30 seconds per epoch, probably maybe a little bit less than that. And the more epochs you run this for, the better it will get, this is a different, we're not likely going to overfit here. So we can run this for like, say, 100 epochs if we wanted to. For our case, let's actually start by just training this on, let's say two epochs, just to see how it does. And then we'll train it on like 1020 4050. And compare the results. But you'll notice the more epochs, the better it's going to get. But just like for our case, we'll start with two and then we'll work our way up. So while that trains will actually explain the next aspect of this without running the code. So essentially, what we need to do, after we've trained the model, we've initialized the weights and biases, we need to rebuild it using a new batch size of one. So remember, the initial batch size was 64, which means that we'd have to pass it 64 inputs or sequences for to work properly. But now what I've done is I'm going to rebuild the model and change it to a batch size of one so that we can just pass it some sequence of whatever length we want, and it will work. So if we run this, we've rebuilt the model with batch size one, that's the only thing we've changed. And now what I can do is load the weights by saying model dot load weights, TF dot train, dot latest checkpoint, checkpoint directory, and then build the model. Using the tensor shape one, none. I know sounds strange. This is how we do this rebuild the model. One nine is just saying expect the input one and then none means we don't know what the next dimension length will be. But here, checkpoint directory is just we've defined where on our computer, we're going to save these TensorFlow checkpoints. This is just saying this is the was the prefix we're going to save the checkpoint with. So we're going to do the checkpoint directory. And then checkpoint epoch where epoch will stand for obviously, whatever epoch we're on. So we'll save checkpoint here, we'll save a checkpoint at epoch one, a checkpoint at epoch two, to get the latest checkpoint, we do this. And then if we wanted to load any intermediate checkpoint, say, like checkpoint 10, which is what I've defined here, we can use this block of code down here. And I've just hardwired the checkpoint that I'm loading by saying TF dot train dot load checkpoint, whereas this one just gets the most recent, so we'll get the most recent which should be checkpoint two for me. And then what we're going to do is generate the text. So this function, oh, dig into it in a second. But I just want to run and show you how this works. Because I feel like we've done a lot of work for not very many results right now. And I'm just gonna type in the string Romeo. And just show you that when I do this, we give it a second. And it will actually generate an output sequence like this. So we have Romeo loose give this is the beginning of our sequence that says Lady Capulet, food martone. Father gnomes come to those shell, right? So it's like pseudo English. Most of it are like kind of proper words. But again, this is because we train it on just two epochs. So I'll talk about how we build this in a second. But if you wanted a better output for this part, then you would train this on more epochs. So now let's talk about how I actually generated that output. So we rebuilt the model to accept a batch size of one, which means that I can pass it a sequence of any length. And in fact, what I start by doing is passing the sequence that I've typed in here, which was Romeo, then what that does, is we run this function generate text, I just stole this from TensorFlow as website like I'm stealing almost all of this code. And then we say the number of characters to generate is 800. The input evaluation which is now what we need to pre process this text again, so that this works properly, we could use my little function, or we can just write this line of code here, which does what the function that I wrote does for us, so char to ID x s for S and start string start string is what we typed in that case Romeo, then what we're going to do is expand the dimensions. So essentially turn just a list like this that has all these numbers 987 into a double list. So just a nested list because that's what it's expecting as the input one batch one entry, then what we do is we're going to say the string that we want to store because we want to print this out at the end, right? We'll put in this text generated list, temperature equals 1.0. What this will allow us to do is if we change this value to be higher, well, I mean, you can read the comment here, right, low temperature results in more predictable text, higher temperature results in more surprising text. So this is just a parameter to mess with, if you want, you don't necessarily need it. And I would like I've just left mine at one for now, we're gonna start by resetting the status of the model. This is because when we rebuild the model, it's gonna have stored the last state that it remembered when it was training. So we need to clear that before we pass new input text to it. And we say for i in range num generate, which means how many characters we want to generate, which is 800. Here, what we're going to do is, say predictions equals model, input a Val, that's going to start as the start string that's encoded, right. And then what we're going to do is say predictions equals TF dot squeeze, prediction zero, what this does is take our predictions, which is going to be in a nested list, and just removes that exterior dimension. So we just have the predictions that we want, we don't have that extra dimension that we need to index again. And then we're gonna say using a categorical distribution to predict the character returned by the model. That's what he writes here. We'll divide by the temperature, if it's one, that's not going to do anything. And we'll say predicted ID equals we'll sample whatever the output was from the model, which is what this is doing. And then we're going to take that output, so the predicted ID, and we're going to add that to the input evaluation. And then what we're going to say is text generate dot append, and we're going to convert the text that are integers now, back into a string, and return all of this. Now, I know this seems like a lot, again, this is just given to us, by TensorFlow to, you know, create this aspect, you can read through the comments yourself, if you want to understand it more, but I think that was a decent, decent explanation of what this is doing. So yeah, that is how we can generate, you know, sequences using a recurrent neural network. Now, what I'm going to do is go to my other window here, where I've actually typed all the code just in full and do a quick summary of everything that we've done, just because there was a lot that went on. And then from there, I'm actually going to train this on a B movie script and show you kind of how that works in comparison to the Romeo and Juliet. Okay, so what I'm in now is just the exact same notebook we had before, but I've just pretty much copied all the text in here. Or it's the exact same code we had before. So we just don't have all that other text in between. So I can kind of do a short summary of what we did, as well as show you how this worked when I trained it on the B movie script. So I did mention that I was going to show you that I'm not lying, I will show you can see I've got B movie dot txt loaded in here. And in fact, actually, I'm gonna just show you the script first, to show you what it looks like. So this is what the B movie script looks like. You can see it just like a long, you know, script of text, I just download this for free off the internet. And it's actually not as long as the Romeo and Juliet play. So we're not going to get as good of results from our model. But it should hopefully be okay. So we just start and I'm just gonna do a brief summary. And then I'll show you the results from the B movie script, just so that people that are confused, maybe have something that wraps it up here, we're doing our imports, I don't think I need to explain that this part up here is just loading in your file, again, I don't think I need to explain that, then we're actually going to read the file. So open it from our directory, decode it into UTF, eight, we're going to create a vocabulary and encode all of the text that's inside of this file, then what we're going to do is turn all of that text into, you know, the encoded version, we're writing a function here that goes the other way around. So from int to tax, not from text to int, we're going to define the sequence length that we want to train with, which will be sequence length of 100, you can decrease this value, if you want you go 50 go 20, it doesn't really matter, it's up to you, it just that's going to determine how many training examples you're going to have right as the sequence length. Next, what we're going to do is create a character data set from a tensor slices from text as int, well, this is going to do is just convert our entire text that's now an integer array into a bunch of slices of characters. Um, so that's what this is doing here. So are not slices, what am I saying? You're just going to convert, like, split that entire array into just characters. Like that's pretty much what it's doing. And then we're gonna say sequences equals chart data set dot batch, which now is going to take all those characters and batch them in length of 101. We're going to do then is split all of that into the training examples. So like this, right atll and then ELO. We're going to map this function to sequences, which means we're going to apply this to every single sequence and store that in data set. Then, we're going to find the parameters for our initial network. We're going to shuffle the data set and batch that into now 64 training examples. And then we're going to make the function that builds the model which already discussed, we're going to actually build the model starting with the batch size of 64. We're going to create our loss function, compile the model, set our checkpoints for saving, and then train the model and make sure that we say checkpoint callback, as the checkpoint callback for the model, which means it's going to save every epoch, the weights of the model had computed at that epoch. So after we do that, then our models trained. So we've trained the model, you can see I trained this on 50 epochs for the B movie script, and then we're gonna do is build the model now with a batch size of one. So we can pass with one example tune and get a prediction, we're going to load the most recent weights into our model from the checkpoint directory that we defined above. And then what we're going to do is build the model and tell it to expect the shape one, none as its initial input. Now, none just means we don't know what that value is going to be. But we know we're gonna have one entry. Alright, so now we have this generate text method, or function here, which I've already kind of went through how that works. And then we can see that if I type in input string, so we type, you know, input string, let's say, Hello, and hit enter, we'll watch and we can see that the B movie, you know, trained model comes up with its output here. Now unfortunately, the B movie script does not work as well as Romeo and Juliet. That's just because Romeo and Juliet is a much longer piece of text. It's much better only it's formatted a lot nicer and a lot more predictable. But yeah, you kind of get the idea here. And it's kind of cool to see how this performs on different data. So I would highly recommend that you guys find some training data that you could give this other than just the Romeo and Juliet or maybe even try another play or something and see what you can get out of it. Also, quick side note to make your model better increase the amount of epochs here, ideally, you want this loss to be as low as possible, you can see mine was still actually moving down at epoch 50, you will reach a point where the amount of epochs won't make a difference. Although, with models like this, the more epochs typically the better because it's difficult for it to kind of overfit. Because all you want it to do really is just kind of learn how the language works, and then be able to replicate that to you almost right. So that's kind of the idea here. And with that being said, I'm going to say that this section is probably done. Now, I know this was a long, probably confusing section for a lot of you. But this is you know what happens when you start getting into some more complex things in machine learning, it's very difficult to kind of grasp and understand all these concepts in an hour of me just explaining them. What I try to do in these videos is introduce you to the syntax show you how to get a working, you know, kind of prototype and hopefully give you enough knowledge to the fact where if you're confused by something that I said, you can go, and you can look that up. And you can figure out kind of the more important details for yourself, because I really just I can't go into all you know, the extremes in these videos. So anyways, that has been this section. I hope you guys enjoyed doing this, I thought this was pretty cool. And in the next section, we're gonna be talking about reinforcement learning. Hello, everyone, and welcome to the next module in this course on reinforcement learning. So what we're gonna be doing in this module is talking about another technique in machine learning called reinforcement learning. Now, if you remember, at the very beginning of this course, which I know for you guys is probably at like six hours ago, at this point, we did briefly discuss what reinforcement learning was now go through a recap here just to make sure everyone's clear on it. But essentially, reinforcement learning is kind of the strategy in machine learning where rather than feeding a ton of data and a ton of examples to our model, we let the model or in this case, we're going to call it agent actually come up with these examples itself. And we do this by letting an agent explore an environment. Now essentially, the concept here is just like humans, the way that we learn to do something, say like play a game is by actually doing it, we get put in the environment, we try to do it. And then you know, we'll make mistakes, we'll encounter different things, we'll see what goes correctly. And based on those experiences we learned and we figure out the correct things to do. A very basic example is, you know, say we play a game. And when we go left, we fell off a cliff or something right. Next time we play that game. And we get to that point, we're probably not going to go left, because we're going to remember the fact that that was bad, and hence learned from our mistakes. So that's kind of the idea here with reinforcement learning. I'm gonna go through exactly how this works and give some better examples and some math behind one of the implementations we're going to use. But I just want to make this clear that there's a lot of different types of reinforcement learning. In this example, we're just going to be talking about something called q learning and I'm going to keep this module shorter compared to the other ones, because this field of AI and machine learning is pretty complex and can get pretty difficult pretty quickly. So it's something that's maybe a more advanced topic for some of you guys. Alright, so anyways, now we need to define some terminology before I can even start really explaining the technique we're going to use and how this works. So we have something called an environment agent, state, action and reward. Now I'm hoping that some of you guys will remember this from the very beginning but environment is essentially what we're trying to solve or what we're trying to do. So in reinforcement learning, we have this notion of an agent. And the agent is what's going to explore the environment. So if we're thinking about reinforcement learning when it comes to say, training an AI to play a game, well, in that instance, so we're talking about Mario, the agent would be Mario is that is the thing that's moving around and exploring our environment. And the environment would be the level in which we're playing in. So you know, in another example, maybe an example we're going to use below, we're actually going to be kind of in almost a maze. So the environment is going to be the maze, and the agent is going to be the character or the entity or whatever you want to call it that's exploring that maze. So it's pretty, it's usually pretty intuitive to come up with what the environment and the agent are. Although in some more complex examples, it might not always be clear. But just understand that reinforcement learning deals with an agent, something exploring an environment, and a very common application of reinforcement learning is in training AI's on how to play games. And it's actually very interesting what they've been able to do in that field recently. Okay, so we have environments an agent, hopefully, that makes sense, the next thing to talk about is state. So essentially, the state is where you are in the environment. So obviously, inside of the environment, we can have many different states. And a state could also be associated with the, you know, Agent itself. So we're gonna say the agent is in a specific state, whenever it is in some part of the environment. Now, in the case of our game, the state that an agent would be in would be their position in the level, say, if they're at, you know, X, Y coordinates, like 1020, they would be at state or in state 1020. That's kind of how we think about states. Now, obviously, state can be applied in some different instance, as well, we're playing say, maybe a turn based game, you know, she that's not really a great example, I'm trying to think of something where the state wouldn't necessarily be a position, maybe if you're playing a game where you have like health or something like that. And part of the states might be the health of the character, this can get complicated depending on what you're trying to do. But just understand the notion that for most of our examples, state is simply going to be location, although it really is just kind of telling us information about where the agent is, and its status in the environment. So next, we have this notion of an action. So in reinforcement learning, our agent is exploring the environment, it's trying to figure out the best way or how to accomplish some kind of goal in the environment. And the way that it interacts with the environment is with something called actions. Now, actions could be say, moving the left arrow key, right, moving to the left, and the environment moving to the right, it could be something like jumping in an action can actually be not doing something at all. So when we say, you know, Agent performed action, that could really mean that the action in that maybe time step was that they didn't do something right that they didn't do anything that was their action. So it's kind of the idea of action. In the example of our Mario one, which I keep going back to in action would be something like jumping, and typically actions will change the state of our entity or agent. Although they might not necessarily do that. In fact, we will observe with a lot of the different actions that we could actually be in the same state after performing that action. Alright, so now we're on to the last part, which is actually the most important to understand. And this is reward. So reward is actually what our agent is trying to maximize. Well, it is in the environment. So the goal of reinforcement learning is to have this agent navigate this environment, go through a bunch of the different states of it, and determine which actions maximize the reward at every given state. So essentially, the goal of our agent is to maximize a reward. But what is a reward? Well, after every action that's taken, the agent will receive a reward. Now, this reward is something that us as the programmer need to come up with. The reason we need to do this is because we need to tell the agent when it's performing well, and when it's performing poorly. And just like we had like a loss function in neural networks, when we were using those before, this is almost like our loss function, you know, the higher this number is, the more reward the agent gets, the better, the lower the reward, you know, it's not as good as not doing as well. So that's how we kind of monitor and assess performance where agents is by determining the almost average amount of reward that they're able to achieve. And their goal is really to you know, it's almost an optimization problem where they're trying to maximize this reward. So what we're going to do in reinforcement learning is have this agent exploring the environment, going through these different states and performing these different actions trying to maximize its reward. And obviously, if we're trying to get the agent to say finish a level or you know, complete the game, then the maximum maximum reward will be achieved once it's completed the level of completed the game. And if it does, things that we don't like say like dying or like jumping in the wrong spot, we could give it a negative reward to try to influence it to not do that. And our goal, you know when we train these agents is for them to get the most reward. And we hope that they're going to learn the optimal route through a level or through some environment that will maximize that reward for them. Okay, so now I'm going to talk about a technique called cue learning, which is actually just an algorithm that we're going to use to implement this idea of reinforcement learning, we're not going to get into anything too crazy in this last module, because this is meant to be more of an introduction into the kind of field of reinforcement learning than anything else. But q learning is the most basic way to implement reinforcement learning, at least that I have discovered. And essentially what q learning is, and I don't actually really know why they call it Q, although I should probably know that is creating some kind of table or matrix likes data structure, that's going to contain as the What is it, I guess, the rows every single state, and as the columns every single action that could be taken in all of those different states. So for an example here, and we'll do one on kind of the whiteboard later on, if we can get there. But here, we can see that this is kind of my cue table. And what I'm saying is that we have a one, a two, a three a four as all the possible actions that can be performed in any given state. And we have three states denoted by the fact that we have three rows, and the numbers in this, this table with this Q, what are they called Q matrix Q table, whatever you want to call it, the numbers that are present here represent what the predicted reward will be, given that we take an action, whatever this action is, in this state. So I'm not sure if this is making sense to you guys. But essentially, if we're saying that row zero is state zero, action to a two, this value tells us what reward we should expect to get, if we take this action while we're in this state. That's what that is trying to tell us. That's what that means. Same thing here in, you know, state two, we can see that the optimal action to take would be action two, because that has the highest reward for this state. And that's what this table is that we're going to try to generate with this technique called q learning. A table that can tell us given any state with the predicted reward will be for any action that we take. And we're going to generate this table by exploring the environment many different times, and updating these values according to what we kind of see or what the agencies in the environment and the rewards that receives for any given action in any given state. And we'll talk about how we're going to update that later. But this is the basic premise. So that is kind of cue learning. We're gonna hop on the whiteboard now. And we'll do a more in depth example. But then we're going to talk about how we actually learned this cue table that I just discussed. Okay, so I've drawn a pretty basic example right now that I'm going to try to use to illustrate the idea of Q learning and talk about some problems with it, and how we can kind of combat those as we learn more about how q learning works. But the idea here is that we currently have three states and why what is happening, why was that happening up at the top, I don't know. Anyways, the idea is, we have three states s one, s two and s three. And at each state, we have two possible actions that can be taken, we can either stay in this state or we can move. Now what I've done is kind of just written some integers here that represent the reward that we're going to get or that the agent is going to get such that it takes that action in a given state. So if we take the action, here, in s one, right of moving, then we will receive a reward of one because that's what we've written here is the reward that we get from moving. Whereas if we stay, we'll get a reward of three, you know, same concept here, if we stay, we get to if we move, we get one, and I think you understand the point. So the goal of our agent to remember is to maximize its reward in the environment. And what we're going to call the environment is this right here, the environment is essentially defines the number of states the number of actions and you know, the way that the agent can interact with these states and these actions. So in this case, the agent can interact with the states by taking actions that change its state, right. So that's what we're getting at with this. Now, what I want to do is show you how we use this cue table, or learn this cue table to come up with kind of the almost, you know the model like the machine learning model that we're going to use. So essentially, what we would want to have here is we want to have a kind of pattern in this table that allows our agent to receive the maximum reward. So in this case, we're going to say that our agent will start at state s one. And obviously, whenever we're doing this reinforcement learning, we need to have some kind of start state that the agent will start in this could be a random state, it could change, but it doesn't start in some state. So in this case, we're going to say it starts at s one. Now when we're in s one, the agent has two things that it can do. It can stay in the current state and receive a reward of three or it can move and receive a reward of one, right if we get to s two in this state. What can we do? We can Stay, which means we receive a reward of two, or we can move, which means we get a reward of one. And same thing for s3, we can stay, we get a reward of four, and we can move, we get a reward of one. Now, right now, if we had just ran this one time and have the agent stay in each state, like start in each unique state, this is what the cue table we would get would look like. Because after looking at this just one time starting in each state, with the agent would be able to or I guess two times because it would have to try each action. Let's say we had the agent starting each state twice. So it started as one twice, it started as two twice, and it started in s3 twice. And every time it started there, it tried one of the different actions. So when it started as one it tried moving once and then it tried staying once, we would have a cue table that looks like this, because what would happen is we would update values in our cue table to represent the reward we received when we took that action from that state. So we can see here, the one we were in state s one, and we decided to stay what we did is we wrote a three inside of the state column, because that is how much reward we received when we moved, right? Same thing for state two, when we moved for state two or I guess sorry, state when we stayed in state two, we received a reward of two, same thing for four. Now, this is okay, right? This tells us kind of, you know, the optimal move to make in any state to receive the maximum reward. But what if we introduce the idea that you know, our agent, we want it to receive the maximum total reward possible, right. So if it's in state one, Ideally, we'd like it to move to state two, and then move to states three, and then just stay in state three, because it will receive the most amount of reward? Well, with the current table that we've developed, if we just follow this, and we look at the table, we say, Okay, if we want to use this cue learning table now to you know, move an agent around our level, what we'll do is we'll say, Okay, what state is it in, if it's in state two, we'll do stay, because that's the highest reward that we have in this table. If that's the approach we use, then we can see that if our, you know, Agent start in state one or state two, it would stay in what we call a local minima, because it's not able to kind of realize from this state, that it can move any further and receive a much greater reward. Right. And that's kind of the concept we're going to talk about as we implement and, you know, discuss further how q learning works. But hopefully, it gives you a little bit of insight into what we do with this table, essentially, when we're updating these table values is when we're exploring this environments, when we explore this environment, and we start in a state, when we take an action to another state, we observe the reward that we got from going there, and we observe the state that we change to right, so we observe the fact that in state one, when we go to state two, we receive the reward of one. And what we do is we take that observation, and we use it to update this cue table. And the goal is at the end of all of these observations, and it could be millions of them, that we have a cue table that tells us the optimal action to take in any single state. So we're actually hard coding, this kind of mapping, that essentially just tells us given any state, all you have to do is look up in this table, look at all of the actions that could be taken, and just take the maximum action or the reward that's supposed to give, I guess, the action that's supposed to give the maximum reward. And if we were to follow that on this, we can see we get stuck in the local minima, which is why we're going to introduce a lot of other concepts. So our reinforcement learning model and Q learning, we have to implement the concept of being able to explore the environment, not based on previous experiences, right, because if we just tell our model, okay, what we're going to do is we're going to start in all these different states, we're going to start in the start state and just start navigating around. If we update our model immediately, or update our cue table immediately and put this three here for state, we can almost guarantee that since this three is here, when our model is training, right, if it's using this cue table to determine what state to move to next, when it's training and determining what to do, it's just always gonna stay, which means we'll never get a chance to even see what we could have gotten to at s3. So we need to kind of introduce some concept of taking random actions, and being able to explore the environment more freely, before starting to look at these q values, and use that for the training. So I'm actually going to go back to my slides now to make sure I don't get lost, because I think I was starting to ramble a little bit there. So we're gonna now talk about learning the cue table. So essentially, I showed you how we use that cue table, which is given some state we just look that state up in the cue table, and then determine what the maximum reward we could get by taking you know, some actions and then take that action. And that's how we would use the cue table later on when we're actually using the model. But when we're learning the cue table, that's not necessarily what we want to do. We don't want to explore the environment by just taking the maximum reward. We've seen so far and just always going that direction, we need to make sure that we're exploring in a different way and learning the correct values for the cue table. So essentially, our agent learns by exploring the environment and observing the outcome slash reward from each action it takes in a given state, which we've already set. But how does it know what action to take in each state? When it's learning? That's the question I need to answer for you now, well, there's two ways of doing this, our agent can essentially, you know, use the current cue table to find the best action, which is kind of what I just discussed. So taking, looking at the cue table, looking at the state and just taking the highest reward, or it can randomly pick a valid action. And our goal is going to be when we create this Q learning algorithm to have a really great balance of these two, where sometimes we use the Q table to find the best action, and sometimes we take a random action. So that is one thing. But now I'm just going to talk about this formula for how we actually update q values. So obviously, what's gonna end up happening in our Q learning is, we're gonna have an agent, that's going to be in the learning stage, exploring the environment, and having all these actions and all these rewards and all these observations happening. And it's going to be moving around the environment by following one of these two kind of principles, randomly picking a ballot action or using the current cue table to find the best action. And when it gets into a new state, and it you know, moves from state to state, it's going to keep updating this cue table telling it, you know, this is what I've learned about the environment, I think this is a better move, we're going to update this value. But how does it do that in a way, that's going to make sense because we can't just put, you know, the maximum value we got from moving otherwise, we're going to run into that issue, which I just talked about, where we get stuck in that local maxima, right? I'm not sure if I called it minimum before. But anyways, it's local maxima, where we see this high reward. But that's preventing us if we keep taking that action from reaching a potentially high reward in a different state. So the formula that we actually use to update the cue table is this. So cue state action equals q state action, and a state action is just referencing first the rows for the state and then the action as the column plus alpha times. And then this is all in brackets, right? reward, plus, I believe this is gamma times max Q of new states minus q state action. So what the heck does this mean? What are these constants? What is all this? We're going to talk about the constants in a minute, but I want to Yeah, I want to explain this formula, actually. So let's Okay, well, I guess we'll go through the constants, it's hard to go through a complicated math formula. So A stands for the learning rate, and gamma stands for the discount factor. So alpha learning rate, gamma discount factor. Now, what is the learning rate? Well, this is a little blurb on what this is. But essentially, the learning rate ensures that we don't update our cue table too much on every observation, so before, right when I was showing you like this, if we can go back to my Windows Ink, was it's not working. I guess I'm just not patient enough. before when I was showing you, all I did when I took an action was I looked at the reward that I got from taking that action. And I just put that in my cue table right? Now, obviously, that is not an optimal approach to do this, because that means that in the instance, where we hit state one, well, I'm not going to be able to get to this reward of four, because I'm going to throw that you know, three in here, and I'm just going to keep taking that action. We need to, you know, hopefully make this move action actually have a higher value than stay. So that next time we're in state one, we consider the fact that we can move to state two, and then move to state three to optimize a reward. So how do we do that? Well, the learning rate is one thing that helps us kind of accomplish this behavior. Essentially, what it is telling us and this is usually a decimal value, right is how much we're allowed to update every single cue value by on every single action or every single observation. So if we just use the approach before, then we're only going to need to observe given the amount of states and the amount of actions and we'll be able to completely fill in the cue table. So in our case, if we had like three states and three actions, we could, you know, nine iterations, we'd be able to fill the entire cue table, the learning rate means that it's going to just update a little bit slower, and essentially, change the value in the cue table very slightly. So you can see that what we're doing is taking the current value of the cue table, so whatever is already there, and then what we're going to do is add some value here. And this value that we add is either going to be positive or negative, essentially telling us you know, whether we should take this new action or whether we shouldn't take this new action. Now, the way that this kind of value is calculated, right, is obviously our alpha is multiply this by this, but we have the reward, plus, in this case, gamma, which is just going to actually be the discount factor. And I'll talk about how that works in a second of the maximum of the new state we moved into. Now what this means is find the maximum reward that we could receive in the new state by taking any action and multiply that by what we call The discount factor, what this part of the formula is trying to do is exactly what I've kind of been talking about, try to look forward and say, okay, so I know if I take this action in this state, I receive this amount of reward. But I need to factor in the reward I could receive in the next state, so that I can determine the best place to move to. That's kind of what this Max and this gamma are trying to do for us. So this discount factor, whatever you want to call it, it's trying to factor in a little bit about what we could get from the next state into this equation. So that hopefully, our kind of agent can learn a little bit more about the transition states. So states that maybe are actions that maybe don't give us an immediate reward, but lead to a larger reward in the future. That's what this wine max are trying to do. Then what we do is we subtract from this the state and action, this is just to make sure that we're adding what the difference was, in, you know, what we get from this versus what the current value is, and not like multiplying these values, crazily. I mean, you can look into more of the math here and plug in like some values later. And you'll see how this kind of works. But this is the basic format. I feel like I explained that in depth enough. Okay. So now that we've done that, and we've updated this, we've learned kind of how we update the cells and how this works. I could go back to the whiteboard and draw it out. But I feel like that makes enough sense, we're going to look at what the next state is, we're going to factor that into our calculation, we have this learning rate, which tells us essentially how much we can update each cell value by and we have this, what do you call it here discount factor, which essentially tries to kind of define the balance between finding really good rewards in our current state, and finding the rewards in the future state. So the higher this value is, the more we're going to look towards the future, the lower it is, the more we're going to focus completely on our current reward, right. And obviously, that makes sense, because we're going to add the maximum value. And if we're multiplying that by a lower number, that means we're going to consider that less than if that was greater. Awesome. Okay. So now that we kind of understand that, I want to move on to a Q learning example. And what we're going to do for this example, is actually use something called the open AI gym, I just need to throw my drawing tablet away right there so that we can get started. But open AI gym is actually really interesting kind of module, I don't even actually I don't even really know the way to describe it almost tool that was actually developed by open AI, um, you know, coincidentally by the name, which is founded by Elan Musk, and someone else. So he's actually, you know, made this kind of, I don't really don't know the word to describe it, I almost want to say tool that allows programmers to work with these really cool gym environments, and train reinforcement learning models. So you'll see how this works in a second. But essentially, there's a ton of graphical environments that have very easy interfaces to use. So like moving characters around them, that you're allowed to experiment with completely for free as a programmer to try to, you know, make some cool reinforcement learning models. That's what opening a gym is. And you can look at it, I mean, we'll click on it here, actually, to see what it is. You can see, Jim, there's all these different Atari environments, and it's just a way to kind of train reinforcement learning models. Alright, so now we're gonna start by just importing Jim, if you're in Collaboratory, there's nothing you need to do here. If you're in your own thing, you're going to have to pip install Jim. And then what we're going to do is make this frozen lake v zero, Jim. So essentially, what this does is just set up the environment that we're going to use. Now I'll talk more about what this environment is later. But I want to talk about how Jim works, because we are going to be using this throughout the thing. So the open AI gym is meant for reinforcement learning. And essentially what it has is an observation space and an action space for every environment. Now, the observation space is what we call our environment, right. And that will tell us the amount of states that exist in this environment. Now in our case, we're going to be using kind of like a maze like thing, which I'll show you in a second, so you understand why we get the values we do. Action space tells us how many actions we can take when we do the dot n at any given state. So if we print this out, we get 16 and four, representing the observation space. In other words, the number of states is 16. And the amount of actions we can take in every single state is four. Now in this case, these actions can be left down, up and right. But yes, now Nv dot reset. So essentially, we have some commands that allow us to move around the environment, which are actually down here. If we want to reset the environment in start back in the beginning state, then we do MV dot reset, you can see this actually returns to us the starting state, which obviously is going to be zero. Now, we also have the ability to take a random action, or select a random action from the action space. So what this line does right here, say of the action space, so all the commands that are there are all the actions we could take, pick a random one and return that. So if you do that, actually, let's just print action and see what this As you'll see, we get zero to write, it just gives us a random action that is valid from the action space. Alright, next, what we have is this NB dot step in action. Now what this does is take whatever action we have, which in this case is three, and perform that in the environment. So tell our agent to take this action in the environment and return to us a bunch of information. So the first thing is the observation, which essentially means what state do we move into next. So I could call this new underserve state. reward is what reward that we receive by taking that action. So there'll be some value right? In our in this case, the reward is either one or zero. But that's not that important to understand. And then we have a bool of done, which tells us did we lose the game? Or did we win the game, yes or no, so true. So if this is true, what this means is we need to reset the environment because our agent either lost or won, and is no longer in a valid state in the environment. info gives you us a little bit of information. It's not showing me anything here. We're not going to use info throughout this, but figured I'd let you know that now. Nv dot render, I'll actually render this for you and show you renders a graphical user interface that shows you the environment. Now, if you use this while you're training, so you actually watch the agent do the training, which is what you can do with this, it slows it down drastically, like probably by you know, 10 or 20 times because it actually needs to draw the stuff on the screen. But you know, you can use it if you want. So this is what our frozen lake example looks like. You can see that the highlighted square is where our agent is. And in this case, we have four different blocks. We have s, f, h, and G. So S stands for start, F stands for a frozen, is this a frozen lake. And the goal is to navigate to the goal without falling in one of the holes, which is represented by H. And this here tells us the action that we just took now I guess the starting action is up because that's zero, I believe. But yes, so if we run this a bunch of times, we'll see this updating. Unfortunately, this doesn't work very well in Google Collaboratory, the gooeys. But if you did this in your own command line, and you'd like did some different steps and rounding it all out, you would see this working properly. Okay, so now we're on to talking about the frozen lake environment, which is kind of what I just did. So now we're just going to move to the example where we actually implement cue learning to essentially solve the problem, how can we train an AI to navigate this environment and get to the start to the goal? How can we do that? Well, we're gonna use q learning. So let's start. So the first thing we need to do is import Jim, import NumPy and then create some constants here. So we'll do that we're gonna say the amount of states is equal to the line I showed you before. So Nv dot observation space dot n actions is equal to n v dot action space n. And then we're going to say Q is equal to NP dot zeroes, states and actions. So something I guess I forgot to mention is when we initialize the Q table, we just initialize all blank values or zero values, because obviously, at the beginning of our learning, our model, or our agent doesn't know anything about the environment yet, so we just leave those all blank, which means we're going to more likely be taking random actions at the beginning of our training, trying to explore the environment space more. And then as we get further on and learn more about the environment, those actions will likely be more calculated based on the cue table values. So we print this out, we can see this is the array that we get, we've had to beat, build a 16 by four, I guess not array, well, I guess this technically is an array, and we'll call it matrix 16 by four, so every single row represents a state, and every single column represents an action that could be taken in that state. Alright, so we're going to find some constants here, which we talked about before. So we have the gamma, the learning rate, the maximum of steps and the number of episodes. So the number of episodes is actually how many episodes you want to train your agent on. So how many times do you want it to run around and explore the environment? That's what episode stands for? Max steps essentially says, okay, so if we're in the environment, and we're kind of navigating and moving around, we haven't died yet. How many steps are we going to let the agent take before we cut it off, because what could happen is we could just bounce in between two different states indefinitely. So we need to make sure we have a max steps so that at some point, if the agent is just doing the same thing, we can, you know, and that or if it's like going in circles, we can end that and start again, with different you know, q values. Alright, so episode Yeah, we already talked about that learning rate, we know what that is gamma, we know what that is, mess with these values as we go through. And you'll see the difference that makes in our training, actually include a graph down below. So we'll talk about that to kind of show us the outcome of our training but learning rate, the higher This is, the faster I believe that it learns, yes. So a high learning rate means that each update will introduce larger change to the current state. So yeah, so that makes sense based on the equation as well. Just want to make sure that I wasn't going crazy there. So let's run this constant block to make sure. And now we're going to talk about picking an action. So remember how I said and I actually wrote them down here, there's essentially two things we can Do at every, what do we call it? Step right? We can randomly pick valid action. Or we can use the current cue table to find the best action. So how do we actually implement that into our open AI gym? Well, I just wanted to write a little code block here to show you the exact code that will do this for us. So we're gonna introduce this new concept, or this new, I can almost call it constant called epsilon. I think, epsilon, I think I spelt this wrong, app salon. Yeah, that should be how you spell it. So we're gonna start the epsilon value essentially tells us the percentage chance that we're going to pick a random action. So here, we're gonna use a 90% epsilon, which essentially means that every time we take an action, there's gonna be a 90% chance, it's random and 10% chance that we look at the cue table to make that action, now won't reduce this epsilon value as we train so that our model will start being able to explore, you know, as much as it possibly can in the environment by just taking random actions. And then after we have enough observations, and we've explored the environment enough, we'll start to slowly decrease the epsilon, so that it hopefully finds a more optimal route for things to do. Now, the way we do this is we save NP dot random dot uniform 01, which essentially means pick a random value between zero and one is less than epsilon and salon like, that's, I think I'm gonna have to change some other stuff. But we'll see that action equals Nv dot action spaced out samples. So take a random action, that's what this means store what that action is in here. Otherwise, we're going to take the argument max of the state row in the cue table. So what this means is find the maximum value in the cue table and tell us what row it's in. So that way we know what action to take. So if we're in row, I guess not sorry, not row, column four and column one, you know, that's maximum value, take action one, that's what this is saying. So using a cue table to pick the best action. Alright, so we don't need to run this because this is just going to be, we just I just wrote that to show you. Now how do we update the Q values? Well, this is just following the equation that I showed above. So this is the line of code that does this, I just want to write it out. So you guys could see exactly what each line is doing and kind of explore it for yourself. But essentially, you get the point, you know, you have your learning rate, reward gamma, take the max. So NP dot max does the same thing as a max function in Python, this is going to take the max value, not the argument max from the next state, right, the new state that we moved into, and then subtracting, obviously, the queue state action. Alright, so putting it all together. So now we're actually going to show how we can train and create this cue table and then use that cue table. So this is the pretty much all this code that I have, we've already actually written at least this block here. That's why I put it in its own block. So just all the constants, I've included this render constant to tell us whether we want to draw the environment or not, in this case, I'm gonna leave it false. But you can make it true. If you want episodes, I've left that 1500. For this, if you want to make your model better, typically, you train it on more episodes. But that's up to you. And now we're gonna get into the big chunk of code, which I'm going to talk about. So what this is going to do, we're going to have a rewards list, which is actually just going to store all of the rewards we see just so I can graph that later for you guys. Then we're going to say for episode in range episodes. So this is just telling us, you know, for every episode, let's do the steps I'm about to do so maximum episodes, which is our training length, essentially, we're going to reset the state, obviously, which makes sense. So state equals n v dot reset, which will give us the starting state, we're going to say for underscore in range max steps, which means Okay, we're going to do you know, we're going to explore the environment up to maximum steps, we do have a done here, which will actually break the loop if we breach the goal, which we'll talk about further. So the first thing we're gonna do is say if render, you know, render the environment, that's pretty straightforward. Otherwise, let's take an action. So for each time step, we need to take an action. So epsilon i think is spelt correctly here. Yeah, I believe that's right. I'm gonna say action equals mv dot action space, this is already the code we've looked at. And then we're gonna say is next state reward done, underscore equals EMB, dot step action, we've put an underscore here, because we don't really care about this info value. So I'm not going to store it. But we do care about what the next state will be the reward from that action and if we were done or not. So we take that action, that's what does this MV dot step, and then when we do is, say, Q, state action, and we just update the q value using the formula that we've talked about. So this is the formula, you can look at it more in depth if you want, but based on whatever the reward is, you know, that's how we're going to update those q values. And after a lot of training, we should have some decent q values in there. Alright, so then we set the current state to be the next state so that when we run this time step again, now our agent is in the next state, and can start exploring the environment again, in this current, you know, iteration almost, if that makes sense. So then we say if done, so essentially, if the agent died, or if they lost or with whatever it was, we're going to append whatever reward they got from their last step into the rewards up here, and it's worthy of noting that the way the rewards work here is you get one reward if you move to a valid block, and you get zero reward if you die. So every time we move to a valid spot, we get one, otherwise we get zero. I'm pretty sure that's the way it works at least. But that's something that's important to know. So then what we're going to do is reduce the epsilon, if we die by just a fraction of an amount, you know, 0.001, just so we slowly start decreasing the epsilon moving in the correct direction. And then we're going to break because we've reached the goals, print the cue table, and then print the average reward. Now this takes a second to train, like, you know, a few seconds really, that one's pretty fast, because I've set this at was 1500. But if you want, you can set this up, say 10,000, wait another few minutes or whatever, and then see how much better you can do. So we can see that after that I received an average reward of 0.288866667. This is actually what the cue table values look like. So all these decimal values, after all these updates, I just decided to print them out, I just want to show you the average reward so that we can compare that to what we can get from testing or this graph. So now I'm just going to graph this. And we're going to see this is what the graph, so you don't have to really understand this code if you don't want to. But this is just graphing the average reward over 100 steps from the beginning to the end. So essentially, I've been I've calculated the average of every 100 episodes, and then just graph this on here, we can see that we start off very poorly in terms of reward because the epsilon value is quite high, which means that we're taking random actions pretty much all the time. So if we're taking a bunch of random actions, obviously, chances are, we're probably going to die a lot, we're probably going to get rewards of zeros quite frequently. And then after we get to about 600 episodes, you can see that six actually represents 600, because this is in hundreds, we start to slowly increase. And then actually, we go on a crazy increase here, when we start to take values more frequently. So the epsilon is increasing, right. And then after we get here, we kind of level off. And I this does show a slight decline. But I guarantee you, if we ran this for you know, like 15,000, it would just go up and down and Bob up and down. And that's just because even though we have increased the epsilon, there is still a chance that we take a random action and you know, gets your reward. So that is pretty much it for this Q learning example. And I mean, that's pretty straightforward. To use the cue table, if you actually wanted to say, you know, watch the agent move around the thing, I'm going to leave that to you guys, because if you can follow what I've just done in here, and understand this, it's actually quite easy to use the cue table. And I think as like a final almost like, you know, trust in you guys, you can figure out how to do that. The hint is essentially do exactly what I've done in here, except don't update the cue table values, just use the cue table values already. And that's, you know, pretty much all there is to Q learning. So this has been the reinforcement learning module for this TensorFlow course, which actually is the last module in this series. I hope you guys have enjoyed up until this point, just emphasis again, this was really just an introduction to reinforcement learning, this technique in this problem itself is not very interesting and not, you know, the best way to do things is not the most powerful, it's just to get you thinking about how reinforcement learning works. And potentially, if you'd like to look into that more, there's a ton of different resources and you know, things you can look at in terms of reinforcement learning. So that being said, that has been this module. And now we're going to move into the conclusion we'll we'll talk about some next steps and some more things that you guys can look at to improve your machine learning skills. So finally, after about seven hours, of course content, we have reached the conclusion of this course. Now, what I'm going to do in this last brief short section is just explained to you where you can go for some next steps and some further learning with TensorFlow and machine learning artificial intelligence in general. Now, what I'm going to be recommending to you guys is that we look at the TensorFlow website, because they have some amazing guides and resources on here. And in fact, a lot of the examples that we used in our notebooks were based off of or exactly the same as the original TensorFlow guide. And that's because the code that they have is just very good, they're very good and easy to understand examples. And in terms of learning, I find that these guides are great for people that want to get in quickly see the examples and then go and do some research on their own time and understand why they work. So if you're looking for some further steps, at this point in time, you have gained a very general and broad knowledge of machine learning and AI, you have some basic skills in a lot of the different areas. And hopefully, this has introduced you to a bunch of different concepts and the possibilities of what you are able to do using modules like TensorFlow. Now, what I'm going to suggest to all of you is that if you find a specific area of machine learning or AI that you are very interested in that you would dial in on that area and focus most of your time into learning that. That is because when you get to a point in machine learning in AI, where you really get specific and pick one kind of strain or one kind of area, it gets very interesting very quickly and you can devote most of your time to getting as deep as possible in that space. cific topic. And that's something that's really cool. And most people that are experts in the AI or machine learning field typically have one area of specialization. Now, if you're someone who doesn't care to specialize an area, or you just want to play around and see some different things, the TensorFlow website is great to really get kind of a general introduction to a lot of different areas and be able to kind of use this code, tweak it a little bit on your own, and implement it into your own projects. And in fact, the next kind of steps and resources I'm going to be showing you here involves simply going to the TensorFlow website, going to the tutorial page, this is very easy to find, I don't even need to link it, you can just search TensorFlow. And you'll find this online. And looking at some more advanced topics that we haven't covered. So we've covered a few of the topics and tutorials that are here, I've just kind of modified their version, and thrown out in the notebook and explained it in words and video content. But if you'd like to move on to say a next step, or something very cool, something I would recommend is doing the deep dream in the generic, generic neural network section on the TensorFlow website, being able to make something like this, I think is very cool. And this is an example where you can tweak this a ton by yourself and get some really cool results. So some things like this are definitely next steps. There's tons and tons of guides and tutorials on this website, they make it very easy for anyone to get started. And with these guides, what I will say is typically what will end up happening is they just give you the code and brief explanations of why things work, you should really be researching and looking up some more, you know, deep level explanations of why some of these things work as you go through if you want to have a firm and great understanding of why the model performs the way that it does. So with that being said, I believe I'm going to wrap up the course now, I know you guys can imagine how much work I put into this. So please do leave a like, subscribe to the channel, leave a content show your support. This I believe is the largest open source machine learning course in the world that deals completely with TensorFlow and Python. And I hope that this gave you a lot of knowledge. So please do give me your feedback down below in the comments. With that being said again, I hope you enjoyed and I hopefully I will see you again in another tutorial guide or series.