Transcript for:

hello welcome to the video it's quite a big one but if you've come here to learn machine learning and deep learning and pytorch code well you're in the right place now this video and tutorial is focused for beginners who've got about three to six months of python coding experience so we're going to cover a whole bunch of important machine learning concepts by writing pytorch code now if you get stuck you can leave a comment below or post on the course github discussions page and on github is where you'll be able to find all the materials that we cover as well as on learnpytorch.io there's an online readable book version of this course there but if you finish this video and you find that hey i would still like to learn more pie torch i mean you can't really cover all the pie touch in a day that video titles just to play on words of the length of video that's an aside there is five more chapters available at learn pie touchdown covering everything from transfer learning to model deployment to experiment tracking and all the videos to go with those are available at zero to mastery.io but that's enough for me happy machine learning and i'll see you inside hello my name is daniel and welcome to the deep learning with pie torch course now that was too good not to watch twice welcome to the deep learning with oh how cool is that fire pie torch course so this is very exciting you're going to see that animation quite a bit because i mean it's fun and pie torch's symbol is aflame because of torch but let's get into it so naturally if you've come to this course you might have already researched what is deep learning but we're going to cover it quite briefly and just in the sense of how much you need to know for this course because we're going to be more focused on rather than just definitions we're going to be focused on getting practical and seeing things happen so if we define what machine learning is because as we'll see in a second deep learning is a subset of machine learning machine learning is turning things data which can be almost anything images text tables of numbers video audio files almost anything can be classified as data into numbers so computers love numbers and then finding patterns in those numbers now how do we find those patterns well the computer does this part specifically a machine learning algorithm or a deep learning algorithm the things that we're going to be building in this course how code and math now this course is code focused i want to stress that before you get into it we're focused on writing code now behind the scenes that code is going to trigger some math to find patterns in those numbers if you would like to deep dive into the math behind the code i'm going to be linking extra resources for that however we're going to be getting hands-on and writing lots of code to do lots of this and so if we keep going to break things down a little bit more machine learning versus deep learning if we have this giant bubble here of artificial intelligence you might have seen something similar like this on the internet i've just copied that and put it into pretty colors for this course so you've got this overarching big bubble of the topic of artificial intelligence which you could define as again almost anything you want then typically there's a subset within artificial intelligence which is known as machine learning which is quite a broad topic and then within machine learning you have another topic called deep learning and so that's what we're going to be focused on working with pytorch writing deep learning code but again you could use pytorch for a lot of different machine learning things and truth be told i kind of use these two terms interchangeably yes ml is the broader topic and deep learning is a bit more nuanced but again if you want to form your own definitions of these i'd highly encourage you to do so this course is more focused on rather than defining what things are is seeing how they work so this is what we're focused on doing just to break things down if you're familiar with the fundamentals of machine learning you probably understand this paradigm but we're going to just rehash on it anyway so if we consider traditional programming let's say you'd like to write a computer program that's enabled to or has the ability to reproduce your grandmother's favorite or famous roast chicken dish and so we might have some inputs here which are some beautiful vegetables a chicken that you've raised on the farm you might write down some rules this could be your program cut the vegetables season the chicken preheat the oven cook the chicken for 30 minutes and add vegetables now it might not be this simple or it might actually be because your sicilian grandmother is a great cook so she's put things into an art now and can just do it step by step and then those inputs combined with those rules makes this beautiful roast chicken dish so that's traditional programming now a machine learning algorithm typically takes some inputs and some desired outputs and then figures out the rules so the patterns between the inputs and the outputs so where in traditional program we had to hand write all of these rules the ideal machine learning algorithm will figure out this bridge between our inputs and our idealized output now in the machine learning sense this is typically described as supervised learning because you will have some kind of input with some kind of output also known as features and also known as labels and the machine learning algorithm's job is to figure out the relationships between the inputs or the features and the outputs or the label so if we wanted to write a machine learning algorithm to figure out our sicilian grandmother's famous roast chicken dish we would probably gather a bunch of inputs of ingredients such as these delicious vegetables and chicken and then have a whole bunch of outputs of the finished product and see if our algorithm can figure out what we should do to go from these inputs to output so that's almost enough to cover of the difference between traditional programming and machine learning as far as definitions go we're going to get hands on encoding these sort of algorithms throughout the course for now let's go to the next video and ask a question why use machine learning or deep learning and actually before we get there i'd like you to think about that so going back to what we just saw the paradigm between traditional programming and machine learning why would you want to use machine learning algorithms rather than traditional programming so if you had to write all these rules could that get cumbersome so have a think about it and we'll cover it in the next video welcome back so in the last video we covered briefly the difference between traditional programming and machine learning and again i don't want to spend too much time on definitions i'd rather you see this in practice and i left you with the question why would you want to use machine learning or deep learning well let's think of a good reason why not i mean if we had to write all those handwritten rules to reproduce alice's singing grandmother's roast chicken dish all the time that would be quite cumbersome right well let's draw a line on that why not what's a better reason and kind of what we just said right for a complex problem can you think of all the rules so let's imagine we're trying to build a self-driving car now if you've learned to drive you've probably done so in maybe 20 hours 100 hours but now i'll give you a task of writing down every single rule about driving how do you back out of your driveway how do you turn left and go down the street how do you park a reverse park how do you stop at an intersection how do you know how fast to go somewhere so we just listed half a dozen rules but you could probably go a fair few more you might get into the thousands and so for a complex problem such as driving can you think of all the rules well probably not so that's where machine learning and deep learning come in to help and so this is a beautiful comment i like to share with you on one of my youtube videos is my 2020 machine learning roadmap and this is from yeshui i'm probably going to mispronounce this if i even try to but yashween says i think you can use ml so ml is machine learning i'm going to use that a lot throughout the course by the way ml is machine learning just so you know for literally anything as long as you can convert it into numbers ah that's what we said before machine learning is turning something into computer readable numbers and then programming it to find patterns except with a machine learning algorithm typically we write the algorithm and it finds the patterns not us and so literally it could be anything any input or output from the universe that's pretty darn cool about machine learning right but should you always use it just because it could be used for anything well i'd like to also introduce you to google's number one rule of machine learning now if you can build a simple rule-based system such as the step of five rules that we had to map the ingredients to our sicilian grandmother's roast chicken dish if you can write just five steps to do that that's going to work every time well you should probably do that so if you can build a simple rule based system that doesn't require machine learning do that and of course maybe it's not so very simple but maybe you can just write some rules to solve the problem that you're working on now this is from a wise software engineer which is i kind of hinted at it before rule one of google's machine learning handbook now i'm going to highly recommend you read through that but we're not going to go through that in this video so check that out you can google that otherwise the links will be where you get links so just keep that in mind although machine learning is very powerful and very fun and very excited it doesn't mean that you should always use it and i know this is quite the thing to be saying at the start of a deep learning machine learning course but i just want you to keep in mind simple rule-based systems are still good machine learning isn't a solve all for everything now let's have a look at what deep learning is good for but i'm going to leave you on a cliffhanger because we're going to check this out in the next video see you soon in the last video we familiarized ourselves with google's number one rule of machine learning which is basically if you don't need it don't use it and with that in mind what should we actually be looking to use machine learning or deep learning for well problems with long lists of rules so when the traditional approach fails so remember the traditional approach is you have some sort of data input you write a list of rules for that data to be manipulated in some way shape or form and then you have the outputs that you know but if you have a long long list of rules like the rules of driving a car which could be hundreds could be thousands could be millions who knows that's where machine learning and deep learning may help and it kind of is at the moment in the world of self-driving cars machine learning and deep learning are the state-of-the-art approach continually changing environments so one of the the benefits of deep learning is that it can keep learning if it needs to and so it can adapt and learn to new scenarios so if you update the data that your model was trend on it can adjust to new different kinds of data in the future so similarly to if you are driving a car you might know your own neighborhood very well but then when you go to somewhere you haven't been before sure you can draw on the foundations of what you know but you're going to have to adapt how fast should you go where should you stop where should you park these kinds of things so with problems with long lists of rules or continually changing environments or if you had a large large data set and so this is where deep learning is flourishing in the world of technology so let's give an example one of my favorites is the food 101 dataset which you can search for online which is images of 101 different kinds of foods now we briefly looked at what a rule list might look like for cooking your grandmother's famous sicilian roast chicken dish but can you imagine if you wanted to build an app that could take photos of different food how long your list of rules would be to differentiate 101 different foods it'd be so long you need rule sets for every single one let's just take one food for example how do you write a program to tell what a banana looks like i mean you'd have to code what a banana looks like but not only a banana what everything that isn't a banana looks like so keep this in mind what deep learning is good for problems with long lists of rules continually changing environments or discovering insights within large collections of data now what deep blending is not good for and i'm going to write typically here because again this is problem specific deep learning is quite powerful these days and things might change in the future so keep an open mind if there's anything about this course it's not for me to tell you exactly what's what it's for me to spark a curiosity into you to figure out what's what or even better yet what's not what so when you need explainability as we'll see the patterns learned by a deep learning model which is lots of numbers called weights and biases we'll have a look at that later on are typically uninterpretable by a human so some of the times deep learning models can have a million 10 million 100 million a billion some models are getting into the trillions of parameters when i say parameters i mean numbers or patterns in data remember machine learning is turning things into numbers and then writing a machine learning model to find patterns in those numbers so sometimes those patterns themselves can be lists of numbers that are in the millions and so can you imagine looking at a list of numbers that has a million different things going on it's going to be quite hard i find it hard to understand three or four numbers let alone a million and when the traditional approach is a better option again this is google's rule number one of machine learning if you can do what you need to do with a simple rule based system well maybe you don't need to use machine learning or deep learning again i'm going to use the deep learning machine learning terms interchangeably i'm not too concerned with definitions you can form your own definitions but just so you know from my perspective ml and deep learning are quite similar when errors are unacceptable so since the outputs of a deep learning model aren't always predictable we'll see that deep learning models are probabilistic that means they're when they predict something they're making a probabilistic bet on it whereas in a rule-based system you kind of know what the outputs are going to be every single time so if you can't have errors based on probabilistic errors well then you probably shouldn't use deep learning and you'd like to go back to a simple rule-based system and then finally when you don't have much data so deep learning models usually require a fairly large amount of data to produce great results however there's a caveat here you know at the start i said typically we're going to see some techniques of how to get great results without huge amounts of data and again i wrote typically here because there are techniques you can just research deep learning explainability you're going to find a whole bunch of stuff you can look up examples of when machine learning versus deep learning and then when errors are unacceptable again there are ways to make your model reproducible so it predicts you know what's going to come out so we do a lot of testing to verify this as well so what's next ah we've got machine learning versus deep learning and we're going to have a look at some different problem spaces in a second and mainly breaking down in terms of what kind of data you have not going to do this now prevent this video from getting too long we'll cover all these colorful beautiful pictures in the next video welcome back so in the last video we covered a few things of what deep learning is good for and what deep learning is typically not good for so let's dive in to a little more of a comparison of machine learning versus deep learning again i'm going to be using these terms quite interchangeably but there are some specific things that typically you want traditional style of machine learning techniques versus deep learning however this is constantly changing so again i'm not talking in absolutes here i'm more just talking in general and i'll leave it to you to use your own curiosity to research the specific differences between these two but typically for machine learning like the traditional style of algorithms although they are still machine learning algorithms which is kind of a little bit confusing where deep learning and machine learning differ is you want to use traditional machine learning algorithms on structured data so if you have tables of numbers this is what i mean by structured rows and columns structured data and possibly one of the best algorithms for this type of data is a gradient boosted machine such as xgboost this is an algorithm that you'll see in a lot of data science competitions and also used in production settings when i say production settings i mean applications that you may interact with on the internet or use on a day-to-day so that's production xg boost is typically the favorite algorithm for these kind of situations so again if you have structured data you might look into xgboost rather than building a deep learning algorithm but again the rules aren't set in stone that's where deep learning and machine learning is kind of an art kind of a science is that sometimes xgboost is the best for structured data but there might be exceptions to the rule but for deep learning it is typically better for unstructured data and what i mean by that is data that's kind of all over the place it's not in your nice standardized rows and columns so say you had natural language such as this tweet by this person whose name is quite similar to mine and has the same twitter account as me oh maybe i wrote that how do i learn machine learning what you need to hear learn python learn math stats probability software engineering build what you need to do google it go down the rabbit hole resurfacing six to nine months and when you assess i like that or if you had a whole bunch of text such as the definition for deep learning on wikipedia again this is the reason why i'm not covering as many definitions in this course is because look how simple you can look these things up wikipedia is going to be able to define deep learning far better than what i can i'm more focused on just getting involved in working hands-on with this stuff then defining what it is and then we have images if we wanted to build a burger take a photo app thing you would work with image data which doesn't really have much of a structure although we'll see that there are ways for deep learning that we can turn this kind of data to have some sort of structure through the beauty of a tensor and then we might have audio files such as if you were talking to your voice assistant i'm not going to say one because a whole bunch of my devices might go crazy if i say the name of my voice assistant which rhymes with i'm not even going to say that out loud and so typically for unstructured data you'll want to use a neural network of some kind so structured data gradient boosted machine or a random forest or a tree based algorithm such as xg boost and unstructured data neural networks so let's keep going let's have a look at some of the common algorithms that you might use for structured data machine learning versus unstructured data deep learning so random forest is one of my favorites gradient boosted models naive bayes nearest neighbor support vector machine svm and then many more but since the advent of deep learning these are often referred to as shallow algorithms so deep learning why is it called deep learning well as we'll see is that it can have many different layers of algorithm you might have an input layer 100 layers in the middle and then an output layer but we'll get hands-on with this later on and so common algorithms for deep learning and neural networks fully connected neural network convolutional neural network recurrent neural network transformers have taken over over the past couple of years and of course many more and the beautiful thing about deep learning and neural networks is is almost as many problems that it can be applied to is there's many different ways that you can construct them so this is why i'm putting all these dot points on the page and i can understand if you haven't had much experience with machine learning or deep learning this can be a whole bunch of information overload but good news is what we're going to be focused on building with pytorch is neural networks fully connected neural networks and convolutional neural networks the foundation of deep learning but the excellent thing is the exciting thing is is that if we learn these foundational building blocks we can get into these other styles of things here and again part art part science of machine learning and deep learning is depending on how you represent your problem depending on what your problem is many of the algorithms here and here can be used for both so i know i've just kind of bedazzled you and saying that oh well you kind of use these ones for deep learning you kind of use these ones for machine learning but depending on what your problem is you can also use both so that's a little bit of confusion to machine learning but that's the fun part about it too is use your curiosity to figure out what's best for whatever you're working on and with all this talk about neural networks how about in the next video we cover what are neural networks now i'd like you to google this before we watch the next video because there's going to be hundreds of definitions of what they are and i'd like you to start forming your own definition of what a neural network is i'll see you in the next video welcome back in the last video i left you with the cliffhanger of a question what are neural networks and i gave you the challenge of googling that but you might have already done that by the time you've got here let's just do that together if i type in what are neural networks i've already done this what are neural networks explain neural networks neural network definition there are hundreds of definitions of things like this online neural network in five minutes a three blue one brown i'd highly recommend that channel series on neural networks that's going to be in the extracurricular stat quest is also amazing so there's hundreds of different definitions on here and you can read 10 of them five of them three of them make your own definition but for the sake of this course here's how i'm going to find neural networks so we have some data of whatever it is we might have images of food we might have tweets or natural language and we might have speech so these are some examples of inputs for unstructured data because they're not rows and columns so these are the input data that we have and then how do we use them with a neural network well before data can be used in a neural network it needs to be turned into numbers because humans we like looking at images of ramen and spaghetti we know that that's ramen we know that that's spaghetti after we've seen it one or two times and we like reading good tweets and we like listening to amazing music or hearing our friend talk on the phone in audio file however before a computer understands what's going on in these inputs it needs to turn them into numbers so this is what i call a numerical encoding or a representation and this numerical encoding these square brackets indicate that it's part of a matrix or a tensor which we're going to get very hands-on with throughout this course so we have our inputs we've turned it into numbers and then we pass it through a neural network and now this is a graphic for a neural network however the graphics for neural networks as we'll see can get quite involved but they all represent the same fundamentals so if we go to this one for example we have an input layer then we have multiple hidden layers however you define this you can design these in how you want and then we have an output layer so our inputs will go in some kind of data the hidden layers will perform mathematical operations on the input so the numbers and then we'll have an output oh there's three blue on brown neural networks from the ground up great video highly recommend you check that out but then if we come back to this so we've got our inputs we've turned it into numbers and we've got our neural networks that we put the input in this is typically the input layer hidden layer this can be as many different layers as you want as many different each of these little dots is called a node there's a lot of information here but we're going to get hands-on with seeing what this looks like and then we have some kind of output now which neural network should you use well you can choose the appropriate neural network for your problem which could involve you hand coding each one of these steps or you could find one that has worked on problems similar to your own such as for images you might use a cnn which is a convolutional neural network for natural language you might use a transformer for speech you might also use a transformer but fundamentally they all follow the same principle of inputs manipulation outputs and so the neural network will learn a representation on its own we won't define what it learns so it's going to manipulate these patterns in some way shape or form and when i say learn's representation i'm going to also refer to it as learn's patterns in the data a lot of people refer to it as features a feature may be the fact that the word do comes after howe usually in across a whole bunch of different languages a feature can be almost anything you want and again we don't define this the neural network learns these representations patterns features also called weights on its own and then where do we go from there well we've got some sort of numbers numerical encoding turned our data into numbers our neural network has learned a representation that it thinks best represents the patterns in our data and then it outputs those representation outputs which we can use and often you'll hear this referred to as features or weight matrix or weight tensor learned representation is also another common one there's a lot of different terms for these things and then it will output we can convert these outputs into human understandable outputs so if we were to look at these these could be again i said representations or patterns that a neural network loans can be millions of numbers this is only nine so imagine if these were millions of different numbers i can barely understand the nine numbers that is going on here so we need a way to convert these into human understandable terms so for this example we might have some input data which are images of food and then we want our neural network to learn the representations between an image of ramen and an image of spaghetti and then eventually we'll take those patterns that it's learned and we'll convert them into whether it thinks that this is an image of ramen or spaghetti or in the case of this tweet is this a tweet for a natural disaster or not a natural disaster so our neural network has well we've written code to turn this into numbers pass it through our neural network our neural network has learned some kind of patterns and then we ideally want it to represent this tweet as not a disaster and then we can write code to do each of these steps here and the same thing for these inputs going as speech turning into something that you might say to your smart speaker which i'm not going to say because a whole bunch of my devices might go off and so let's cover the anatomy of neural networks we've hinted at this a little bit already but this is like neural network anatomy 101. again this is highly customizable what this thing actually is we're going to see it in pi torch code later on but the data goes into the input layer and in this case the number of units slash neuron slash nodes is two hidden layers you can have i put a s here because you can have one hidden layer or the deep in deep learning comes from having lots of layers so this is only showing four layers you might have well this is three layers as well might be very deep neural networks such as resnet 150 2. this is 152 different layers so again you can oh this is 34 because this is only resnet 34 but resnet 152 has 152 different layers so that's a common computer vision or a popular computer vision algorithm by the way lots of terms we're throwing out here but with time you'll start to become familiar with them so hidden layers can be almost as many as you want we've only got pictured one here and in this case there's three hidden units slash neurons and then we have an output layer so the outputs learned representation or prediction probabilities from here depending on how we set it up which again we will see what these are later on and in this case it has one hidden unit so two input three one output you can customize the number of these you can customize how many layers there are you can customize what goes into here you can customize what goes out of there so now if we talk about the overall architecture which is describing all of the layers combined so that's when you hear neural network architecture it talks about the input the hidden layers which may be more than one and the output layer so that's a terminology for overall architecture now i say patterns is an arbitrary term you can hear embedding embedding might come from hidden layers weights feature representation feature vectors all referring to similar things so again how do we turn our data into some numerical form build a neural network to figure out patterns to output some desired output that we want and now to get more technical each layer is usually a combination of linear so straight lines and non-linear non-straight functions so what i mean by that is a linear function is a straight line a non-linear function is a non-straight line if i ask you to draw whatever you want with unlimited straight lines and not straight lines so you can use straight lines or curved lines what kind of patterns could you draw at a fundamental level that is basically what a neural network is doing it's using a combination of linear straight lines and not straight lines to draw patterns in our data we'll see what this looks like later on now for the next video let's dive in briefly to different kinds of learning so we've looked at what a neural network is the overall algorithm but there are also different paradigms of how a neural network learns i'll see in the next video welcome back we've discussed a brief overview of an anatomy of what a neural network is but let's now discuss some learning paradigms so the first one is supervised learning and then we have unsupervised and self-supervised learning and transfer learning now supervised learning is when you have data and labels such as in the example we gave at the start which was how you would build a neural network or a machine learning algorithm to figure out the rules to cook your sicilian grandmother's famous roast chicken dish so in the case of supervised learning you'd have a lot of data so inputs such as raw ingredients as vegetables and chicken and a lot of examples of what that inputs should ideally look like or in the case of discerning photos between a cat and a dog you might have a thousand photos of a cat and a thousand photos of a dog that you know which photos are cat and which photos are dog and you pass those photos to a machine learning algorithm to discern so in that case you have data the photos and the labels aka cat and dog for each of those photos so that's supervised learning data and labels unsupervised and self-supervised learning is you just have the data itself you don't have any labels so in the case of cat and dog photos you only have the photos you don't have the labels of cat and dog so in the case of self-supervised learning you could get a machine learning algorithm to learn an inherent representation of what and when i say representation i mean patterns and numbers i mean weights i mean features a whole bunch of different names describing the same thing you could get a self-supervised learning algorithm to figure out the fundamental patterns between a dog and a cat image but it wouldn't necessarily know the difference between the two that's where you could come in later and go show me the patterns you've learned and it might show you the patterns and you can go okay the patterns that look like this are dog and the patterns that look like that are cat so self-supervised and unsupervised learning learn solely on the data itself and then finally transfer learning is a very very important paradigm in deep learning it's taking the patterns that one model has learned of a data set and transferring it to another model such in the case of if we were trying to build a supervised learning algorithm for discerning between cat and dog photos we might start with a model that has already learned patterns and images and transfer those foundational patterns to our own model so that our model gets a head start this is transfer learning is a very very powerful technique but as for this course we're going to be writing code to focus on these two supervised learning and transfer learning which are two of the most common paradigms or common types of learning in machine learning and deep learning however this style of code though can be adapted across different learning paradigms now i just want to let you know there is one that i haven't mentioned here which is kind of in its own bucket and that is reinforcement learning so i'll leave this as an extension if you wanted to look it up but essentially this is a good one that's a good photo actually so shout out to kd nuggets the whole idea of reinforcement learning is that you have some kind of environment and an agent that does actions in that environment and you give rewards and observations back to that agent so say for example you wanted to teach your dog to urinate outside well you would reward its actions of urinating outside and possibly not reward its actions of urinating all over your couch so reinforcement learning is again it's kind of in its own paradigm this picture has a good explanation between unsupervised learning supervised learning to separate two different things and then reinforcement learning is kind of like that but again i will let you research the different learning paradigms a little bit more in your own time as i said we're going to be focused on writing code to do supervised learning and transfer learning specifically pie torch code now with that covered let's get a few examples of what is deep learning actually used for and before we get into the next video i'm going to issue you a challenge to search this question yourself and come up with some of your own ideas for what deep learning is currently used for so give that a shot and i'll see you in the next video how'd you go did you do some research did you find out what deep learning is actually used for i bet you found a treasure trail of things and hey i mean if you're in this course chances are that you probably already know some use cases for deep learning you're like daniel hurry up and get to the code well we're going to get there don't you worry but let's have a look at some things that deep learning can be useful but before i just want to remind you of this comment this is from yasha swee on the 2020 machine learning roadmap video i think you can use ml and remember ml is machine learning and remember deep learning is a part of ml for literally anything as long as you can convert it into numbers and program it to find patterns literally it could be anything any input or output from the universe so that's a beautiful thing about machine learning is that if you can encode it something into numbers chances are you can build a machine learning algorithm to find patterns in those numbers will it work well again that's the reason machine learning and deep learning is part art part science a scientist would love to know that their experiments would work but an artist is kind of excited about the fact that i don't know this might work it might not and so that's something to keep in mind along with the rule number one of machine learning is if you don't need it you don't use it but if you do use it it can be used for almost anything and let's get a little bit specific and find out some deep learning use cases and i've put some up there for a reason because there are lots these are just some that i interact with in my day-to-day life such as recommendation we've got a programming video we've got a programming podcast we've got some jiu-jitsu videos we've got some runescape videos a soundtrack from my favorite movie have you noticed whenever you go to youtube you don't really search for things anymore well sometimes you might but the recommendation page is pretty damn good that's all powered by deep learning and in the last 10 years have you noticed that translation has got pretty good too well that's powered by deep learning as well now i don't have much hands-on experience with this i did use it when i was in japan i speak a very little amount of japanese and even smaller amount of mandarin but if i wanted to translate deep learning as epic to spanish it might come out as el aprendizais this now all of the native spanish speakers watching this video can laugh at me because that was a very australian version of saying deep learning is epic in spanish but that's so cool all the google translate is now powered by deep learning and the beautiful thing if i if i couldn't say it myself i could click this speaker and it would say it for me so that's speech recognition that's powered by deep learning so if you were to ask your voice assistant who's the biggest big dog of them all of course they're going to say you which is what i've set up my voice assistant to say that's part of speech recognition and in computer vision oh look at this you see this where is this photo from this photo is from this person driving this car did a hit and run on my car out the front of my house my apartment building my car was parked on the street this car the trailer came off ran into the back of my car basically destroyed it and then they drove off however my door neighbor's security camera picked up on this car now i became a detective for a week and i thought if there was a computer vision algorithm built into that camera it could have detected when the car hit i mean it took a lot of searching to find it turns out the car hit about 3 30 a.m in the morning so it's pitch black and of course we didn't get the license plate so this person is out there somewhere in the world after doing a hit and run so if you're watching this video just remember computer vision might catch you one day so this is called object detection where you would place a box around the area where the pixels most represent the object that you're looking for so for computer vision we could train an object detector to capture cars that drive past a certain camera and then if someone does a hit and run on you you could capture it and then fingers crossed it's not too dark that you can read the license plate and go hey excuse me please this person has hit my car and wrecked it so that's a very close to home story of where computer vision could be used and then finally natural language processing have you noticed as well your spam detector on your email inbox is pretty done good well some are powered by deep learning some not it's hard to tell these days what is powered by deep learning what isn't but natural language processing is the process of looking at natural language text so unstructured text so whatever you'd write an email in a story in a wikipedia document and deciding or getting your algorithm to find patterns in that so for this example you would find that this email is not spam this deep learning course is incredible i can't wait to use what i've learned thank you so much and by the way that is my real email so if you want to email me you can and then this is spam hey daniel congratulations you win a lot of money wow i really like that a lot of money but some reason i don't think that this is real so that would probably go to my spam inbox now with that being said if we wanted to put these problems in a little bit more of a classification this is known as sequence to sequence because you put one sequence in and get one sequence out same as this you have a sequence of audio waves and you get some text out so sequence to sequence sector sec this is classification slash regression in this case the regression is predicting a number that's what a regression problem is you would predict the coordinates of where these box corners should be so say this should be at however many pixels in from the x angle and however many pixels down from the y angle that's that corner and then you would draw in between the corners and then the classification part would go hey this is that car that did a hit and run on us and in this case this is classification classification is predicting whether something is one thing or another or perhaps more than one thing or another in the class of multi-class classification so this email is not spam that's a class and this email is spam so that's also a class so i think we've only got one direction to go now that we've sort of laid the foundation for the course and that is let's start talking about pie torch i'll see you in the next video let's now cover some of the foundations of by torch but first you might be asking what is pie torch well of course we could just go to our friend the internet and look up hightorch.org this is the home page for pytorch this course is not a replacement for everything on this homepage this should be your ground truth for everything pytorch so you can get started you've got a big ecosystem you've got a way to set up on your local computer you've got resources you've got docs pytorch you've got the github you've got search you've got blog everything here this website should be the place you're visiting most throughout this course as we're writing pytorch code you're coming here you're reading about it you're checking things out you're looking at examples but for the sake of this course let's break pie torch down oh there's a little flame animation i just forgot about what is pie torch i didn't sync up the animations that's right so so pytorch is the most popular research deep learning framework and ash drinks to that i'll get to that in a second it allows you to write fast deep learning code in python so if you know python it's a very user-friendly programming language and pytorch allows us to write state-of-the-art deep learning code accelerated by gpus with python it enables you access to many pre-built deep learning models from torch hub which is a website that has a lots of if you remember i said transfer learning is a way that we can use other deep learning models to power our own torch hub is a resource for that same as torch vision.models we'll be looking at this throughout the course and it provides an ecosystem for the whole stack of machine learning from pre-processing data getting your data into tensors what if you started with some images how do you represent them as numbers then you can build models such as neural networks to model that data then you can even deploy your model in your application cloud well deploy your pytorch model application cloud will be depending on what sort of application slash cloud that you're using but generally it will run some kind of pi torch model and it was originally designed and used in-house by facebook slash meta i'm pretty sure facebook have renamed themselves meta now but it is now open source and used by companies such as tesla microsoft and openai and when i say it is the most popular deep learning research framework don't take my word for it let's have a look at papers with code.com trends if you're not sure of what papers with code is it is a website that tracks the latest and greatest machine learning papers and whether or not they have code so we have some other languages here other deep learning frameworks pi torch tensorflow jacks is another one mxnet paddle paddle the original torch so pytorch is an evolution of torch written in python caf2 mind spore but if we look at this when is this last date is december 2021 we have oh this is going to move every time i move it no so i'll highlight pie torch at 58 percent there so by far and large the most popular research machine learning framework used to write the code for state state-of-the-art machine learning algorithms so this is brow state of the art papers with code.com amazing website we have semantic segmentation image classification object detection image generation computer vision natural language processing medical i'll let you explore this it's one of my favorite resources for staying up to date on the field but as you see out of the 65 000 papers with code that this website is tracked 58 of them are implemented with pytorch how cool is that and this is what we're learning so let's jump into there why pie torch well other than the reasons that we just spoke about it's a research favorite this is highlighting oh there we go so there we go i've highlighted it here by torch 58 nearly two and a half thousand repos if you're not sure what a repo is a repo is a place where you store all of your code online and generally if a paper gets published in machine learning if it's fantastic research it will come with code code that you can access and use for your own applications or your own research again why pie torch well this is a tweet from francois chalet who's the author of carers which is another popular deep learning framework but with tools like colab we're going to see what collab is in a second cameras and tensorflow i've added in here and pi torch virtually anyone can solve in a day with no initial investment problems that would have required an engineering team working for a quarter and twenty thousand dollars in hardware in 2014 so this is just to highlight how good the space of deep learning and machine learning tooling has become collab carers and tensorflow are all fantastic and now pie torch is added to this list if you want to check that out there's francoise chalet on twitter very very prominent voice in the machine learning field why pytorch if you want some more reasons well have a look at this look at all the places that are using python it's just coming up everywhere we've got andre capati here who's the director of ai at tesla so if we go we could search this pie torch at tesla we've got a youtube talk there andre capathy director of ai at tesla and so tesla are using pi torch for the computer vision models of autopilot so if we go videos or maybe images does it come up there things like this a car detecting what's going on in the scene of course there'll be some other code for planning but i'll let you research that we come back here openai which is one of the biggest open artificial intelligence research firms open in the sense that they publish a lot of their research methodologies however recently there's been some debate about that but if you go to openai.com let's just say that they're one of the biggest ai research entities in the world and they've standardized on pytorch so they've got a great blog they've got great research and now they've got open ai api which is you can use their api to access some of the models that they've trained presumably with pytorch because this blog post from january 2020 says that openai is now standardized across pytorch there's a repo called the incredible pie torch which collects a whole bunch of different projects that are built on top of pine torch that's the beauty of pie torch is that you can build on top of it you can build with it ai for ag for agriculture pie torch has been used let's have a look pie torch in agriculture there we go agricultural robots use pie torch this is a medium article it's everywhere so if we go down here this is using object detection beautiful object detection to detect what kind of weeds should be sprayed with fertilizer this is just one of many different things supply torch on a big tractor like this it can be used almost anywhere and if we come back pytorch builds the future of ai and machine learning at facebook so facebook which is also meta ai a little bit confusing even though it says meta ai it's on ai.facebook.com that may change by the time you watch this they use pytorch in-house for all of their machine learning applications microsoft is huge in the pytorch game it's absolutely everywhere so if that's not enough reason to use pytorch well then maybe you're in the wrong course so you've seen enough reasons of why to use pi torch i'm going to give you one more that is that it helps you run your code your machine learning code accelerated on a gpu we've covered this briefly but what is a gpu slasher tpu because this is more of a newer chip these days a gpu is a graphics processing unit which is essentially very fast at crunching numbers originally designed for video games if you've ever designed or played a video game you know that the graphics are quite intense especially these days and so to render those graphics you need to do a lot of numerical calculations and so the beautiful thing about pytorch is that it enables you to leverage a gpu through an interface called cuda which is there's a lot of words i'm going to throw at you here a lot of acronyms in the deep learning space cuda let's just search cuda cuda toolkit so cuda is a parallel computing platform and application programming interface which is an api that allows software to use certain types of graphics processing units for general purpose computing that's what we want so pytorch leverages cuda to enable you to run your machine learning code on nvidia gpus now there is also an ability to run your pytorch code on tpus which is a tensor processing unit however gpus are far more popular when running various types of pi torch code so we're going to focus on running our pi torch code on the gpu and to just give you a quick example pytorch on tpu let's see that getting started with pytorch on cloud tpus there's plenty of guides for that but as i said gpus are going to be far more common in practice so that's what we're going to focus on and with that said we've said tensor processing unit now the reason why these are called tensor processing units is because machine learning and deep learning deals a lot with tensors and so in the next video let's answer the question what is a tensor but before i go through and answer that from my perspective i'd like you to research this question so open up google or your favorite search engine and type in what is a tensor and see what you find i'll see you in the next video welcome back in the last video i left you on the cliffhanger question of what is a tensor and i also issued you the challenge to research what is a tensor because as i said this course isn't all about telling you exactly what things are it's more so sparking a curiosity in you so that you can stumble upon the answers to these things yourself but let's have a look what is a tensor now if you remember this graphic there's a lot going on here but this is our neural network we have some kind of input some kind of numerical encoding now we start with this data in our case it's unstructured data because we have some images here some text here and some audio file here now these necessarily don't go in all at the same time this image could just focus on a neural network specifically for images this text could focus on a neural network specifically for text and this sound bite or speech could focus on a neural network specifically for speech however the field is sort of also moving towards building neural networks that are capable of handling all three types of inputs for now we're going to start small and then build up the algorithms that we're going to focus on are neural networks that focus on one type of data but the premise is still the same you have some kind of input you have to numerically encode it in some form pass it to a neural network to learn representations or patterns within that numerical encoding output some form of representation and then we can convert that representation into things that humans understand and you might have already seen these and i might have already referenced the fact that these are tenses so when the question comes up what are tenses a tensor could be almost anything it could be almost any representation of numbers we're gonna get very hands-on with tenses and that's actually the fundamental building block of pytorch aside from neural network components is the torch dot tensor we're going to see that very shortly but this is a very important takeaway is that you have some sort of input data you're going to numerically encode that data turn it into a tensor of some kind whatever that kind is will depend on the problem you're working with then you're going to pass it to a neural network which will perform mathematical operations on that tensor now a lot of those mathematical operations are taken care of by pi torch behind the scenes so we'll be writing code to execute some kind of mathematical operations on these tensors and then the neural network that we create or with the one that's already been created but we just use for our problem will output another tensor similar to the input but has been manipulated in a certain way that we've sort of programmed it to and then we can take this output tensor and change it into something that a human can understand so to remove a lot of the text around it make it a bit more clearer if we were focusing on building an image classification model so we wanted to classify whether this was a photo of ramen or spaghetti we would have images as input we would turn those images into numbers which are represented by a tensor we would pass that tensor of numbers to a neural network or there might be lots of tensors here we might have 10 000 images we might have a million images or in some cases if you're google or facebook you might be working with 300 million or a billion images at a time the principle still stands that you encode your data in some form of numerical representation which is a tensor pass that tensor or lots of tensors to a neural network the neural network performs mathematical operations on those tensors outputs a tensor we convert that tensor into something that we can understand as humans and so with that being said we've covered a lot of the fundamentals what is machine learning what is deep learning what is neural network well we've touched the surface of these things you can get as deep as you like we've covered why use pie torch what is pie torch now the fundamental building block of deep learning is tenses we've covered that let's get a bit more specific in the next video of what we're going to cover code wise in this first module oh i'm so excited we're going to start coding soon i'll see you in the next video now it's time to get specific about what we're going to cover code wise in this fundamentals module but i just want to reiterate the fact that going back to the last video where i challenge you to look up what is a tensor here's exactly what i would do i would come to google i would type in the question what is a tensor there we go what is a tensor in pi torch it knows google knows that using that deep learning data that we want to know what a tensor is in pi torch but a tensor is a very general thing it's not associated with just pi torch now we've got tensor on wikipedia we've got tensor this is probably my favorite video on what is a tensor by dan fleisch fleece i'm probably saying that wrong but good first name and this is going to be your extra curriculum for this video and the previous video is to watch this on what is a tensor and now you might be saying well what gives i've come to this course to learn pytorch and all this guy is doing all you're doing daniel is just googling things when a question comes up why don't you just tell me what it is well if i was to tell you everything about deep learning and machine learning and pie torch and what it is and what it's not that course would be far too long i'm doing this on purpose i'm searching questions like this on purpose because that's exactly what i do day to day as a machine learning engineer i write code like we're about to do and then if i don't know something i literally go to whatever search engine i'm using google most of the time and type in whatever error i'm getting or pie torch what is a tensor something like that so i want to not only tell you that it's okay to search questions like that but it's encouraged so just keep that in mind as we go through the whole course you're going to see me do it lots but let's get into what we're going to cover here we go now this tweet is from elon musk and so i've decided you know what let's base the whole course on this tweet we have learning mldl from university you have a little bit of a small brain online courses well like this one that brain starts to explode and you get some little fireworks from youtube oh if you're watching this on youtube look at that shiny brain from articles my goodness lucky that this course comes in article format if you go to learn pytorch dot io all of the course materials are in online book format so we're going to get into this fundamental section very shortly but if you want a reference the course materials are built off this book and by the time you watch this there's going to be more chapters here so we're covering all the bases here and then finally from means you would ascend to some god-like creature i think that's hovering underwater so that is the best way to learn machine learning so this is what we're going to start with ml dl from university online courses youtube from articles from memes no no no no but kind of yeah here's what we're going to cover broadly so now in this module we are going to cover the pi torch basics and fundamentals mainly dealing with tensors and tensor operations remember a neural network is all about input tensors performing operations on those tensors creating output operations later we're going to be focused on pre-processing data getting it into tensors so turning data from raw form images whatever into a numerical encoding which is a tensor then we're going to look at building and using pre-trained deep learning models specifically neural networks we're going to fit a model to the data so we're going to show our model or write code for our model to learn patterns in the data that we've pre-processed we're going to see how we can make predictions with our model because that's what deep learning and machine learning is all about right using patterns from the past to predict the future and then we're going to evaluate our models predictions we're going to learn how to save and load our models for example if you wanted to export your model from where we're working to an application or something like that and then finally we're going to see how we can use a trained model to make predictions on our own data on custom data which is very fun and how well you can see that the scientist has faded out a little bit but that's not really that true we're going to do it like cooks not chemists so chemists are quite precise everything has to be exactly how it is but cooks are more like oh you know what a little bit of salt add a little bit of butter does it taste good okay well then we're on but machine learning is a little bit of both it's a little bit of science a little bit of art so that's how we're going to do it but i like the idea of this being a machine learning cooking show so welcome to cooking with machine learning cooking with pie torch with daniel and finally we've got a workflow here which ta-da we have a pytorch workflow which is one of many we're going to kind of use this throughout the entire course is step one we're going to get our data ready step two we're going to build or pick a pre-trained model to suit whatever problem we're working on step 2.1 pick a loss function and optimizer don't worry about what they are we're going to cover them soon step 2.2 build a training loop now this is kind of all part of the parcel of step two hence why we've got 2.1 and 2.2 you'll see what that means later on number three we're going to fit the model to the data and make a prediction so say we're working on image classification for ramen or spaghetti how do we build a neural network or put our images through that neural network to get some sort of idea of what's in an image we'll see how to do that we'll evaluate our model to see if it's predicting bs or it's actually going alright number five we're going to improve through experimentation that's another big thing that you'll notice throughout machine learning throughout this course is that it's very experimental part art part science number six save and reload your trained model again i put these with numerical order but they can kind of be mix and match depending on where you are in the journey but numerical order is just easy to understand for now now we've got one more video maybe another one before we get into code but in the next video i'm going to cover some very very important points on how you should approach this course i'll see you there now you might be asking how should i approach this course you might not be asking but we're going to answer it anyway how to approach this course this is how i would recommend approaching this course so i'm a machine learning engineer day to day and learning machine learning to coding machine learning a kind of two different things i remember when i first learned it was kind of you learned a lot of theory rather than writing code so not to take away from the theory of being important this course is going to be focusing on writing machine learning specifically pi torch code so the number one step to approaching this course is to code along now because this course is focused on purely writing code i will be linking extracurricular resources for you to learn more about what's going on behind the scenes of the code my idea of teaching is that if we can code together write some code see how it's working that's going to spark your curiosity to figure out what's going on behind the scenes so motto number one is if in doubt run the code write it run the code see what happens number two oh sure i love that explore and experiment again approach this with the idea the mind of a scientist and a chef or science and art experiment experiment experiment try things with rigor like a scientist would and then just try things for the fun of it like a chef would number three visualize what you don't understand i can't emphasize this one enough we have three mottos so far if in doubt run the code you're going to hear me say this a lot experiment experiment experiment and number three visualize visualize visualize why is this well because we've spoken about machine learning and deep learning deals with a lot of data a lot of numbers and so i find it that if i visualize some numbers in whatever form that isn't just numbers all over a page i tend to understand it better and there are some great extra resources that i'm going to link that also turn what we're doing so writing code into fantastic visualizations number four ask questions including the dumb questions really there's no such thing as a dumb question everyone is just on a different part of their learning journey and in fact if you do have a quote-unquote dumb question it turns out that a lot of people probably have that one as well so be sure to ask questions i'm going to link a resource in a minute of where you can ask those questions but please please please ask questions not only to the community but to google to the internet to wherever you can or just yourself ask questions of the code and write code to figure out the answer to those questions number five do the exercises there are some great exercises that i've created for each of the modules if we go have we got the book version of the course up here we do within all of these chapters here down the bottom is going to be exercises and extra curriculum so we've got some exercises there i'm not going to jump into them but i would highly recommend don't just follow along with the course and code after i code please please please give the exercises a go because that's going to stretch your knowledge we're going to have a lot of practice writing code together doing all of this stuff here but then the exercises are going to give you a chance to practice what you've learned and then of course extra curriculum well hey if you want to learn more there's plenty of opportunities to do so there and then finally number six share your work i can't emphasize enough how much writing about learning deep learning or sharing my work through github or different code resources or with the community has helped with my learning so if you learn something cool about pytorch i'd love to see it link it to me somehow in the discord chat or on github or whatever there'll be links of where you can find me i'd love to see it please do share your work it's a great way to not only learn something because when you share it when you write about it well it's like how would someone else understand it but it's also a great way to help others learn too and so we said how to approach this course now let's go how not to approach this course i would love for you to avoid overthinking the process and this is your brain this your brain on fire so avoid having your brain on fire that's not a good place to be we are working with pytorch so it's going to be quite hot just playing on words with the name torch but avoid your brain catching on fire and avoid saying i can't learn i've said this to myself lots of times and then i've practiced it and it turns out i can actually learn those things so let's just draw a red line on there oh a thicker red line yeah there we go nice and thick red line we'll get that out there it doesn't really make sense now that this has a void and crossed out but don't say i can't learn and prevent your brain from catching on fire finally we've got one more video that i'm going to cover before this one gets too long of the resources for the course before we get into coding i'll see you there now there are some fundamental resources that i would like you to be aware of before we go any further in this course these are going to be paramount to what we're working with so for this course there are three things there is the github repo so if we click this link i've got it pinned on my browser so you might want to do the same while you're going through the course but this is mr d burks and my github slash pytorch deep learning it is still a work in progress at the time of recording this video but by the time you go through it it won't look too much different but there'd just be more materials you'll have materials outline section what does it cover as you can see some more are coming soon at the time of recording this so these will probably be done by the time you watch this exercises and extra curriculum there'll be links here basically everything you need for the course will be in the github repo and then if we come back also on the github repo the same repo so mr d burke slash pie torch deep learning if you click on discussions this is going to be the q a this is just the same link here the q a for the course so if you have a question here you can click new discussion you can go q a and then type in video and then the title pie torch fundamentals and then go in here or you could type in your error as well what is endem for a tensor and then in here you can type in some stuff here hello i'm having trouble on video xyz put in the name of the video so that way i can or someone else can help you out and then code you can go three back ticks right python and then you could go import torch torch dot rand which is going to create a tensor we're going to see this in a second yeah and then if you post that question the formatting of the code is very helpful that we can understand what's going on and what's going on here so this is basically the outline of how i would ask a question video this is going on what is such and such for whatever's going on hello this is what i'm having trouble with here's the code and here's what's happening you could even include the error message and then you can just click start discussion and then someone either myself or someone else from the course will be able to help out there and the beautiful thing about this is that it's all in one place you can start to search it there's nothing here yet because the course isn't out yet but as you go through it there will probably be more and more stuff here then if you have any issues with the code that you think needs fixed you can also open a new issue there i'll let you read more into what's going on i've just got some issues here already about the fact that i need to record videos for the course i need to create some stuff but if you think there's something that could be improved make an issue if you have a question about the course ask a discussion and then if we come back to the keynote we have one more resource so that was the course materials all live in the github the course q a is on the course github's discussions tab and then the course online book now this is a work of art this is quite beautiful it is some code to automatically turn all of the materials from the github so if we come into here code if we click on notebook zero zero this is going to sometimes if you've ever worked with jupyter notebooks on github they can take a while to load so all of the materials here automatically get converted into this book so the beautiful thing about the book is that it's got different headings here it's all readable it's all online it's going to have all the images there and you can also search some stuff here pytorch training steps creating a training loop and pie torch beautiful we're going to see this later on so they're the three big materials that you need to be aware of the three big resources for this specific course materials on github course q a course online book which is learnpytorch.io simple url to remember all the materials will be there and then specifically for pie torch all things pie torch the pie torch website and the pie torch forums so if you have a question that's not course related but more pie touch related i'd highly recommend you go to the pie touch forums which is available at discuss.pytorch.org we've put a link there then the pie torch website pytorch.org this is going to be your home ground for everything pytorch of course we have the documentation here and as i said this course is not a replacement for getting familiar with the pytorch documentation this the course actually is built off all of the pytorch documentation it's just organized in a slightly different way so there's plenty of amazing resources here on everything to do with pytorch this is your home ground and you're going to see me referring to this a lot throughout the course so just keep these in mind course materials on github course discussions learn pytorch dot io this is all for the course and all things pie torch specific so not necessarily this course but just pi torch in general the pi torch website and the pi torch forums with that all being said we've come so far we've covered a lot already but guess what time it is let's write some code i'll see you in the next video we've covered enough of the fundamentals so far well from a theory point of view let's get into coding so i'm going to go over to google chrome i'm going to introduce you to the tool one of the main tools we're going to be using for the entire course and that is google colab so the way i would suggest following along with this course is remember one of the major ones is to code along so we're going to go to colab.research.google i've got a typo here classic you're going to see me do lots of typos throughout this course collab.research.google.com this is going to load up google colab now you can follow along with what i'm going to do but if you'd like to find out how to use google colab from a top down perspective you can go through some of these i'd probably recommend going through overview of collaboratory features but essentially what google colab is going to enable us to do is create a new notebook and this is how we're going to practice writing pytorch code so if you refer to the reference document of learnpytorch.io these are actually colab notebooks just in book format so online book format so these are the basis materials for what the course is going to be there's going to be more here but every new module we're going to start a new notebook and i'm going to just zoom in here so this one the first module is going to be 0 0 because python code starts at 0 0 and we're going to call this pytorch fundamentals i'm going to call mine video just so we know that this is the notebook that i wrote through the video and what this is going to do is if we click connect it's going to give us a space to write python code so here we can go print hello i'm excited to learn pytorch and then if we hit shift and enter it comes out like that but another beautiful benefit of google colab or ps i'm using the pro version which costs about 10 a month or so that price may be different depending on where you're from the reason i'm doing that because i use colab all the time however you do not have to use the paid version for this course google colab comes with a free version which you'll be able to use to complete this course if you see it worthwhile i find the pro version is worthwhile another benefit of google colab is if we go here we can go to runtime let me just show you that again run time change runtime type hardware accelerator and we can choose to run our code on an accelerator here now we've got gpu and tpu we're going to be focused on using gpu if you'd like to look into tpu i'll leave that to you but we can click gpu click save and now our code if we write it in such a way will run on the gpu now we're going to see this later on code that runs on the gpu is a lot faster in terms of compute time especially for deep learning so if we write here nvidia smi we now have access to a gpu in my case i have a tesla p100 it's quite a good gpu you tend to get the better gpus if you pay for google colab if you don't pay for it you get the free version you get a free gpu it just won't be as fast as the gpus you typically get with the paid version so just keep that in mind a whole bunch of stuff that we can do here i'm not going to go through it all because there's too much but we've covered basically what we need to cover so if we just come up here i'm going to write a text cell so oo dot pytorch fundamentals and i'm going to link in here resource notebook now you can come to learnpytorch.io and all the notebooks are going to be in sync so 0 0 we can put this in here resource notebook is there that's what this notebook is going to be based off this one here and then if you have a question about what's going on in this notebook you can come to the course github and then we go back back this is where you can see what's going on this is pytorch deep learning projects as you can see what's happening at the moment i've got pytorch course creation because i'm in the middle of creating it but if you have a question you can come to mr d burke slash pytorch deep learning slash discussions which is this tab here and then ask a question by clicking new discussion so any discussions related to this notebook you can ask it there and i'm going to turn this right now this is a code cell colab is basically comprised of code and text cells i'm going to turn this into a text cell by pressing command m shift and enter now we have a text cell and then if we wanted another code cell we could go like that text code text code yada yada ya but i'm going to delete this and to finish off this video we're going to import pi torch so we're going to import torch and then we're going to print torch dot dot version so that's another beautiful thing about google colab is that it comes with pytorch pre-installed and a lot of other common python data science packages such as we could also go import pandas as pd import numpy as np import matplotlib dot pi plot as plt this is google colab is by far the easiest way to get started with this course you can run things locally if you'd like to do that i'd refer to you to pytorch deep learning there's going to be setup.md getting set up to code pi torch we've just gone through number one setting up with google colab there is also another option for getting started locally right now this document's a work in progress but it'll be finished by the time you watch this video this is not a replacement though for the pie torch documentation for getting set up locally so if you'd like to run locally on your machine rather than on google colab please refer to this documentation or setup.md here but if you'd like to get started as soon as possible i'd highly recommend you using google colab in fact the entire course is going to be able to be run through google collab so let's finish off this video make sure we've got pytorch ready to go and of course some fundamental data science packages here wonderful this means that we have pytorch 1.10.0 so if your version number is far greater than this maybe you're watching this video a couple of years in the future and pi torch is up to 2.11 maybe some of the code in this notebook won't work but 1.10.0 should be more than enough for what we're going to do and plus q111 cu111 is stands for cuda version 11.1 i believe and what that would mean is if we came in here and we wanted to install it on linux which is what colab runs on there's mac and windows as well we've got cuda yeah so right now as of recording this video the latest pytorch build is 1.10.2 so you'll need at least pi torch 1.10 to complete this course and cuda 11.3 so that's cuda toolkit if you remember cuda toolkit is nvidia's programming there we go nvidia developer cuda is what enables us to run our pytorch code on nvidia gpus which we have access to in google collab beautiful so we're set up ready to write code let's get started in the next video writing some pytorch code this is so exciting i'll see you there so we've got set up we've got access to pi torch we've got a google colab instance running here we've got a gpu because we've gone up to runtime change runtime type hardware accelerator you won't necessarily need a gpu for this entire notebook but i just wanted to show you how to get access to a gpu because we're going to be using them later on so let's get rid of this and one last thing how i'd recommend going through this course is in a split window fashion so for example you might have the video where i'm talking right now and writing code on the left side and then you might have another window over the other side with your own colab window and you can go new notebook call it whatever you want my notebook you could call it very similar to what we're writing here and then if i write code over on this side on this video you can't copy it of course but you'll write the same code here and then go on and go on and go on and if you get stuck of course you have the reference notebook and you have an opportunity to ask a question here so with that being said let's get started the first thing we're going to have a look at in pi torch is an introduction to tensors so tensors are the main building block of deep learning in general or data and so you may have watched the video what is a tensor for the sake of this course tensors are a way to represent data especially multi-dimensional data numeric data that is but that numeric data represents something else so let's go in here creating tensors so the first kind of tensor we're going to create is actually called a scalar i know i'm going to throw a lot of different names of things at you but it's important that you're aware of such nomenclature even though in pytorch almost everything is referred to as a tensor there are different kinds of tensors and just to exemplify the fact that we're using a reference notebook if we go up here we can see we have importing pytorch we've done that now we're up to introduction to tensors we've got creating tensors and we've got scalar etc etc etc so this is what we're going to be working through let's do it together so scalar the way to oops what have i done there the way to create a tensor in pi torch we're going to call this scalar equals torch dot tensor and we're going to fill it with the number seven and then if we press or retype in scalar what do we get back seven wonderful and it's got the tensor data type here so how would we find out about what torch.tensor actually is well let me show you how i would we go to torch.tensor there we go we've got the documentation so this is possibly the most common class in pytorch other than one we're going to see later on that you'll use which is torch.nn basically everything in pi torch works off torch.tensor and if you'd like to learn more you can read through here in fact i would encourage you to read through this documentation for at least 10 minutes after you finish some videos here so with that being said i'm going to link that in here so pi torch tensors are created using torch not tensor and then we've got that link there oops typo's called law daniel come on you're better than this no i'm kidding there's going to be typos galore through the whole course okay now what are some attributes of a scalar so some details about scalars let's find out how many dimensions there are oh and by the way this warning perfect timing google collab will give you some warnings here depending on whether you're using a gpu or not now the reason being is because google colab provides gpus to you and i for free however gpus aren't free for google to provide so if we're not using a gpu we can save some resources allow someone else to use a gpu by going to none and of course we can always switch this back so i'm going to turn my gpu off so that someone else out there i'm not using the gpu at the moment they can use it so what you're also going to see is if your google colab instance ever restarts up here we're going to have to re-run these cells so if you stop coding for a while go have a break and then come back and you start your notebook again that's one downside of google colab is that it resets after a few hours how many hours i don't know exactly the reset time is longer if you have the pro subscription but because it's a free service and the way google calculate usage and all that sort of stuff i can't give a conclusive evidence or conclusive answer on how long until it resets but just know if you come back you might have to rerun some of your cells and you can do that with shift and enter so a scalar has no dimensions right it's just a single number but then we move on to the next thing or actually if we wanted to get this number out of a tensor type we can use scalar dot item this is going to give it back as just a regular python integer wonderful there we got the number seven back get tensor back as python int now the next thing that we have is a vector so let's write in here vector which again is going to be created with torch.tensor but you will also hear the word vector used a lot too now what is the deal oops 7.7 google collabs auto complete is a bit funny it doesn't always do the thing you want it to so if we see a vector we've got two numbers here and i think if we wanted to find out what is a vector so a vector usually has magnitude and direction so what we're going to see later on is there we go magnitude how far it's going and which way it's going and then if we plotted it we've got yeah a vector equals the magnitude would be the the length here and the direction would be where it's pointing and oh here we go scalar vector matrix 10 so this is what we're working on as well so the thing about vectors how they differ with scalars is how i just remember them is rather than magnitude and direction is a vector typically has more than one number so if we go vector and dim how many dimensions does it have it has one dimension which is kind of confusing but when we see tensors with more than one dimension it'll make sense and another way that i remember how many dimensions something has is by the number of square brackets so let's check out something else maybe we go vector dot shape shape is 2 so the difference between dimension so dimension is like number of square brackets and when i say even though there's two here i mean number of pairs of closing square brackets so there's one pair of closing square brackets here but the shape of the vector is two so we have two by one elements so that means a total of two elements now if we wanted to step things up a notch let's create a matrix so this is another term you're going to hear and you might be wondering why i'm capitalizing matrix well i'll explain that in a second matrix equals torch dot tensor and we're going to put two square brackets here you might be thinking what could the two square brackets mean or actually that's a little bit of a challenge if one pair of square brackets had an end dim of one what will the end be number of dimensions of two square brackets so let's create this matrix beautiful so we've got another tensor here again as i said these things have different names like the traditional name of scalar vector matrix but they're all still a torch dot tensor that's a little bit confusing but the thing you should remember in pi torch is basically any time you encode data into numbers it's of a tensor data type and so now how many n number of dimensions do you think a matrix has it has two so there we go we have two square brackets so if we wanted to get matrix let's index on the zeroth axis let's see what happens there ah so we get seven and eight and then we get off the first dimension ah nine and ten so this is where the square brackets the pairings come into play we've got two square bracket pairings on the outside here so we have an end dim of two now if we get the shape of the matrix what do you think the shape will be ah two by two so we've got two numbers here by two so we have a total of four elements in there so we're covering a fair bit of ground here nice and quick but that's going to be the teaching style of this course is we're going to get quite hands-on and writing a lot of code and just interacting with it rather than continually going back over and discussing what's going on here the best way to find out what's happening within a matrix is to write more code that's similar to these matrices here but let's not stop at matrix let's upgrade to a tensor now so i might put this in capitals as well and i haven't explained what the capitals mean yet but we'll see that in a second so let's go torch dot tensor and what we're going to do is this time we've done one square bracket pairing we've done two square bracket pairings let's do three square bracket pairings and just get a little bit adventurous all right and so you might be thinking at the moment this is quite tedious i'm just going to write a bunch of random numbers here one two three three six nine two five four now you might be thinking daniel you've said tenses could have millions of numbers if we had to write them all by hand that would be quite tedious and yes you're completely right the fact is though that most of the time you won't be crafting tenses by hand pie torch will do a lot of that behind the scenes however it's important to know that these are the fundamental building blocks of the models and the deep learning neural networks that we're going to be building so tensor capitals as well we have three square brackets so or three square bracket pairings i'm just going to refer to three square brackets at the very start because they're going to be paired down here how many n dim or number of dimensions do you think our tensor will have three wonderful and what do you think the shape of our tensor is we have three elements here we have three elements here three elements here and we have one two three so maybe our tensor has a shape of one by three by three hmm what does that mean well we've got three by one two three that's the second square bracket there by one ah so that's the first dimension there or the zeroth dimension because remember pi torch is zero indexed we have well let's just instead of talking about it let's just get on the zeroth axis and see what happens with the zeroth dimension there we go okay so there's this is the far left one zero which is very confusing because we've got a one here but so we've got oops don't mean that what this is saying is we've got one three by three shape tensor so very outer bracket matches up with this number one here and then this three matches up with the next one here which is one two three and then this three matches up with this one one two three now if you'd like to see this with a pretty picture we can see it here so dim zero lines up so the blue bracket the very outer one lines up with the one then dim equals one this one here the middle bracket lines up with the middle dimension here and then dem equals two the very inner lines up with these three here so again this is going to take a lot of practice it's taken me a lot of practice to understand the dimensions of tensors but to practice i would like you to write out your own tensor of you can put however many square brackets you want and then just interact with the end dim shape and indexing just as i've done here but you can put any combination of numbers inside this tensor that's a little bit of practice before the next video so give that a shot and then we'll move on to the next topic i'll see you there welcome back in the last video we covered the basic building blocks of data representation in deep learning which is the tensor or in pi torch specifically torch.tensor but within that we had a look at what a scalar is we had a look at what a vector is we had to look at a matrix we had a look at what a tensor is and i issued you the challenge to get as creative as you like with creating your own tensor so i hope you gave that a shot because as you'll see throughout the course and your deep learning journey a tensor can represent or can be of almost any shape and size and have almost any combination of numbers within it and so this is very important to be able to interact with different tensors to be able to understand what the different names of things are so when you hear matrix you go oh maybe that's a two dimensional tensor when you hear a vector maybe that's a one-dimensional tensor when you hear a tensor that could be any amount of dimensions and just for reference for that if we come back to the course reference we've got a scalar what is it a single number number of dimensions zero we've got a vector a number with direction number of dimensions one a matrix a tensor and now here's another little tidbit of the nomenclature of things the naming of things typically you'll see a variable name for a scalar or a vector as a lowercase so a vector you might have a lowercase y storing that data but for a matrix or a tensor you'll often see an uppercase letter or variable in python in our case because we're writing code and so i'm not exactly sure why this is but this is just what you're going to see in machine learning and deep learning code and research papers across the board this is a typical nomenclature scalars and vectors lowercase matrix and tensors uppercase that's where that naming comes from and that's why i've given a tensor uppercase here now with that being said let's jump in to another very important concept with tensors and that is random tenses why random tenses i'm just writing this in a code cell now i could go here this is a comment in python random tenses but we'll get rid of that we could just start another text cell here and then three hashes is going to give us a heading random tenses there or i could turn this again into a markdown cell with command mm when i'm using google colab so random tenses let's write down here why random tenses so we've done the tedious thing of creating our own tenses with some numbers that we've defined whatever these are again you could define these as almost anything but random tenses is a big part in pie torch because let's write this down random tenses are important because the way many neural networks learn is that they start with tensors full of random numbers and then adjust those random numbers to better represent the data so seriously this is one of the big concepts of neural networks we i'm going to write in code here which is this is what the tick is for start with random numbers look at data update random numbers look at data update random numbers that is the crux of neural networks so let's create a random tensor with pi torch remember how i said that pie torch is going to create tenses for you behind the scenes well this is one of the ways that it does so so we create a random tensor and we'll give it a size of random tensor of size or shape pytorch use these independently so size shape they mean the different versions of the same thing so random tensor equals torch dot rand and we're going to type in here three four and the beautiful thing about google collab as well is that if we wait long enough it's going to pop up with the docstring of what's going on i personally find this a little hard to read in google colab because you see you can keep going down there you might be able to read that but what can we do well we can go to torch.rand then we go to the documentation beautiful now there's a whole bunch of stuff here that you're more than welcome to read we're not going to go through all that we're just going to see what happens hands on so copy that in here and write this in notes torch random tenses done i'm just going to make some code cells down here so i've got some space i can get this a bit up here let's see what our random tensor looks like there we go beautiful of size 3 4. so we've got 3 or four elements here and then we've got three deep here so again there's the two pairs so what do you think the number of dimensions will be for random tensor and m 2. beautiful and so we have some random numbers here now the beautiful thing about pie torch again is that it's going to do a lot of this behind the scenes so if we wanted to create a size of 10 10 in some cases we won't want one dimension here and then it's going to go 1010 and then if we check the number of dimensions how many do you think it will be now 3 why is that because we've got 1 10 10 and then if we wanted to create 10 10 10 what's the number of dimensions going to be it's not going to change why is that we haven't run that cell yet but we've got a lot of numbers here we can find out what 10 times 10 times 10 is and i know we can do that in our heads but the beauty of colab is we've got a calculator right here 10 times 10 times 10 we've got a thousand elements in there but sometimes tenses can be hundreds of thousands of elements or millions of elements but pie torch is going to take care of a lot of this behind the scenes so let's clean up a bit of space here this is a random tensor random numbers beautiful of now it's got two dimensions because we've got three by four and if we put another one in the front there we're going to have how many dimensions three dimensions there but again this number of dimensions could be any number and what's inside here could be any number let's get rid of that and let's get a bit specific because right now this is just a random tensor of whatever dimension how about we create a random tensor with similar shape to an image tensor so a lot of the time when we turn images image size tensor when we turn images into tensors they're going to have let me just write it in code for you first size equals a height a width and a number of color channels and so in this case it's going to be height width color channels and the color channels are red green blue and so let's create a random image tencel let's view the size of it or the shape and then random image size tensor we'll view the end dimm beautiful okay so we've got torch size the same size two two four two two four three height width color channels and we've got three dimensions one for height with color channels let's go and see an example of this this is the pi touch fundamentals notebook if we go up to here so say we wanted to encode this image of my dad eating pizza with thumbs up of a square image of 224 by 224 this is an input and if we wanted to encode this into tensor format well one of the ways of representing an image tensor very common ways is to split it into color channels because with red green and blue you can create almost any color you want and then we have a tensor representation so sometimes you're going to see color channels come first we can switch this around in our code quite easily by going color channels here but you'll also see color channels come at the end i know i'm saying a lot that we kind of haven't covered yet the main takeaway from here is that almost any data can be represented as a tensor and one of the common ways to represent images is in the format color channels height width and how these values are will depend on what's in the image but we've done this in a random way so the takeaway from this video is that pytorch enables you to create tensors quite easily with the random method however it is going to do a lot of this creating tensors for you behind the scenes and why are random tenses so valuable because neural networks start with random numbers look at data such as image tensors and then adjust those random numbers to better represent that data and they repeat those steps onwards and onwards and onwards let's finish this video here my challenge for you is to create your own random tensor of whatever size and shape you want so you could have 5 10 10 here and see what that looks like and then we'll keep coding in the next video i hope you took on the challenge of creating random tensor of your own size and just a little tidbit here you might have seen me in the previous video i didn't use the size parameter but in this case i did here you can go either way so if we go torch dot rand size equals we put in a tuple here of 3 3 we've got that tensor there 3 3 but then also if we don't put the size in there it's the default so it's going to create a very similar tensor so whether you have this size or not it's going to have quite a similar output depending on the shape that you put in there but now let's get started to another kind of tensor that you might see zeros and ones so say you wanted to create a tensor where that wasn't just full of random numbers you wanted to create a tensor of all zeros this is helpful for if you're creating some form of mask now we haven't covered what a mask is but essentially if we create a tensor of all zeros what happens when you multiply a number by zero all zeros so if we wanted to multiply these two together let's do zeros times random tensor there we go all zeros so maybe if you're working with this random tensor and you wanted to mask out say all of the numbers in this column for some reason you could create a tensor of zeros in that column multiply it by your target tensor and you would zero all those numbers that's telling your model hey ignore all of the numbers that are in here because i've zeroed them out and then if you wanted to create a tensor of all ones create a tensor of all ones we can go ones equals torch dot ones size equals three four and then if we have a look there's another parameter i haven't showed you yet but this is another important one is the d type so the default data type so that's what d type stands for is torch.float we've actually been using torch.float the whole time because that's whenever you create a tensor with pi torch we're using a pi torch method unless you explicitly define what the data type is we'll see that later on defining what the data type is it starts off as torch float32 so these are float numbers so that is how you create zeros and ones zeros is probably i've seen more common than ones in use but just keep these in mind you might come across them there are lots of different methods to creating tensors and truth be told like random is probably one of the most common but you might see zeros and ones out in the field so now we've covered that let's move on in to the next video where we're going to create a range so have a go at creating a tensorflow of zeros and whatever size you want and the tensorflow of ones and whatever size you want and i'll see you in the next video welcome back i hope you took on the challenge of creating a torch tensor of zeros of your own size and ones of your own size but now let's investigate how we might create a range of tensors and tensors like so these are two other very common methods of creating tensors so let's start by creating a range so we'll first use torch.range because depending on when you're watching this video torch dot range may be still in play or it may be deprecated if we write in torch dot range right now with the pine torch version that i'm using which is torch dot version which is torch or pi torch 1.10.0 torch range is deprecated and will be removed in a future release so just keep that in mind if you come across some code that's using torch.range maybe out of whack so the way to get around that is to fix that is to use a range instead and if we just write in torch.a range we've got tenses of 0 to 9 because it of course starts at 0 index if we wanted one to ten we could go like this one two three four five six seven eight nine ten and we can go zero or we go one two ten equals torch a range wonderful and we can also define the step so let's let's type in some start and where can we find the documentation on a range sometimes in google collab you can press shift tab but i find that it doesn't always work for me yeah you could hover over it but we could also just go torch a range and look for the documentation torch a range so we've got start and step let's see what all of these three do maybe we started at zero and maybe we wanted to go to a thousand and then we want a step of what should our step be what's a fun number 77 so it's not one to ten anymore but here we go we've got start at zero 77 plus 77 plus 77 plus 77 all the way up to it finishes at a thousand so if we wanted to take it back to one to ten we can go up here one ten and the default step is going to be one oops we needed the end to be it's going to finish at end minus one there we go beautiful now we can also create tensors like so creating tensors like so tensor's like is say you had a particular shape of a tensor you wanted to replicate somewhere else but you didn't want to explicitly define what that shape should be so what's the shape of one to ten one to ten now if we wanted to create a tensorflow of zeros that had the same shape as this we can use tensor like or zeros like so ten zeros zeros equals i'm not even sure if i'm spelling zeros right then zeros well i might have a typo spelling zeroes here but you get what i'm saying is torch zeros oh torch spell it like that that's why i'm spelling it like that zeros like one to ten and then the input is going to be 1 to 10 and we have a look at 10 zeros my goodness this is taking quite a while to run this is troubleshooting on the fly if something's happening like this you can try to stop if something was happening like that you can click run and then stop well it's running so fast that i can't click stop if you do also run into trouble you can go run time restart run time we might just do that now just to show you restart and run all is going to restart the compute engine behind the colab notebook and run all the cells to where we are so let's just see that restart and run runtime if you're getting errors sometimes this helps there is no set in stone way to troubleshoot errors it's guess and check with this so there we go we've created ten zeros which is torch zeros like our one to ten tensor so we've got zeros in the same shape as one to ten so if you'd like to create tensors use torch a range and get deprecated message use torch a range instead for creating a range of tensors with a start and end in a step and then if you wanted to create tensors or a tensor like something else you want to look for the like method and then you put an input which is another tensor and then it'll create a similar tensor with whatever this method here is like in that fashion or in the same shape as your input so with that being said give that a try create a range of tensors and then try to replicate that range shape that you've made with zeros i'll see you in the next video welcome back let's now get into a very important topic of tensor data types so we've briefly hinted on this before and i said that let's create a tensor to begin with float 32 tensor and we're going to go float32 tensor equals torch.tensor and let's just put in the numbers three six nine if you've ever played need for speed underground you'll know where three six nine comes from and then we're going to go d type equals let's just put none and see what happens hey float32 tensor oh what is the data type load32 tensor.d type float 32 even though we put none this is because the default data type in pi torch even if it's specified as none is going to come out as float 32 what if we wanted to change that to something else well let's type in here float 16 and now we've got float 32 tensor this variable name is a lie now because it's a float16 tensor so we'll leave that as none let's go there there's another parameter when creating tensors that's very important which is device so we'll see what that is later on and then there's a final one which is also very important which is requires grad equals false now this could be true of course we're going to set this as false so these are three of the most important parameters when you're creating tensors now again you won't necessarily always have to enter these when you're creating tensors because because pytorch does a lot of tensor creation behind the scenes for you so let's just write out what these are data type is what data type is the tensor eg float 32 or float 16. now if you'd like to look at what data types are available for pi torch tensors we can go torch.tensor and right at the top unless the documentation changes we have data types it's so important that data types is the first thing that comes up when you're creating a tensor so we have 32-bit floating point 64-bit floating point 16 16 32-bit complex now the most common ones that you will likely interact with a 32-bit floating point and 16-bit floating point now what does this mean what do these numbers actually mean well they have to do with precision in computing so let's look up that precision in computing precision computer science so in computer science the precision of a numerical quantity we're dealing with numbers right is a measure of the detail in which the quantity is expressed this is usually measured in bits but sometimes in decimal digits it is related to precision in mathematics which describes the number of digits that are used to express a value so for us precision is the numerical quantity is a measure of the detail how much detail in which the quantity is expressed so i'm not going to dive into the background of computer science and how computers represent numbers the important takeaway for you from this will be that single precision floating point is usually called float 32 which means yeah a number contains 32 bits in computer memory so if you imagine if we have a tensor that is using 32 bit floating point the computer memory stores the number as 32 bits or if it has 16 bit floating point it stores it as 16 bits or 16 numbers representing or 16 i'm not sure if a bit equates to a single number in computer memory but what this means is that a 32-bit tensor is single precision this is half precision now this means that it's the default is 32 float 32 torch.float32 as we've seen in code which means it's going to take up a certain amount of space in computer memory now you might be thinking why would i do anything other than the default well if you'd like to sacrifice some detail in how your number is represented so instead of 32 bits it's represented by 16 bits you can calculate faster on numbers that take up less memory so that is the main differentiator between 32 bit and 16 bit but if you need more precision you might go up to 64 bit so just keep that in mind as you go forward single precision is 32 half precision is 16. what do these numbers represent they represent how much detail a single number is stored in memory that was a lot to take in but we're talking about tensor data types i'm spending a lot of time here because i'm going to put a note here note tensor data types is one of the three big issues with pi torch and deep learning or not not issues they're going to be errors that you run into in deep learning three big errors you'll run into with pi torch and deep learning so one is tensors not right data type two tensors not right shape we've seen a few shapes before and three tensors not on the right device and so in this case if we had a tensor that was float 16 and we were trying to do computations with a tensor that was float 32 we might run into some errors and so that's the tensors not being in the right data type so it's important to know about the d type parameter here and then tensor's not being the right shape well that's once we get onto matrix multiplication we'll see that if one tensor is a certain shape and another tense is another shape and those shapes don't line up we're going to run into shape errors and this is a perfect segue to the device device equals none by default this is going to be cpu this is why we are using google colab because it enables us to have access to oh we don't want to restart enables us to have access to a gpu as i've said before a gpu enables us so we could change this to cuda that would be we'll see how to write device agnostic code later on but this device if you try to do operations between two tensors that are not on the same device so for example you have one tensor that lives on a gpu for fast computing and you have another tensor that lives on a cpu and you try to do something with them while pie twitch is going to throw you an error and then finally this last requires grad is if you want pytorch to track the gradients we haven't covered what that is of a tensor when it goes through certain numerical calculations this is a bit of a bombardment but i thought i'd throw these in as important parameters to be aware of since we're discussing data type and really it would be reminisce of me to discuss data type without discussing not the right shape or not the right device so with that being said let's write down here what device is your tensor on and whether or not to track gradients with this tensor's operations so we have a float 32 tensor now how might we change the tensor data type of this let's create float 16 tensor and we saw that we could explicitly write in float 16 tensor or we can just type in here float 16 tensor equals float 32 tensor dot type and we're going to type in torch dot float 16 why float16 because well that's how we define float 16 or we could use half so the same thing these things are the same let's just do half or float 16 is more explicit for me and then let's check out float 16 tensor beautiful we've converted our float 32 tensor into float 16. so that is one of the ways that you'll be able to tackle the tensors not in the right data type issue that you run into and just a little note on the precision and computing if you'd like to read more on that i'm going to link this in here and this is all about how computers store numbers so precision in computing there we go i'll just get rid of that wonderful so give that a try create some tenses research or go to the documentation of torch.tensor and see if you can find out a little bit more about d-type device and requires grad and create some tenses of different data types play around with whatever the ones you want here and see if you can run into some errors maybe try to multiply two tensors together so if you go float 16 tensor times float 32 tensor give that a try and see what happens i'll see you in the next video welcome back in the last video we covered a little bit about tensor data types as well as some of the most common parameters you'll see passed to the torch.tensor method and so i used to do the challenge at the end of the last video to create some of your own tenses of different data types and then to see what happens when you multiply a float 16 tensor by a float 32 tensor oh it works and but you're like daniel you said that you're going to have tensors not the right data type well this is another kind of gotcha or caveat of pie torch and deep learning in general is that sometimes you'll find that even if you think something may error because these two tenses are different data types it actually results in no error but then sometimes you'll have other operations that you do especially training large neural networks where you'll get data type issues the important thing is to just be aware of the fact that some operations will run an error when your tensors are not in the right data type so let's try another type maybe we try a 32-bit integer so torch dot in 32 and we try to multiply that by a float i wonder what what will happen then so let's go in 32 in 32 tensor equals torch dot tensor and we'll just make it three notice that there's no floats there or no dot points to make it a float 369 and d type can be torch and 32 and then int 32 tensor what does this look like typo of course one of many m32 tensor so now let's go float 32 tensor and see what happens can we get pi torch to throw an error int 32 tensor huh it worked as well well maybe we go into 64. what happens here it still works now see this is again one of the confusing parts of doing tensor operations what if we do a long tensor torched along is this going to still work ah torch has no attribute called long that's not a data dive issue i think it's long tensor long tensor does this work d type must be torch d type torch long tensor i could have sworn that this was torch.tensor oh there we go torch.long tension that's another word for 64 bit so what is this saying cpu tensor okay let's see this is some troubleshooting on the fly here then we multiply it this is a float 32 times a long it works okay so it's actually a bit more robust than what i thought it was but just keep this in mind when we're training models we're probably going to run into some errors at some point of our tensors not being the right data type and if pitoch throws us an error saying your tensors are in the wrong data type well at least we know now how to change that data type or how to set the data type if we need to and so with that being said let's just formalize what we've been doing a fair bit already and that's getting information from the three big things that we'll want to get from our tenses in line with the three big errors that we're going to face in neural networks and deep learning is let's copy these down i'm just going to get this copy this down below so if we want to get some information from tensors how do we check the shape how do we check the data type how do we check the device so let's write that down so to get information from this to get d type or let's write data type from a tensor can use tensor dot d type and let's go here to get shape from a tensor can use tensor dot shape and to get device from a tensor which devices it on cpu or gpu can use tensor dot device let's see these three in action so if we run into one of the three big problems in deep learning and neural networks in general especially with pi torch tensor's not the right data type tensor's not the right shape or tensor's not on the right device let's create a tensor and try these three out we've got some tensor equals torch dot rand and we'll create a 3 4. let's have a look at what it looks like there we go random numbers of shape 3 and 4. now let's find out some details about it find out details about some tensor so print we'll print some tensor and oops didn't want that print and let's format it or make an f string of shape of tensor oh let's do data type first we'll follow that order data type of tensor and we're going to go how do we do this some tensor dot what dot d type beautiful and then we're going to print tensors not in the right shape so let's go shape of tensor equals some tensor dot shape oh i went a bit too fast but we could also use size let's just confirm that actually we'll code that out together from my experience some tensor size and some tensor dot shape result in the same thing is that true oh function oh that's what it is some tensor dot size is a function not an attribute there we go which one should you use for me i'm probably more used to using shape you may come across dot size as well but just realize that they do quite the same thing except one's a function and one's an attribute an attribute is written dot shape without the curly brackets a function or a method is with the brackets at the end so that's the difference between these are attributes here d type size we're going to change this to shape tensor attributes this is what we're getting i should probably write that down this is tensor attributes that's the formal name for these things and then finally what else do we want tensors what device are we looking for let's get rid of this get rid of this and then print if device tensor is on by default our tensor is on the cpu so some tensor dot device there we go so now we've got our tensor here some tensor the data type is a torch float 32 because we didn't change it to anything else and torch float32 is the default the shape is 3 4 which makes a lot of sense because we passed in 3 4 here and the device tensor is on is the cpu which is of course the default unless we explicitly say to put it on another device all of the tensors that we create will default to being on the cpu rather than the gpu and we'll see later on how to put tensors and other things in torch onto a gpu but with that being said give it a shot create your own tensor get some information from that tensor and see if you can change these around so see if you could create a random tensor but instead of float 32 it's a float 16. and then probably another extracurricular we haven't covered this yet but see how to change the device a pie torch tensor is on give that a crack and i'll see you in the next video welcome back so the last video we had a look at a few tensor attributes namely the data type of a tensor the shape of a tensor and the device that a tensor lives on and i alluded to the fact that these will help resolve three of the most common issues in building neural networks deep learning models specifically with pytorch so tensor's not being the right data type tensor's not being the right shape and tense has not been on the right device so now let's get into manipulating tensors and what i mean by that so let's just write here the title manipulating tensors and this is going to be tensor operations so when we're building neural networks neural networks are comprised of lots of mathematical functions that pytorch code is going to run behind the scenes for us so let's go here tensor operations include addition subtraction and these are the regular addition subtraction multiplication there's two types of multiplication in that you'll typically see referenced in deep learning and neural networks division and matrix multiplication and these the ones here so addition subtraction multiplication division are your typical operations that you're probably familiar with matrix multiplication the only different one here is matrix multiplication we're going to have a look at that in a minute but to find patterns in numbers of a data set a neural network will combine these functions in some way shape or form so it takes a tensor full of random numbers performs some kind of combination of addition subtraction multiplication division matrix multiplication it doesn't have to be all of these it could be any combination of these to manipulate these numbers in some way to represent a data set so that's how a neural network learns is it will just comprise these functions look at some data to adjust the numbers of a random tensor and then go from there but with that being said let's look at a few of these so we'll begin with addition first thing we need to do is create a tensor and to add something to a tensor we'll just go torch tensor let's go one two three add something to a tensor is tensor plus we can use plus as the addition operator just like in python tensor plus 10 is going to be tensor 11 12 13 10 to plus 100 is going to be as you'd expect plus a hundred let's leave that as plus ten and add ten to it and so you might be able to guess how we would multiply it by ten so let's go multiply tensor by 10 we can go tensor star which on my keyboard shift 8 10. we get 10 10 10 and because we didn't reassign it our tensor is still one two three so if we go if we reassign it here tensor equals tensor by 10 and then check out tensor we've now got 10 20 30 and the same thing here we'll have 10 20 30. but then if we go back from the top if we delete this reassignment oh what do we get there tensor by 10 oh what's happened here oh because we've got yeah okay i see tensor by 10 tensor still one two three what should we try now how about subtract subtract ten equals tensor minus ten and you can also use well there we go one minus ten eight 8-10 3-10 you can also use like torch has inbuilt functions or pi torch so try out pi torch inbuilt functions so torch.mole is short for multiply we can pass in our tensor here and we can add in 10 that's going to multiply each element of tensor by 10 so just taking the original tensor that we created which is one two three and performing the same thing as this i would recommend where you can use the operators from python if for some reason you see torch.mall maybe there's a reason for that but generally these are more understandable if you just use the operators if you need to do a straight up multiplication straight up addition or straight up subtraction because torch also has torch dot add torch dot add is it twitch.ad it might be torched it add i'm not sure oh there we go yeah torch.ad so as i alluded to before there's two different types of multiplication that you'll hear about element wise and matrix multiplication we're going to cover matrix multiplication in the next video as a challenge though i would like you to search what is matrix multiplication and i think the first website that comes up matrix multiplication wikipedia yeah math is fun has a great guide so before we get into matrix multiplication jump into math is fun to have a look at matrix multiplying and have a think about how we might be able to replicate that in pie torch even if you're not sure just have a think about it i'll see you in the next video welcome back in the last video we discussed some basic tensor operations such as addition subtraction multiplication element wise division and matrix multiplication but we didn't actually go through what matrix multiplication is so now let's start on that more particularly discussing the difference between element wise and matrix multiplication so we'll come down here let's write another heading matrix multiplication so there's two ways well two main ways yeah let's write that two main ways of performing multiplication in neural networks and deep learning so one is the simple version which is what we've seen which is element wise multiplication and number two is matrix multiplication so matrix multiplication is actually possibly the most common tensor operation you will find inside neural networks and in the last video i issued the extra curriculum of having a look at the mathisfun.com page for how to multiply matrices so the first example they go through is element-wise multiplication which just means multiplying each element by a specific number in this case we have two times four equals eight two times zero equals zero two times one equals two two times negative 9 equals negative 18. but then if we move on to matrix multiplication which is multiplying a matrix by another matrix we need to do the dot product so that's something that you'll also hear matrix multiplication referred to as the dot product so these two are used interchangeably matrix multiplication or dot product and if we just look up the symbol for dot product you'll find that it's just a dot there we go a heavy dot images there we go a dot b so this is vector a dot product b a few different options there but let's look at what it looks like in pi torch code but first there's a little bit of a difference here so how did we get from multiplying this matrix here of one two three four five six times seven eight nine ten eleven twelve how did we get 58 there well we start by going this is the difference between element wise and dot product by the way one times seven we record that down there so that's seven and then two times nine so this is first row first column 2 times 9 is 18 and then 3 times 11 is 33 and if we add those up 7 plus 18 plus 33 we get 58. and then if we were to do that for each other element that's throughout these two matrices we end up with something like this so that's what i'd encourage you to go through step by step and reproduce this a good challenge would be to reproduce this by hand with pytorch code but now let's go back and write some pie torch code to do both of these so i just want a link here as well more information on multiplying matrices so i'm going to turn this into markdown let's first see element wise element wise multiplication we're going to start with just a rudimentary example so if we have our tensor what is it at the moment it's one two three and then if we multiply that by itself we get one four nine but let's print something out so it looks a bit prettier than that so print and i'm going to turn this into a string and then we do that so if we print tensor times tensor element wise multiplication is going to give us print equals and then let's do in here tensor times tensor and we go like that wonderful so we get one times one equals one two times two equals four three times three equals nine now for matrix multiplication pi torch stores matrix multiplication similar to torch.mall in the torch dot mat mole space which stands for matrix multiplication so let's just test it out let's just treat the exact same thing that we did here instead of element wise we'll do matrix multiplication on our 1 2 3 tensor what happens here oh my goodness 14. now why did we get 14 instead of 1 4 9 can you guess how we got to 14 or think about how we got to 14 from these numbers so if we recall back we saw that for we're only multiplying two smaller tensors by the way one two three these examples with a larger one but the same principle applies across different sizes of tensors or matrices and when i say matrix multiplication you can also do matrix multiplication between tenses and in our case we're using vectors just to add to the confusion but what is the difference here between element wise and dot product we've got one main addition and that is addition so if we were to code this out by hand matrix multiplication by hand we'd have recall that the elements of our tensor are one two three so if we wanted to matrix multiply that by itself we'd have one times one which is the equivalent of doing 1 times 7 in this visual example and then we'd have plus it's going to be 2 times 2 2 times 2 what does that give us plus 3 times 3 what does it give us 3 times 3 that gives us 14. so that's how we got to that number there now we could do this with a for loop so let's have a geez it when i say gays it means have a look that's a australian colloquialism for having a look but i want to show you the time difference in it might not actually be that big a difference if we do it by hand versus using something like matmul and that's another thing to note is that if pytorch has a method already implemented chances are it's a fast calculating version of that method so i know for basic operators i said it's usually best to just use the straight up basic operator but for something like matrix multiplication or other advanced operators instead of the basic operators you probably want to use the torch version rather than writing a for loop which is what we're about to do so let's go value equals zero this is matrix multiplication by hand so for i in range len tensor so for each element in the length of our tensor which is one two three we want to update our value to be plus equal which is doing this plus reassignment here the eighth element in each tensor times the element so times itself and then how long is this going to take let's now return the value we should get 14. print 14. there we go so 1.9 milliseconds on whatever cpu that google collab is using behind the scenes but now if we time it and use the torch method torch.matmall equals tensor dot sensor and again we're using a very small tensor so okay there we go it actually showed how much quicker it is even with such a small tensor so this is 1.9 milliseconds this is 252 microseconds so this is 10 times slower using a for loop than pi torches vectorized version i'll let you look into that if you want to find out what vectorization means it's just a type of programming that rather than writing for loops because as you could imagine if this tensor was let's say had a million elements instead of just three if you have to loop through each of those elements one by one that's going to be quite cumbersome so a lot of pie touches functions behind the scenes implement optimized functions to perform mathematical operations such as matrix multiplication like the one we did by hand in a far faster manner as we can see here and that's only with a tensor of three elements so you can imagine the speed ups on something like a tensor with a million elements but with that being said that is the crux of matrix multiplication for a little bit more i encourage you to read through this documentation here by mathisfun.com otherwise let's look at a couple of rules that we have to satisfy for larger versions of matrix multiplication because right now we've done it with a simple tensor only one two three let's step things up a notch in the next video welcome back in the last video we were introduced to matrix multiplication which although we haven't seen it yet is one of the most common operations in neural networks and we saw that you should always try to use torches implementation of certain operations except if they're basic operations like plus multiplication and whatnot because chances are it's a lot faster version than if you would do things by hand and also it's a lot less code like compared to this this is pretty verbose code compared to just hey matrix multiply these two tenses but there's something that we didn't allude to in the last video there's a couple of rules that need to be satisfied when performing matrix multiplication it worked for us because we have a rather simple tensor but once you start to build larger tensors you might run into one of the most common errors in deep learning i'm going to write this down actually here this is one to be very familiar with one of the most common errors in deep learning we've already alluded to this as well is shape errors so let's jump back to this in a minute i just want to write up here so there are two rules that performing or two main rules that performing matrix multiplication needs to satisfy otherwise we're going to get an error so number one is the inner dimensions must match let's see what this means so if we want to have two tenses of shape three by two and then we're going to use the at symbol now we might be asking why the at symbol well the at symbol is another is a like a operator symbol for matrix multiplication so i just want to give you an example if we go tensor at at stands for matrix multiplication we get tensor 14 which is exactly the same as what we got there should you use at or should you use matt mull i would personally recommend to use matt mall it's a little bit clearer at sometimes can get confusing because it's not as common as seeing something like matt mull so we'll get rid of that but i'm just using it up here for brevity and then we're going to go 3 2 now this won't work we'll see why in a second but if we go 2 3 at and then we have 3 2 this will work or and then if we go the reverse say 3's on the outside 2's here and then we have 2s on the inside 3s on the outside this will work now why is this well this is the rule number one the inner dimensions must match so the inner dimensions uh what i mean by this is let's create torch rand we'll create of size 3 2 and then we'll get its shape so we have so if we created a tensor like this three two and then if we created another tensor well let me just show you straight up torch dot matt matmul torch dot rand watch this won't work we'll get an error there we go so this is one of the most common errors that you're going to face in deep learning is that matrix one in matrix two shapes cannot be multiplied because it doesn't satisfy rule number one the inner dimensions must match and so what i mean by inner dimensions is this dimension multiplied by this dimension so say we were trying to multiply 3 2 by 3 2 these are the inner dimensions now this will work because why the inner dimensions match 2 3 by three two two three by three two now notice how the inner dimensions inner inner match let's see what comes out here oh look at that and now this is where rule two comes into play two the resulting matrix has the shape of the outer dimensions so we've just seen this one 2 3 at 3 2 which is at remember is matrix multiply so we have a matrix of shape 2 3 matrix multiply a matrix of 3 2 the inner dimensions match so it works the resulting shape is what two two just as we've seen here we've got a shape of two two now what if we did the reverse what if we did this one that also will work three on the outside what do you think is going to happen here in fact i encourage you to pause the video and give it a go so this is going to result in a 3 3 matrix but don't take my word for it let's have a look 3 put 2 on the inside and we'll put two on the inside here and then three on the outside what does it give us oh look at that a three three one two three one two three now what if we were to change this 2 and 2. this can be almost any number you want let's change them both to 10 what's going to happen will this work what's the resulting shape going to be so the inner dimensions match what's rule number two the resulting matrix has the shape of the outer dimension so what do you think is going to be the shape of this resulting matrix multiplication well let's have a look it's still 3 3. wow now what if we go 10 10 on the outside and 10 and 10 on the inside what do we get well we get i'm not going to count all of those but if we just go shape we get 10 by 10 because these are the two main rules of matrix multiplication is if you're running into an error that the matrix multiplication can't work so let's say this was 10 and this was 7 watch what's going to happen we can't multiply them because the inner dimensions do not match we don't have 10 and 10 we have 10 and 7. but then when we change this so that they match we get 10 and 10. beautiful so now let's create a little bit more of a specific example we'll create two tenses we'll come down actually to prevent this video from being too long i've got an error in the word error that's funny we'll go on with one of those common errors in deep learning shape errors we've just seen it but i'm going to get a little bit more specific with that shape error in the next video before we do that have a look at matrix multiplication there's a website my other favorite website i told you i've got two this is my other one matrix multiplication.xyz this is your challenge before the next video put in some random numbers here whatever you want 2 10 5 6 7 8 whatever you want change these around a bit three four oh that's a five not a four and then multiply and just watch what happens that's all i'd like you to do just watch what happens and we're going to replicate something like this in pi torch code in the next video i'll see you there welcome back in the last video we discussed a little bit more about matrix multiplication but we're not done there we looked at two of the main rules of matrix multiplication and we saw a few errors of what happens if those rules aren't satisfied particularly if the inner dimensions don't match so this is what i've been alluding to as one of the most common errors in deep learning and that is shape errors because neural networks are comprised of lots of matrix multiplication operations if you have some sort of tensor shape error somewhere in your neural network chances are you're going to get a shape error so now let's investigate how we can deal with those so let's create some tenses shapes for matrix multiplication and i also showed you the website sorry matrix multiplication.xyz i hope you had a go at typing in some numbers here and visualizing what happens because we're going to reproduce something very similar to what happens here but with pytorch code shapes for matrix multiplication we have tensor a let's create this as torch.tensor we're going to create a tensor with just the elements one two all the way up to let's just go to six hey that'll be enough six wonderful and then tensor b can be equal to a torch tensor of where we're going to go for this one let's go seven ten this would be a little bit confusing this one but then we'll go eight eleven and this will go up to twelve nine twelve so it's the same sort of sequence as what's going on here but they've been swapped around so we've got the vertical axis here instead of one two three four this is just seven eight nine ten eleven twelve but let's now try and perform a matrix multiplication how do we do that torch dot matmul for matrix multiplication ps torch also has torch dot mm which stands for matrix multiplication which is a short version so i'll just write down here so that you know tensor a tensor b i'm going to write torch.mm is the same as torch.matmall it's an alias for writing less code this is literally how common matrix multiplications are in pi torch is that they've made torch.mm as an alias for matt mull so you have to type four less characters using torch.mm instead of matt mull but i like to write matt mull because it's a little bit like it explains what it does a little bit more than mn so what do you think is going to happen here it's okay if you're not sure but what you could probably do to find out is check the shapes of these does this operation matrix multiplication satisfy the rules that we just discussed especially this one this is the main one the inner dimensions must match well let's have a look eh oh no mat 1 and mat 2 shapes cannot be multiplied 3 by 2 and 3 by 2. this is very similar to what we went through in the last video but now we've got some actual numbers there let's check the shape oh torch size 3 2 torch size 3 2 now in the last video we created a random tensor and we could adjust the shape on the fly but these tensors already exist how might we adjust the shape of these well now i'm going to introduce you to another very common operation or tensor manipulation that you'll see and that is the transpose to fix our tensor shape issues we can manipulate the shape of one of our tensors using uh let's go transpose and so all right here we're going to see this anyway but i'm going to define it in words a transpose switches the axes or dimensions of a given tensor so let's see this in action if we go and the way to do it is you can go tensor b dot t let's see what happens let's look at the original tensor b as well so dot t stands for transpose and that's a little bit hard to read so we might do these on different lines tensor b and get rid of that so you see what's happened here instead of tensor b this is the original one we might put the original on top instead of the original one having 7 8 9 10 11 12 down the vertical the transpose has transposed it to 789 across the horizontal and 10 11 12 down here now if we get the shape of this tends to be dot shape let's have a look at that let's have a look at the original shape tensor b dot shape what's happened oh no we've still got three two oh that's what i've missed out here i've got a typo excuse me i thought i was you think code that you've written is working but then you realize you've got something as small as just a dot t missing and it throws off your whole train of thought so you're seeing these arrows on the fly here now tensor b is this but its shape is torched at size 3 2. and if we try to matrix multiply 3 2 and 3 2 tensor a and tensor b we get an error why because the inner dimensions do not match but if we perform a transpose on tensor b we switch the dimensions around so now we perform a transpose with tensor b dot t t is for transpose we have this is the important point as well we still have the same elements it's just that they've been rearranged they've been transposed so now tensor b still has the same information encoded but rearranged so now we have torch size 2 3 and so when we try to matrix multiply these we satisfy the first criteria and now look at the output of the matrix multiplication of tensor a and tensor b dot t transposed is 3 3 and that is because of the second rule of matrix multiplication the resulting matrix has the shape of the outer dimensions so we've got 3 2 matrix multiply 2 3 results in a shape of 3 3. so let's prettify some of this and we'll print out what's going on here just so we know we can step through it because right now we've just got code all over the place a bit let's see here the matrix multiplication operation works when tensor b is transposed and in a second i'm going to show you what this looks like visually but right now we've done it with pytorch code which might be a little confusing and that's perfectly fine matrix multiplication takes a little while and a little practice so original shapes is going to be tensor a dot shape let's see what this is and tensor b equals tensor b dot shape but the reason why we're spending so much time on this is because as you'll see as you get more and more into neural networks and deep learning the matrix multiplication operation is one of the most or if not the most common same shape as above because we haven't changed tensor a shape we've only changed tensor b shape or we've transposed it and then tensor b dot transpose equals we want tends to be dot t dot shape wonderful and then if we print let's just print out oops print i spelt the wrong word there print we want what are we multiplying here this is one of the ways remember our motto of visualize visualize visualize well this is how i visualize visualize visualize things shape let's do the at symbol for brevity tensor and let's get b dot t dot shape we'll put in our little rule here inner dimensions must match and then print let's get the output output we'll put that on a new line the output is going to equal torch dot well our output's already here but we're going to rewrite it for a little bit of practice tensor a tensor b dot t and then we can go print output and then finally print let's get it on a new line as well the output shape a fair bit going on here but we're going to step through it and it's going to help us understand a little bit about what's going on that's the data visualizers motto there we go okay so the original shapes are what torch size 3 2 and torch size 3 2. the new shapes tensor a stays the same we haven't changed tensor a and then we have tensor b dot t is torch size 2 3 then we multiply a three by two by a two by three so the inner dimensions must match which is correct they do match two and two then we have an output of tensor uh 27 30 33 61 68 75 etc and the output shape is what the output shape is the outer dimensions three three now of course you could rearrange this maybe transpose tensor a instead of tensor b have a play around with it see if you can create some more errors trying to multiply these two and see what happens if you transpose tensor a instead of tensor b that's my challenge but before we finish this video how about we just recreate what we've done here with this cool website matrix multiplication so what did we have we had tensor a which is one to six let's recreate this remove that this is going to be one two three four five six and then we want to increase this and this is going to be 7 8 9 10 11 12. is that the right way of doing things so this is already transposed just to let you know so this is the equivalent of tensor b on the right here tensor b dot t so let me just show you if we go tensor b dot transpose which original version was that but we're just passing in the transpose version to our matrix multiplication website and then if we click multiply this is what's happening behind the scenes with our pie touch code of matmal we have 1 times 7 plus 2 times 10 did you see that little flippy thing that it did that's where the 27 comes from and then if we come down here what's our first element 27 when we matrix multiply them then if we do the same thing the next step we get 30 and 61 from a combination of these numbers do it again 33 68 95 from a combination of these numbers again and again and finally we end up with exactly what we have here so that's a little bit of practice for you to go through is to create some of your own tenses can be almost whatever you want and then try to matrix multiply them with different shapes see what happens when you transpose and what different values you get and if you'd like to visualize it you could write out something like this that really helps me understand matrix multiplication and then if you really want to visualize it you can go through this website and recreate your target tenses in something like this i'm not sure how long you can go but yeah that should be enough to get started so give that a try and i'll see you in the next video welcome back in the last few videos we've covered one of the most fundamental operations in neural networks and that is matrix multiplication but now it's time to move on and let's cover tensor aggregation and what i mean by that is finding the min max mean sum etc tensor aggregation of certain tensor values so for whatever reason you may want to find the minimum value of a tensor the maximum value the mean the sum what's going on there so let's have a look at some few pi torch methods that are inbuilt to do all of these and again if you're finding one of these values it's called tensor aggregation because you're going from what's typically a large amount of numbers to a small amount of numbers so the min of this tensor would be 27 so you're turning it from nine elements to one element hence aggregation so let's create a tensor create a tensor x equals torch dot let's use a range we'll create maybe uh 0 to 100 with a step of 10. sounds good to me and we can find them in by going can we do torch dot men maybe we can ah or we could also go x dot min and then we can do the same find the max torch.max and x.max now how do you think we might get the average so let's try it out or find the mean find the mean torch dot mean x oops we don't have an x this going to work what's happened mean input data type should be either floating point or complex d types got long instead haha finally i knew that error would show its face eventually remember how i said it right up here that we've covered a fair bit already but right up here some of the most common errors that you're going to run into is tensors not the right data type not the right shape we've seen that with matrix multiplication not the right device we haven't seen that yet but not the right data type this is one of those times so it turns out that the tensor that we created x is of the data type x dot d type in 64. which is long so if we go to let's look up torch tensor this is where they're getting long from we've seen long before is n64 where's that or long yeah so long tensor that's what it's saying and it turns out that the torch mean function can't work on tensors with data type long so what can we do here well we can change the data type of x so let's go torch mean x type and change it to float 32 or before we do that if we go to torch dot mean is this going to tell us that it needs a d type d type optional the desired data type does it have float 32 it doesn't tell us ah so this is another one of those little hidden things that you're going to come across and you only really come across this by writing code is that sometimes the documentation doesn't really tell you explicitly what d type the input should be the input tensor however we find out that with this error message that it should either be a floating point or a complex d type not a long so we can convert it to torch float32 so all we've done is gone x type as type float32 let's see what happens here 45. beautiful and then the same thing if we went can we do x dot mean is that going to work as well oh same thing so if we go x dot type torch dot float 32 get the mean of that there we go so that is i knew it would come up eventually a beautiful example of finding the right data type let me just put a note here note the torch.mean function requires a tensor of float 32 so so far we've seen two of the major errors in pi torch is data type and shape issues what's another one that we said ah some so find the sum find the sum we want x dot sum or maybe we just do torch dot sum first keep it in line with what's going on above and x dot sum which one of these should you use like torch dot something x or x dot sum personally i prefer torch.max but you'll also probably see me at points right this it really depends on what's going on i would say pick whichever style you prefer and because behind the scenes they're calling the same methodology picture whichever style you prefer and stick with that throughout your code for now let's leave it at that tensor aggregation there's some finding min max mean sum in the next video we're going to look at finding the positional min and max which is also known as arg max and arg min or vice versa so actually that's a little bit of a challenge for the next video is see how you can find out what the positional min and max is of this and what i mean by that is which index does the max value occur at and which index of this tensor does the min occur at you'll probably want to look into the methods arg min torch.arg torch.argmin for that one and torch.max for that but we'll cover that in the next video i'll see you there welcome back in the last video we learned all about tensor aggregation and we found the min the max the mean and the sum and we also ran into one of the most common issues in pi torch and deep learning and neural networks in general and that was wrong data types and so we solved that issue by converting because some functions such as torch.main require a specific type of data type as input and we created our tensor here which was of by default torch in 64. however torch.main requires torch.float32 we saw that in an error we fixed that by changing the type of the inputs i also issued you the challenge of finding finding the positional min and max and you might have found that you can use the arg min for the minimum let's remind ourselves of what x is x so this means at tensor index of tensor x if we find the argument that is the minimum value which is zero so at index zero we get the value zero so that's that zero there zero there this is an index value so this is what arg min stands for find the position in tensor that has the minimum value with argmin and then returns index position of target tensor where the minimum value occurs now let's just change x to start from one just so there we go so the argument is still position zero position zero so this is an index value and then if we index on x at the zeroth index we get one so the minimum value in x is one and then the maximum you might guess is find the position in tensor that has the maximum value with arg max and it's going to be the same thing except it'll be the maximum which is which position index nine so if we go zero one two three four five six seven eight nine and then if we index on x for the ninth element we get 91 beautiful now these two are useful for if yes you wanted to find the minimum of a tensor you can just use min but if you sometimes you don't want the actual minimum value you just want to know where it appears particularly with the arg max value this is helpful for when we use the softmax activation function later on now we haven't covered that yet so i'm not going to allude too much to it but just remember to find the positional min and max you can use argmin and arg max so that's all we need to cover with that let's keep going in the next video i'll see you then welcome back so we've covered a fair bit of ground and just to let you know i took a little break after going through all of these and i'd just like to show you how i get back to where i'm at because if we tried to just write x here and press shift and enter because our colab was disconnected it's now connecting because as soon as you press any button in colab it's going to reconnect it's going to try to connect initialize and then x is probably not going to be stored in memory anymore so there we go name x is not defined that's because the colab state gets reset if you take a break for a couple of hours this is to ensure google can keep providing resources for free and it deletes everything to ensure that there's no compute resources that are being wasted so to get back to here i'm just going to go restart and run all you don't necessarily have to restart the notebook you could also go do we have run all yeah we could do run before that'll run every cell before this we could run after we could run the selection which is this cell here i'm going to click run all which is just going to go through every single cell that we've coded above and run them all however it will also stop at the errors where i've left in on purpose so remember when we ran into a shape error well because this error we didn't fix it i left it there on purpose so that we could keep seeing a shape error it's going to stop at this cell so we're going to have to run every cell after the error cell so see how it's going to run these now they run fine and then we get right back to where we were which was x so that's just a little tidbit of how i get back into coding let's now cover reshaping stacking squeezing and unsqueezing you might be thinking squeezing and unsqueezing what are you talking about daniel well it's all to do with tenses and you're like are we gonna squeeze our tenses give them a hug are we gonna let them go by unsqueezing them well let's quickly define what these are so reshaping as we saw before one of the most common errors in machine learning and deep learning is shape mismatches with matrices because they have to satisfy certain rules so reshape reshapes and input tensor to a defined shape now we're just defining these things in words right now but we're going to see it in code in just a minute there's also view which is return a view of an input tensor of certain shape but keep the same memory as the original tensor so we'll see what view is in a second reshaping and view are quite similar but a view always shares the same memory as the original tensor it just shows you the same tensor but from a different perspective a different shape and then we have stacking which is combine multiple tensors on top of each other this is a v stack for vertical stack or side by side h stack let's see what different types of torch stacks there are again this is how i research different things if i wanted to learn something new i would search torch something stack concatenate a sequence of tenses along a new dimension okay so maybe not h stack or v stack we can just define what dimension we'd like to combine them on i wonder if there is a torch v stack torch v stack oh there is and is there a torch h stack for horizontal stack there is a h stack beautiful so we'll focus on just the plain stack if you want to have a look at v stack it'll be quite similar to what we're going to do with stack and same with h stack again this is just words for now we're going to see the code in a minute so there's also squeeze which removes all one dimensions i'm going to put one in code dimensions from a tensor we'll see what that looks like and then there's unsqueeze which adds a one dimension to our target tensor and then finally there's permute which is return a view of the input with dimensions permuted so swapped in a certain way so a fair few methods here but essentially the the crust of all of these the main point of all of these is to manipulate our tensors in some way to change their shape or change their dimension because again one of the number one issues in machine learning and deep learning is tensor shape issues so let's start off by creating a tensor and have a look at each of these let's create a tensor and then we're going to just import torch we don't have to but this will just enable us to run the notebook directly from this cell if we wanted to instead of having to run everything above here so let's create another x torch dot a range because range is deprecated i'm just going to add a few code cells here so that i can scroll and that's in the middle of the screen there beautiful so let's just make it between 1 and 10 nice and simple and then let's have a look at x and x dot shape what does this give us okay beautiful so we've got the numbers from one to nine our tensor is of shape torch size nine let's start with reshape so how about we add an extra dimension so then we have x reshaped equals x dot reshape now a key thing to keep in mind about the reshape is that the dimensions have to be compatible with the original dimensions so if we're going to change the shape of our original tensor with a reshape and we try to change it into the shape one seven does that work with the number nine well let's find out hey let's check out x reshaped and then we'll look at x reshaped dot shape what's this going to do oh why do we get an error there well it's telling us here this is what part twitch is actually really good at is giving us errors for what's going wrong we have one seven is invalid for input size of nine and why is that well we're trying to squeeze nine elements into a tensor of one times seven into seven elements but if we change this to nine what do we get ah so do you notice what just happened here we just added a single dimension see the single square bracket with the extra shape here what if we wanted to add two can we do that no we can't why is that well because 2 9 is invalid for input size 9 because 2 times 9 is what 18 so we're trying to double the amount of elements without having double the amount of elements so if we change this back to one what happens if we change these around nine one what does this do ah a little bit different there so now instead of adding one on the first dimension or the zeroth dimension because python is zero indexed we added it on the first dimension which is giving us a square bracket here if we go back so we add it to the outside here because we've put the one there and then if we wanted to add it on the inside we put the one on the outside there so then we've got the torch size nine one now let's try change the view change the view so just to reiterate the reshape has to be compatible with the original size so how about we change this to one to ten so we have a size of ten and then we can go five two what happens there ah it's compatible because five times two equals ten and then what's another way we could do this how about we make it up to 12. so we've got 12 elements and then we can go three four my code cell's taking a little while to run here then we'll go back to nine just so we've got the original there oops they're going to be incompatible ah so this is another thing this is good we're getting some errors on the fly here sometimes you'll get save failed with google colab and automatic saving failed what you can do to fix this is just either keep coding keep running some cells and collab will fix itself in the background or restart the notebook close it and open again so we've got size 9 of size 8 sorry incompatible but this is good you're seeing the errors that come up on the fly rather than me sort of just telling you what the errors are you're seeing them as they come up for me i'm trying to live code this and this is what's going to happen when you start to use google colab and subsequently other forms of jupiter notebooks but now let's get into the view so we can go zed equals let's change the view of x view we'll change it to 1 9 and then we'll go zed and then z dot shape ah we get the same thing here so view is quite similar to reshape remember though that a view shares the memory with the original tensor so z is just a different view of x so z shares the same memory as what x does so let's exemplify this so changing z changes x because a view of a tensor shares the same memory as the original input so let's just change z we'll change the first element by using indexing here so we're targeting one we'll set this to equal five and then we'll see what z and x equal yeah so see we've got z the first one here we change the first element the zeroth element to five and the same thing happens with x we change the first element of z so because z is a view of x the first element of x changes as well but let's keep going how about we stack some tensors on top of each other and we'll see what the stack function does in torch so stack tensors on top of each other and i'll just see if i press command s to save maybe we'll get this fixed or maybe it just will fix itself oh notebook is saved unless you've made some extensive changes that you're worried about losing you could just download this notebook so file download and upload it to colab but usually if you click yes it sort of resolves itself yeah there we go all changes saved so that's beautiful troubleshooting on the fly i like that so x stacked let's stack some tensors together equals torch stack let's go x x x because if we look at what the doc string of stack is will we get this in collab or we just go to the documentations yeah so list it takes a list of tenses and concatenates a sequence of tenses along a new dimension and we define the dimension the dimension by default is zero that's a little bit hard to read for me so tenses dm equals zero if we come into here the default dimension is zero let's see what happens when we play around with the dimension here so we've got four x's and the first one we'll just do it by default x stacked okay wonderful so they're stacked vertically let's see what happens if we change this to one ah they're rearranged a little and stacked like that what happens if we change it to two does it have a dimension two oh we can't do that well that's because the original shape of x is incompatible with using dimension two so the only real way to get used to what happens here by stacking them on top of each other is to play around with the different values for the dimension dim 0 dim1 they look a little bit different there now they're on top of each other and so the first 0th index is now the 0th tensor and then same with 2 being there 3 and so on but we'll leave it at the default and there's also v stack and h stack i'll leave that to you too to practice those but i think from memory v stack is using dimension equals zero or h stack is like using dimension equals one i may have those back the front you can correct me if i'm wrong there now let's move on we're going to now have a look at squeeze and unsqueeze so actually i'm going to get you to practice this so see if you can look up torch squeeze and torch unsqueeze and see if you can try them out we've created a tensor here we've used reshape and view and we've used stack the usage of squeeze and unsqueeze is quite similar so give that a go and to prevent this video from getting too long we'll do them together in the next video welcome back in the last video i issued the challenge of trying out torch dot squeeze which removes all single dimensions from a target tensor and how would you try that out well here's what i would have done i'd go to torch.squeeze and see what happens open up the documentation squeeze input dimension returns a tensor with all the dimensions of input size 1 removed and does it have some demonstrations yes it does wow ok so you could copy this in straight into a notebook copy it here but what i'd actually encourage you to do quite often is if you're looking up a new torch method you haven't used code all of the example by hand and then just practice what the inputs and outputs look like so x is the input here check the size of x squeeze x well set the squeeze of x to y check the size of y so let's replicate something similar to this we'll go into here we'll look at x reshaped and we'll remind ourselves of x reshaped dot shape and then how about we see what x reshaped dot squeeze looks like okay what happened here well we started with two square brackets and we started with a shape of 1 9 and removes all single dimensions from a target tensor and now if we call the squeeze method on x reshaped we only have one square bracket here so what do you think the shape of x reshaped dot squeeze is going to be and check the shape here it's just nine so that's the squeeze method removes all single dimensions if we had one one nine it would remove all of the ones so it would just end up being nine as well now let's write some print statements so we can have a little pretty output so previous tensor this is what i like to do this is a form of visualize visualize visualize if i'm trying to get my head around something i print out each successive change to see what's happening that way i can go oh okay so that's what it was there and then i called that line of code there yes it's a bit tedious but you do this half a dozen times a fair few times i mean i still do it a lot of the time even though i've written thousands of lines of machine learning code but it starts to become instinct after a while you start to go oh okay i've got a dimension mismatch on my tensors so i need to squeeze them before i put them into a certain function for a little while but with practice just like riding a bike right that that trite saying it's like when you first start you're all wobbly all over the place having to look up the documentation not that there's much documentation for riding a bike you just kind of keep trying but that's the style of coding i'd like you to adopt is to just try it first then if you're stuck go to the documentation look something up print it out like this what we're doing quite cumbersome but this is going to give us a good explanation for what's happening here's our previous tensor x reshaped and then if we look at the shape of x reshaped it's one nine and then if we call the squeeze method which removes all single dimensions from a target tensor we have the new tensor which is has one square bracket removed and the new shape is all single dimensions removed so it's still the original values but just a different dimension now let's do the same as what we've done here with unsqueeze so we've given our tensors a hug and squeezed out all the single dimensions of them now we're going to unsqueeze them we're going to take a step back and let them grow a bit so torch unsqueeze adds a single dimension to a target tensor at a specific dim dimension now that's another thing to note in pi torch whenever it says dim that's dimension as in this is the zeroth dimension first dimension and if there was more here we'd go two three four five six etc because why tensors can have unlimited dimensions so let's go previous target can be x squeezed so we'll get this squeezed version of our tensor which is x squeezed up here and then we'll go print the previous shape is going to be x squeezed dot shape and then we're going to add an extra dimension with unsqueeze then we go x unsqueezed equals x squeezed so our tensor before that we remove the single dimension and we're going to put in unsqueeze dim we'll do it on the zeroth dimension and i want you to have a think about what this is going to output even before we run the code just think about because we've added an extra dimension on the zeroth dimension what's the new shape of the unsqueezed tensor going to be so we're going to go x unsqueezed and then we're going to go print we'll get our new tensor shape which is going to be x unsqueezed dot shape all right let's have a look there we go so there's our previous tensor which is the squeezed version just as a single dimension here and then we have our new tensor which with the unsqueezed method on dimension zero we've added a square bracket on the zeroth dimension which is this one here now what do you think is going to happen if i change this to one where's the single dimension going to be added let's have a look ah so instead of adding the single dimension on the zeroth dimension we've added it on the first dimension here it's quite confusing because python is zeroth index so i kind of want to my brain's telling me to say first but it's really the zeroth index here or the zeroth dimension now let's change this back to zero but that's just another way of exploring things every time there's like a parameter that we have here dim equals something like that could be shape could be size whatever try changing the values that's what i'd encourage you to do and even write some print code like we've done here now there's one more we want to try out and that's permute so torch dot permute rearranges the dimensions of a target tensor in a specified order so if we wanted to check out let's get rid of some of these extra tabs torch dot permute let's have a look this one took me a little bit of practice to get used to because again working with zeroth dimensions even though it seems like the first one so returns a view okay so we know that a view shares the memory of the original input tensor with its dimensions permuted so permuted for me i didn't really know what that word meant i just have mapped in my own memory that permute means rearrange dimensions so the example here is we start with a random tensor we check the size and then we have torch permute we're going to swap the order of the dimensions so the second dimension is first the zeroth dimension is in the middle and the first dimension is here so these are dimension values so if we have torch rand n 2 3 5 2 0 1 has changed this one to be over here and then 0 1 is 2 3 and now 2 3 there so let's try something similar to this so one of the common places you'll be using permute or you might see permute being used is with images so there's a data specific data format we've kind of seen a little bit before not too much original equals torch dot rand size equals so an image tensor we could go height width color channels on the end so i'll just write this down so this is height width color channels remember much of and i'm going to spell color australian style much of deep learning is turning your data into numerical representations and this is quite common numerical representation of image data you have a tensor dimension for the height attentions dimension for the width and the tensor dimension for the color channels which is red green and blue because a certain number of red green and blue creates almost any color now if we want to permute this so permute the original tensor to re-arrange the axis or dimension axis or dimension are kind of used in the same light for tensors or dim order so let's switch the color channels to be the first or the zeroth dimension so instead of height with color channels it'll be color channel's height width how would we do that with permute let's give it a shot x permuted equals x original dot permute and we're going to take the second dimension because this takes a series of dims here so the second dimension is color channels remember zero one two so two we want two first then we want the height which is the zero and then we want the width which is one and now let's do this shifts axis zero to one 1 to 2 and 2 to 0. so this is the order as well this 2 maps to 0. this 0 maps to the first index this one maps to this index but that's enough talk about it let's see what it looks like so print previous shape x original dot shape and then we go here print new shape this will be the permuted version we want x permuted dot shape let's see what this looks like wonderful this is exactly what we wanted so you see let's just write a little note here now this is color channels height width so the same data is going to be in both of these tenses so x original x permuted it's just viewed from a different point of view because remember a permute is a view and what did we discuss a view shares the same memory as the original tensor so x permuted will share the same place in memory as x original even though it's from a different shape so a little challenge before we move on to the next video for you or before you move on to the next video try change one of the values in x original have a look at x original and see if that same value it could be let's get one of this zero zero get all of the dimensions here zero see what that is or can we get a single value maybe oops oh no we'll need a zero here getting some practice on indexing here oh zero zero zero there we go okay so maybe we set that to some value whatever you choose and see if that changes in x permuted so give that a shot and i'll see you in the next video welcome back in the last video we covered squeezing unsqueezing and permuting which i'm not going to lie these concepts are quite a lot to take in but just so you're aware of them remember what are they working towards they're helping us fix shape and dimension issues with our tenses which is one of the most common issues in deep learning and neural networks and i issued you the little challenge of changing a value of x original to highlight the fact that permute returns a different view of the original tensor and a view in pi torch shares memory with that original tensor so if we change the value at 0 0 0 of x original to in my case 7 2 8 2 1 8 it happens the same value gets copied across to x permuted so with that being said we we looked at selecting data from tensors here and this is using a technique called indexing so let's just rehash that because this is another thing that can be a little bit of a hurdle when first working with multi-dimensional tensors so let's see how we can select data from tensors with indexing so if you've ever done indexing indexing with pi torch is similar to indexing with numpy if you've ever worked with numpy and you've done indexing selecting data from arrays numpy uses an array as its main data type pi torch uses tensors it's very similar so let's again start by creating a tensor and again i'm just going to add a few code cells here so i can make my screen right in the middle now we're going to import torch again we don't need to import torch all the time just so you can run the notebook from here later on x equals torch dot let's create a range again just nice and simple this is how i like to work out the fundamentals too is just create a small range reshape it and the reshape has to be compatible with the original dimensions so we'll go one three three and why is this because torche range is going to return us nine values because it's from the start here to the end minus one and then one times three times three is what is nine so let's have a look x x dot shape beautiful so we have one two three four five six seven eight nine of size one so we have this is the outer bracket here which is going to contain all of this and then we have three which is this one here one two three and then we have three which is one two three now let's work with this let's index on our new tensor so let's see what happens when we get x0 this is going to index on the first bracket so we get this one here so we've indexed on the first dimension here the 0th dimension on this one here which is why we get what's inside here and then let's try again let's index on the middle bracket so dimension one so we've got to go x and then zero and then zero let's see what happens there now is this the same as going x zero zero it is there we go so depends on what you want to use sometimes i prefer to go like this so i know that i'm getting the first bracket and then the zeroth version of that first bracket so then we have these three values here now what do you think what's going to happen if we index on the third dimension or the second dimension here well let's find out so let's index on the most inner bracket which is last dimension so we have x 0 0 0. what number is this going to give us back if x 0 on the zeroth dimension gives us back this middle tensor and then of x 0 0 gives us back the zeroth index of the middle tensor if we go x 0 0 0 is going to give us the zeroth tensor the zeroth index and the zeroth element a lot to take in there but what we've done is we've just broken it down step by step we've got this first zero targets this outer bracket and returns us all of this and then zero zero targets this first because of this first zero and then the zero here targets this and then if we go zero zero zero we target this then we target this and then we get this back because we are getting the zeroth index here so if we change this to one what do we get back two and if we change these all to one what will we get this is a bit of a trivia here or a challenge so we're going one one one let's see what happens oh no did you catch that before i ran the code i did that one quite quickly we have index one is out of bounds why is that well because this dimension is only one here so we can only index on the zero that's where it gets a little bit confusing because this says one but because it's only got zero dimension we can only index on the zeroth dimension but what if we do zero one one what does that give us five beautiful so i'd like to issue you the challenge of how about getting number nine how would you get number nine so rearrange this code to get number nine that's your challenge now i just want to show you as well is you can use you can also use you might see this the semicolon to select all of a target dimension so let's say we wanted to get all of the zeroth dimension but the zeroth element from that we can get one two three and then let's say we want to say get all values of the zeroth and first dimensions but only index one of the second dimension oh that was a mouthful but get all values of zeroth and first dimensions but only index one of second dimension so let's break this down step by step we want all values of zeroth and first dimensions but only index one of the second dimension let me press enter shift enter two five eight so what did we get there two five eight okay so we've got all elements of the zeroth and first dimension but then so which will return us this thing here but then we only want 2 5 8 which is the first element here of the second dimension which is this 3 there so quite confusing but with some practice you can figure out how to select almost any numbers you want from any kind of tensor that you have so now let's try again get all values of the zero dimension but only the one index value of the first and second dimension so what might this look like let's break it down again so we come down here x and we're going to go all values of the zero dimension because zero comes first and then we want only the one index value of the first and only the one index value of the second what is this going to give us five oh we selected the middle tensor so really this line of code is exactly the same as this line of code here except we've got the square brackets on the outside here because we've got this semicolon there so if we change this to a zero we remove that but because we've got the semicolon there we've selected all the dimensions so we get back square bracket there something to keep in mind finally let's just go one more so get index zero of zero and first dimension and all values of second dimension so x 0 0 so 0th index of 0 and first dimension 0 0 and all values of the second dimension what have we just done here we've got tensor123 lovely this code again is equivalent to what we've done up here this has a semicolon on the end but what this line explicitly says without the semicolon is hey give us all the values on the remaining dimension there so my challenge for you is to take this tensor that we've got here and index on it to return nine so i'll write down here index on x to return nine so if we have a look at x as well as index on x to return three six nine so these values here so give those both a go and i'll see you in the next video welcome back how'd you go did you give the challenge a go i finished the last video with issuing the challenge to index on x to return nine and index on x to return three six nine now here's what i came up with again there's a few different ways that you could approach both of these but this is just what i've found so because x is 1 3 3 of size well that's its dimensions if we want to select 9 we need 0 which is this first outer bracket to get all of these elements and then we need two to select this bottom one here and then we need this final two to select the second dimension of this bottom one here and then for three six nine we need all of the elements in the first dimension all of the uh in the zeroth dimension all of the elements in the first dimension and then we get two which is this 369 set up here so that's how i would practice indexing start with whatever shape tensor you like create it something like this and then see how you can write different indexing to select whatever number you pick so now let's move on to the next part which is pie torch tenses and numpy so numpy is a popular scientific very popular pytorch actually requires numpy when you install pyth popular scientific python numerical computing library that's a bit of a mouthful and because of this pytorch has functionality to interact with it so quite often you might start off with let's change this into markdown you might start off with your data because it's numerical format you might start off with data in numpy numpy array want in pi torch tensor because your data might be represented by numpy because it started in numpy but say you want to do some deep learning on it and you want to leverage pie torches deep learning capabilities well you might want to change your data from numpy to a pi torch tensor and pi torch has a method to do this which is torch from numpy which will take in an nd array which is numpy's main data type and change it into a torch tensor we'll see this in a second and then if you want to go from pi torch tensor to numpy because you want to use some sort of numpy method well the method to do this is torch dot tensor and you can call dot numpy on it but this is all just talking about in words let's see it in action so numpy array to tensor let's try this out first so we'll import torch so we can run this cell on its own and then import numpy as np the common naming convention for numpy we're going to create an array in numpy and we're going to just put one to eight a range and then we're going to go tensor equals torch from numpy because we want to go from numpy array to a torch tensor so we use from numpy and then we pass in array and then we have array and tensor wonderful so there's our numpy array and our torch tensor with the same data but what you might notice here is that the d type for the tensor is torch.float64. now why is this it's because numpy's default data type d type is float 64 whereas tensor what have we discussed before what's pi torch's default data type float64 well that's not pi torch's default data type if we were to create torch a range 1.0 to 8.0 by default pi touch is going to create it in float 32 so just be aware of that if you are going from numpy to pi torch the default numpy data type is float 64 and pi torch reflects that data type when you use the from numpy method i wonder if there's a d type can we go d type equals torch dot float 32 takes no keyword okay but how could we change the data type here well we could go type torch float32 yeah that will give us a tensor d type of float 32 instead of float 64. beautiful i'll just keep that there so you know warning when converting from numpy pi torch pi torch reflects numpy's default data type of float 64 unless specified otherwise because what have we discussed when you're trying to perform certain calculations you might run into a data type issue so you might need to convert the type from float64 to float32 now let's see what happens what do you think will happen if we change the array we change the value of an array well let's find out so change the value of array the question is what will this do to tensor because we've used the from numpy method do you think if we change the array the tensor will change so let's try this array equals array plus one so we're just adding one to every value in the array now what is the array and the tensor going to look like aha so array we only changed the first value there oh sorry we changed every value because we have one to seven now it's two three four five six seven eight if we change the value from the array it doesn't change the value of the tensor so that's just something to keep in mind if you use from numpy we get a new tensor in memory here so the original the new tensor doesn't change if you change the original array so now let's go from tensor to numpy if you wanted to go back to numpy tensor to numpy array so we'll start with the tensor we could use the one we have right now but we're going to create another one but we'll create one of ones just for fun one rhymes with fun numpy tenser equals how do we go to numpy well we have torch.tensor.numpy so we just simply call numpy on here and then we have tensor and numpy tensor what data type do you think the numpy tensor is going to have because we've returned it to numpy pi torch's default data type is float32 so if we change that to numpy what's going to be the d type of the numpy tensor numpy tensor dot d type it reflects the original d type of what you set the tensor as so just keep that in mind if you're going between pi torch and numpy default data type of numpy is float64 whereas the default data type of pi torch is float 32 so that may cause some errors if you're doing different kinds of calculations now what do you think is going to happen if we went from our tensor to an array if we change the tensor change the tensor what happens to numpy tensor so we go tensor equals tensor plus one and then we go numpy tensor oh we'll get tensor as well so our tensor is now all twos because we added one to the ones but our numpy tensor remains the same remains unchanged so this means they don't share memory so that's how we go in between pie torch and numpy if you'd like to look up more i'd encourage you to go pie torch and numpy so warm up numpy beginner there's a fair few tutorials here on pie torch because numpy is so prevalent they work pretty well together so have a look at that there's a lot going on there there's a few more links i'd encourage you to check out but we've covered some of the main ones that you'll see in practice with that being said let's now jump into the next video where we're going to have a look at the concept of reproducibility and if you'd like to look that up i encourage you to search pytorch reproducibility and see what you can find otherwise i'll see you in the next video welcome back it's now time for us to cover the topic of reproducibility if i could even spell it that would be fantastic reproducibility trying to take the random out of random so we've touched upon the concept of neural networks harnessing the power of randomness and what i mean by that is we haven't actually built our own neural network yet but we will be doing that and we've created tensors full of random values and so in short how our neural network learns is start with random numbers perform tensor operations update random numbers to try and make them better representations of the data again again again again however if you're trying to do reproducible experiments sometimes you don't want so much randomness and what i mean by this is if we were creating random tenses from what we've seen so far is that every time we create a random tensor let's create one here torch dot rand and we'll create it of 3 3. every time we run this cell it gives us new numbers so seven seven five two there we go rand again right so we get a whole bunch of random numbers here every single time but what if you were trying to share this notebook with a friend so say you went up share and you clicked the share link and you you sent that to someone and you're like hey try out this machine learning experiment i did and you wanted a little less randomness because neural networks start with random numbers how might you do that well let's just write down to reduce the randomness in neural networks and pie torch comes the concept of a random seed so we're going to see this in action but essentially let's write this down essentially what the random seed does is flavor the randomness so because of how computers work they're actually not true randomness and actually there's arguments against this and it's quite a big debate in the computer science topic what not but i am not a computer scientist i'm a machine learning engineer so computers are fundamentally deterministic it means they they run the same steps over and over again so what the randomness we're doing here is referred to as pseudo randomness or generated randomness and the random seed which is what you see a lot in machine learning experiments flavors that randomness so let's see it in practice and at the end of this video i'll give you two resources that i'd recommend to learn a little bit more about the concept of pseudo-randomness and reproducibility in pie torch let's start by importing torch so you could start this notebook right from here create two random tenses we'll just call this random tensor a equals torch dot rand and we'll go three four and we'll go random tensor b equals torch dot ran same size three four and then if we have a look at let's go print random tensor a print random tensor b and then let's print to see if they're equal anywhere random tenser a equals equals equals random tensor b now what do you think this is going to do if we have a look at one equals one what does it return true so this is a comparison operator to compare two different tenses we're creating two random tenses here we're going to have a look at them we'd expect them to be full of random values do you think any of the values in each of these random tenses is going to be equal to each other well there is a chance that they are but it's highly unlikely i'll be quite surprised if they are oh again my connection might be a little bit oh there we go beautiful so we have tensor a tensor of 3 4 with random numbers and we have tensor b of 3 4 with random numbers so if we were if i was to share this notebook with my friend or my colleague or even you if you ran this cell you are going to get random numbers as well and you have every chance of replicating one of these numbers but again it's highly unlikely so again i'm getting that automatic save failed you might get that if your internet connection is dropping out maybe that's something going on with my internet connection but again as we've seen usually this resolves itself if you try a few times or just keep coding if it really doesn't resolve itself you can go file there's a download notebook or save a copy and drive download you can download the notebook save it to your local machine re-upload it so upload notebook and start again in another google colab instance but there we go it fixed itself wonderful troubleshooting on the fly so the way we make these reproducible is through the concept of a random seed so let's have a look at that let's make some random but reproducible tenses so import torch and we're going to set the random seed by going torch dot manual seed random oh we don't have random set yet i'm going to set my random seed you set the random seed to some numerical value 42 is a common one you might see zero you might see one two three four essentially you can set it to whatever you want and each of these you can think of 77 100 as different flavors of randomness so i like to use 42 because it's the answer to the universe and then we go random seed and now let's create some random tenses random tensor c with the flavor of our random seed three four and then we're going to go torch tensor d equals torch dot rand three four now let's see what happens we'll print out random tensor c and we'll print out random tensor d and then we'll print out to see if they're equal anywhere random tensor c equals random tensor d so let's find out what happens huh what gives well we got randomness we set the random seed we're telling pie torch hey flavor our randomness with 42 torch manual seed hmm let's try set the manual seed each time we call a random method we go there ah much better so now we've got some flavored randomness so a thing to keep in mind is that if you want to use the torch manual seed generally it only works for one block of code if you're using a notebook so that's just something to keep in mind if you're creating random tenses one after the other we're using assignment like this you should use torch.manual seed every time you want to call the rand method or some sort of randomness however if we're using other torch processors usually what you might see is torch manual seed is set right at the start of a cell and then a whole bunch of code is done down here but because we're calling subsequent methods here we have to reset the random seed otherwise if we don't do this we comment this line it's going to flavor the randomness of torch random tensor c with torch manual seed but then random tensor d is just going to have no flavor it's not going to use a random seed so we we reset it there wonderful so i wonder does this have a seed method let's go torch.brand does this have seed sometimes they have a seed method c no it doesn't okay that's all right the more you learn but there's the documentation for torch.rand and i said that i was going to link at the end of this video so the manual seed is a way to or the random seed but in torch it's called a manual seed is a way to flavor the randomness so these numbers as you see are still quite random but the random seed just makes them reproducible so if i was to share this with you if you were to run this block of code ideally you're going to get the same numerical output here so with that being said i'd like to refer to you to the pi torch reproducibility document because we've only quite scratched the surface of this of reproducibility we've covered one of the main ones but this is a great document on how to go through reproducibility in pytorch so this is your extra curriculum for this even if you don't understand what's going on in a lot of the code here just be aware of reproducibility because it's an important topic in machine learning and deep learning so i'll put this here extra resources for reproducibility as we go pi torch randomness we'll change this into markdown and then finally the concept of a random seed is wikipedia random seed so random seeds quite a universal concept not just for pie torch there's a random seed in numpy as well so if you'd like to see what this means yeah initialize a pseudo-random number generator so that's a big word pseudorandom number generator but if you like to learn about more random number generation computing and what a random c does is i'd refer to you to check out this documentation here whoa far out we have covered a lot but there's a couple more topics you should really be aware of to finish off the pie torch fundamentals you got this i'll see you in the next video welcome back now let's talk about the important concept of running tensors or pi torch objects so running tensors and high torch objects on gpus and making faster computations so we've discussed that gpus let me just scroll down a little bit here gpus equal faster computation on numbers thanks to cuda plus nvidia hardware plus pie torch working behind the scenes to make everything hunky-dory good that's what hunky dory means by the way if you've never heard that before so let's have a look at how we do this now we first need to talk about let's go here one getting a gpu there's a few different ways we've seen one before number one easiest is to use what we're using right now use google colab for a free gpu but there's also google codelab pro and i think there might even be let's look up google collab pro choose the best that's right for you i use google code lab pro because i use it almost every day so yeah i pay for collab pro you can use collab for free which is might be what you're using there's also colab pro plus which has a lot more advantages as well but collab pro is giving me faster gpus so access to faster gpus which means you spend less time waiting while your code is running more memory longer run time so it'll last a bit longer if you leave it running idle and then color pro again is a step up from that i personally haven't had a need yet to use google collab pro plus you can complete this whole course on the free tier as well but as you start to code more as you start to run bigger models as you start to want to compute more you might want to look into something like google collab pro or let's go here options to upgrade as well and then another way is use your own gpu now this takes a little bit of setup and requires the investment of purchasing a gpu there's lots of options so one of my favorite posts for getting a gpu is yeah the best gpus for deep learning in 2020 or something like this what do we got deep learning tim detmers this is yeah which gpus to get for deep learning now i believe at the time of this video i think it's been updated since this date so don't take my word for it but this is a fantastic blog post for figuring out what gpus see this post for what option to get and then number three is use cloud computing so such as gcp which is google cloud platform aws which is amazon web services or azure these services which is azure is by microsoft allow you to rent computers on the cloud and access them so the first option using google colab which is what we're using is by far the easiest and free so there's big advantages there however the downside is that you have to use our website here google co-lab you can't run it locally you don't get the benefit of using cloud computing but my personal workflow is i run basically all of my small scale experiments and things like learning new stuff in google colab and then if i want to upgrade things run bigger experiments i have my own dedicated deep learning pc which i have built with a big powerful gpu and then also i use cloud computing if necessary so that's my workflow start with google colab and then these two if i need to do some larger experiments but because it's a beginner course we can just stick with google collab for the time being but i thought i'd make you aware of these other two options and if you'd like to set up a gpu so four two and three pi torch plus gpu drivers which is cuda takes a little bit of setting up to do this refer to pi torch setup documentation so if we go to pytorch.org they have some great setup guides here get started and we have start locally this is if you want to run on your local machine such as a linux setup this is what i have linux cuda 11.3 it's going to give you a conda install command to use conda and then if you want to use cloud partners which is alibaba cloud amazon web services google cloud platform this is where you'll want to go so i'll just link this in here but for this course we're going to be focusing on using google colab so now let's see how we might get a gpu in google collab and we've already covered this but i'm going to recover it just so you know we're going to change the runtime type you can go in any notebook and do this runtime type hardware accelerator we can select gpu click save now this is going to restart our runtime and connect us to a runtime aka a google compute instance with a gpu and so now if we run nvidia smi i have a tesla p100 gpu so let's look at this tesla p100 gpu do we have an image yeah so this is the gpu that i've got running not the tesla car the gpu so this is quite a powerful gpu that is because i have upgraded to colab pro now if you're not using collab pro you might get something like a tesla k80 which is a slightly less powerful gpu than a tesla p100 but still a gpu nonetheless and will still work faster than just running pytorch code on the pure cpu which is the default in google colab and the default in pytorch and so now we can also check to see if we have gpu access with pytorch so let's go here this is number two now check for gpu access with pi torch so this is a little command that's going to allow us or tell us if pi torch just having the gpu here this is by the way another thing that colab has a good setup with is that all the connections between pytorch and the nvidia gpu are set up for us whereas when you set it up on your own gpu or using cloud computing there are a few steps you have to go through which we're not going to cover in this course i'd highly recommend you go through the getting started locally set up if you want to do that to connect pi torch to your own gpu so let's check for the gpu access with pi torch this is another advantage of using google colab almost zero setup to get started so import torch and then we're going to go torch.cuda dot is available and remember cuda is nvidia's programming interface that allows us to use gpus for numerical computing there we go beautiful so big advantage of google colab is we get access to a free gpu in my case i'm paying for the faster gpu but in your case you're more than welcome to use the free version all that means it'll be slightly slower than a faster gpu here and we now have access to gpus with pi torch so there is one more thing known as device agnostic code so set up device agnostic code now this is an important concept in pytorch because wherever you run pytorch you might not always have access to a gpu but if there was access to a gpu you'd like it to use it if it's available so one of the ways that this is done in pytorch is to set the device variable now really you could set this to any variable you want but you're going to see it used as device quite often so cuda if torch dot cuda is available else cpu so all this is going to say and we'll see where we use the device variable later on is set the device to use cuda if it's available so it is so true if it's not available if we don't have access to a gpu that pi torch can use just default to the cpu so with that being said there's one more thing you can also count the number of gpus so this won't really apply to us for now because we're just going to stick with using one gpu but as you upgrade your pytorch experiments and machine learning experiments you might have access to more than one gpu so you can also count the devices here we have access to one gpu which is this here so the reason why you might want to count the number of devices is because if you're running huge models on large data sets you might want to run one model on a certain gpu another model on another gpu and so on and so on but final thing before we finish this video is if we go pi torch device agnostic code cuda semantics there's a little section in here called best practices this is basically what we just covered there is setting the device argument now this is using the arg pass but so yeah there we go argos.device torch.device cuda args.device dot device cpu so this is one way to set it from the python arguments when you're running scripts but we're using the version of running it through a notebook so check this out i'll just link this here device agnostic code it's okay if you're not sure of what's going on here we're going to cover it a little bit more later on throughout the course but right here for pytorch since it's capable of running compute on the gpu or cpu it's best practice to set up device agnostic code eg run on gpu if available else default to cpu so check out the best practices for using cuda which is mainly setting up device agnostic code and let's in the next video see what i mean about setting our pytorch tensors and objects to the target device welcome back in the last video we checked out a few different options for getting a gpu and then getting pytorch to run on the gpu and for now we're using google colab which is the easiest way to get set up because it gives us free access to a gpu faster ones if you set up with collab pro and it comes with pytorch automatically set up to use the gpu if it's available so now let's see how we can actually use the gpu so to do so we'll look at putting tensors and models on the gpu so the reason we want our tensors slash models on the gpu is because using a gpu results in faster computations and if we're getting our machine learning models to find patterns and numbers gpus are great at doing numerical calculations and the numerical calculations we're going to be doing are tensor operations like we saw above so the tensor operations well we've covered a lot somewhere here tensor operations there we go manipulating tensors tensor operations so if we can run these computations faster we can discover patterns in our data faster we can do more experiments and we can work towards finding the best possible model for whatever problem that we're working on so let's see we'll create a tensor as usual create a tensor now the default is on the cpu so tensor equals torch.tensor and we'll just make it a nice simple one one two three and let's write here tensor not on gpu we'll print out tensor and this is where we can use we saw this parameter before device can we pass it in here device equals cpu let's see what this comes out with there we go so if we print it out tensor123 is on the cpu but even if we got rid of that device parameter by default it's going to be on the cpu wonderful so now pytorch makes it quite easy to move things to and i'm saying two for a reason to the gpu or two even better the target device so if the gpu is available we use cuda if it's not it uses cpu this is why we set up the device variable so let's see move tensor to gpu if available tensor on gpu equals tensor.2 device now let's have a look at this tensor on gpu so this is going to shift the tensor that we created up here to the target device wonderful look at that so now our tensor123 is on device cuda zero now this is the index of the gpu that we're using because we only have one it's going to be at index zero so later on when you start to do bigger experiments and work with multiple gpus you might have different tensors that are stored on different gpus but for now we're just sticking with one gpu keeping it nice and simple and so you might have a case where you want to move oh actually the reason why we set up device agnostic code is again this code would work if we run this regardless if we had so it won't error out but regardless if we had a gpu or not this code will work so whatever device we have access to whether it's only a cpu or whether it's a gpu this tensor will move to whatever target device but since we have a gpu available it goes there you'll see this a lot this two method moves tensors and it can be also used for models we're going to see that later on so just keep two device in mind and then you might want to for some computations such as using numpy numpy only works with the cpu so you might want to move tensors back to the cpu moving tensors back to the cpu so can you guess how we might do that it's okay if you don't know we haven't covered a lot of things but i'm going to challenge you anyway because that's the fun part of thinking about something so let's see how we can do it let's write down if tensor is on gpu can't transform it to numpy so let's see what happens if we take our tensor on the gpu and try to go numpy what happens well we get an error so this is another huge error remember the top three errors in deep learning or pie torch there's lots of them but number one shape errors number two data type issues and with pi torch number three is device issues so can't convert cuda zero device type tensor to numpy so numpy doesn't work with the gpu use tensor.cpu to copy the tensor to host memory first so if we call tensor.cpu it's going to bring our target tensor back to the cpu and then we should be able to use it with numpy so to fix the gpu tensor with numpy issue we can first set it to the cpu so tensor back on cpu equals tensor on gpu dot cpu we're just taking what this said here that's the beautiful thing about pytorch is very helpful error messages and then we're going to go numpy and then if we go tensor back on cpu is this going to work let's have a look oh of course it's not because i typed it wrong and i've typed it again twice third time third time's a charm there we go okay so that works because we've put it back to the cpu first before calling numpy and then if we refer back to our tensor on the gpu because we've re-associated this again we've got typos galore classic because we've reassigned tensorback on cpu our tensor on gpu remains unchanged so that's the four main things about working with pi torch on the gpu there are a few more tidbits such as multiple gpus but now you've got the fundamentals we're going to stick with using one gpu and if you'd like to later on once you've learned a bit more research into multiple gpus well as you might have guessed pytorch has functionality for that too so have a go at getting access to a gpu using collab check to see if it's available set up device agnostic code create a few dummy tenses and just set them to different devices see what happens if you change the device parameter run a few errors by trying to do some numpy calculations with tensors on the gpu and then bring those tensors on the gpu back to numpy and see what happens there so i think we've covered i think we've reached the end of the fundamentals we've covered a fair bit introduction to tensors the min max a whole bunch of stuff inside the introduction to tensors finding the positional min max reshaping indexing working with tensors and numpy reproducibility using a gpu and moving stuff back to the gpu far out now you're probably wondering daniel we've covered a whole bunch what should i do to practice all this well i'm glad you asked let's cover that in the next video welcome back and you should be very proud of yourself right now we've been through a lot but we've covered a whole bunch of pie torch fundamentals these are going to be the building blocks that we use throughout the rest of the course but before moving on to the next section i'd encourage you to try out what you've learned through the exercises and extra curriculum now i've set up a few exercises here based off everything that we've covered if you're going to learn pie torch dot io go to the section that we're currently on this is going to be the case for every section by the way so just keep this in mind is we're working on torch fundamentals now if you go to the pie torch fundamentals notebook this is going to refresh but that if you scroll down to the table of contents at the bottom of each one is going to be some exercises and extra curriculum so these exercises here such as documentation reading because a lot you've seen me refer to the pie torch documentation for almost everything we've covered a lot but it's important to become familiar with that so exercise number one is read some of the documentation exercise number two is create a random tensor with shape seven seven three perform a matrix multiplication on the tensor from two with another random tensor so these exercises are all based off what we've covered here so i'd encourage you to reference what we've covered in whichever notebook you choose could be this learn pytorch.io could be going back through the one we've just coded together in the video so i'm going to link this here exercises see exercises for this notebook here so then how should you approach these exercises so one way would be to just read them here and then in colab we'll go file new notebook wait for the notebook to load then you could call this zero zero pie torch exercises or something like that and then you could start off by importing torch and then away you go for me i'd probably set this up on one side of the screen this one up on the other side of the screen and then i just have the exercises here so number one i'm not going to really write much code for that but you could have documentation reading here and then so this encourages you to read through torch.tensor and go through there for 10 minutes or so and then for the other ones we've got create a random tensor with shape 77 so we just comment that out so torch rand 77 and there we go some are as easy as that some are a little bit more complex as we go throughout the course these exercises are going to get a little bit more in depth as we've learned more but if you'd like an exercise template you can come back to the github this is the the home for all of the course materials you can go into extras and then exercises i've created templates for each of the exercises so pytorch fundamentals exercises if you open this up this is a template for all of the exercises so you see there create a random tensor with shape 77 these are all just headings and if you'd like to open this in collab and work on it how can you do that well you can copy this link here come to google colab we'll go file open notebook github you can type in the link there click search what's this going to do boom apply dodge fundamentals exercises so now you can go through all of the exercises this will be the same for every module on the course and test your knowledge now it is open book you can use the notebook here the ones that we've coded together but i would encourage you to try to do these things on your own first if you get stuck you can always reference back and then if you'd like to see an example solutions you can go back to the extras there's a solutions folder as well and that's where the solutions live so the fundamental exercise solutions but again i would encourage you to try these out at least give them a go before having a look at the solutions so just keep that in mind at the end of every module there's exercises and extra curriculum the exercises will be code based the extra curriculum is usually like reading based so spend one hour going through the pie torch basics tutorial i'd recommend the quick start and tensor sections and then finally to learn more on how a tensor can represent data watch the video what's a tensor which we referred to throughout this but massive effort on finishing the pi torch fundamentals section i'll see you in the next section friends welcome back to the pie torch workflow module now let's have a look at what we're going to get into so this is a pie torch workflow and i say a how cool is that animation because it's one of many when you get into deep learning machine learning you'll find that there's a fair few ways to do things but here's the rough outline of what we're going to do we're going to get our data ready turn it into tensors because remember a tensor can represent almost any kind of data we're going to pick or build or pick a pre-trained model we'll pick a loss function optimizer don't worry if you don't know what they are we're going to cover this we're going to build a training loop fit the model to make a prediction so fit the model to the data that we have we'll have learn how to evaluate our models we'll see how we can improve their experimentation and we'll save and reload our trained model so if you wanted to export your model from a notebook and use it somewhere else this is what you want to be doing and so where can you get help probably the most important thing is to follow along with the code we'll be coding all of this together remember motto number one if in doubt run the code try it for yourself that's how that's how i learn best is i write code i try it i get it wrong i try again and keep going until i get it right read the docs string because that's going to show you some document documentation about the functions that we're using so on a mac you can use shift command and space in google colab or if you're on a windows pc it might be control here if you're still stuck try searching for it you'll probably come across resources such as stack overflow or the pytorch documentation we've already seen this a whole bunch and we're probably going to see it a lot more throughout this entire course actually because that's going to be the ground truth of everything pie torch try again and finally if you're still stuck ask a question so the best place to ask a question will be at the pi torch deep learning slash discussions tab and then if we go to github that's just under here so mr debug pod torch deep learning this is all the course materials if we see here this is your ground truth for the entire course and then if you have a question go to the discussions tab new discussion you can ask a question there and don't forget to please put the video and the code that you're trying to run that way we can reference what's going on and help you out there and also don't forget there is the book version of the course so learn pytorch.io by the time you watch this video it'll probably have all the chapters here but here's what we're working through this is what the videos are based on all of this we're going to go through all of this how fun is that but this is just reference material so you can read this at your own time we're going to focus on coding together and speaking of coding let's code i'll see you over at google co-lab alrighty let's get hands-on with some code i'm going to come over to colab.research.google.com you may already have that bookmarked and i'm going to start a new notebook so we're doing it going to do everything from scratch here we'll let this load up i'm just going to zoom in a little bit beautiful and now i'm going to title this 01 pie torch workflow and i'm going to put the video ending on here so that you know that this notebook's from the video why is that because in the course resources we have the original notebook here which is what this video notebook is going to be based off you can refer to this notebook as reference for what we're going to go through it's got a lot of pictures and beautiful text annotations we're going to be focused on the code in the videos and then of course you've got the book version of the notebook as well which is just a different formatted version of this exact same notebook so i'm going to link both of these up here so let's write in here pie torch workflow and let's explore an example pie torch end-to-end workflow and then i'm going to put the resources so ground truth notebook we go here and i'm also going to put the book version book version of notebook and finally ask a question which will be where at the discussions page then we'll go there beautiful let's turn this into markdown so let's get started let's just jump right in and start what we're covering so this is the trend i want to start getting towards is rather than spending a whole bunch of time going through keynotes and slides i'd rather we just code together and then we explain different things as they need to be explained because that's what you're going to be doing if you end up writing a lot of pytorch is you're going to be writing code and then looking things up as you go so i'll get out of these extra tabs i don't think we need them just these two will be the most important so what we're covering let's create a little dictionary so we can check this if we wanted to later on so referring to our pi torch workflows at least the example one that we're going to go through which is just here so we're going to go through all six of these steps maybe a little bit of each one but just to see it going from this to this that's what we're really focused on and then we're going to go through the rest of the course like really dig deep into all of these so what we're covering number one is data preparing and loading number two is we're going to see how we can build a machine learning model in pi torch or a deep learning model and then we're going to see how we're going to fit our model to the data so this is called training so fit is another word as i said in machine learning there's a lot of different names for similar things kind of confusing but you'll pick it up with time so we're going to once we've trained a model we're going to see how we can make predictions and evaluate those predictions evaluating a model if you make predictions it's often referred to as inference i typically say making predictions but inference is another very common term and then we're going to look at how we can save and load a model and then we're going to put it all together so a little bit different from the visual version we have of the pi torch workflow so if we go back to here i might zoom in a little there we go so we're going to focus on this one later on improve through experimentation we're just going to focus on the getting data ready building a model fitting the model evaluating model save and reload so we'll see this one more like in depth later on but i'll hint at different things that you can do for this while we're working through this workflow and so let's put that in here and then if we wanted to refer to this later we can just go what we're covering oh this is going to connect of course beautiful so we can refer to this later on if we wanted to and we're going to start by import torch we're going to get pi torch ready to go import nnn so i'll write a node here and then we haven't seen this one before but we're going to see a fair few things that we haven't seen but that's okay we'll explain that as we go so nn contains all of pi torches building blocks for neural networks and how would we learn more about torch n well if we just go torch.nn here's how i'd learn about it pi torch documentation beautiful look at all these these are the basic building blocks for graphs now when you see the word graph it's referring to a computational graph which is in the case of neural networks let's look up a photo of a neural network images this is a graph so if you start from here you're going to go towards the right there's going to be many different pictures so yeah this is a good one input layer you have a hidden layer hidden layer 2 output layer so torch n comprises of a whole bunch of different layers so you can see layers layers layers and each one of these you can see input layer hidden layer one hidden layer two so it's our job as data scientists and machine learning engineers to combine these torch.nn building blocks to build things such as these now it might not be exactly like this but that's the beauty of pytorch is that you can combine these in almost any different way to build any kind of neural network you can imagine and so let's keep going that's torch nnn we're going to get hands on with it rather than just talk about it and we're going to need matplotlib because what's our other model our data explorers motto is visualize visualize visualize and let's check out pytorch version high torch version torch.version so this is just to show you you'll need at least this version so 1.10 plus 111 that means that we've got cu stands for cuda that means we've got access to cuda we don't have a gpu on this runtime yet because we haven't gone to gpu we might do that later so if you have a version that's lower than this say 1.8.0 you'll want pytorch 1.10 at least if you have a version higher than this your code should still work but that's about enough for this video we've got our workflow ready to set up our notebook our video notebook we've got the resources we've got what we're covering we've got our dependencies let's in the next one get started on one data preparing and loading i'll see you in the next video let's now get on to the first step of our pie torch workflow and that is data preparing and loading now i want to stress data can be almost anything in machine learning i mean you could have an excel spreadsheet which is rows and columns nice and formatted data you could have images of any kind you could have videos i mean youtube has lots of data you could have audio like songs or podcasts you could have even dna these days patterns and dna are starting to get discovered by machine learning and then of course you could have text like what we're writing here and so what we're going to be focusing on throughout this entire course is the fact that machine learning is a game of two parts so one get data into a numerical representation to build a model to learn patterns in that numerical representation of course there's more around it yes yes yes i understand you can get as complex as you like but these are the main two concepts and machine learning when i say machine learning same goes for deep learning you need some kind of oh number oracle number call i like that word number oracle representation then you want to build a model to learn patterns in that numerical representation and if you want i've got a nice pretty picture that describes that machine learning a game of two parts let's refer to our data remember data can be almost anything these are our inputs so the first step that we want to do is create some form of numerical encoding in the form of tensors to represent these inputs how this looks will be dependent on the data depending on the numerical encoding you choose to use then we're going to build some sort of neural network to learn a representation which is also referred to as patterns features or weights within that numerical encoding it's going to output that representation and then we want to do something with that representation such as in the case of this we're doing image recognition image classification is it a photo of ramen or spaghetti is this tweet spam or not spam is this audio file saying what it says here remember i'm not going to say this because my audio assistant that's also named this word here is close by and i don't want it to go off so this is our game of two parts one here is convert our data into a numerical representation and two here is build a model or use a pre-trained model to find patterns in that numerical representation so we've got a little stationary picture here turn data into numbers part two build a model to learn patterns in numbers so with that being said now let's create some data to showcase this so to showcase this let's create some known data using the linear regression formula now if you're not sure what linear regression is or the formula is let's have a look linear regression formula this is how i'd find it okay we have some fancy greek letters here but essentially we have y equals a function of x and b plus epsilon okay well there we go a linear regression line has the equation in the form of y equals a plus bx i like this one better this is nice and simple we're going to start from as simple as possible and work up from there so y equals a plus bx where x is the explanatory variable and y is the dependent variable the slope of the line is b now the slope is also known as the gradient and a is the intercept okay the value of when y when x equals zero now this is just text on a page this is formula on a page you know how i like to learn things let's code it out so let's write it here we'll use a linear regression formula to make a straight line with known parameters i'm going to write this down because parameter is a common word that you're going to hear in machine learning as well so a parameter is something that a model learns so for our data set if machine learning is a game of two parts we're going to start with this number one is going to be done for us because we're going to start with a known representation a known data set and then we want our model to learn that representation this is all just talk daniel let's get into coding yes you're right you're right let's do it so create known parameters so i'm going to use a little bit different names to what that google definition did so weight is going to be 0.7 and bias is going to be 0.3 now weight and bias are another common two terms that you're going to hear in neural networks so just keep that in mind but for us this is going to be the equivalent of our weight will be b and our bias will be a but forget about this for the time being let's just focus on the code so we know these numbers but we want to build a model that is able to estimate these numbers how by looking at different examples so let's create some data here we're going to create a range of numbers start equals 0 and equals 1. we're going to create some numbers between 0 and 1 and they're going to have a gap so the step the gap is going to be 0.02 now we're going to create an x variable why is x a capital here well it's because typically x in machine learning you'll find is a matrix or a tensor and if we remember back to the fundamentals a capital represents a matrix or a tensor and a lowercase represents a vector but in our case it's going to be a little confusing because x is a vector but later on x will start to be a tensor and a matrix so for now we'll just keep the capital not capital notation we're going to create the formula here which is remember how we said our weight is in this case the b and our bias is the a so we've got the same formula here y equals y times x plus bias now let's have a look at these different numbers so we'll view the first 10 of x and we'll view the first 10 of y we'll have a look at the length of x and we'll have a look at the length of y wonderful so we've got some values here we've got 50 numbers of each that's a little confusing let's just view the first 10 of x and y first and then we can have a look at the length here plane x length y so what we're going to be doing is building a model to learn some values to look at the x values here and learn what the associated y value is and the relationship between those of course we know what the relationship is between x and y because we've coded this formula here but you won't always know that in the wild that is the whole premise of machine learning this is our ideal output and this is our input the whole premise of machine learning is to learn a representation of the input and how it maps to the output so here are our input numbers and these are our output numbers and we know that the parameters of the weight and bias are 0.7 and 0.3 we could have set these to whatever we want by the way i just like the numbers 7 and 3. you could set these to 0.9 whatever whatever the premise would be the same so oh and what i've just done here i kind of just coded this without talking but i just did torch a range and it starts at zero and it ends at one and the step is 0.02 so there we go zero zero zero up by zero oh two zero 004 and i've unsqueezed it so what does unsqueezed do removes the extra dimensions oh sorry adds extra dimension daniel you're getting confused here so if we remove that we get no extra square bracket but if we add unsqueeze you'll see that we need this later on for when we're doing models wonderful so let's just leave it at that that's enough for this video we've got some data to work with don't worry if this is a little bit confusing for now we're going to keep coding on and see what we can do to build a model to infer patterns in this data but right now i want you to have a think this is tensor data but it's just numbers on a page what might be a better way to hint this is a hint by the way visualize it what's our data explorer's motto let's have a look at that in the next video welcome back in the last video we created some numbers on a page using the linear regression formula with some known parameters now there's a lot going on here but that's all right we're going to keep building upon what we've done and learn by doing so in this video we're going to cover one of the most important concepts in machine learning in general so splitting data into training and test sets one of the most important concepts in machine learning in general now i know i've said this already a few times one of the most important concepts but truly this is possibly in terms of data this is probably the number one thing that you need to be aware of and if you've come from a little bit of a machine learning background you probably well and truly know all about this but we're going to recover it anyway so let's jump in to some pretty pictures oh look at that one speaking of pretty pictures but that's not what we're focused on now we're looking at the three data sets and i've written down here possibly the most important concept in machine learning because it definitely is from a data perspective so the course materials imagine you're at university so this is going to be the training set and then you have the practice exam which is the validation set then you have the final exam which is the test set and the goal of all of this is for generalization so let's step back so say you're trying to learn something at university or through this course you might have all of the materials which is your training set so this is where our model learns patterns from and then to practice what you've done you might have a practice exam so the mid-semester exam or something like that now that's to see if you're learning the course materials well so in the case of our model we might tune our model on this plastic exam so we might find that on the validation set our model doesn't do too well and we adjust it a bit and then we retrain it and then it does better before finally at the end of semester the most important exam is your final exam and this is to see if you've gone through the entire course materials and you've learned some things now you can adapt to unseen material and that's a big point here we're going to see this in practice is that when the model learns something on the course materials it never sees the validation set of the test set so say we started with 100 data points you might use 70 of those data points for the training material you might use 15 percent of those data points so 15 for the practice and you might use 15 for the final exam so this final exam is just like if you're at university learning something is to see if hey have you learned any skills from this material at all are you ready to go into the wild into the quote unquote real world and so this final exam is to test your model's generalization because it's never seen this data is let's define generalization is the ability for a machine learning model or a deep learning model to perform well on data it hasn't seen before because that's our whole goal right we want to build a machine learning model on some training data that we can deploy in our application or production setting and then more data comes in that it hasn't seen before and it can make decisions based on that new data because of the patterns it's learned in the training set so just keep this in mind three data sets training validation test and if we jump in to the blur and pi torch book we've got split data so we're going to create three sets or in our case we're only going to create two a training and a test why is that because you don't always need a validation set there is often a use case for a validation set but the main two that are always used is the training set and the testing set and how much should you split well usually for the training set you'll have 60 to 80 of your data if you do create a validation set you'll have somewhere between 10 and 20 and if you do create a testing set it's a similar split to the validation set you'll have between 10 and 20 so training always testing always validation often but not always so with that being said i'll let you refer to those materials if you want but now let's create a training and test set with our data so we saw before that we have 50 points we have x and y we have one to one ratio so one value of x relates to one value of y and we know that the split now for the training set is 60 to 80 and the test set is 10 to 20 so let's go with the upper bounds of each of these 80 and 20 which is a very common split actually 80 20. so let's go create a train test split and we're going to go train split we'll create a number here so we can see how much so we want an integer of 0.8 which is 80 of the length of x what does that give us train split should be about 40 samples wonderful so we're going to create 40 samples of x and 40 samples of y our model will train on those 40 samples to predict what the other 10 samples so let's see this in practice so x train y train equals x and we're going to use indexing to get all of the samples up until the train split that's what this colon does here so hey x up into the train split y up until the train split and then for the testing oh thanks for that autocorrect collab i didn't actually need that one x test y test equals x and then we're going to get everything from the train split onwards so the index onwards that's what this notation means here and y from the train split onwards as well now there aren't many different ways to create a train and test split ours is quite simple here but that's because we're working with quite a simple data set one of the most popular methods that i like is scikit-learn's train test split we're going to see this one later on it adds a little bit of randomness into splitting your data but that's for another video just to make you aware of it so let's go length x train we should have 40 training samples to how many testing samples length x test and length y test wonderful 40 40 10 10 because we have training features training labels testing features testing labels so essentially what we've created here is now a training set we've split our data training set could also be referred to as training split yet another example of where machine learning has different names for different things so set split same thing training split test split this is what we've created remember the validation set is used often but not always because our data set is quite simple we're just sticking with the necessities training and test but keep this in mind one of your biggest biggest biggest hurdles in machine learning will be creating proper training and test sets so it's a very important concept with that being said i did issue the challenge in the last video to visualize these numbers on a page we haven't done that in this video so let's move towards that next i'd like you to think of how could you make these more visual like these are just numbers on a page right now maybe matplotlib can help let's find out hey hey hey hey welcome back in the last video we split our data into training and test sets and now later on we're going to be building a model to learn patterns in the training data to relate to the testing data but as i said right now our data is just numbers on a page it's kind of hard to understand you might be able to understand this but i prefer to get visual so let's write this down how might we better visualize our data and i'll put a capital here so we're grammatically correct and this is where the data explorers motto comes in visualize visualize visualize haha right so if ever you don't understand a concept one of the best ways to start understanding it more for me is to visualize it so let's write a function to do just that we're going to call this plot predictions we'll see why we call it this later on that's the benefit of making these videos is that i've got a plan for the future although it might seem like i'm winging it there is a little bit of behind the scenes happening here so we'll have the train data which is our x train and then we'll have the train labels which is our y train and we'll also have the test data yeah that's a good idea x test and we'll also have the test labels equals oh y test excuse me i was looking at too many x's there and then the predictions and we'll set this to none because we don't have any predictions yet but as you might have guessed we might have some later on so we'll put a little doc string here so that we're being nice and pythonic so plots training data test data and compares predictions nice and simple nothing too outlandish and then we're going to create a figure this is where matplotlib comes in plot figure and we'll go fig size equals 10 7 which is my favorite hand in poker and we'll plot the training data in blue also happens to be a good dimension for a matplotlib plot plot.scatter train data and creating a scatter plot here we'll see what it does in a second color we're going to give this a color of b for blue that's what c stands for in matplotlib scatter we'll go size equals 4 and label equals training data now where could you find information about this scatter function here we've got command shift space is that going to give us a little bit of a dock string or sometimes if command not space is not working you can also hover over this bracket i think you can even hover over this there we go but this is a little hard for me to read like it's it's there but it's got a lot going on x y s c c map i i just like to go matplotlib scatter there we go we've got a whole bunch of information there a little bit easier to read for me here and then you can see some examples beautiful so now let's jump back into here so in our function plot predictions we've taken some training data test data we've got the training data plotting in blue what color should we use for the testing data how about green i like that idea plot on scatter test data green's my favorite color what's your favorite color c equals g you might be able to just plot it in your favorite color here just remember though it'll be a little bit different from the videos and then we're going to call this testing data so just the exact same line as above but with a different set of data now let's check if there are predictions so are there predictions so if predictions is not none let's plot the predictions plot the predictions if they exist so plot scatter test data and why are we plotting the test data remember what is our scatter function let's go back up to here it takes in x and y so our predictions are going to be compared to the testing data labels so that's the whole game that we're playing here we're going to train our model on the training data and then to evaluate it we're going to get our model to predict the y values as with the input of x test and then to evaluate our model we compare how good our model's predictions are in other words predictions versus the actual values of the test data set but we're going to see this in practice rather than just talk about it so let's do our predictions in red and label equals predictions wonderful so let's also show the legend because i mean we're legends so we could just put in a mirror here no i'm kidding the legend is going to show our labels on the matplotlib plot so prop equals size and prop stands for properties well it may or may not i just like to think it does that's how i remember it so we have a beautiful function here to plot our data should we try it out remember we've got hard coded inputs here so we don't actually need to input anything to our function we've got our train and test data ready to go if in doubt run the code let's check it out did we make a mistake in our plot predictions function you might have caught it hey there we go beautiful so because we don't have any predictions we get no red dots but this is what we're trying to do we've got a simple straight line you can't get a much more simple data set than that so we've got our training data in blue and we've got our testing data in green so the whole idea of what we're going to be doing with our machine learning model is we don't actually really need to build a machine learning model for this we could do other things but machine learning is fun so we're going to take in the blue dots there's quite a pattern here right this is the relationship we have an x value here and we have a y value so we're going to build a model to try and learn the pattern of these blue dots so that if we fed our model our model the x values of the green dots could it predict the appropriate y values for that because remember these are the test data set so pass our model x test to predict y test so blue dots as input green dots as the ideal output this is the ideal output a perfect model would have red dots over the top of the green dot so that's what we will try to work towards now we know the relationship between x and y how do we know that well we set that up above here this is our weight and bias we created that line y equals weight times x plus bias which is the simple version of the linear regression formula so mx plus c you might have heard that in high school algebra so gradient plus intercept that's what we've got with that being said let's move on to the next video and build a model oh this is exciting i'll see you there welcome back in the last video we saw how to get visual with our data we followed the data explorers motto of visualize visualize visualize and we've got an idea of the training data that we're working with and the testing data that we're trying to build a model to learn the patterns in the training data essentially this upwards trend here to be able to predict the testing data so i just want to give you another heads up i took a little break after recording the last video and so now my collab notebook has disconnected so i'm going to click reconnect and my variables here may not work so this is what might happen on your end if you take a break from using google colab and come back if i try to run this function they might have been saved looks like they have but if not you can go restart and run all this is typically one of the most helpful troubleshooting steps of using google colab if a cell say down here isn't working you can always rerun the cells above and that may help with a lower cell here such as if this function wasn't instantiated because this cell wasn't run and we couldn't run this cell here which calls this function here we just have to rerun this cell above so that we can run this one but now let's get into building our first pi torch model we're going to jump straight into the code so our first pi torch model now this is very exciting let's do it so we'll turn this into markdown now we're going to create a linear regression model so let's look at linear aggression formula again we're going to create a model that's essentially going to run this computation so we need to create a model that has a parameter for a a parameter for b in our case it's going to be weight and bias and a way to do this forward computation what i mean by that we're going to see with code so let's do it we'll do it with pure pi torch so create a linear regression model class now if you're not experienced with using python classes i'm going to be using them throughout the course and i'm going to call this one linear regression model if you haven't dealt with python classes before that's okay i'm going to be explaining what we're doing as we're doing it but if you'd like a deeper dive i'd recommend you to real python python classes oop in python3 that's a good rhyming so i'm just going to link this here because we're going to be building classes through out the course i'd recommend getting familiar with oop which is object oriented programming a little bit of a mouthful hence the oop in python to do so you can use the following resource from real python but we're not going to go through that now i'd rather just code it out and talk it out while we do it so we've got a class here now the first thing you might notice is that the class inherits from nn.module now you might be wondering well what's nn.module well let's write down here almost everything in pi torch inherits from nn dot module so you can imagine nn.module as the lego building bricks of pytorch model and so nn.module has a lot of helpful inbuilt things that's going to help us build our pi touch models and of course how could you learn more about it well you could go and then dot module pi torch module here we go base class for all neural network modules wonderful your models should also subclass this class so that's what we're building we're building our own pi touch model and so the documentation here says that your model should also subclass this class and another thing with pi torch this is what makes it it might seem very confusing when you first begin but modules can contain other modules so what i mean by it being a lego brick is that you can stack these modules on top of each other and make progressively more complex neural networks as you go but we'll leave that for later on for now we're going to start with something nice and simple and let's clean up our web browser so we're going to create a constructor here which is with the init function it's going to take self as a parameter if you're not sure of what's going on here just follow along with the code for now and i'd encourage you to read this documentation here after the video so then we have super dot init i know when i first started learning this i was like why do i have to write a nit twice and then what's super and all that jazz but just for now just take this as being some required python syntax and then we have self dot weights so that means we're going to create a weights parameter we'll see why we do this in a second and to create that parameter we're going to use nn.parameter and just a quick reminder that we imported nn from torch before and if you remember nn is the building block layer for neural networks and within nn so nnn stands for neural network is module so we've got nn.parameter now we're going to start with random parameters so torch dot rand n one we're going to talk through each of these in a second so i'm also going to put requires requires grad equals true we haven't touched any of these but that's okay d type equals torch dot float so let's see what nn parameter tells us what do we have here a kind of tensor that is to be considered a module parameter so we've just created a module using nnmodule parameters are torch tensor subclasses so this is a tensor in itself that have a very special property when used with modules when they're assigned as a module attribute they are automatically added to the list of its parameters and will appear eg in module dot parameters iterator ah we're going to see that later on assigning a tensor doesn't have such effect so we're creating a parameter here now requires grad what does that mean well let's just rather than just try to read the doctrine collab let's look it up nn.param what does it say requires grad optional if the parameter requires gradient hmm what does requires gradient mean well let's come back to that in a second and then for now i just want you to think about it d type equals torch dot float now the data type here torch.float is as we've discussed before is the default for pi torch torch dot float this could also be torched off float32 so we're just going to leave it as torch float32 because pi torch likes to work with flight 32. now do we have this by default we do so we don't necessarily have to set requires grad equals true so just keep that in mind so now we've created a parameter for the weights we also have to create a parameter for the bias let's finish creating this and then we'll write the code then we'll talk about it so rand n now requires grad equals true and d type equals torch dot float there we go and now we're going to write a forward method so forward method to define the computation in the model so let's go def forward which self takes in a parameter x which is data which x is expected to be of type torch tensor and it returns a torch dot tensor and then we go here and so we say x we don't necessarily need this comment i'm just going to write it anyway x is the input data so in our case it might be the training data and then from here we want it to return self dot weights times x plus self dot bias now where have we seen this before well this is the linear regression formula now let's take a step back into how we created our data and then we'll go back through and talk a little bit more about what's going on here so if we go back up to our data where do we create that we created it here so you see how we've created known parameters weight and bias and then we created our y variable our target using the linear regression formula weight times x plus bias and x were a range of numbers so what we've done with our linear regression model that we've created from scratch if we go down here we've created a parameter weights this could just be weight if we wanted to we've created a parameter here so when we created our data we knew what the parameters weight and bias were the whole goal of our model is to start with random numbers so these are going to be random parameters and to look at the data which in our case will be the training samples and update those random numbers to represent the pattern here so ideally our model if it's learning correctly will take our weight which is going to be a random value and our bias which is going to be a random value and it will run it through this forward calculation which is the same formula that we use to create our data and it will adjust the weight and bias to represent as close as possible if not perfect the known parameters so that's the premise of machine learning and how does it do this through an algorithm called gradient descent so i'm just going to write this down because we've talked a lot about this but i'd like to just tie it together here so what our model does so start with random values weight and bias look at training data and adjust the random values to better represent the or or get closer to the ideal values so the weight and bias values we use to create the data so that's what it's going to do it's going to start with random values and then continually look at our training data to see if it can adjust those random values to be what would represent this straight line here now how does it do so how does it do so through two main algorithms so one is gradient descent and two is back propagation so i'm going to leave it here for the time being but we're going to continue talking about this gradient descent is why we have requires grad equals true and so what this is going to do is when we run computations using this model here pi torch is going to keep track of the gradients of our weights parameter and our bias parameter and then it's going to update them through a combination of gradient descent and back propagation now i'm going to leave this as extracurricular for you to look through gradient descent and back propagation i'm going to add some resources here there will also be plenty of resources in the pi touch workflow fundamentals book chapter on how these algorithms work behind the scenes we're going to be focused on the code the pi touch code to trigger these algorithms behind the scenes so pytorch lucky for us has implemented gradient descent and back propagation for us so we're writing the higher level code here to trigger these two algorithms so in the next video we're going to step through this a little bit more and then further discuss some of the most useful and required modules of pride torch particularly nn and a couple of others so let's leave it there and i'll see you in the next video welcome back in the last video we covered a whole bunch in creating our first pytorch model that inherits from nn.module we talked about object-oriented programming and how a lot of pytorch uses object-oriented programming i can't say that i might just say oop for now what i've done since the last video though as i've added two resources here for gradient descent and back propagation these are two of my favorite videos on youtube by the channel three blue one brown so this is on gradient descent i would highly recommend watching this entire series by the way so that's your extra curriculum for this video in particular and for this course overall is to go through these two videos even if you're not sure entirely what's happening you will gain an intuition for the code that we're going to be writing with pytorch so just keep that in mind as we go forward a lot of what pytorch is doing behind the scenes for us is taking care of these two algorithms for us and we also created two parameters here in our model where we've instantiated them as random values so one parameter for each of the ones that we use the weight and bias for our data set and now i want you to keep in mind that we're working with a simple data set here so we've created our known parameters but in a data set that you haven't created by yourself you've maybe gathered that from the internet such as images you won't be necessarily defining these parameters instead another module from nn will define the parameters for you and we'll work out what those parameters should end up being but since we're working with a simple data set we can define our two parameters that we're trying to estimate this is a key point here is that our model is going to start with random values that's the annotation i've added here start with a random weight value using torch random and then we've told it that it can update via gradient descent so pi torch is going to track the gradients of this parameter for us and then we've told it that the d type we want is float 32 we don't necessarily need these two set explicitly because a lot of the time the default in pi torch is to set these two requires grad equals true and d type equals torch.float it does that for us behind the scenes but just to keep things as fundamental and as straightforward as possible we've set all of this explicitly so let's jump into the keynote i'd just like to explain what's going on one more time in a visual sense so here's the exact code that we've just written i've just copied it from here and i've just made it a little bit more colorful but here's what's going on so when you build a model in pytorch it subclasses the nn dot module class this contains all the building blocks for neural networks so our class of model subclasses nn.module now inside the constructor we initialize the model parameters now as we'll see later on with bigger models we won't necessarily always explicitly create the weights and biases we might initialize whole layers now this is a concept we haven't touched on yet but we might initialize a list of layers or whatever we need so basically what happens in here is that we create whatever variables that we need for our model to use and so these could be different layers from torch.nn single parameters which is what we've done in our case hard coded values or even functions now we've explicitly set requires grad equals true for our model parameters so this in turn means that pi torch behind the scenes will track all of the gradients for these parameters here for use with torch.autograd so torch.autograd module of pytorch is what implements gradient descent now a lot of this will happen behind the scenes for when we write our pi touch training code so if you'd like to know what's happening behind the scenes i'd highly recommend you checking out these two videos hence is why i've linked them here oh and for many pi torch dot nn modules requires grad is true is set by default finally we've got a forward method now any subclass of nn.module which is what we've done requires a forward method now we can see this in the documentation if we go torch dot nn dot module click on module do we have forward yeah there we go so ford we've got a lot of things built into an end.module so you see here this is a subclass of nn.module and then we have ford so ford is what defines the computation performed at every call so if we were to call linear regression model and put some data through it the forward method is the operation that this module does that this model does and in our case our forward method is the linear regression function so keep this in mind any subclass of nn.module needs to override the forward method so you need to define a forward method if you're going to subclass an n.module we'll see this very hands-on but for now i believe that's enough coverage of what we've done if you have any questions remember you can ask it in the discussions we've got a fair bit going on here but i think we've broken it down a fair bit the next step is for us to i know i mentioned this in a previous video is to cover some pytorch model building essentials but we're going to cover a few more of them we've seen some already but the next way to really start to understand what's going on is to check the contents of our model train one and make some predictions with it so let's get hands-on with that in the next few videos i'll see you there welcome back in the last couple of videos we stepped through creating our first pie torch model and it looks like there's a fair bit going on here but some of the main takeaways is that almost every model in pi torch inherits from nn.module and if you are going to inherit from nn.module you should override the forward method to define what computation is happening in your model and for later on when our model is learning things in other words updating its weights and bias values from random values to values that better fit the data it's going to do so via gradient descent and back propagation and so these two videos are some extra curriculum for what's happening behind the scenes but we haven't actually written any code yet to trigger these two so i'll refer back to these when we actually do write code to do that for now we've just got a model that defines some forward computation but speaking of models let's have a look at a couple of pi torch model building essentials so we're not going to write too much code for this video and it's going to be relatively short but i just want to introduce you to some of the main classes that you're going to be interacting with in pytorch and we've seen some of these already so one of the first is torch.nn so contains all of the building blocks for computational graphs computational graphs is another word for neural networks well actually computational graphs is quite general i'll just write here a neural network can be considered a computational graph so then we have torch dot nn.parameter we've seen this so what parameters should our model try and learn and then we can write here often a pi torch layer from torch.nn will set these for us and then we've got torch.nn.module which is what we've seen here and so torch.n.module is the base class for all neural network modules if you subclass it you should overwrite forward which is what we've done here we've created our own forward method so what else should we cover here we're going to see these later on but i'm going to put it here torch.optim this is where the optimizers in pytorch live they will help with gradient descent so optimizer an optimizer is as we've said before that our model starts with random values and it looks at training data and adjust the random values to better represent the ideal values the optimizer contains algorithm that's going to optimize these values instead of being random to being values that better represent our data so those algorithms live in torch.opt-in and then one more for now i'll link to extra resources and we're going to cover them as we go that's how i like to do things cover them as we need them so all nnn dot module so this is the forward method i'm just going to explicitly say here that all nn module subclasses require you to overwrite ford this method defines what happens in the forward computation in our case if we were to pass some data to our linear regression model the forward method would take that data and perform this computation here and as your models get bigger and bigger ours is quite straightforward here this fold computation can be as simple or as complex as you like depending on what you'd like your model to do and so i've got a nice and fancy slide here which basically reiterates what we've just discussed pytorch essential neural network building modules so the module torch.mn torch.nn.module torch.optim torch.utils.dataset we haven't actually talked about this yet and i believe there's one more yeah data loader we're going to see these two later on but these are very helpful when you've got a bit more of a complicated data set in our case we've got just 50 integers for our data set we've got a simple straight line but when we need to create more complex data sets we're going to use these so this will help us build models this will help us optimize our model's parameters and this will help us load data and if you'd like more one of my favorite resources is the pytorch cheat sheet again we're referring back to the documentation see all of this documentation right as i said this course is not a replacement for the documentation it's just my interpretation of how one should best become familiar with pytorch so we've got imports the general import torch from torch.utils.data we've got data set data loader oh would you look at that we've got that mentioned here data data set data loader and torch script and jit neural network api on x i'll let you go through here we're covering some of the most fundamental ones here but there's of course pytorch is quite a big library so some extracurricular for this video would be to go through this for five to ten minutes and just read you don't have to understand them all we're going to start to get more familiar with all these well not all of them because i mean that would require making videos for the whole documentation but a lot of these through writing them via code so that's enough for this video i'll link this pytorch cheat sheet in the video here and in the next video how about we we haven't actually checked out what happens if we do create an instance of our linear regression model i think we should do that i'll see you there welcome back in the last video we covered some of the pytorch model building essentials and look i linked a cheat sheet here there's a lot going on there's a lot of text going on in the page of course the reference material for here is in that learn pi torch book pi torch model building essentials under 0.1 which is the notebook we're working on here but i couldn't help myself i wanted to add some color to this so before we inspect our model let's just add a little bit of color to our text on the page we go to here's our workflow this is what we're covering in this video right or in this module 0.1 but to get data ready here are some of the most important high torch modules torch vision.transforms we'll see that when we cover computer vision later on torch.utils.data.dataset so that's if we want to create a data set that's a little bit more complicated then because our data set is so simple we haven't used either of these data set creator or data loader then if we go build a picker model well we can use torch.net and we've seen that one we've seen torch.nn.module so in our case we're building a model but if we wanted a pre-trained model well there's some computer vision models that have already been built for us in torch vision. models now torch vision stands for pi torch's computer vision module so we haven't covered that either but this is just a spoiler for what's coming on later on then at the optimizer if we wanted to optimize our model's parameters to better represent a data set we can go to torch.opt-in then if we wanted to evaluate the model well we've got torch metrics for that we haven't seen that but we're going to be hands on with all of these later on then if we wanted to improve through experimentation we've got torch.utils.tensorboard what's this but again if you want more there's some at the pod touch cheat sheet but now this is just adding a little bit of color and a little bit of code to our pi torch workflow and with that being said let's get a little bit deeper into what we've built which is our first pie touch model so checking the contents of our pytorch model so now we've created a model let's see what's inside you might already be able to guess this by the fact of what we've created in the constructor here and the init function so what do you think we have inside our model and how do you think we'd look in that now of course these are questions you might not have the answer to because you've just you're like daniel i'm just starting to learn pi touch i don't know these but i'm asking you just to start thinking about these different things you know so we can check out our model parameters or what's inside our model using wait for it dot parameters oh don't you love it when things are nice and simple well let's check it out hey well first things we're going to do is let's create a random seed now why are we creating a random seed well because recall we're creating these parameters with random values and if we were to create them with outer random seed we would get different values every time so for the sake of the educational sense for the sake of this video we're going to create a manual seed here torch.manual seed i'm going to use 42 or maybe 43 i could use 43. now 42 because i love 42 it's the answer to the universe and we're going to create an instance of the model that we created so this is a subclass of nn.module so let's do it model 0 because it's going to be the 0th model the first model that we've ever created in this whole course how amazing linear regression model which is what our class is called so we can just call it like that that's all i'm doing just calling this class and so let's just see what happens there and then if we go model 0 what does it give us oh linear regression okay it doesn't give us much but we want to find out what's going on in here so check out the parameters so model 0 dot parameters what do we get from this oh a generator well let's turn this into a list that'll be better to look at there we go oh how exciting is that so parameter containing oh look at the values tensor requires grad equals true parameter containing wonderful so these are our model parameters so why are they the values that they are well it's because we've used torch rand n let's see what happens if we go let's just create torch dot rand n 1. what happens we get a value like that and now if we run this again we get the same values but if we run this again so keep this in mind two three four five actually that's wow that's pretty cool that we got a random value that was all in order four in a row can we do it twice in a row probably not oh we get it the same one now why is that oh we get a different one did we just get the same one twice oh my gosh we've got the same value twice in a row you saw that you saw that that's incredible now the reason why we get this is because this one is different every time because there's no random seed watch if we put the random seed here torch dot manual seed 42 3367 what happens three three six seven what happens three three six seven okay and what if we commented out the random seed here initialized our model different values oh two three five two three four five it must like that value oh my goodness let me know if you get that value right so if we keep going we get different values every single time why is this why are we getting different values every single time you might be daniel you sound like a broken record but i'm trying to really drive home the fact that we initialize our models with random parameters so this is the essence of what our machine learning models and deep learning models are going to do start with random values weights and bias it may be it we've only got two parameters here but the future models that we build might have thousands and so of course we're not going to do them all by hand we'll see how we do that later on but for now we start with random values and our ideal model will look at the training data and adjust these random values but just so that we can get reproducible results i'll get rid of this cell i've set the random seed here so you should be getting similar values to this if you're not because there's maybe some sort of pie touch update and how the random seeds calculated you might get slightly different values but for now we'll use torch.manual sieve.42 and i want you to just be aware of this can be a little bit confusing if you just do the list of parameters for me i understand it better if i list the name parameters so the way we do that is with model 0 and we call state dick on it this is going to give us our dictionary of the parameters of our model so as you can see here we've got weights and we've got bias and they are random values so where did weights and bias come from well of course they came from here weights bias but of course as well up here we've got known parameters so now our whole goal is what our whole goal is to build code or write code that is going to allow our model to look at these blue dots here and adjust this weight and bias value to be weights as close as possible to weight and bias now how do we go from here and here to here and here we're going to see that in future videos but the closer we get these values to these two the better we're going to be able to predict and model our data now this principle i cannot stress enough is the fundamental entire foundation the fundamental foundation wow good description daniel the entire foundation of deep learning we start with some random values and we use gradient descent and back propagation plus whatever data that we're working with to move these random values as close as possible to the ideal values and in most cases you won't know what the ideal values are but in our simple case we already know what the ideal values are so just keep that in mind going forward the premise of deep learning is to start with random values and make them more representative closer to the ideal values with that being said let's try and make some predictions with our model as it is i mean it's got random values how do you think the predictions will go so i think in the next video we'll make some predictions on this test data and see what they look like i'll see you there welcome back in the last video we checked out the internals of our first pie touch model and we found out that because we're creating our model with torch dot or the parameters of our model with torch.rand they begin as random variables and we also discuss the entire premise of deep learning is to start with random numbers and slowly progress those towards more ideal numbers slightly less random numbers based on the data so let's see before we start to improve these numbers let's see what their predictive power is like right now now you might be able to guess how well these random numbers will be able to predict on our data you're not sure what that predicting means let's have a look so making predictions using torch dot inference mode hmm something we haven't seen but as always we're going to discuss it while we use it so to check our model's predictive power let's see how well it predicts y test based on x test because remember again another premise of a machine learning model is to take some features as input and make some predictions close to some sort of labels so when we pass data through our model it's going to run it through the ford method so here's where it's a little bit confusing we defined a forward method and it takes x as input now i've done a little x but we're going to pass it in a large x as its input but the reason why i've done a little x is because often times in pi touch code you're going to find all over the internet is that x is quite common commonly used in the forward method here like this as the input data so i've just left it there because that's what you're going to find quite often so let's test it out we haven't discussed what inference mode does yet but we will make predictions with model so with torch dot inference mode let's use it and then we will discuss what's going on why preds equals a model 0 x test so that's all we're doing we're passing the x test data through our model now when we pass this x test in here let's remind ourselves of what x test is x test ten variables here and we're trying to our ideal model will predict the exact values of y test so this is what our model will do if it's a perfect model it will take these x test values as input and it will return these y test values as output that's an ideal model so the predictions are the exact same as the test data set how do you think our model will go considering it's starting with random values as its parameters well let's find out hey so we'll can that why threads oh what's happened here not implemented error ah this is an error i get quite often in google collab when i'm creating a high torch model now it usually happens i'm glad we've stumbled upon this and i think i know the fix but if not we might see a little bit of troubleshooting in this video is that when we create this if you see this not implemented error right it's saying that the forward method here we go forward implemented there we go it's a little bit of a rabbit hole this not implemented area i've come across at a fair few times and it took me a while to figure out that for some reason the spacing so in python you know how you have space space and that defines a function space base there's another thing there and another line there for some reason if you look at this line in my notebook and by the way if you don't have these lines or if you don't have these numbers you can go into tools settings editor and then you can define them here so show line numbers show intention guides all that sort of jazz there you can customize what's going on but i just have these two on because i've run into this error a fair few times and so it's because that this forward method is not in line with this bracket here so we need to highlight this and click shift tab move it over so now you see that it's in line here and then if we run this won't change any output there see that's the hidden gotcha is that when we ran this before it found no error but then when we run it down here it works so just keep that in mind i'm really glad we stumbled upon that because indentation errors not implemented errors one of the most common errors you'll find in pi torch or in well when you're writing pi torch code in google colab i'm not sure why but it just happens so these are our models predictions so far by running the test data through our model's forward method that we defined and so if we look at why test are these close oh my gosh they are shocking so why don't we visualize them plot predictions and we're going to put in predictions equals y preds let's have a look oh my goodness all the way over here remember how we discussed before that an ideal model will have what red dots on top of the green dots because our ideal model will be perfectly predicting the test data so right now because our model is initialized with random parameters it's basically making random predictions so they're extremely far from where our ideal predictions are is that we'll have some training data and our model predictions when we first create our model will be quite bad but we want to write some code that will hopefully move these red dots closer to these green dots i'm going to see how we can do that in later videos but we did one thing up here which we haven't discussed which is with torch dot inference mode now this is a context manager which is what happens when we're making predictions so making predictions another word for predictions is inference torch uses inference so i'll try to use that a bit more but i like to use predictions as well we could also just go y preds equals model 0 dot x test and we're going to get quite a similar output right but i've put on inference mode because i want to start making that a habit for later on when we make predictions put on inference mode now why do this you might notice something different what's the difference here between the outputs why preds equals model there's no inference mode here no context manager do you notice that there's a grad function here and we don't need to go into discussing what exactly this is doing here but do you notice that this one is lacking that grad function so do you remember how behind the scenes i said that pi torch does a few things with requires grad equals true it keeps track of the gradients of different parameters so that they can be used in gradient descent and backpropagation now what inference mode does is it turns off that gradient tracking so it essentially removes all of the because when we're doing inference we're not doing training so we don't need to keep track of the gradient so we don't need to to keep track of how we should update our models so inference mode disables all of the useful things that are available during training what's the benefit of this well it means that pytorch behind the scenes is keeping track of less data so in turn it will with our small data set it probably won't be too dramatic but with a larger data set it means that your predictions will potentially be a lot faster because a whole bunch of numbers aren't being kept track of or a whole bunch of things that you don't need during prediction mode or inference mode that's why it's called inference mode are not being saved to memory if you'd like to learn more about this you go pytorch inference mode twitter i just remember to search for twitter because i did a big tweet storm about it here we go so oh this is another thing that we can cover i'm going to copy this in here but there's also a blog post about what's going on behind the scenes long story short it makes your code faster want to make your inference code and pytorch run faster here's a quick thread on doing exactly that and that's what we're doing so i'm going to write down here see more on inference mode here and i just want to highlight something as well is that they referenced torch no grad with the torch inference mode context manager inference mode is fairly new in python so you might see a lot of code existing pi touch code with torch dot no grad you can use this as well y threads equals model 0 and this will do much of the same as what inference mode is doing but inference mode has a few things that are advantageous over no grad which are discussed in this thread here but if we do this we get very similar output to what we got before grad function but as you'll read in here and in the pi touch documentation inference mode is the favored way of doing inference for now i just wanted to highlight this so you can also do something similar with torch dot no grad however inference mode is preferred alrighty so i'm just going to comment this out so we just have one thing going on there the main takeaway from this video is that when we're making predictions we use the context manager torched on inference mode and right now because our model's variables or internal parameters are randomly initialized our model's predictions are as good as random so they're actually not too far off where uh like our values are at least the red dots aren't like scattered all over here but in the upcoming videos we're going to be writing some pytorch training code to move these values closer to the green dots by looking at the training data here so that being said i'll see you in the next video friends welcome back in the last video we saw that our model performs pretty poorly like ideally these red dots should be in line with these green dots and we know that because why well it's because our model is initialized with random parameters and i just want to put a little note here you don't necessarily have to initialize your model with random parameters you could initialize it with this could be zero here these two values weights and bias could be zero and you could go from there or you could also use the parameters from another model but we're going to see that later on that's something called transfer learning that's just a little spoiler for what's to come and so we've also discussed that an ideal model will replicate these known parameters so in other words start with random unknown parameters these two values here and then we want to write some code for our model to move towards estimating the ideal parameters here now i just want to be explicit here and write down some intuition before we jump into the training code but this is very exciting we're about to get into training our very first machine learning model so let's write here the whole idea of training is for a model to move from some unknown parameters these may be random to some known parameters or in other words from a poor representation representation of the data to a better representation of the data and so in our case would you say that our model's representation of the green dots here with this red dots is that a good representation or is that a poor representation i mean i don't know about you but i would say that to me this is a fairly poor representation and one way to measure the representation between your model's outputs in our case the red dots the predictions and the testing data is to use a loss function so i'm going to write this down here this is what we're moving towards we're moving towards training but we need a way to measure how poorly our models predictions are doing so one way to measure how poor or how wrong your model's predictions are is to use a loss function and so if we go pi torch loss functions we're going to see that pi torch has a fair few loss functions built in but the essence of all of them is quite similar so just wait for this to load my internet's going a little bit slow today but that's okay we're not in a rush here we're learning something fun if i search here for loss loss functions here we go so yeah this is torture n these are the basic building blocks for graphs whole bunch of good stuff in here including loss functions beautiful this is another thing to note as well another one of those scenarios where there's more words for the same thing you might also see a loss function referred to as a criterion there's another word called cost function so i might just write this down so you're aware of it yeah cost function versus loss function there may be some formal definitions about what all of these are maybe they're used in different fields but in the case of we're focused on machine learning right so i'm just going to go note loss function may also be called cost function or criterion in different areas for our case we're going to refer to it as a loss function and let's just formally define a loss function here because we're going to go through a fair few steps in the upcoming videos so this is a warning nothing we can't handle but i want to put some formal definitions on things we're going to see them in practice that's what i prefer to do rather than just sit here defining stuff this lecture has already had enough text on the page so hurry up and get into coding daniel a loss function is a function to measure how wrong your model's predictions are to the ideal outputs so lower is better so think of a measurement how could we measure the difference between the red dots and the green dots one of the simplest ways to do so would be just measure the distance here right so if we go let's just estimate this as 035 to 0.8 thereabouts so what's the difference there about 0.45 then we could do the same again for all of these other dots and then maybe take the average of that now if you've worked loss functions before you might have realized that i've just reproduced mean absolute error but we're going to get to that in a minute so we need a loss function i'm going to write down another little dot point here this is just setting up intuition things we need to train we need a loss function this is pi torch and this is machine learning in general actually but we're focused on pi torch we need an optimizer what does the optimizer do takes into account the loss of a model and adjust the model's parameters so the parameters recall are our weight and bias values weight and biases we can check those or bias we can check those by going model dot parameter or parameters but i also like oh that's going to give us a generator isn't it oh do we not define the model yet what do we call our model oh model zero excuse me i forgot where gonna build a lot of models in this course so we're giving them numbers model dot parameters yeah we've got a generator so we'll turn that into a list but model 0 if we want to get them labeled we want state dict here there we go so our weights is this value that's a random value we've set and there's the bias and now we've only got two parameters for our model so it's quite simple however the principles that we're learning here are going to be the same principles taking a loss function trying to minimize it so getting it to lower so the ideal model will predict exactly what our test data is and an optimizer will take into account the loss and we'll adjust a model's parameter in our case weights and bias to be let's finish this definition takes into account the loss of a model and adjust the model's parameters eg weight and bias in our case to improve the loss function and specifically for pi torch we need a training loop and a testing loop now this is what we're going to work towards building throughout the next couple of videos probably going to focus on these two first the loss function and optimizer there's the formal definition of those you're going to find many different definitions that's how i'm going to find them loss function measures how wrong your model's predictions are lower is better optimizer takes into account the loss of your model so how wrong it is and starts to move these two values into a way that improves where these red dots end up but these again these principles of a loss function and an optimizer can be for models with two parameters or models with millions of parameters can be for computer vision models or could be for simple models like ours that predict the dots on a straight line so with that being said let's jump into the next video we'll start to look a little deeper into loss function brow problem and an optimizer i'll see you there welcome back whoa we're in an exciting streak of videos coming up here i mean the whole course is fun trust me but this is really exciting because training your first machine learning model seems a little bit like magic but it's even more fun when you're writing the code yourself what's going on behind the scenes so we discussed that the whole concept of training is from going unknown parameters random parameters such as what we've got so far two parameters that better represent the data and we spoke of the concept of a loss function we want to minimize the loss function that is the whole idea of a training loop in pi torch or an optimization loop in pi torch and an optimizer is one of those ways that can nudge the parameters of our model in our case weights or bias towards values rather than just being random values like they are now towards values that lower the loss function and if we lower the loss function what does the loss function do it measures how wrong our model's predictions are compared to the ideal outputs so if we lower that well hopefully we move these red dots towards the green dots and so as you might have guessed pi torch has some built-in functionality for implementing loss functions and optimizers and by the way what we're covering so far is in the train model section of the pi torch workflow fundamentals i've got a little nice table here which describes a loss function what does it do where does it live in pi torch common values we're going to see some of these hands-on if you'd like to read about it of course you have the book version of the course here so loss functions in pi torch i'm just in docs torch.nn look at this look at all these loss functions there's far too many for us to go through all in one hit so we're just going to focus on some of the most common ones look at that we got about what's that 15 loss functions something like that well truth be told is that which one should use you're not really going to know unless you start to work hands-on with different problems and so in our case we're going to be looking at l1 loss and this is an again once more another instance where different machine learning libraries have different names for the same thing this is mean absolute error which we kind of discussed in the last video which is if we took the distance from this red dot to this green dot and say it's 0.4 thereabouts 0.4 0.4 and then took the mean well we've got the mean absolute error but in pi torch they call it l1 loss which is a little bit confusing because then we go to mse loss which is mean squared error which is l2 so naming conventions just takes a little bit of getting used to this is a warning for you so let's have a look at the l1 loss function again i'm just making you aware of where the other loss functions are we'll do with some binary cross entropy loss later in the course and maybe even is there categorical cross entropy we'll see that later on but all the others will be problem specific for now a couple of loss functions like this l1 loss msc loss we use four regression problems so that's predicting a number cross entropy loss is a loss that you use with classification problems but we'll see those hands on later on let's have a look at l1 loss so l1 loss creates a criterion as i said you might hear the word criterion used in pi torch for a loss function i typically call them loss functions the literature typically calls it loss functions that measures the mean absolute error there we go l1 loss is the mean absolute error between each element in the input x and target y now your extracurricular as you might have guessed is to read through the documentation for the different loss functions especially l1 loss but for the sake of this video let's just implement it for ourselves oh and if you want a little bit of a graphic i've got one here this is where we're up to by the way picking a loss function optimizer for step two this is a fun part right we're getting into training a model so we've got mean absolute error here's that graph we've seen before oh look at this okay so we've got the difference here i've actually measured this before in the past so i kind of knew what it was mean absolute error is if we repeat for all samples in our set that we're working with and if we take the absolute difference between these two dots well then we take the mean we've got mean absolute error so m-a-e-loss equals torch mean we could write it out that's the beauty of pine torch right we could write this out or we could use the torch n version which is recommended so let's jump in there's a colorful slide describing what we're about to do so let's go set up a loss function and then we're also going to put in here set up an optimizer so let's call it los fn equals nn dot l1 loss simple as that and then if we have a look at what's our loss function what does this say my goodness my internet is going quite slow today it's raining outside so there might be some delay somewhere but that's right gives us a chance to sit here and be mindful about what we're doing look at that okay loss function l1 loss beautiful so we've got a loss function our objective for training a machine learning model will be to let's go back look at the colorful graphic will be to minimize these distances here and in turn minimize the overall value of mae that is our goal if our red dots line up with our green dots we will have a loss value of 0 the ideal point for a model to be and so let's go here we now need an optimizer as we discussed before the optimizer takes into account the loss of a model so these two work in tandem that's why i've put them as similar steps if we go back a few slides so this is why i put these as 2.1 often picking a loss function and optimizer and pi torch come as part of the same package because they work together the optimizer's objective is to give the model values so parameters like a weight and a bias that minimize the loss function they work in tandem and so let's see what an optimizer optimizers where might that be what if we search here i typically don't use this search because i prefer just using google search but does this give us optimizer hey there we go so again pi torch has torch dot opt in which is where the optimizers are torch dot optim let me put this link in here this is another bit of your extra curriculum to if you want to read more about different optimizers in pytorch as you might have guessed they have a few torch optim is a package implementing various optimization algorithms most commonly used methods are already supported and the interface is general enough so that more sophisticated ones can also be easily integrated into the future so if we have a look at what algorithms exist here again we're going to throw a lot of names at you but in the literature a lot of them that have made it into here are already good working algorithms so it's a matter of picking whichever one's best for your problem how do you find that out well sgd stochastic gradient descent is possibly the most popular however there are some iterations on sgd such as atom which is another one that's really popular so again this is one of those other machine learning is part art part science is trial and error of figuring out what works best for your problem for us we're going to start with sgd because it's the most popular and if you were paying attention to a previous video you might have seen that i said look up gradient descent where have we got this gradient descent there we go so this is one of the main algorithms that improves our models so gradient descent and back propagation so if we have a look at this stochastic gradient descent bit of a tongue twister is random gradient descent so that's what stochastic means so basically our model improves by taking random numbers let's go down here here and randomly adjusting them so that they minimize the loss and once our optimizer that's right here once our optimizer torch.optim let's implement sgd sgd stochastic gradient descent we're going to write this here stochastic gradient descent it starts by randomly adjusting these values and once it's found some random values or random steps that have minimized the loss value we're going to see this in action later on it's going to continue adjusting them in that direction so say it says oh wait if i increase the weight it reduces the loss so it's going to keep increasing the weights until the weights no longer reduce the loss maybe it gets to a point at say 0.65 if you increase the weights anymore the loss is going to go up so the optimize is like well i'm going to stop there and then for the bias the same thing happens if it decreases the bias and finds that loss increases well it's going to go well i'm going to try increasing the bias instead so again one last summary of what's going on here a loss function measures how wrong our model is and the optimizer adjusts our model parameters no matter whether there's two parameters or millions of them to reduce the loss there are a couple of things that an optimizer needs to take in it needs to take in as a an argument params so this is if we go to sgd i'm just going to link this as well sgd there's the formula of what sgd does i look at this and i go hmm there's a lot going on here it took me a while to understand that so i like to see it in code so we need params this is short for what parameters should i optimize as an optimizer and then we also need an lr which stands for i'm going to write this in a comment lr equals learning rate possibly the most oh i didn't even type rate did i possibly the most important hyper parameter you can set so let me just remind you i'm throwing lots of words out here but i'm kind of like trying to write notes about what we're doing again we're going to see these in action in a second so check out our models and parameters so a parameter is a value that the model sets itself so learning rate equals possibly the most important learning hyper parameter oh no i don't need learning there do i hyperparameter and a hyper parameter is a value that us as a data scientist or a machine learning engineer set ourselves you can set so the learning rate is in our case let's go 0.01 you're like daniel where did i get this value from well again these type of values come with experience i think it actually says it in here lr lr 0.1 yeah okay so the default is 0.1 but then if we go back to opt-in i think i saw it somewhere did i see it somewhere zero point something zero yeah there we go yeah so a lot of the default settings are pretty good in torch optimizers however the learning rate what does it actually do we could go 0.01 these are all common values here triple zero one i'm not sure exactly why oh model it's model 0. the learning rate says to our optimizer yes it's going to optimize our parameters here but the higher the learning rate the more it adjusts each of these parameters in one hit so let's say it's 0.01 and it's going to optimize this value here so it's going to take that big of a step if we changed it to here it's going to take a big step on this three and if we changed it to all the way to the end zero zero one it's only going to change this value so the smaller the learning rate the smaller the change in the parameter the larger the learning rate the larger the change in the parameter so we've set up a loss function we've set up an optimizer let's now move on to the next step in our training workflow and that's by building a training loop far out this is exciting i'll see you in the next video welcome back in the last video we set up a loss function and we set up an optimizer and we discussed the roles of each so loss function measures how wrong our model is the optimizer talks to the loss function and goes well if i change these parameters a certain way does that reduce the loss function at all and if it does yes let's keep adjusting them in that direction if it doesn't let's adjust them in the opposite direction and i just want to show you i added a little bit of text here just to concretely put down what we were discussing inside the optimizer you'll often have to set two parameters params and lr where params is the model parameters you'd like to optimize for an example in our case params equals our model 0 parameters which were of course a weight and a bias and the learning rate which is lr in optimizer lr stands for learning rate and the learning rate is a hyper parameter remember a hyper parameter is a value that we the data scientist or machine learning engineer sets whereas a parameter is what the model sets itself defines how big or small the optimizer changes the model parameters so a small learning rate so the smaller this value results in small changes a large learning rate results in large changes so another question might be well very valid question hey i put this here already is which loss function and which optimizer should i use so this is another tough one because it's problem specific but with experience and machine learning i'm showing you one example here you'll get an idea of what works for your particular problem for a regression problem like ours a loss function of l1 loss which is mae in pi torch and an optimizer like torch.optim sgd like stochastic gradient descent will suffice but for a classification problem we're going to see this later on not this one specifically whether a photo is a cat of a dog that's just an example of a binary classification problem you might want to use a binary classification loss but with that being said we now are moving on to well here's our whole goal is to reduce the mae of our model let's get the workflow we've done these two steps now we want to build a training loop so let's get back into here there's going to be a fair few steps going on we've already covered a few but hey nothing we can't handle together so building a training loop in pie torch so i thought about just talking about what's going on in the training loop but we can talk about the steps after we've coded them how about we do that so we want to build a training loop and a testing loop how about we do that so a couple of things we need in a training loop so there's going to be a fair few steps here if you've never written a training loop before but that is completely fine because you'll find that the first couple of times that you write this you'll be like oh my gosh there's too much going on here but then when you have practice you'll go okay i see what's going on here and then eventually you'll write them with your eyes closed i've got a fun song for you to help you out remembering things it's called the unofficial pie torch optimization loop song we'll see that later on or actually i'll probably leave that as an extension but you'll see that you can also functionalize these things which we will do later in the course so that you can just write them once and then forget about them but we're going to write it all from scratch to begin with so we know what's happening so we want to or actually step 0 is loop through the data so we want to look at the data multiple times because our model is going to at first start with random predictions on the data make some predictions we're trying to improve those we're trying to minimize the loss to make those predictions we do a forward pass so forward pass why is it called a forward pass so this involves data moving through our models forward functions now there i say functions because there might be plural there might be more than one and the forward method recall we wrote in our model up here forward a forward pass is our data going through this function here and if you want to look at it visually let's look up a neural network graphic images a forward pass is just uh data moving from the input to the output layer so starting here input layout moving through the model so that's a forward pass also called forward propagation another time where more than one name is used for the same thing so we'll go back down here forward pass and i'll just write here also called forward propagation propagation wonderful and then we need to calculate the loss so forward pass let me write this to calculate or to make predictions make predictions on data so calculate the loss compare forward pass predictions oh there's some thunder going in the background here of my place we might be in for a storm perfect time to write code compare forward past predictions to ground truth labels we're going to see all this in code in a second calculate the loss and then we're going to go optimize the zero grad we haven't just spoken about what this is but that's okay we're going to see that in a second i'm not going to put too much there lost backward we haven't discussed this one either there's probably three steps that we haven't really discussed we've discussed the idea behind them but not too much in depth optimizer step so this one is lost backwards is move backwards if the forward passes forwards like through the network the forward pass is data goes into out the backward pass data goes our calculations happen backwards so we'll see what that is in a second where were we over here we've got too much going on i'm getting rid of these moves backwards through the network to calculate the gradients oh the gradients of each of the parameters of our model with respect to the loss oh my gosh that is an absolute mouthful but that'll do for now optimize a step this is going to use the optimizer to adjust our models parameters to try and improve the loss so remember how i said in a previous video that i'd love you to watch the two videos i linked above one on gradient descent and one on backpropagation if you did you might have seen like there's a fair bit of math going on in there well that's essentially how our model goes from random parameters to better parameters using math many people one of the main things i get asked from machine learning is how do i learn machine learning if i didn't do math well the beautiful thing about pi torch is that it implements a lot of the math of back propagation so this is back propagation i'm going to write this down here this is an algorithm put back back propagation hence the loss backward we're going to see this in code in a second don't you worry and this is gradient descent so these two algorithms drive the majority of our learning so back propagation calculate the gradients of the parameters of our model with respect to the loss function and optimize our step we'll trigger code to run gradient descent which is to minimize the gradients because what is a gradient let's look this up what is a gradient i know we haven't written the code yet but we're going to do that images gradient there we go change in y change in x gradient is from high school math gradient is a slope so if you were on a hill let's find a picture of a hill picture of a hill there we go this is a great big hill so if you're on the top of this hill and you wanted to get to the bottom how would you get to the bottom well of course you just walk down the hill but if you're a machine learning model what are you trying to do let's imagine your loss is the height of this hill and you start off with your losses really high and you want to take your loss down to zero which is the bottom right well if you measure the gradient of the hill the bottom of the hill is in the opposite direction to where the gradient is steep does that make sense so the gradient here is an incline we want our model to move towards the gradient being nothing which is down here and you could argue yeah the gradient's probably nothing at the top here but let's just for argument's sake say that we want to get to the bottom of the hill so we're measuring the gradient and one of the ways an optimization algorithm works is it moves our model parameters so that the gradient equals zero and then if the gradient of the loss equals zero while the loss equals zero two so now let's write some code so we're going to set up a parameter called or a variable called epochs and we're going to start with one even though this could be any value let me define these as we go so we're going to write code to do all of this so epochs and epoch is one loop through the data dot dot so epochs we're going to start with one so one time through all of the data we don't have much data and so for epoc so let's go this this is step zero zero loop through the data by the way when i say loop through the data i want you to do all of these steps within the loop and do dot dot loop through the data so for epoch in range epochs even though it's only going to be one we can adjust this later and because epochs we've set this ourselves it is a this is a hyperparameter because we've set it ourselves and i know you could argue that hey our machine learning parameters of model 0 or our model parameters model 0 aren't actually parameters because we've set them but in the models that you build in the future they will likely be set automatically rather than you setting them explicitly like we've done when we created model 0. and oh my gosh this is taking quite a while to run that's all right we don't need it to run fast we just we need to write some more code then you'll come on there's a step here i haven't discussed either set the model to training mode so pytorch models have a couple of different modes the default is training mode so we can set it to training mode by going like this train so what does train mode do in a pi torch model my goodness is there a reason my internet is going this slow that's all right i'm just going to discuss this with talking again list train mode train mode in pi torch sets oh there we go requires grad equals true now i wonder if we do with torch dot no grad remember no grad is similar to inference mode will this adjust see i just wanted to take note of requires grad equals true actually what i might do is we do this in a different cell watch this this is just going to be rather than me just spit words at you i reckon we might be able to get it working doing this oh that didn't list the model parameters why did that not come out model 0 dot eval so there's two modes eval mode and train mode model dot eval parameters okay we're experimenting together on the fly here and actually this is what i want you to do is i want you to experiment with different things it's not going to say requires grad equals false with torch dot no grad model 0 dot parameters i don't know if this will work but it definitely works behind the scenes and what i mean by works behind the scenes and not here it works behind the scenes when calculations have been made but not if we're trying to explicitly print things out well that's an experiment that i thought was going to work and it didn't work so train mode in pi torch sets all parameters that require gradients to require gradients so do you remember with the picture of the hill i spoke about how we're trying to minimize the gradient so the gradient is the steepness of the hill if the height of the hill is a loss function and we want to take that down to zero we want to take the gradient down to zero so same thing with the gradients of our model parameters which are here with respect to the loss function we want to try and minimize that gradient so that's gradient descent is take that gradient down to zero so model.train and then there's also model0.eval so turns off gradient tracking so we're going to see that later on but for now i feel like this video is getting far too long let's finish the training loop in the next video i'll see you there friends welcome back in the last video i promised a lot of code but we didn't get there we discussed some important steps i forgot how much behind the scenes there is to apply torch training loop and i think it's important to spend the time that we did discussing what's going on because there's a fair few steps but once you know what's going on i mean later on we don't have to write all the code that we're going to write in this video you can functionalize it we're going to see that later on in the course and it's going to run behind the scenes for us but we're spending a fair bit of time here because this is literally the crux of how our model learns so let's get into it so now we're going to implement the forward pass which involves our model's forward function which we defined up here when we built our model the forward pass runs through this code here so let's just write that so in our case because we're training i'm just going to write here this is training we're going to see dot eval later on we'll talk about that when it comes let's do the forward pass so the forward pass we want to pass data through our models forward method we can do this quite simply by going why pred so why predictions because remember we're trying to use our ideal model is using x test to predict y test on our test data set we make predictions on our test data set we learn on our training data set so we're passing i'm just going to get rid of that because we don't need that so we're passing our model x train and model 0 is going to be our current model there we go so we learn patterns on the training data to evaluate our model on the test data number two where we at so we have to calculate the loss now in a previous video we set up a loss function so this is going to help us calculate the what what kind of loss are we using we want to calculate the mae so the difference or the distance between our red dot and a green dot and the formula would be the same if we had 10 000 red dots and 10 000 green dots we're calculating how far they are apart and then we're taking the mean of that value so let's go back here so calculate the loss and in our case we're going to set loss equal to our loss function which is l1 loss in pi torch but it is mae y pred and y train so we're calculating the difference between our model's predictions on the training data set and the ideal training values and if you want to go into torch.nn loss functions that's going to show you the order because sometimes this confuses me too what order the values go in here but it goes prediction first then labels and i maybe be wrong there because i get confused here my dyslexia kicks in but i'm pretty sure it's predictions first then actual labels do we have an example of where it's used yeah input first target next so there we go and truth be told because it's mean absolute error it shouldn't actually matter too much but in the case of staying true to the documentation let's do inputs first and then targets next for the rest of the course then we're going to go optimizer xero grad i haven't discussed this one but that's okay i'm going to write the code and then i'm going to discuss what it does so what does this do actually before we discuss this i'm going to write these two steps because they kind of all work together and it's a lot easier to discuss what optimizer zero grad does in the context of having everything else perform back propagation on the loss with respect to the parameters of the model backpropagation is going to take the loss value so lost backward i always say backwards but it's just backward that's the code there and then number five is step the optimizer so perform gradient descent so optimizer dot step look at us we just wrote the five major steps of a training loop now let's discuss how all of these work together so it's kind of strange like the ordering of these you might think what should i do the order typically the forward pass and the loss come straight up then there's a little bit of ambiguity around what order these have to come in but the optimizer step should come after the back propagation so i i just like to keep this order how it is because this works let's just keep it that way but what happens here well it also is a little bit confusing in the first iteration of the loop because we've got zero grad but what happens here is that the optimizer makes some calculations in how it should adjust model parameters with regards to the back propagation of the loss and so by default these will by default how the optimizer changes will accumulate through the loop so we have to zero them above in step three for the next iteration of the loop so a big long comment there but what this is saying is let's say we go through the loop and the optimizer chooses a value of one change it by one and then it goes through a loop again if we didn't zero it if we didn't take it to zero because that's what it is doing it's going one to zero it would go okay next one two three four five six seven eight all through the loop right because we're looping here if this was ten it would accumulate the value that it's supposed to change ten times but we wanted to start again start fresh each iteration of the loop and now the reason why it accumulates that's pretty deep in the pi torch documentation from my understanding there's something to do with like efficiency of computing if you find out what the exact reason is i'd love to know so we have to zero it then we perform back propagation if you recall back propagation is discussed in here and then with optimizer step we formed gradient descent so the beauty of pytorch this is the beauty of pytorch is that it will perform back propagation we're going to have a look at this in a second and gradient descent for us so to prevent this video from getting too long i know we've just written code but i would like you to practice writing a training loop yourself just write this code and then run it and see what happens actually you can comment this out we're going to write the testing loop in a second so your extra curriculum for this video is to one rewrite this training loop is to two sing the pi torch optimization loop song let's go into here if you want to remember the steps well i've got a song for you this is the training loop song we haven't discussed the test step but maybe you could try this yourself so this is an old version of the song actually i've got a new one for you but let's sing this together it's train time so we do the forward pass calculate the loss optimize a zero grad lost backwards optimize a step step step now you only have to call optimize a step once this is just for jingle purposes but for test time let's test with torch no grad do the forward pass calculate the loss watch it go down down down that's from my twitter but this is a way that i help myself remember the steps that are going on in the code here and if you want the video version of it well you're just going to have to search unofficial pie torch optimization loop song oh look at that who's that guy well he looks pretty cool uh so i'll let you check that out in your own time but for now go back through the training loop steps i've got a colorful graphic coming up in the next video we're going to write the testing steps and then we're going to go back one more time and talk about what's happening in each of them and again if you'd like some even more extra curriculum don't forget the videos i've shown you on backpropagation and gradient descent but for now let's leave this video here i'll see you in the next one friends welcome back in the last few videos we've been discussing the steps in a training loop in pytorch and there's a fair bit going on so in this video we're going to step back through what we've done just to recap and then we're going to get into testing and it's nice and early where i am right now sun's about to come up it's a very very beautiful morning to be writing code so let's jump in we've got a little song here for what we're doing in the training steps for an epoch in a range commodore dot train do the forward pass calculate the loss optimize the zero grad last backward optimizer step step step that's the little jingle i use to remember the steps in here because the first time you write it there's a fair bit going on but subsequent steps and subsequent times that you do write it you'll start to memorize this and even better later on we're going to put it into a function so that we can just call it over and over and over and over again with that being said let's jump in to a colorful slide because that's a lot of code on the page let's add some color to it understand what's happening that way you can refer to this and go hmm i see what's going on now so for the loop this is why it's called a training loop we step through a number of epochs one epoch is a single forward pass through the data so pass the data through the model for a number of epochs epochs is a hyper parameter which means you could set it to 100 you could set it to a thousand you could set it to one as we're going to see later on in this video we skipped this step with the colors but we put the model in we call model.train this is the default mode that the model is in essentially it sets up a whole bunch of settings behind the scenes in our model parameters so that they can track the gradients and do a whole bunch of learning behind the scenes with these functions down here pi torch does a lot of this for us so the next step is the forward pass we perform a forward pass on the training data in the training loop this is an important note in the training loop is where the model learns patterns on the training data whereas in the testing loop we haven't got to that yet is where we evaluate the patterns that our model has learned or the parameters that our model has learned on unseen data so we pass the data through the model this will perform the forward method located within the model object so because we created a model object you can actually call your models whatever you want but it's good practice to you'll often see it just called model and if you remember we'll go back to the code we created a forward method in our model up here which is this because our linear regression model class subclasses nn.module we need to create our own custom forward method so that's why it's called a forward pass is because not only does it well the technical term is forward propagation so if we have a look at a neural network picture forward propagation just means going through the network from the input to the output there's a thing called back propagation which we're going to discuss in a second which happens when we call lost.backward which is going backward through the model but let's return to our colorful slide we've done the forward pass call a forward method which performs some calculation on the data we pass it next is we calculate the loss value how wrong the model's predictions are and this will depend on what loss function you use what kind of predictions your model is outputting and what kind of true values you have but that's what we're doing here we're comparing our models predictions on the training data to what they should ideally be and these will be the training labels the next step we zero the optimizer gradients so why do we do this well it's a little confusing for the first epoch in the loop but as we get down to optimizer.step here the gradients that the optimizer calculates accumulate over time so that for each epoch for each loop step we want them to go back to zero and now the exact reason behind why the optimizer accumulates gradients is buried somewhere within the pi torch documentation i'm not sure the exact reason from memory it's because of compute optimization it just adds them up in case you wanted to know what they were but if you find out exactly i'd love to know next step is to perform back propagation on the loss function that's what we're calling last.backward now backpropagation is we compute the gradient of every parameter with requires grad equals true and if you recall we go back to our code we've set requires grad equals true for our parameters now the reason we've set requires grad equals true is not only so backpropagation can be performed on it but let me show you what the gradients look like so let's go loss function curve that's a good idea so we're looking for some sort of convex curve here there we go l2 loss we're using l1 loss at the moment is there a better one here all we need is just a nice looking curve here we go so this is why we keep track of the gradients behind the scenes pi torch is going to create some sort of curve for all of our parameters that looks like this now this is just a 2d plot so the reason why we're just using an example from google images is one because you're going to spend a lot of your time googling different things and two in practice when you have your own custom neural networks right now we only have two parameters so it's quite easy to visualize a loss function curve like this but when you have say 10 million parameters you basically can't visualize what's going on and so pytorch again will take care of these things behind the scenes but what it's doing is when we say requires grad python is going to track the gradients of each of our parameters and so what we're trying to do here with back propagation and subsequently gradient descent is calculate where the lowest point is because this is a loss function this is mse loss we could trade this out to be mae loss in our case or l1 loss for our specific problem but this is some sort of parameter and we calculate the gradients because what is the gradient let's have a look what is a gradient a gradient is an inclined part of a road or railway now we want it in machine learning what's it going to give us in machine learning a gradient is a derivative of a function that has more than one input variable okay let's dive in a little deeper see here's some beautiful lost landscapes we're trying to get to the bottom of here this is what gradient descent is all about so oh there we go so this is a cost function which is also a loss function we start with a random initial variable what have we done we've started with a random initial variable right okay and then we take a learning step beautiful this is w so this could be our weight parameter okay we're connecting the dots here this is exciting we've got a lot of tabs here but that's all right we'll bring this all together in a second and what we're trying to do is come to the minimum now why do we need to calculate the gradients well the gradient is what ah value of weight here we go this is even better i love google images so this is our loss this is a value of a weight so we calculate the gradients y because the gradient is the slope of a line or the steepness and so if we calculate the gradient here and we find that it's really steep right up the top of this this incline we might head in the opposite direction to that gradient that's what gradient descent is and so if we go down here now what are these step points there's a little thing that i wrote down in the last video at the end of the last video i haven't told you about yet but i was waiting for a moment like this and if you recall i said kind of all of these three steps optimize the zero grad loss backward optimizes stepper all altogether so we calculate the gradients because we want to head in the opposite direction of that gradient to get to a gradient value of zero and if we get to a gradient value of zero with a loss function well then the loss is also zero so that's why we keep track of the gradient with requires grad equals true and again pi torch does a lot of this behind the scenes and if you want to dig more into what's going on here i'm going to show you some extra resources for backpropagation which is calculating this gradient curve here and gradient descent which is finding the bottom of it towards the end of this video and again if we started over this side we would just go in the opposite direction of this so maybe this is a positive gradient here and we just go in the opposite direction here we want to get to the bottom that is the main point of gradient descent and so if we come back i said just keep this step size in mind here if we come back to where we created our loss function and optimizer i put a little tidbit here for the optimizer because we've written a lot of code and we haven't really discussed what's going on but i like to do things on the fly as we need them so inside our optimizer we'll have main two parameters which is params so the model parameters you'd like to optimize params equals model 0. parameters in our case and pytorch is going to create something similar to this curve not visually but just mathematically behind the scenes for every parameter now this is a value of weight so this would just be potentially the weight parameter of our network but again if you have 10 million parameters there's no way you could just create all of these curves yourself that's the beauty of pi torch it's doing this behind the scenes through a mechanism called torch auto grad which is auto gradient calculation and there's beautiful documentation on this if you'd like to read more on how it works please go through that but essentially behind the scenes it's doing a lot of this for us for each parameter that's the optimizer then within the optimizer once we've told it what parameters to optimize we have the learning rate so the learning rate is another hyper parameter that defines how big or small the optimizer changes the parameters with each step so a small learning rate results in small changes whereas a large learning rate is in large changes and so if we look at this curve here we might at the beginning start with large steps so we can get closer and closer to the bottom but then as we get closer and closer to the bottom to prevent stepping over to this side of the curve we might do smaller and smaller steps and the optimizer in pytorch there are optimizers that do that for us but there is also another concept called learning rate scheduling which is again something if you would like to look up and do more but learning rate scheduling essentially says hey maybe start with some big steps and then as we get closer and closer to the bottom reduce how big the steps are that we take because if you've ever seen a coin coin at back of couch this is my favorite analogy for this if you've ever tried to reach a coin at the back of a couch like this excited young chap if you're reaching towards the back of the couch you take quite big steps as you say your arm was over here you would take quite big steps until you get to about here and the closer you get to the coin the smaller and smaller your steps are otherwise what's going to happen the coin is going to be lost or if you took two small steps you'd never get to the coin it would take forever to get there so that's the concept of the learning rate if you take two big steps you're going to just end up over here if it takes two small steps it's going to take you forever to get to the bottom here and this bottom point is called convergence that's another term you're going to come across i know i'm throwing a lot of different terms at you but that's the whole concept of the learning rate how big is your step down here in gradient descent gradient descent is this back propagation is calculating these derivative curves or the gradient curves for each of the parameters in our model so let's get out of here we'll go back to our training steps where were we i think we're up to back propagation have we done backwards yes so the back propagation is where we do the backwards step so the forward pass forward propagation go from input to output back propagation we take the gradients of the loss function with respect to each parameter in our model by going backwards that's what happens when we call lost dot backward pi torch does that for us behind the scenes and then finally step number five is step the optimizer we've kind of discussed that as i said if we take a step let's get our loss curve back up loss function curve doesn't really matter what curve we use the optimizer step is taking a step this way to try and optimize the parameters so that we can get down to the bottom here and i also just noted here that you can turn all of this into a function so we don't necessarily have to remember to write these every single time the ordering of this you'll want to do the forward pass first and then calculate the loss because you can't calculate the loss unless you do the forward pass i like this ordering here of these three as well but you also want to do the optimizer step after the loss backward so this is my favorite ordering it works if you like this ordering you can take that as well with that being said i think this video has gotten long enough in the next video i'd like to step through this training loop one epoch at a time so that we can see i know i've just thrown a lot of words at you that this optimizer is going to try and optimize our parameters each step but let's see that in action how our parameters of our model actually change every time we go through each one of these steps so i'll see you in the next video let's step through our model welcome back now we've spent a fair bit of time on the training loop and the testing loop well we haven't even got to that yet but there's a reason behind this because this is possibly one of the most important things aside from getting your data ready which we're going to see later on in pi torch deep learning is writing the training loop because this is literally like how your model learns patterns and data so that's why we're spending a fair bit of time on here and we'll get to the testing loop because that's how you evaluate the patterns that your model has learned from data which is just as important as learning the patterns themselves and following on from the last couple of videos i've just linked some youtube videos that i would recommend for extra curriculum for back propagation which is what happens when we call lost dot backward down here and for the optimizer step gradient descent is what's happening there so i've linked some extra resources for what's going on behind the scenes there from a mathematical point of view remember this course focuses on writing pytorch code but if you'd like to dive into what math pi torch is triggering behind the scenes i'd highly recommend these two videos and i've also added a note here as to which loss function and optimizer should i use which is a very valid question and again it's another one of those things that's going to be problem specific but with experience over time you work with machine learning problems you write a lot of code you get an idea of what works and what doesn't with your particular problem set for example like a regression problem like ours regression is again predicting a number we use mae loss which point which causes l1 loss you could also use mse loss and an optimizer like torch optim stochastic gradient descent will suffice but for classification you might want to look into a binary classification a binary cross entropy loss but we'll look at a classification problem later on in the course for now i'd like to demonstrate what's going on in the steps here so let's go model zero let's look up the state dict and see what the parameters are for now now they aren't the original ones i don't think let's reinstantiate our model so we get really new parameters yeah we recreated it here i might just get rid of that so we'll rerun our model code rerun model state dict and we will create an instance of our model and just make sure your parameters should be something similar to this if it's not exactly like that it doesn't matter but yeah i'm just going to showcase you'll see on my screen what's going on anyway state dick 3367 for the weight and 0 1 2 88 for the bias and again i can't stress enough we've only got two parameters for our model and we've set them ourselves future models that you build and later ones in the course will have much much more and we won't actually explicitly set any of them ourselves we'll check out some predictions they're going to be terrible because we're using random parameters to begin with but we'll set up a new loss function and an optimizer optimizer is going to optimize our model 0 parameters the weight and bias the learning rate is 0.01 which is relatively large step that would be a bit smaller remember the larger the learning rate the bigger the step the more the optimizer will try to change these parameters every step but let's stop talking about it let's see it in action i've set a manual seed here too by the way because the optimizer steps are going to be quite random as well depending on how the model's predictions go but this is just to try and make it as reproduced as possible so keep this in mind if you get different values to what we're going to output here from my screen to your screen don't worry too much what's more important is the direction they're going so [Music] ideally we're moving these values here this is from we did one epoch before we're moving these values closer to the true values and in practice you won't necessarily know what the true values are but that's where evaluation of your model comes in we're going to cover that when we write a testing loop so let's run one epoch now i'm gonna keep that down there watch what happens we've done one epoch just a single epoch we've done the forward pass we've calculated the loss we've done optimizer zero grad we've performed back propagation and we've stepped the optimizer what does stepping the optimizer do it updates our model parameters to try and get them further closer towards the weight and bias if it does that the loss will be closer to zero that's what it's trying to do how about we print out the loss at the same time print loss and the loss let's take another step so the loss is 0 301. now we check the weights and the bias we've changed again three three four four five one four eight eight we go again oh the loss is going down check it hey look at that the values are getting closer to where they should be if ever so slightly loss went down again oh my goodness this is so amazing look we're training our let's print this out in the same cell print out model state dict we're training our first machine learning model here people this is this is very exciting even if it's only step by step and it's only a small model this is very important loss is going down again values are getting closer to where they should be again we won't really know where they should be in real problems but for now we do so let's just get excited the real way to sort of measure your model's progress in practice is a lower loss value remember lower is better a loss value measures how wrong your model is we're going down we're going in the right direction so that's what i meant by as long as your values are going in the similar direction so down we're writing similar code here but if your values are slightly different in terms of the exact numbers don't worry too much because that's inherent to the randomness of machine learning because the steps that the optimizer are taking are inherently random but they're sort of pushed in a direction so we're doing gradient descent here this is beautiful how low can we get the loss how about we try to get to 0.1 look at that we're getting close to 0.1 and then i mean we don't have to do this and by hand oh the bias is getting close to where it exactly should be we're below 0.1 beautiful so that was only about say 10 passes through the data but now you're seeing it in practice you're seeing it happen you're seeing gradient descent let's go gradient descent work in action we've got images this is what's happening we've got our cost function j is another term for cost function which is also our loss function we start with an initial weight what have we done we started with an initial weight this value here and what are we doing we've measured the gradient pie torch has done that behind the scenes for us thank you pie torch and we're taking steps towards the minimum that's what we're trying to do if we minimize the gradient of our weight we minimize the cost function which is also a loss function we could keep going here for hours and get as low as we want but my challenge for you or actually how about we make some predictions with our model we've got right now let's make some predictions so with torch dot inference mode we'll make some predictions together and then i'm going to set you a challenge how about you run this code here for 100 epochs after this video and then you make some predictions and see how that goes so why preds remember how poor our predictions are why preds new equals we just do the forward pass here model 0 on the test data let's just remind ourselves quickly of how poor our previous predictions were plot predictions predictions equals y do we still have this saved why preds hopefully this is still saved there we go shocking predictions but we've just done 10 or so epochs so 10 or so training steps have our predictions do they look any better let's run this we'll copy this code you know my rule i don't really like to copy code but in this case i just want to exemplify a point i like to write all the code myself what do we got why preds new look at that look at that we are moving our predictions closer the red dots closer to the green dots this is what's happening we're reducing the loss in other words we're reducing the difference between our models predictions and our ideal outcomes through the power of back propagation and gradient descent so this is super exciting we're training our first machine learning model my challenge to you is to run this code here change epochs to a hundred see how low you can get this loss value and run some predictions plot them and i think it's time to start testing so give that a go yourself and then we'll write some testing code in the next video i'll see you there welcome back in the last video we did something super excited we saw our loss go down so the loss is remember how different our models predictions are to what we'd ideally like them and we saw our model update its parameters through the power of back propagation and gradient descent all taken care of behind the scenes for us by pytorch so thank you pie torch and again if you'd like some extra resources on what's actually happening from a math perspective for back propagation and gradient descent i would refer to you to these otherwise this is also how i learn about things gradient descent there we go how does gradient descent work and then we've got backpropagation and just to reiterate i am doing this just googling these things because that's what you're going to do in practice you're going to come across a lot of different things that aren't covered in this course and this is seriously what i do day to day as a machine learning engineer if i don't know what's going on i just go to google read watch a video write some code and then i build my own intuition for it but with that being said i also issued you the challenge of trying to run this training code for 100 epochs did you give that a go i hope you did and how low did your loss value did the weights and bias get anywhere close to where they should have been how do the predictions look now i'm going to save that for later on running this code for 100 epochs for now let's write some testing code and just a note you don't necessarily have to write the training and testing loop together you can functionalize them which we will be doing later on but for the sake of this intuition building and code practicing and first time where we're writing this code together i'm going to write them together so testing code we call model. what does this do so this turns off different settings in the model not needed for evaluation slash testing this can be a little confusing to remember when you're writing testing code but we're going to do it a few times until it's habit so just make it a habit if you're training your model call model.train to make sure it's in training mode if you're testing or evaluating your model so that's what a vowel stands for evaluate call model dot eval so it turns off different settings in the model not needed for evaluation so testing this is things like drop out we haven't seen what dropout is slash batch norm layers but if we go into torch.nn i'm sure you'll come across these things in your future machine learning endeavors so dropout drop out layers there we go and batch norm do we have batch batch norm there we go if you'd like to work out what they are feel free to check out the documentation just take it from me for now that model eval turns off different settings not needed for evaluation and testing then we set up with torch dot inference mode inference mode so what does this do let's write down here so this turns off gradient tracking so as we discussed if we have parameters in our model and it turns off actually a few more things and a couple more things behind the scenes these are things again not needed for testing so we discussed that if parameters in our model have requires grad equals true which is the default for many different parameters in pi torch pi torch will behind the scenes keep track of the gradients of our model and use them in backward and optimizer step for back propagation and gradient descent however we only need those two back propagation and gradient descent during training because that is when our model is learning when we are testing we adjust just evaluating the parameters the patterns that our model has learned on the training data set so we don't need to do any learning when we're testing so we turn off the things that we don't need and is this going to have the correct spacing for me i'm not sure we'll find out so we still do the forward pass in testing mode do the forward pass and if you want to look up torch inference mode just go torch inference mode there's a great tweet about it that pi torch did which explains what's happening i think we've covered this before but yeah want to make your inference code in pine torch run faster here's a quick thread on doing exactly that so inference mode is torch no grad again you might see torch no grad i think i'll write that down just to let you know but here's what's happening behind the scenes a lot of optimization code which is beautiful this is why we're using pytorch so that our code runs nice and far then we go there you may also see with torch.nograd in older high torch code it does similar things but inference mode is the faster way of doing things according to the thread and according to there's a blog post attached to there as well i believe so you may also see torch dot no grad in older pie torch code which would be valid but again inference mode is the better way of doing things so do forward pass so let's get our model we want to create test predictions here so we're going to go model zero there's a lot of code going on here but i'm going to just step by step it in a second we'll go back through it all and then number two is calculate the loss now we're doing the test predictions here calculate the loss test predictions with model 0 so now we want to calculate the what we want to calculate the test loss so this will be our loss function the difference between the test spread and the test labels that's important so for testing we're working with test data for training we're working with training data model learns patterns on the training data and it evaluates those patterns that it's learned the different parameters on the testing data data it has never seen before just like in a university course you'd study the course materials which is the training data and you'd evaluate your knowledge on materials you'd hopefully never seen before unless you sort of were friends with your professor and they gave you the exam before the actual exam that would be cheating right so that's a very important point for the test data set don't let your model see the test data set before you evaluate on it otherwise you'll get poor results and that's print out what's happening epoc we're going to go epoch and then i will introduce you to my little jingle to remember all of these steps because there is a lot going on don't you worry i know there's a lot going on but again with practice we're going to know what's happening here like it's the back of our hand all right so do we need this oh yeah we could say that oh no we don't need test here loss this is loss not test print out what's happening okay and we don't actually need to do this every epoch we could just go say if epoch divided by 10 equals zero print out what's happening let's do that rather than clutter everything up print it out and we'll print out this so let's just step through what's happening we've got a hundred epochs that's what we're about to run 100 epochs our model is trained for about 10 so far so it's got a good base maybe we'll just get rid of that bass start a new instance of our model so we'll come right back down so our model is back to randomly initialized parameters but of course randomly initialized flavored with a random seed of 42 lovely lovely and so we've got our training code here we've discussed what's happening there now we've got our testing code we call model.val which turns off different settings in the model not needed for evaluation slash testing we call with torch inference mode context manager which turns off gradient tracking and a couple more things behind the scenes to make our code faster we do the forward pass we do the test predictions we pass our model the test data the test features to calculate the test predictions then we calculate the loss using our loss function we can use the same loss function that we used for the training data and it's called the test loss because it's on the test data set and then we print out what's happening because we want to know what's happening while our model is training we don't necessarily have to do this but the beauty of pi torch is you can use basic python printing statements to see what's happening with your model and so because we're doing 100 epochs we don't want to clutter up everything here so we'll just print out what's happening every 10th epoch again you can customize this as much as you like what's printing out here this is just one example if you had other metrics here such as calculating model accuracy we might see that later on hint we might print out our model accuracy so this is very exciting are you ready to run 100 epochs how low do you think our loss can go this loss was after about 10 so let's just save this here let's give it a go ready three two one let's run oh my goodness look at that weights here we go every 10 epochs we're printing out what's happening so the zeroth epoch we started with loss is 312. look at it go down yes that's what we want and are our weights and bias are they moving towards our ideal weight and bias values of 0.7 and 0.3 yes they're moving in the right direction here the loss is going down epoch 20 wonderful epoch 30 even better 40 50. going down downtown yes this is what we want this is what we want now we're predicting a straight line here look how low the loss gets after 100 epochs we've got about three times less than what we had before and then we got these values are quite close to where they should be zero five six two nine zero three five seven three we'll make some predictions what do they look like why preds new this is the original predictions with random values and if we make y preds new look how close it is after a hundred epochs now what's our do we print out the test loss oh no we're printing out loss as well let's get rid of that i think this is this yeah that's this statement here our code would have been a much cleaner if we didn't have that but that's all right life goes on so our test loss because this is the test predictions that we're making is not as low as our training loss hmm i wonder how we could get that lower what do you think we could do we just trained it for longer and what happened how do you think you could get these red dots to line up with these green dots do you think you could so that's my challenge to you for the next video think of something that you could do to get these red dots to match up with these green dots maybe train for longer how do you think you could do that so give that a shot and i'll see in the next video we'll review what our testing code is doing i'll see you there welcome back in the last video we did something super exciting we trained our model for 100 epochs and look how good the predictions got but i finished it off with challenging you to see if you could align the red dots with the green dots and it's okay if you're not sure how the best way to do that that's what we're here for we're here to learn what are the best way to do these things together but you might have had the idea of potentially training the model for a little bit longer so how could we do that well we could just rerun this code so the model is going to remember the parameters that it has from what we've done here and if we rerun it well it's going to start from where it finished off which is already pretty good for our data set and then it's going to try and improve them even more this is i can't stress enough like what we are doing here is going to be very similar throughout the entire rest of the course for training more and more models so this step that we've done here for training our model and evaluating it is seriously like the fundamental steps of deep learning with pytorch is training and evaluating a model and we've just done it although albeit to predict some red dots and green dots that's all right so let's try to line them up hey red dots onto green dots i reckon if we train it for another 100 epochs we should get pretty darn close ready three two one i'm going to run this cell again runs really quick because our data is nice and simple but look at this lastly we start at o244 where do we get down to 008 oh my goodness so we've improved it by another 3x or so and now this is where our model's got really good on the test loss we've gone from 0.0564 and we've gone down to 005 so almost 10x improvement there and so we make some more predictions what are our model parameters remember the ideal ones here we won't necessarily know them in practice but because we're working with a simple data set we know what the ideal parameters are model 0 state deck weights these are what they previously were what are they going to change to oh would you look at that 0.6990 now again if yours are very slightly different to mine don't worry too much that is the inherent randomness of machine learning and deep learning even though we set a manual seed it may be slightly different the direction is more important so if your number here is not exactly what mine is it should still be quite close to 0.7 and the same thing with this one if it's not exactly what mine is don't worry too much the same with all of these loss values as well the direction is more important so we're pretty darn close how do these predictions look remember these are the original ones we started with random and now we've trained a model so close so close to being exactly that so a little bit off but that's all right we could tweak a few things to improve this but i think that's that's well and truly enough for this example purpose you see what's happened of course we could just create a model and set the parameters ourselves manually but where would be the fun in that we just wrote some machine learning code to do it for us through the power of back propagation and gradient descent now in the last video we wrote the testing loop we discussed a few of the steps here but now let's go over it with a colorful slide hey because i mean coat on a page is nice but colors are even nicer oh we haven't done this we might set up this in this video too but let's just discuss what's going on create an empty list for storing use for value so this is helpful for tracking model progress how do we just do this right now hey we'll go here and we'll go so what did we have epoch count equals that and then we'll go lost values so why do we keep track of these it's because if we want to monitor our model's progress this is called tracking experiments so track different values if we wanted to try and improve upon our current model with a future model so our current results such as this if we wanted to try and improve upon it we might build an entire other model and we might train it in a different setup we might use a different learning rate we might use a whole bunch of different settings but we track the values so that we can compare future experiments to past experiments like the brilliant scientists that we are and so where could we use these lists well we're calculating the loss here and we're calculating the test loss here so maybe we each time append what's going on here as we do a status update so epoch count dot append and we're going to go the current epoch and then we'll go lost values dot append the current loss value and then we will do test loss values dot append the current test loss values wonderful and now let's say instantiate our model so that it starts from fresh so this is just create another instance so we're just going to re-initialize our model parameters to start from zero if we wanted to we could functionalize all of this so we don't have to go right back up to the top of the code but just for demo purposes we're doing it how we're doing it and i'm going to run this for let's say 200 epochs because that's what we ended up doing right we ran it for 200 epochs because we did 100 epochs twice and i want to show you something beautiful one of the most beautiful sights in machine learning so there we go we run it for 200 epochs we start with a fairly high training loss value and a fairly high test loss value so remember what is our loss value it's mae so if we go back yeah this is what we're measuring for loss so this means for the test loss on average each of our dot points here the red predictions are 0.481 that's the average distance between each dot point and then ideally what are we doing we're trying to minimize this distance that's the mae so the mean absolute error and we get it right down to 0.05 and if we make predictions what do we have here we get very close to the ideal weight and bias make our predictions have a look at the new predictions yeah very small distance here beautiful that's a low loss value ideally they'd line up but we've got as close as we can for now so this is one of the most beautiful sites in machine learning so plot the loss curves so let's make a plot because what were we doing we were tracking the value of epoch count loss values and test loss values let's have a look at what these all look like so epoc count goes up loss values ideally go down so we'll get rid of that we're going to create a plot plt.plot we're going to step back through the test loop in a second with some colorful slides label equals train loss and then we're going to go plot you might be able to tell what's going on here test loss values we're going to visualize it because that's the data explorer's motto right is visualize visualize visualize this is equals see collab does this autocorrect that doesn't really work very well and i don't know when it does it and why it doesn't have we got oh no we didn't we didn't say loss value so that's a good autocorrect thank you colab so training and loss and test loss curves so this another term you're going to come across often is a loss curve now you might be able to think about a loss curve if we're doing a loss curve and it's starting at the start of training what do we want that curve to do what do we want our loss value to do we want it to go down so what should an ideal loss curve look like well we're about to see a couple let's have a look oh what do we got wrong we need to i'll turn it into numpy is this what we're getting wrong so why is this wrong lost values why are we getting an issue test loss values ah because they're all tensor values so i think we should let's i might change this to numpy oh can i just do that if i just call this as a numpy array we're going to try and fix this on the fly people numpy array will this turn this into a numpy array let's see if we get numpy i'm figuring these things out together numpy as numpy because matplotlib works with numpy yeah there we go so can we do lost values maybe i'm going to try one thing torch dot tensor loss values and then call cpu dot numpy see what happens here there we go okay so let's just copy this so what we're doing here is our loss values are still on pi torch and they can't be because matplotlib works with numpy and so what we're doing here is we're converting our loss values of the training loss to numpy and if you call from the fundamentals section we call cpu and numpy i wonder if we can just do straight up numpy because we're not working on the yeah okay we don't need cpu because we're not working on the gpu yet but we might need that later on will this work beautiful there we go one of the most beautiful sites in machine learning is a declining loss curve so this is how we keep track of our experiments or one way quite rudimentary we'd like to automate this later on but i'm just showing you one way to keep track of what's happening so the training loss curve is going down here the training loss starts at 0.3 and then it goes right down the beautiful thing is they match up if there was a too big a distance behind the train loss and the test loss or sorry between then we're running into some problems but if they match up closely at some point that means our model is converging and the loss is getting as close to zero as it possibly can if we trained for longer maybe the loss will go almost basically to zero but that's an experiment i'll leave you to try to train that model for longer let's just step back through our testing loop to finish off this video so we did that we created empty lists for strong useful values storing useful values strong use for values told the model what we want to evaluate or that we want to evaluate so we put it in evaluation mode it turns off functionality used for training but not evaluations such as drop out and batch normalization layers if you want to learn more about them you can look them up in the documentation turn on torch inference mode so this is for faster performance so we don't necessarily need this but it's good practice so i'm going to say that yes turn on torch inference mode so this disables functionality such as gradient tracking for inference gradient tracking is not needed for inference only for training now we pass the test data through the model so this will call the model's implemented forward method the forward pass is the exact same as what we did in the training loop except we're doing it on the test data so big notion there training loop training data testing loop testing data then we calculate the test loss value how wrong the model's predictions are on the test data set and of course lower is better then finally we print out what's happening so we can keep track of what's going on during training we don't necessarily have to do this you can customize this print value to print out almost whatever you want because it's pytorch is basically very beautifully interactive with pure python and then we keep track of the values of what's going on on epochs and train loss and test loss we could keep track of other values here but for now we're just going okay what's the loss value at a particular epoch for the training set and for the test set and of course all of this could be put into a function and that way we won't have to remember these steps off by heart but the reason why we've spent so much time on this is because we're going to be using this training and test functionality for all of the models that we build throughout this course so give yourself a pat on the back for getting through all of these videos we've written a lot of code we've discussed a lot of steps but if you'd like a song to remember what's happening let's finish this video off with my unofficial pie torch optimization loop song so for an epoch in a range call model dot train do the forward pass calculate the loss optimize the zero grad last backward optimize a step step step note you only have to call this once but now let's test commodore dot a vowel with torch inference mode do the forward pass calculate the loss and then the real song goes for another epoch because you keep going back through but we finish off with print out what's happening and then of course we evaluate what's going on with that being said it's time to move on to another thing but if you'd like to review what's happening please please please try to run this code for yourself again and check out the slides and also check out the extra curriculum oh by the way if you want a link to all of the extra curriculum just go to the book version of the course and it's all going to be in here so that's there ready to go everything i link as extracurriculum will be in the extra curriculum of each chapter i'll see you in the next video welcome back in the last video we saw how to train our model and evaluate it by not only looking at the loss metrics and the loss curves but we also plotted our predictions and we compared them hey have a go at these random predictions quite terrible but then we trained a model using the power of back propagation and gradient descent and now look at our predictions they're almost exactly where we want them to be and so you might be thinking well we've trained this model and it took us a while to write all this code to get some good predictions how might we run that model again so i've took a little break after the last video but now i've come back and you might notice that my google colab notebook has disconnected so what does this mean if i was to run this is it going to work i'm going to connect to a new google colab instance but will we have all of the code that we've run above you might have already experienced this if you took a break before and came back to the videos ah so plot predictions is no longer defined and do you know what that means that means that our model is also no longer defined so we would have lost our model we would have lost all of that effort of training now luckily we didn't train the model for too long so we can just go run time run all and it's going to re-run all of the previous cells and be quite quick because we're working with a small data set and using a small model but we've been through all of this code oh what have we got wrong here model 0 state dict oh that's all right this is good we're finding errors so if you want to as well you can just go run after it's going to run all the cells after beautiful and we come back down there's our model training we're getting very similar values to what we've got before there's the lost curves beautiful still going okay now our predictions are back because we've rerun all the cells and we've got our model here so what we might cover in this video is saving a model in pi torch because if we're training a model and you get to a certain point especially when you have a larger model you probably want to save it and then reuse it in this particular notebook itself or you might want to save it somewhere and send it to your friend so that your friend can try it out or you might want to use it in a week's time and if google collab is disconnected you might want to be able to load it back in somehow so now let's see how we can save our models in pi torch so i'm going to write down here there are three main methods you should know about for saving and loading models in pi torch because of course with saving comes loading so we're going to over the next two videos discuss saving and loading so one is torch dot save and as you might guess this allows you to save a pi torch object in python's pickle format so you may or may not be aware of python pickle there we go python object serialization there we go so we've got the pickle module implements our binary protocols or implements binary protocols for serializing and deserializing a python object so serializing means i understand it is saving and deserializing means that it's loading so this is what pi torch uses behind the scenes which is from pure python so if we go back here in python's pickle format number two is torch.load which you might be able to guess what that does as well allows you to load a saved pi torch object and number three is also very important is torch.nn dot module dot load state dict now what does this allow you to do well this allows you to load a model's saved dictionary or save state dictionary yeah that's what we'll call it save state dictionary beautiful and what's the model state dict well let's have a look model zero dot state digged the beauty of pi torch is that it stores a lot of your model's important parameters in just a simple python dictionary now it might not be that simple because our model again only has two parameters in the future you may be working with models with millions of parameters so looking directly at the state dick may not be as simple as what we've got here but the principle is still the same it's still a dictionary that holds the state of your model and so i've got these three methods i want to show you where from because this is going to be your extracurriculum save and load models your extra curriculum for this video if we go into here this is a very very very important piece of pipe torch documentation or maybe even a tutorial so your extra curriculum for this video is to go through it here we go we've got torch save torch load torch module state dig that's where or load st dick that's where i've got the three things that we've just written down and there's a fair few different pieces of information so what is a state dict so in pi torch the learnable parameters i.e the weights and biases of a torch nn module which is our model remember our model subclasses nn module are contained in the model's parameters accessed with model.parameters a state dict is simply a python dictionary object that maps each layout to its parameter tensor that's what we've seen and so then if we defined a model we can initialize the model and if we wanted to print the state dict we can use that the optimizer also has a state dict so that's something to be aware of you can go optimizer.statedict and then you get an output here and this is our saving and loading model for inference so inference again is making a prediction that's probably what we want to do in the future at some point for now we've made predictions right within our notebook but if we wanted to use our model outside of our notebook say in an application or in a another notebook that's not this one you'll want to know how to save and load it so the recommended way of saving and loading a pie touch model is by saving its state digged now there is another method down here which is saving and loading the entire model so your extracurricular for this lesson we're going to go through the code to do this but your extracurricular is to read all of the sections in here and then figure out what the pros and cons are of saving and loading the entire model versus saving and loading just the state dict so that's a challenge for you for this video i'm going to link this in here and now let's write some code to save our model so pytorch save and load code code tutorial plus extra curriculum so if we go saving our pi torch model so what might we want what do you think the save parameter takes if we have torch.save what do you think it takes inside it well let's find out together hey so let's import pathlib we're going to see why in a second this is python's module for dealing with writing file paths so if we wanted to save something to this is google colabs files section over here but just remember if we do save this from within google colab the model will disappear if our google colab notebook instance disconnects so i'll show you how to download it from google colab if you want google colab also has a way save from google colab google colab to google drive to save it to your google drive if you wanted to but i'll leave you to look at that on your own if you like so we're first going to create a model directory so create models directory so this is going to help us create a folder over here called models and of course we could create this by hand by adding a new folder here somewhere but i like to do it with code so model path we're going to set this to path which is using the path library here to create us a path called models simple we're just going to save all of our models to the models file and then we're going to create model path we're going to make that directory uh modelpath.mkdir for make directory we're going to set parents to equals true and we're also going to set exist okay equals to true that means if it already existed it won't throw us an error it will try to create it but if it already exists it'll just recreate the parent's directory or it'll leave it there it won't error out on us we're also going to create a model save path this way we can give our model a name right now it's just model zero we want to save it under some name to the model's directory so let's create the model name model name equals 01. i'm going to call it o1 for the section that way if we have more models later on the course we know which ones come from where you might create your own naming convention model workflow pi torch workflow model0.pth and now this is another important point pi torch objects usually have the extension.p for pi torch or dot pt so if we go in here and if we look up dot pt yeah a common convention is to save models using either a dot pt or dot pth file extension i'll let you choose which one you like i like dot pth so if we go down here dot pth they both result in the same thing you just have to remember to make sure you write the right loading path and right saving path so now we're going to create a model save path which is going to be our model path and because we're using the path lib we can use this syntax that we've got here model path slash model name and then if we just print out model save path what does this look like there we go so it creates us a posix path using the path lib library of models slash 01 pi torch workflow model 0.pth we haven't saved our model there yet it's just got the path that we want to save our model ready so if we refresh this we've got models over here do we have anything in there no we don't yet so now is our step to save the model so three is save the model state dict why are we saving the state dict because that's the recommended way of doing things if we come up here saving and loading the model for inference save and load the state dict which is recommended we could also save the entire model but that's part of your extracurriculum to look into that so let's use some syntax it's quite like this torch.save and then we pass it an object and we pass it a path of where to save it we already have a path and good thing is we already have a model so we just have to call this let's try it out so let's go print f saving model to and we'll put in the path here model save path i like to print out some things here and there that way we know what's going on and oh i don't need that capital a do i getting a little bit trigger happy here with the typing so torch dot save and we're going to pass in the object parameter here and if we looked up torch save we can go what is this code take so torch save object f what is f a file like object okay or a string or os path like object beautiful that's what we've got a path like object containing a file name so let's jump back into here the object is what it's our model 0 dot state dict that's what we're saving and then the file path is model save path you're ready let's run this and see what happens beautiful saving model to models so that's our model path and there's our model there so if we refresh this what do we have over here wonderful we've saved our trained model so that means we could potentially if we wanted to you could download this file here that's going to download it from google colab to your local machine that's one way to do it but there's also a guide here to save from google collaboratory to google drive that way you could use it later on so there's many different ways the beauty of pi torch is its flexibility so now we've got a saved model but let's just check using our ls command we're going to check models yeah let's just check models this is going to check here so this is list wonderful there's our o1 pi torch workflowmodel0.pth now of course we've saved a model how about we try loading it back in and seeing how it works so if you want a challenge read ahead in the documentation and try to use torch.load to bring our model back in see what happens i'll see you in the next video welcome back in the last video we wrote some code here to save our pi torch model i'm just going to x out of this couple of things that we don't need just to clear up the screen and now we've got our pth file because remember.pth or dot pt is a common convention for saving a pi torch model we've got it saved there and we didn't necessarily have to write all of this path style code but this is just handy for later on if we wanted to functionalize this and create in say a save dot pi file over here so that we could just call our save function and pass it in a file path where we wanted to save like a directory and a name and then it'll save it exactly how we want it for later on but now we've got a saved model i issued a challenge of trying to load that model in so do we have torch.load in here did you try that out we've got oh we've got a few options here wonderful but we're using one of the first ones so let's go back up here if we wanted to check the documentation for torch.load we've got this option here load what happens loads and objects save with torch.save from a file torch.load uses python's unpickling facilities but treats storages which underlie tensors specially they are first deserialized on the cpu and then are moved to the device they were saved from wonderful so this is moved to the device if later on when we're using a gpu this is just something to keep in mind we'll see that when we start to use a cpu and a gpu but for now let's practice using the torch.load method and see how we can do it so we'll come back here and we'll go loading a pi torch model and since we just going to start writing here since we saved our model's state deck so just the dictionary of parameters from a model rather than the entire model we'll create a new instance of our model class and load the state dick load the saved state dick that's better state dick into that now this is just words on a page let's see this in action so to load in a state dict which is what we saved we didn't save the entire model itself which is one option that's extracurricular but we saved just the model state dick so if we remind ourselves what model zero dot state dick looks like we saved just this so to load this in we have to instantiate a new class or a new instance of our linear regression model class so to load in a saved state deck we have to instantiate a new instance of our model class so let's call this loaded model zero i like that that way we can differentiate because it's still going to be the same parameters as model 0 but this way we know that this instance is the loaded version not just the version we've been training before so we'll create a new version of it here linear regression model this is just the code that we wrote above linear regression model and then we're going to load the saved state deck of model 0 and so this will update the new instance with updated parameters so let's just check before we load it we haven't written any code to actually load anything what does loaded model zero what does the state dick look like here it won't have anything it'll be initialized with what oh loaded that's what i called it loaded see how it's initialized with random parameters so essentially all we're doing when we load a state dictionary into our new instance of our model is that we're going hey take the saved state dick from this model and plug it into this so let's see what happens when we do that so loaded model zero remember how i said there's a method to also be aware of up here which is torch nnmodule.loadstatedick and because our model is a what it's a subclass of torch.nn.module so we can call load state dict on our model directly or on our instance so recall linear regression model is a subclass of nn.module so let's call in load state dick and this is where we call the torch.load method and then we pass it the model save path is that what we called it because torch.load it takes in f so what's f a file like object or a string or os path like object so that's why we created this path like object up here model save path so all we're doing here we're creating a new instance linear regression model which is a subclass of nn.module and then on that instance we're calling in load state dict of torch.load model save path because what's saved at the model save path our previous models state digged which is here so if we run this let's see what happens all keys match successfully that is beautiful and so see the values here loaded state dick of model 0 well let's check the loaded version of that we now have wonderful we have the exact same values as above but there's a little way that we can test this so how about we go make some predictions so make some predictions just to make sure with our loaded model so let's put it in a vowel mode because when you make predictions you want it in evaluation mode so it goes a little bit faster and we want to also use inference mode so with torch dot inference mode for making predictions we want to write this loaded model threads we're going to make some predictions on the test data as well so loaded model 0 we're going to forward pass on the x test data and then we can have a look at the loaded model threads wonderful and then to see if the two models are the same we can compare loaded model breads with original model threads so why preds these should be equivalent equals equals loaded model threads do we have the same thing oh false false false what's going on here why preds how much different are they oh where's that happened have we made some model threads with this yet so how about we make some model threads this is troubleshooting on the fly team so let's go model 0 dot eval and then with torch dot inference mode this is how we can check to see that our two models are actually equivalent why preds equals i have a feeling why pred is actually saved somewhere else equals model 0 and then we pass it the x test data and then we might move this above here and then have a look at what why preds equals do we get the same output yes we should wonderful okay beautiful so now we've covered saving and loading models or specifically saving the models state digged so we saved it here with this code and then we loaded it back in with load state dick plus torch load and then we check to see by testing equivalence of the predictions of each of our models so the original one that we trained here model 0 and the loaded version of it here so that's saving and loading a model in pytorch there are a few more things that we could cover but i'm going to leave that for extracurriculum we've covered the two main things or three main things one two three if you'd like to read more i'd highly encourage you to go through and read this tutorial here but with that being said we've covered a fair bit of ground over the last few videos how about we do a few videos where we put everything together just to reiterate what we've done i think that'll be good practice i'll see you in the next video welcome back over the past few videos we've covered a whole bunch of ground in a pie torch workflow starting with data then building a model well we split the data then we built a model we looked at the model building essentials we checked the contents of our model we made some predictions with a very poor model because it's based off random numbers we spent a whole bunch of time figuring out how we could train a model we figured out what the loss function is we saw an optimizer we wrote a training and test loop we then learned how to save and load a model in pytorch so now i'd like to spend the next few videos putting all this together we're not going to spend as much time on each step but we're just going to have some practice together so that we can reiterate all the things that we've done so putting it all together let's go back through the steps above and see it all in one place wonderful so we're going to start off with 6.1 and we'll go have a look at our workflow so 6.1 is data but we're going to do one step before that and i'm just going to get rid of this so we have a bit more space so we've got our data ready we've turned it into tensors way back at the start then we built a model and then we picked a loss function and an optimizer we built a training loop we trained our model we made some predictions we saw that they were better we evaluated our model we didn't use torch metrics but we got visual we saw our red dots starting to line up with the green dots we haven't really improved through experimentation we did a little bit of it though as in we saw that if we trained our model for more epochs we got better results so you could argue that we have done a little bit of this but there are other ways to experiment we're going to cover those throughout the course and then we saw how to save and reload a trained model so we've been through this entire workflow which is quite exciting actually so now let's go back through it but we're going to do it a bit quicker than what we've done before because i believe you've got the skills to do so now so let's start by importing pytorch so you could start the code from here if you wanted to and matplotlib and actually if you want you can pause this video and try to re-code all of the steps that we've done by putting some headers here like data and then build a model and then train a model save and loader model whatever and try to code it out yourself if not feel free to follow along with me and we'll do it together so import torch from torch import oh would help if i could spell torch import nn because we've seen that we use nn quite a bit and we're going to also import matplotlib because we like to make some plots because we like to get visual visualize visualize visualize as plt and we're going to check our pytorch version that way we know if you're on an older version some of the code might not work here but if you're on a newer version it should work if it doesn't let me know there we go 1.10 i'm using 1.10 for this by the time you watch this video there may be a later version out and we're also going to let's create some device agnostic code so create device agnostic code because i think we're up to this step now this means if we've got access to a gpu our code will use it for potentially faster computing if no gpu is available the code will default to using cpu we don't necessarily need to use a gpu for our particular problem that we're working on right now because it's a small model and it's a small data set but it's good practice to write device agnostic code so that means our code will use a gpu if it's available or a cpu by default if a gpu is not available so set up device agnostic code we're going to be using a similar setup to this throughout the entire course from now on so that's why we're bringing it back cuda is available so remember cuda is nvidia's programming framework for their gpus lcu's cpu and we're going to print what device are we using device so what we might do is if we ran this it should be just a cpu for now right yours might be different to this if you've enabled a gpu but let's change this over to use cuda and we can do that if you're using google colab we can change the runtime type by selecting gpu here and then i'm going to save this but what's going to happen is it's going to restart the runtime so we're going to lose all of the code that we've written above how can we get it all back well we can go run all so this is going to run all the cells above here they should all work and it should be quite quick because our model and data aren't too big and if it all worked we should have cuda as our device that we can use here wonderful so the beauty of google colab is that they've given us access to a nvidia gpu so thank you google collab just once again i'm paying for the paid version of google colab you don't have to the free version should give you access to a gpu albeit it might not be as a later version as gpu as the pro versions give access to but this will be more than enough for what we're about to recreate so i feel like that's enough for this video we've got some device agnostic code ready to go and for the next few videos we're going to be rebuilding this except using device agnostic code so give it a shot yourself there's nothing in here that we haven't covered before so i'll see you in the next video let's create some data welcome back in the last video we set up some device agnostic code and we got ready to start putting everything we've learned together so now let's continue with that we're going to recreate some data now we could just copy this code but we're going to write it out together so we can have some practice creating a dummy data set and we want to get to about this stage in this video so we want to have some data that we can plot so that we can build a model to once again learn on the blue dots to predict the green dots so we'll come down here data i'm going to get out of this as well so that we have a bit more room let's now create some data using the linear regression formula of y equals weight times features plus bias and you may have heard this as y equals mx plus c or mx plus b or something like that or you can substitute these for different names images when i learned this in high school it was y equals mx plus c yours might be slightly different yeah bx plus a that's what they use here a whole bunch of different ways to name things but they're all describing the same thing so let's see this in code rather than formulaic examples so we're going to create our weight which is 0.7 and a bias which is 0.3 these are the values we previously used for a challenge you could change these to 0.1 maybe and 0.2 these could be whatever values you'd like to set them as so weight and bias the principle is going to be the same thing we're going to try and build a model to estimate these values so we're going to start at 0 and we're going to end at 1 so we can just create a straight line and we're going to fill in those between 0 and 1 with a step of 0.02 and now we'll create the x and y features x and y which is features and labels actually so x is our features and y are our labels x equals torch dot a range and x is a capital why is that because typically x is a feature matrix even though ours is just a vector now we're going to unsqueeze this so we don't run into dimensionality issues later on you can check this for yourself without unsqueeze errors will pop up and y equals weight times x plus bias you see how we're going a little bit faster now this is sort of the pace that we're going to start going for things that we've already covered if we haven't covered something we'll slow down but if we have covered something i'm going to step it through we're going to start speeding things up a little so if we get some values here wonderful we've got some x values and they correlate to some y values we're going to try and use the training values of x to predict the training values of y and subsequently for the test values oh and speaking of training and test values how about we split the data so let's split the data split data so we'll create the train split equals int 0.8 we're going to use 80 which is where 0.8 comes from for the length of x so we use 80 of our samples for the training which is a typical training and test split 80 20 thereabouts you could use like 70 30 you could use 90 10 it all depends on how much data you have there's a lot of things in machine learning that are quite flexible train split we're going to index on our data here so that we can create our splits google collab auto corrected my code in a non-helpful way just then and we're going to do the opposite split for the testing data now let's have a look at the lengths of these if my calculations are correct we should have about 40 training samples and 10 testing samples and again this may change in the future when you work with larger data sets you might have a hundred thousand training samples and twenty thousand testing samples the ratio will often be quite similar and then let's plot what's going on here so plot the data and note if you don't have the plot predictions function loaded this will error so we can just run plot predictions here if we wanted to and we'll pass it in x train y train x test y test and this should come up with our plot wonderful so we've just recreated the data that we've been previously using we've got blue dots to predict green dots but if this function errors out because you've started the notebook from here right from this cell and you've gone down from there just remember you'll just have to go up here and copy this function we don't have to do it because we've run all the cells but if you haven't run that cell previously you could put it here and then run it run it and we'll get the same outcome here wonderful so what's next well if we go back to our workflow we've just created some data and have we turned it into tenses yet i think it's just still oh yeah it is it's tenses because we use pytorch to create it but now we're up to building or picking a model so we've built a model previously we did that back in build model so you could refer to that code and try to build a model to fit the data that's going on here so that's your challenge for the next video so building a pi torch linear model and why do we call it linear because linear refers to a straight line what's non-linear non-straight so i'll see you in the next video give it a shot before we get there but we're going to build a pie torch linear model welcome back we're going through some steps to recreate everything that we've done in the last video we created some dummy data and we've got a straight line here so now by the workflow we're up to building a model or picking a model in our case we're going to build one to suit our problem so we've got some linear data and i've put building a pi torch linear model here i issued you the challenge of giving it a go you could do exactly the same steps that we've done in build model but i'm going to be a little bit cheeky and introduce something new here and that is the power of torch.nn so let's see it what we're going to do is we're going to create a linear model by subclassing an end.module because why a lot of pi torch models subclass nn module so class linear regression what should we call this one linear regression model v2 how about that and we'll subclass an end.module so much similar code to what we've been writing so far or when we first created our linear regression model and then we're going to put the standard constructor code here def init underscore underscore and it's going to take as an argument itself and then we're going to call super dot another underscore init underscore underscore brackets but we're going to instead of if you recall above back in the build model section we initialized these parameters ourselves and i've been hinting at in the past in videos we've seen before that oftentimes you won't necessarily initialize the parameters yourself you'll instead initialize layers that have the parameters in built in those layers we still have to create a forward method but what we're going to see is how we can use a torch linear layer to do these steps for us so let's write the code and then we'll step through it so we're going to use nn.linear because why we're building a linear regression model and our data is linear and in the past our previous model has implemented the linear regression formula so for creating the model parameters so we can go self dot linear layer equals so this is constructing a variable that this class can use self linear layer equals nn dot linear remember nn and pytorch stands for neural network and we have in features as one of the parameters and out features as another parameter this means we want to take as input of size one an output of size one where does that come from well if we have a look at x train and y train we have one value of x maybe there's too many here x five will be the first five five and five so recall we have one value of x equates to one value of y so that means within this linear layer we want to take as one feature x to output one feature y and we're using just one layer here so the input and output shapes of your model in features out features what data goes in and what data comes out these values will be highly dependent on the data that you're working with and we're going to see different data or different examples of input features and output features all throughout this course so but that is what's happening we have one in feature to one out feature now what's happening inside nn.linear let's have a look torch and then linear we go the documentation applies a linear transformation to the incoming data where have we seen this before y equals x a t plus b now they're using different letters but we've got the same formula as what's happening up here we've got the same formula as our data weight times x plus bias and then if we look up linear regression formula once again linear regression formula we've got this formula here now again these letters can be replaced by whatever letters you like but this linear layer is implementing the linear regression formula that we created in our model before so it's essentially doing this part for us and behind the scenes the layer creates these parameters for us so that's a big piece of the puzzle of pytorch is that as i've said you won't always be initializing the parameters your model yourself you'll generally initialize layers and then you'll use those layers in some forward computation so let's see how we could do that so we've got a linear layout which takes us in features one and out features one what should we do now well because we've subclassed nn.module we need to override the forward method so we need to tell our model what should it do as the forward computation and in here it's going to take itself as input as well as x which is conventional for the input data and then we're just going to return here self dot linear layer x right and actually we might use some typing here to say that this should be a torch tensor and it's also going to return a torch dot tensor that's using python's tie pins so this is just saying hey x should be a torch tensor and i'm going to return you a torch tensor because i'm going to pass x through the linear layer which is expecting one in feature and one out feature and it's going to this linear transform that's another word for it again pi torch and machine learning in general has many different names for the same thing i would call this linear layer i'm going to write here also called linear transform probing layer fully connected layer dense layer in tensorflow so a whole bunch of different names for the same thing but they're all implementing a linear transform they're all implementing a version of linear regression y equals x a transpose plus b in features out features wonderful so let's see this in action so we're going to go set the manual seed so we can get reproducibility as well torch dot manual seed and we're going to set model one equals linear regression this is model one because we've already got model zero linear regression v2 and we're going to check model one and we're going to check its state dictionary state dict there we go what do we have inside this orderdinct has that not created anything for us model one dot state decked order dink we haven't got anything here in the regression model v2 ideally this should be outputting a weight and a bias yeah variables weight and bias let's dig through our code line by line and see what we've got wrong ah did you notice this the init function so the constructor had the wrong amount of underscores so it was never actually constructing this linear layer troubleshooting on the fly team there we go beautiful so we have a linear layer and we have it is created for us inside a weight and a bias so effectively we've replaced the code we wrote above for build model initializing a weight and bias parameter with the linear layer and you might be wondering why the values are slightly different even though we've used the manual seed this goes behind the scenes of how pie torch creates its different layers it's probably using a different form of randomness to create different types of variables so just keep that in mind and to see this in action we have a conversion here so this is what's going on we've converted this is our original model class linear regression we initialized our model parameters here we've got a weight and a bias but instead we've swapped this in our linear regression model v2 this should be v2 to use linear layer and then in the forward method we had to write the formula manually here when we initialized the parameters manually but because of the power of torch.net we have just passed it through the linear layer which is going to perform some predefined forward computation in this layer so this style of what's going on here is how you're going to see the majority of your pi torch deep learning models created using pre-existing layers from the torch.nn module so if we go back into torch.n torch.nn we have a lot of different layers here so we have convolutional layers pooling layers padding layers normalization recurrent transformer linear we're using a linear layer dropout etc etc so for all of the common layers in deep learning because that's what neural networks are they're layers of different mathematical transformations pytorch has a lot of pre-built implementations so that's a little bit of a sneaky trick that i've done to alter our model but we've still got basically the exact same model as we had before so what's next well it's to train this model so let's do that in the next video welcome back so in the last video we built a pi torch linear model nice and simple using a single nn dot linear layer with one in feature one out feature and we override the forward method of nn.module using the linear layout that we created up here so what's going to happen is when we do the forward pass on our model we're going to put some data in and it's going to go through this linear layer which behind the scenes as we saw with torch nn linear behind the scenes it's going to perform the linear regression formula here so y equals x a t plus b but in our case we've got weight and bias so let's go back it's now time to write some training code but before we do let's set the model to use the target device and so in our case we've got a device of cuda but because we've written device agnostic code if we didn't have access to a cuda device a gpu our default device would be a cpu so let's check the model device we can do that first up here check the model current device because we're going to use the gpu here or we're going to write device agnostic code that's better to say device agnostic code that's the proper terminology device what device are we currently using this is the cpu right so by default the model will end up on the cpu but if we set it to model one call.2 device what do you think it's going to do now if our current target device is cuda we've seen what two does in the fundamental section 2 is going to send the model to the gpu memory so now let's check where the parameters of our model live device if we send them to the device previously it was the cpu it's going to take a little bit longer while the gpu gets fired up and goes pi torch goes hey i'm about to send you this model you ready for it boom there we go wonderful so now our model is on the device or the target device which is cuda and if cuda wasn't available the target device would be cpu so this would just come out just exactly how we've got it here but with that being said now let's get on to some training code this is the fun part what do we have to do we've already seen this for training i'm just going to clear up our workspace a little bit here for training we need this is part of the pi torch workflow we need a loss function what does a loss function do measures how wrong our model is we need an optimizer we need a training loop and a testing loop and the optimizer what does that do well it optimizes the parameters of our model so in our case model one dot state dick what do we have so we have some parameters here within the linear layer we have a weight and we have a bias the optimizer is going to optimize these random parameters so that they hopefully reduce the loss function which remember the loss function measures how wrong our model is so in our case because we're working with a regression problem let's set up the loss function and by the way all of these steps are part of the workflow we've got data ready we've built or picked a model we're using a linear model now we're up to here 2.1 pick a loss function and an optimizer we're going to build a training loop in the same session because you know what we're getting pretty damn good at this loss function equals what well we're going to use l1 loss so let's set that up nn dot l1 loss which is the same as mae and if we wanted to set up our optimizer what optimizer could we use well pi torch offers a lot of optimizers in torch dot optim sgd that's stochastic gradient descent because remember gradient descent is the algorithm that optimizes our model parameters atom is another popular option for now we're going to stick with sgd lr which stands for learning rate in other words how big of a step will our optimizer change our parameters with every iteration a smaller learning rate so such as zero zero zero one will be a small step and then a large learning rate such as 0.1 will be a larger step too big of a step our model learns too much and it explodes too small of a step our model never learns anything but oh we actually have to pass params first i forgot about that i got ahead of myself with a learning rate params is the parameters we'd like our optimizer to optimize so in our case it's model one dot parameters because model one is our current target model beautiful so we've got a loss function and an optimizer now let's write a training loop so i'm going to set torch manual c so we can try and get as reproducible as results as possible remember if you get different numbers to what i'm getting don't worry too much if they're not exactly the same the direction is more important so that means if my loss function is getting smaller yours should be getting smaller too don't worry too much if your fourth decimal place isn't the same as what my values are so we have a training loop ready to be written here epochs how many should we do well we did 200 last time that worked pretty well so let's do 200 again did you go through the extra curriculum yet did you watch the video for the unofficial pie torch optimization loop song yet this one here listen to the unofficial pie torch optimization loop song if not it's okay let's sing it together so for an epoch in range epochs we're going to go through the song in a second we're going to set the model to train in our case it's model one model to train now step number one is what do the forward pass this is where we calculate the predictions so we calculate the predictions by passing the training data through our model and in our case because the forward method in model one implements the linear layer this data is going to go through the linear layer which is torch dot nn dot linear and go through the linear regression formula and then we calculate the loss which is how wrong our model's predictions are so the loss value equals loss fn and here we're going to pass in y pred and y train then what do we do we zero the optimizer optimize the zero grad which because by default the optimize is going to accumulate gradients behind the scenes so every epoch we want to reduce those back to zero so it starts from fresh we're going to perform back propagation here back propagation by calling lost stop backwards if the forward pass goes forward through the network the back would pass goes backwards through the network calculating the gradients for the loss function with respect to each parameter in the model so optimize a step this next part is going to look at those gradients and go you know what which way should i optimize the parameters so because the optimizer is optimizing the model parameters it's going to look at the loss and go you know what i'm going to adjust the weight to be increased and i'm going to lower the bias and see if that reduces the loss and then we can do testing we can do both of these in the same hit now we are moving quite fast through this because we spent a whole bunch of time discussing what's going on here so for testing what do we do we set the model into evaluation mode that's going to turn off things like dropout and batch normalization layers we don't have any of that in our model for now but just it's good practice to always call a vowel whenever you're doing testing and same with inference mode we don't need to track gradients and a whole bunch of other things pytorch does behind the scenes when we're testing or making predictions so we use the inference mode context manager this is where we're going to create test spread which is going to be our test predictions because here we're going to pass the test data features forward pass through our model and then we can calculate the test loss which is our loss function and we're going to compare the test spread to y test wonderful and then we can print out what's happening so what should we print out how about if epoch divided by 10 equals zero so every 10 epochs let's print something out print we'll do an f string here epoc is epoch and then we'll go loss which is the training loss can just be equal to the loss and then we'll go test loss is equal to test loss so do you think this will work it's okay if you're not sure but let's find out together hey oh we've got a we need a bracket there oh my goodness what's going on runtime error expected all tenses to be on the same device ah of course do you know what's happening here but we found at least two devices cuda and cpu yes of course that's what's happened so what have we done up here we put our model on the gpu but what's going on here our data as our data on the gpu no it's not by default it's on the cpu so we haven't written device agnostic code for our data so let's write it here put data on the target device device agnostic code for data so remember one of the biggest issues with pi torch aside from shape errors is that you should have your data or all of the things that you're computing with on the same device so that's why if we set up device agnostic code for our model we have to do the same for our data so now let's put x train to device y train equals y train to device this is going to create device agnostic code in our case it's going to use cuda because we have access to a cuda device but if we don't this code will still work it will still default to cpu so this is good i like that we got that error because that's some of the things you're going to come across in practice right so now let's run this what's happening here hey look at that wonderful so our loss starts up here nice and high and then it starts to go right down here for the training data and then the same for the testing data beautiful right up here and then all the way down okay so this looks pretty good on the test data set so how can we check this how can we evaluate our model well one way is to check its state dick so state what do we got here what are our weight and bias oh my gosh so close so we just set weight and bias before to be 0.7 and 0.3 so this is what our model has estimated our parameters to be based on the training data 0.6968 that's pretty close to 0.7 nearly perfect and the same thing with the bias 0.3025 versus the perfect value is 0.03 but remember in practice you won't necessarily know what the idl parameters are this is just to exemplify what our model is doing behind the scenes it's moving towards some ideal representative parameters of whatever data we're working with so in the next video i'd like you to give it a go of or before we get to the next video make some predictions with our model and plot them on the original data how close to the green dots match up with the red dots and you can use this plot predictions formula or function that we've been using in the past so give that a go and i'll see you in the next video but congratulations look how quickly we just trained a model using the steps that we've covered in a bunch of videos so far and device agnostic code so good i'll see you soon in the last video we did something very very exciting we worked through training an entire neural network some of these steps took us an hour or so worth of videos to go back through before but we coded that in one video so you ready to sing the song just to remind ourselves what's going on for an epoch in a range call model dot train do the forward pass calculate the loss optimize the zero grad last backward optimize a step step step let's test commodore data val with torch inference mode do the forward pass calculate the loss print out what's happening and then we do it again again again for another epoch in a range no i'm kidding we'll just leave it there we'll just leave it there but that's the unofficial pie torch optimization loop song we created some device agnostic code so that we could make the calculations on the same device as what our model is because the model's also using device agnostic code and so now we've got to evaluate our model so we've looked at the loss and the test lost here and we know that our model's loss is going down but what does this actually equate to when it makes predictions that's what we're most interested in right and we've looked at the parameters they're pretty close to the ideal parameters so at the end of last video i issued you the challenge to making and evaluating predictions to make some predictions and plot them i hope you gave it a shot let's see what it looks like together hey so turn the model into evaluation mode why because every time we're making predictions or inference we want our model to be in a val mode and every time we're training we want our model to be in training mode and then we're going to make predictions on the test data because we train on the train data and we evaluate our model on the test data data that our model has never actually seen except for when it makes predictions with torch inference mode we turn on inference mode whenever we make inference or predictions so we're going to set y preds equal to model 1 and the test data goes in here let's have a look at what the y prints look like wonderful so we've got a tensor here it shows us that they're still on the device cuda why is that well that's because previously we set the model 1 to the device the target device the same with the test data so subsequently our predictions are also on the cuda device now let's bring in the plot predictions function here so check out our model predictions visually we're going to adhere to the data explorer's motto of visualize visualize visualize plot predictions and predictions are going to be set to [Music] equals y preds and let's have a look how good do these look oh no oh we got another error type error can't convert cuda device type tensor to numpy oh of course look what we've done so our plot predictions function if we go back up where did we define that what does our plot predictions function use it uses matplotlib of course and matplotlib works with numpy not pi torch and numpy is cpu based so of course we're running into another error down here because we just said that our predictions are on the cuda device they're not on the cpu they're on a gpu so it's giving us this helpful information here use tensor.cpu to copy the tensor to host memory first so this is our tensor let's call dot cpu and see what happens then is that going to go to cpu beautiful oh my goodness look at that look at that go the linear layer the red dots the predictions are basically on top of the testing data that is very exciting now again you may not get the exact same numbers here and that is perfectly fine but the direction should be quite similar so your red dots should be basically on top of the green dots if not very slightly off but that's okay that's okay we just want to focus on the direction here so thanks to the power of back propagation here and gradient descent our model's random parameters have updated themselves to be as close as possible to the ideal parameters and now the predictions are looking pretty darn good for what we're trying to predict but we're not finished there we've just finished training this model what would happen if our notebook disconnected right now well that wouldn't be ideal would it so in the next part we're going to move on to 6.5 saving and loading a trained model so i'm going to give you a challenge here as well is to go ahead and go back and refer to this this code here saving model in pi torch loading a pi torch model and see if you can save model one the state dictionary of model one and load it back in and get something similar to this give that a shot and i'll see you in the next video welcome back in the last video we saw the power of the torch dot nn dot linear layer and back propagation and gradient descent and we've got some pretty darn good predictions out of our model so that's very exciting congratulations you've now trained two machine learning models but it's not over yet we've got to save and load our trained model so i used to do the challenge in the last video to try and save and load the model yourself i hope you gave that a go but we're going to do that together in this video so we're going to start by importing path because we would like a file path to save our model to and the first step we're going to do is create models directory we don't have to recreate this because i believe we already have one but i'm going to put the code here just for completeness and this is just so if you didn't have a models directory this would create one so model path is going to go to path models and then we'd like to model path.makeder we're going to call makeder for make directory we'll set parents equal to true and if it exists okay that'll also be true so we won't get an error oh my gosh google collab i didn't want that we won't get an error if it already exists and two we're going to create a model save path so if you recall that pi torch objects in general have the extension of what this is a little pop quiz before we get to the end of this sentence so this is going to be pi torch workflow for this module that we're going through this one here chapter one pi torch workflow model one and they usually have the extension.pt for pine torch or pth for pytorch as well i like pth but just remember sometimes you might come across slightly different versions of that pt or pth and we're going to create the model save name or the save path it's probably a better way to do it is going to be model path and then we can use because we're using the path lib module from python we can save it under model name and so if we look at this what do we get model save path we should get oh path is not defined oh too many capitals here daniel the reason why i'm doing these in capitals is because oftentimes hyper parameters such as epochs in machine learning are set as hyper parameters lr could be learning rate and then you could have as well model name equals yeah yeah but that's just a little bit of nomenclature trivia for later on and model save path we've done that now we're going to save the model state dictionary rather than the whole model save the model state dict which you will find the pros and cons of in where in the pine touch documentation for saving and loading a model which was a little bit of extra curriculum for a previous video but let's have a look at our model save path we'll print it out and we'll go torch save we'll set the object that we're trying to save to equal model one dot state dict which is going to contain our trained model parameters we can inspect what's going on in here state dict that'll show us our model parameters remember because we're only using a single linear layer we only have two parameters but in practice when you use a model with maybe hundreds of layers or tens of millions of parameters viewing the state dict explicitly like we are now might not be too viable of an option but the principle still remains a state dict contains all of the models trained or associated parameters and what state they're in and the file path we're going to use is of course the model save path which we've seen here is a posix path let's save our model wonderful saving model to this file path here and if we have a look at our folder we should have two saved models now beautiful two saved models this one for us from the workflow we did before up here saving a model in pi torch loading a pi touch model and now the one we've got of course model one is the one that we've just saved beautiful so now let's load a model we're going to do both of these in one video load a pie touch model you know what because we've had a little bit of practice so far and we're going to pick up the pace so let's go loaded let's call it we'll create a new instance of loaded model 1 which is of course our linear regression model v2 which is the version 2 of our linear regression model class which subclasses what subclasses nn.module so if we go back here up here to where we created it so linear regression model v2 uses a linear layout rather than the previous iteration of linear regression model which we created right up here if we go up to here which explicitly defined the parameters and then implemented a linear regression formula in the forward method the difference between what we've got now is we use pi torch's pre-built linear layer and then we call that linear layer in the forward method which is probably the far more popular way of building pi torch models is stacking together pre-built nn layers and then calling them in some way in the forward method so let's load it in so we'll create a new instance of linear regression model v2 and now what do we do we've created a new instance i'm just going to get out of this make some space for us we want to load the model state dig the saved model 1 state d which is the state dict that we just saved beforehand so we can do this by going loaded model 1 calling the load state dict method and then passing it torch dot load and then the file path of where we saved that high torch object before but the reason why we use the path lib is so that we can just call model save path in here wonderful and then let's check out what's going on or actually we need to put the target model or the loaded model to the device the reason being is because we're doing all of our computing with device agnostic code so let's send it to the device and i think that'll be about it let's see if this works oh there we go linear regression model v2 in features one out features one bias equals true wonderful let's check those parameters hey next loaded model one dot parameters are they on the right device let's have a look beautiful and let's just check the loaded state dictionary of loaded model one do we have the same values as we had previously yes we do okay so to conclusively make sure what's going on let's evaluate the loaded model evaluate loaded model loaded model one what do we do for making predictions or what do we do to evaluate we call dot eval and then if we're going to make some predictions we use torch inference mode with torch inference mode and then let's create loaded model one threads equals loaded model one and we'll pass it the test data and now let's check for equality between y threads which is our previous model one preds that we made up here why preds and we're going to compare them to the fresh loaded model one preds and should they be the same yes they are beautiful and we can see that they're both on the device cuda how amazing is that so i want to give you a big congratulations because you've come such a long way we've gone through the entire pie torch workflow from making data preparing and loading it to building a model all of the steps that come in building a model there's a whole bunch there making predictions training a model we spent a lot of time going through the training steps but trust me it's worth it because we're going to be using these exact steps all throughout the course and in fact you're going to be using these exact steps when you build pi torch models after this course and then we looked at how to save a model so we don't lose all our work we looked at loading a model and then we put it all together using the exact same problem but in far less time and as you'll see later on we can actually make this even quicker by functionizing some of the code we've already written but i'm going to save that for later i'll see you in the next video where i'm just going to show you where you can find some exercises and all of the extra curriculum i've been talking about throughout this section 01 pie torch workflow i'll see you there welcome back in the last video we finished up putting things together by saving and loading our trained model which is super exciting because let's come to the end of the pie torch workflow section so now this section is going to be exercises and extra curriculum or better yet where you can find them so i'm going to turn this into markdown and i'm going to write here for exercises and extracurriculum refer to so within the book version of the course materials which is at learn pie torch dot io we're in the 01 section pi torch workflow fundamentals there'll be more here by the time you watch this video likely and then if we go down here at the end of each of these sections we've got the table of contents over here we've got exercises and extra curriculum i listed a bunch of things throughout this series of 01 videos like what's gradient descent and what's back propagation so i've got plenty of resources to learn more on that there's the loading and saving pi torch documentation there's the pytorch cheat sheet there's a great article by jeremy howard for a deeper understanding of what's going on in torch.nn and there's of course the unofficial pie torch optimization lube song by yours truly which is a bit of fun and here's some exercises so the exercises here are all based on the code that we wrote throughout section one so there's nothing in the exercises that we haven't exactly covered and if so i'll be sure to put a note in the exercise itself but we've got create a straight line data set using the linear regression formula and then build a model by subclassing an end.module so for these exercises there's an exercise notebook template which is of course linked here and in the pytorch deep learning github if we go into here and then if we go into extras and if we go into exercises you'll find all of these templates here they're numbered by the same section that we're in this is pie torch workflow exercises so if you wanted to complete these exercises you could click this notebook here open in google colab i'll just wait for this to load there we go and you can start to write some code here you could save a copy of this in your own google drive and go through this it's got some notes here on what you should be doing you can of course refer to the text-based version of them they're all here and then if you want an example of what some solutions look like now please i can't stress enough that i would highly highly recommend trying the exercises yourself you can use the book that we've got here this is just all the code from the videos you can use this you can use i've got so many notebooks here now you can use all of the code that we've written here to try and complete the exercises but please give them a go yourself and then if you go back into the extras folder you'll also find solutions and this is just one example solutions for section zero one but i'm going to get out of that so you can't cheat and look at the solutions first but there's a whole bunch of extra resources all contained within the pine touch deep learning repo extras exercises solutions and they're also in the book version of the course so i'm just going to link this in here i'm going to put this right at the bottom here wonderful but that is it that is the end of the section o1 pi torch workflow so exciting we went through basically all of the steps in a pi torch workflow getting data ready turning into tensors build or pick a model picking a loss function on optimizer we built a training loop we fit the model to the data we made a prediction we evaluated our model we improved through experimentation by training for more epochs we'll do more of this later on and we saved and reload our trained model but that's going to finish one i will see you in the next section friends welcome back we've got another very exciting module you ready neural network classification with pytorch now combining this module once we get to the end with the last one which was regression so remember classification is predicting a thing but we're going to see this in a second and regression is predicting a number once we've covered this we've covered two of the the biggest problems in machine learning predicting a number or predicting a thing so let's start off with before we get into any ideas or code where can you get help first things first is follow along with the code if you can if in doubt run the code try it for yourself write the code i can't stress how important this is if you're still stuck press shift command and space to read the docs string of any of the functions that we're running if you are on windows it might be control i'm on a mac so i put command here if you're still stuck search for your problem if an error comes up just copy and paste that into google that's what i do you might come across resources like stack overflow or of course the pi torch documentation we'll be referring to this a lot again throughout this section and then finally oh wait if you're still stuck try again even down run the code and then finally if you're still stuck don't forget you can ask a question the best place to do so will be on the course github which will be at the discussions page which is linked here if we load this up there's nothing here yet because as i record these videos the course hasn't launched yet but press new discussion talk about what you've got problem with xyz leave a video number here and a timestamp and that way we'll be able to help you out as best as possible so video number timestamp and then your question here and you can select q a finally don't forget that this notebook that we're about to go through is based on chapter two of the zero to mastery learn python for deep learning which is neural network classification with pytorch all of the text based code that we're about to write is here that was a little spoiler and don't forget this is the home page so my github repo pytorch deep learning for all of the course materials everything you need will be here so that's very important how can you get help but this is the number one follow along with the code and try to write it yourself well with that being said when talking about classification what is a classification problem now as i said classification is one of the main problems of machine learning so you probably already deal with classification problems or machine learning powered classification problems every day so let's have a look at some examples is this email spam or not spam did you check your emails this morning or last night or whenever so chances are that there was some sort of machine learning model behind the scenes that may have been a neural network it may have not that decided that some of your emails won't spam so to daniel at mrdeburg.com hey daniel this steep learning course is incredible i can't wait to use what i've learned oh that's such a nice message if you want to send that email directly to me you can that's my actual email address but if you want to send me this email well hopefully my email which is hosted by some email service detects this as spam because although that is a lot of money and it would be very nice i think if someone can't spell too well are they really going to pay me this much money so thank you email provider for classifying this as spam and now because this is one thing or another not spam or spam this is binary classification so in this case it might be one here and this is a zero so or zero or one so one thing or another that's binary classification if you can split it into one thing or another binary classification and then we have an example of say we had the question we asked our photos app on our smartphone or whatever device you're using is this photo of sushi steak or pizza we wanted to search our photos for every time we've eaten sushi or every time we've eaten steak or every time we've eaten pizza far out and this looks delicious but this is multi-class classification now why is this because we've got more than two things we've got one two three and now this could be ten different foods it could be a hundred different foods it could be a thousand different categories so the imagenet data set which is a popular data set for computer vision imagenet we go to here does it say a thousand anywhere 1k or a thousand no it doesn't but if we go imagenet 1k download imagenet data maybe it's here it won't say it but you'll just oh there we go a thousand object classes so this is multi-class classification because it has a thousand classes that's a lot right so that's multi-class classification more than one thing or another and finally we might have multi-label classification which is what tags should this article have when i first got into machine learning i got these two mixed up a whole bunch of times multi-class classification has multiple classes such as sushi steak pizza but assigns one label to each so this photo would be sushi in an ideal world this is steak and this is pizza so one label to each whereas multi-label classification means you could have multiple different classes but each of your target samples such as this wikipedia article what tags should this article have it may have more than one label it might have three labels it might have 10 labels in fact what if we went to the wikipedia page for deep learning wikipedia and does it have any labels oh there we go where was that i mean you can try this yourself this is just the wikipedia page for deep learning there is a lot there we go categories deep learning artificial neural networks artificial intelligence and emerging technologies so that is an example if we wanted to build a machine learning model to say read all of the text in here and then go tell me what are the most relevant categories to this article it might come up with something like these in this case because it has one two three four it has multiple labels rather than just one label of deep learning it could be multi-label classification so we'll go back but there's a few more these will get you quite far in the world of classification and so let's dig a little deeper on binary versus multi-class classification you may have already experienced this so in my case if i search on my phone in the photos app for photos of a dog it might come here if i search for photos of a cat it might come up with this but if i wanted to train an algorithm to detect the difference between photos of these are my two dogs aren't they cute they're nice and tired and they're sleeping like a person this is seven number seven that's that's her name and this is bella this is a cat that me and my partner rescued and so i'm not sure of what this cat's name is actually so i'd love to give it a name but i can't so binary classification if we wanted to build an algorithm we wanted to feed it say 10 000 photos of dogs and 10 000 photos of cats and then we wanted to find a random image on the internet and pass it through to our model and say hey is this a dog or is this a cat it would be binary classification because the options are one thing or another dog or cat but then for multi-class classification let's say we've been working on a farm and we've been taking some photos of chickens because they're groovy right well we updated our model and added some chicken photos in there we would now be working with a multi-class classification problem because we've got more than one thing or another so let's jump in to what we're going to cover this is broadly by the way because this is just text on a page you know i like to just write code of what we're actually doing so we're going to look at the architecture of a neural network classification model we're going to check what the input shapes and output shapes of a classification model are features and labels in other words because remember machine learning models neural networks love to have numerical inputs and those numerical inputs often come in tensors tensors have different shapes depending on what data you're working with we're going to see all of this in code creating custom data to view fit and predict on we're going to go back through our steps in modeling we covered this a fair bit in the previous section but creating a model for neural network classification it's a little bit different to what we've done but not too outlandishly different we're going to see how we can set up a loss function and an optimizer for a classification model we'll recreate a training loop and evaluating loop or a testing loop we'll see how we can save and load our models we'll harness the power of non-linearity well what does that even mean well if you think of what a linear line is what is that it's a straight line so you might be able to guess what a non-linear line looks like and then we'll look at different classification evaluation methods so ways that we can evaluate our classification models and how are we going to do all this well of course we're going to be part cook park chemist part artist part science but for me i personally prefer the cook side of things because we're going to be cooking up lots of code so in the next video before we get into coding let's do a little bit more on what are some classification inputs and outputs i'll see you there welcome back in the last video we had a little bit of a brief overview of what a classification problem is but now let's start to get more hands-on by discussing what the actual inputs to a classification problem look like and the outputs look like so let's say we had our beautiful food photos from before and we were trying to build this app here called maybe food vision to understand what foods are in the photos that we take and so what might this look like well let's break it down to inputs some kind of machine learning algorithm and then outputs in this case the inputs we want to numerically represent these images in some way shape or form then we want to build a machine learning algorithm hey one might actually exist we're going to see this later on in the transfer learning section for our problem and then we want some sort of outputs and in the case of food vision we want to know okay this is a photo of sushi and this is a photo of steak and this is a photo of pizza you could get more hands-on and technical and complicated but we're just going to stick with single label multi-class classification so it could be a sushi photo it could be a steak photo or it could be a pizza photo so how might we numerically represent these photos well let's just say we had a function in our app that every photo that gets taken automatically gets resized into a square into two to four width and two to four height this is actually quite a common dimensionality for computer vision problems and so we've got the width dimension we've got the height and then we've got this c here which isn't immediately recognizable but in the case of pictures they often get represented by width height color channels and the color channels is red green and blue which is each pixel in this image has some value of red green or blue that makes whatever color is displayed here and this is one way that we can numerically represent an image by taking its width its height and color channels and whatever number makes up this particular image we're going to see this later on when we work with computer vision problems so we create a numerical encoding which is the pixel values here then we input the pixel values of each of these images into a machine learning algorithm which is often already exist and if it doesn't exist for our particular problem hey well we're learning the skills to build them now we could use pytorch to build a machine learning algorithm for this and then outputs what might these look like well in this case these are prediction probabilities which the outputs of machine learning models are never actually discrete which means it is definitely pizza it will give some sort of probability value between 0 and 1 for say the closer to one the more confident our model is that it's going to be pizza and the the closer to zero is means that hey this photo of pizza let's say this one and we're trying to predict sushi well it doesn't think that it's sushi so it's giving it quite a low value here and then the same for steak but it's really high the value here for pizza we're gonna see this hands on and then it's the opposite here so it might have got this one wrong but with more training and more data we could probably improve this prediction that's the whole idea of machine learning is that if you adjust the algorithm if you adjust the data you can improve your predictions and so the ideal outputs that we have here this is what our model is going to output but for our case of building out food vision we want to bring them back to so we could just put all of these numbers on the screen here but that's not really going to help people we want to put out labels of what's going on here so we can write code to transfer these prediction probabilities into these labels too and so how did these labels come about how do these predictions come about well it comes from looking at lots of different samples so this loop we could keep going improve these find the ones where it's wrong add more images here train the model again and then make our app better and so if we want to look at this from a shape perspective we want to create some tenses for an image classification example so we're building food vision we've got an image again this is just reiterating on some of the things that we've discussed we've got a width of 2 2 4 and a height of 2 to 4. this could be different this could be 300 300 this could be whatever values that you decide to use then we numerically encode it in some way shape or form we use this as the inputs to our machine learning algorithm because what computers and machine learning algorithms they love numbers they can find patterns in here that we couldn't necessarily find or maybe we could if you had a long enough time but i'd rather write an algorithm to do it for me then it has some outputs which comes in the form of prediction probabilities the closer to one the more confident the model is and saying hey this is i'm pretty damn confident that this is a photo of sushi i don't think it's a photo of steak so i'm giving that zero it might be a photo of pizza but i don't really think so so i'm giving it quite a low prediction probability and so if we have a look at what the shapes are for our tenses here if this doesn't make sense don't worry we're going to see the code to do all of this later on but for now we're just focusing on a classification input and output the big takeaway from here is numerical encoding outputs and numerical encoding but we want to change these numerical codings from the outputs to something that we understand say the word sushi but this tensor may be batch size we haven't seen what batch size is that's all right we're going to cover it color channels width height so this is represented as a tensor of dimensions it could be none here none is a typical value for a batch size which means it's blank so when we use our model and we train it all the code that we write with pi torch will fill in this behind the scenes and then we have three here which is color channels then we have 224 which is the width and we have 224 as well which is the height now there is some debate in the field on the ordering we're using an image as our particular example here on the ordering of these shapes so say for example you might have height with color channels typically width and height come together in this order or they're just side by side in the tensor in terms of their where their dimension appears but color channel sometimes comes first that means after the batch size or at the end here but pytorch the default for now is color channel's width height though you can write code to change this order because tensors are quite flexible and so or the shape could be 32 for the batch size 3 2 2 two two four because 32 is a very common batch size and you don't believe me well let's go here yarn lacoon 32 batch size now what is a batch size great tweet just keep this in mind for later on training with large mini batches is bad for your health more importantly it's bad for your test error friends don't let friends use mini batches larger than 32. so this is quite an old tweet however it still stands quite true because like today it's 2022 when i'm recording these videos there are batch sizes a lot larger than 32 but 32 works pretty darn well for a lot of problems and so this means that if we go back to our slide that if we use a batch size of 32 our machine learning algorithm looks at 32 images at a time now why does it do this well because sadly our computers don't have infinite compute power in an ideal world we look at thousands of images at a time but it turns out that using a multiple of eight here is actually quite efficient and so if we have a look at the output shape here why is it three well because we're working with three different classes one two three so we've got shape equals three now of course as you could imagine these might change depending on the problem you're working with so say if we just wanted to predict if a photo was a cat or a dog we still might have this same representation here because this is the image representation however the shape here may be two or will be two because it's cat or dog rather than three classes here but a little bit confusing as well with binary classification you could have the shape just being one here but we're going to see this all hands on just remember the shapes vary with whatever problem you're working on the principle of encoding your data as a numerical representation stays the same for the inputs and the outputs will often be some form of prediction probability based on whatever class you're working with so in the next video right before we get into coding let's just discuss the high level architecture of a classification model and remember architecture is just like the schematic of what a neural network is i'll see you there welcome back in the last video we saw some example classification inputs and outputs the main takeaway that the inputs to a classification model particularly a neural network want to be some form of numerical representation and the outputs are often some form of prediction probability so let's discuss the typical architecture of a classification model and hey this is just going to be text on a page but we're going to be building a fair few of these so we've got some hyper parameters over here we've got binary classification and we've got multi-class classification now there are some similarities between the two in terms of what problem we're working with but there also are some differences here and by the way this has all come from if we go to the book version of the course we've got what is a classification problem and we've got architecture of a classification neural network so all of this text is available at learnpytorch.io and in section two so we come back so the input layer shape which is typically decided by the parameter in features as you can see here is the same of number of features so if we were working on a problem such as we promised to predict whether someone had heart disease or not we might have five input features such as one for age a number for age it might be in my case 28 sex could be male height 180 centimeters if i've been growing overnight it's really close to 177 weight well it depends on how much i've eaten but it's around about 75 kilos and smoking status which is zero so it could be zero or one because remember we want numerical representation so for sex it could be zero for males one for female height could be its number weight could be its number as well all of these numbers could be more could be less as well so this is really flexible and it's a hyper parameter why because we decide the values for each of these so in the case of our image prediction problem we could have in features equals three for number of color channels and then we go hidden layers so there's the blue circle here i forgot that this was all timed and colorful but let's just discuss hidden layers each of these is a layer n.linear an n.linear and n.relu and then dot linear so that's the kind of uh the syntax you'll see in pi torch for a layer is nn.something now there are many different types of layers in this in pytorch if we go torch and then basically everything in here is a layer in a neural network and then if we look up what a neural network looks like neural network recall that all of these are different layers of some kind of mathematical operation input layer hidden layer you could have as many hidden layers as you want do we have resnet architecture the resnet architecture some of them have 50 layers look at this each one of these is a layer and this is only the 34 layer version i mean there's resnet 152 which is 152 layers we're not at that yet but we're working up the tools to get to that stage let's come back to here the neurons per hidden layer so we've got these out features the green circle the green square now this is if we go back to our neural network picture this is these each one of these little things is a neuron some sort of parameter so if we had a hundred what would that look like well we'd have a fairly big graphic so this is why i like to teach with code because you could customize this as flexible as you want so behind the scenes pytorch is going to create a hundred of these little circles for us and within each circle is what some sort of mathematical operation so if we come back what do we got next output layer shape so this is how many output features we have so in the case of binary classification is one one class or the other we're going to see this later on multi-class classification is you might have three output features one per class eg one for food person or dog if you're building a food person or dog image classification model hidden layer activation which is we haven't seen these yet relu which is a rectified linear unit but can be many others because pi torch of course has what has a lot of non-linear activations we're going to see this later on remember i'm kind of planting the seed here we've seen what a linear line is but i want you to imagine what a non-linear line is it's going to be a bit of a superpower for our classification problem what else do we have output activation we haven't got that here but we'll also see this later on which could be sigmoid form which is generally sigmoid for binary classification but softmax for multi-class classification a lot of these things are just names on a page we haven't seen them yet i like to teach them as we see them but this is just a general overview of what we're going to cover loss function what loss function or what does a loss function do it measures how wrong our model's predictions are compared to what the ideal predictions are so for binary classification we might use binary cross-entropy loss in pi torch and for multi-class classification we might just use cross-entropy rather than binary class entropy get it binary classification binary cross entropy and then optimize that sgd is stochastic gradient descent we've seen that one before another common option is the atom optimizer and of course the torch.optim package has plenty more options so this is an example multi-class classification problem this network here why is that and we haven't actually seen nn.sequential but as you could imagine sequential stands for it just goes through each of these steps so multi-class classification because it has three output features more than one thing or another so three for food person or dog but going back to our food vision problem we could have the input as sushi steak or pizza so we've got three output features which would be one prediction probability per class of image we have three classes sushi steak or pizza now i think we've done enough talking here and enough just pointing to text on slides how about in the next video let's code i'll see you in google collab welcome back now we've done enough theory of what a classification problem is what the inputs and outputs are and the typical architecture let's get in and write some code so i'm going to get out of this and i'm going to go to collab colab.research.google.com so we can start writing some pytorch code i'm going to click new notebook we're going to start exactly from scratch i'm going to name this section 2 and let's call it neural network classification with pytorch i'm going to put underscore video because i'll just show you you'll see this in the github repo but for all the video notebooks the ones that i write code during these videos that you're watching the exact code is going to be saved on the github repo under video notebooks so there's zero zero which is the fundamentals and there's the workflow underscore video but the reference notebook with all the pretty pictures and stuff is in the main folder here so pytorch classification that iphone b or actually maybe we'll just rename it that pi torch classification but we know it's with neural networks high torch classification okay and let's go here we'll add a nice title so o2 neural network classification with pi torch and so we'll remind ourselves classification is a problem of predicting whether something is one thing or another and there can be multiple things as the options such as email spam or not spam photos of dogs or cats or pizza or sushi or steak lots of talk about food and then i'm just going to link in here this resource because this is the book version of the course this is what the videos are based off so book version of this notebook and then all the resources are in here all other resources in the github and then stuck ask a question here which is under the discussions tab and copy that in here that way we've got everything linked and ready to go but as always what's our first step in our workflow this is a little test see if you remember well it's data of course because all machine learning problems start with some form of data we can't write a machine learning algorithm to learn patterns in data that doesn't exist so let's do this video we're going to make some data of course you might start with some of your own that exists but for now we're going to focus on just the concepts around the workflow so we're going to make our own custom data set and to do so i'll write the code first and then i'll show you where i get it from we're going to import the scikit-learn library one of the beautiful things about google colab is that it has scikit-learn available if you're not sure what scikit-learn is it's a very popular machine learning library pytorch is mainly focused on deep learning but psychic learn is focused on a lot of things around machine learning so google colab thank you for having scikit learn already installed for us but we're going to import the make circles data set and rather than talk about what it does let's see what it does so make a thousand samples we're going to go n samples equals 1000 and we're going to create circles you might be wondering why circles well we're going to see exactly why circles later on so x and y we're going to use this variable how would you say nomenclature as capital x and y why is that because x is typically a matrix features and labels so let's go here mate circles and we're going to make n samples so a thousand different samples we're going to add some noise in there just for a little bit of randomness why not you can increase this as you want i've found that 0.03 is fairly good for what we're doing and then we're going to also pass in the random state variable which is equivalent to sitting at random or setting a random seed so we're flavoring the randomness here wonderful so now let's have a look at the length of x which should be what length of y oh we don't have y underscore getting a bit trigger happy with this keyboard here a thousand so we have a thousand samples of x co with a thousand or paired with a thousand samples of y features labels so let's have a look at the first five of x so print first five samples of x and then we'll put in here x and we can index on this five because we're adhering to the data explorer's motto of visualize visualize visualize first five samples of y and then we're going to go y same thing here wonderful let's have a look maybe we'll get a new line in here just so looks a bit better wonderful so numerical now samples are already numerical this is one of the reasons why we're creating our own data set we'll see later on how we get non-numerical data into numbers but for now our data is numerical which means we can learn it with our model or we can build a model to learn patterns in here so this sample has the label of one and this sample has the label of one as well now how many features do we have per sample if i highlight this line how many features is this it would make it a bit easier if there was a comma here but we have two features of x which relates to one label of y and so far we've only seen let's have a look at all of y we've got zero and one so we've got two classes what does this mean zero or one one thing or another well it looks like binary classification to me because we've got only zero or only one if there was zero one two it would be multi-class classification because we have more than two things so let's x out of this let's keep going and do a little bit more data exploration so how about we make a data frame with pandas of circle data there is truly no real definite way of how to explore data for me i like to visualize it multiple different ways or even look at random samples in the case of large data sets such as images or text or whatnot if you have 10 million samples perhaps visualizing them one by one is not the best way to do so so random can help you out there so we're going to create a data frame and we can insert a dictionary here so i'm going to call the features in this part of x x1 and these are going to be x2 so let's write some code to index on this so everything in the zeroth index will be x1 and everything in the first index there we go will be x2 let me clean up this code this should be on different lines enter and then we've got let's put in the label as y so this is just a dictionary here so x1 key to x0 x2 little bit confusing because of zero indexing but x feature one x feature two and the label is y let's see what this looks like we'll look at the first ten samples okay beautiful so we've got x1 some numerical value x2 another numerical value correlates to or matches up with label zero but then this one zero four four two two oh eight and negative that number matches up with label zero so i can't tell what the patterns are just looking at these numbers you might be able to but i definitely can't we've got some ones all these numbers look the same to me so what can we do next well how about we visualize visualize visualize and instead of just numbers in a table let's get graphical this time visualize visualize visualize so we're going to bring in our friendly matplotlib import matplotlib which is a very powerful plotting library i'm just going to add some cells here so we've got some space matplotlib dot pi plot as plt that's right we've got this plot.scatter we're going to do a scatter plot equals x and we want the first index and then y is going to be x as well so that's going to appear on the y axis and then we want to color it with labels we're going to see what this looks like in a second and then the color map cmap stands for color map is going to be plot dot color map plt and then red yellow blue one of my favorite color outputs so let's see what this looks like you ready ah there we go there's our circles that's a lot better for me so what do you think we're going to try and do here if this is our data and we're working on classification we're trying to predict if something is one thing or another so our problem is we want to try and separate these two circles so say given a number here or given two numbers an x one and an x two which are coordinates here we want to predict the label is it going to be a blue dot or is it going to be a red dot so we're working with binary classification so we have one thing or another do we have a blue dot or a red dot so this is going to be our toy data here and a toy problem is let me just write this down this is a common thing that you'll also hear in machine learning note the data we're working with is often referred to as a toy data set a data set that is small enough to experiment on but still sizeable enough to practice the fundamentals and that's what we're really after in this notebook is to practice the fundamentals of neural network classification so we've got a perfect data set to do this and by the way we got this from scikit-learn so this little function here made all of these samples for us and how could you find out more about this function here well you could go scikit-learn classification data sets there are actually a few more in here that we could have done i just like the circle one toy data sets we saw that so this is like a toy box of different data sets so if you'd like to learn more about some data sets that you can have a look in here and potentially practice on with neural networks or other forms of machine learning models from scikit-learn check out this scikit-learn i can't speak highly enough i know this is a pie torch course we're not focused on this but they kind of all come together in terms of the machine learning and deep learning world you might use something from scikit-learn like we've done here to practice something and then you might use pi torch for something else like what we're doing here now with that being said what are the input and output shapes of our problem have a think about that and also have a think about how we'd split this into training and test so give those a go we covered those concepts in some previous videos but we'll do them together in the next video i'll see you there welcome back in the last video we made some classification data so that we can practice building a neural network in pytorch to separate the blue dots from the red dots so let's keep pushing forward on that and i'll just clean up here a little bit but where are we in our workflow what have we done so far well we've got our data ready a little bit we haven't turned it into tenses so let's do that in this video and then we'll keep pushing through all of these so in here i'm going to make this heading 1.1 check input and output shapes the reason we're focused a lot on input and output shapes is why because machine learning deals a lot with numerical representations as tensors and input and output shapes are some of the most common errors like if you have a mismatch between your input and output shapes of a certain layer of an output layer you're going to run into a lot of errors there so that's why it's good to get acquainted with whatever data you're using what are the input shapes and what are the output shapes you'd like so in our case we can go x dot shape and y dot shape so we're working with numpy arrays here if we just look at x that's what the make circles function has created for us we've got an array but as our workflow says we'd like it in tensors if we're working with pi torch we want our data to be represented as pi torch tensors of that data type and so we've got a shape here we've got a thousand samples and x has two features and y has no features it's just a single number it's a scalar so it doesn't have a shape here so there's a thousand samples of y a thousand samples of x two samples of x equals one y label now if you're working with a larger problem you might have a thousand samples of x but x is represented by 128 different numbers or 200 numbers or as high as you want or just 10 or something like that so just keep in mind that this number is quite flexible of how many features represent a label why is the label here let's keep going so view the first example of features and labels so let's make it explicit with what we've just been discussing we'll write some code to do so we'll get the first sample of x which is the zeroth index and we'll get the first sample of y which is also the zeroth index we could get really any one because they're all of the same shape but print values for 1 sample of x what does this equal x sample and the same for y which is y sample and then we want to go print f string for one sample of x and we'll get the shape here x sample dot shape and the same for y and then we'll get y sample dot shape beautiful what's this going to do well we've got one sample of x so this sample here of these numbers we've got a lot going on here seven five four two four six two five and zero two three one four eight zero seven four i mean you can try to find some patterns in those if you do all the best here and the same for y so this is we have the y sample this correlates to a number one a label of one and then we have shapes for one sample of x which is two so we have two features four y it's a little bit confusing here because y is a scalar which doesn't actually have a shape it's just one value so for me in terms of speaking this teaching it out loud will be two features of x trying to predict one number for y and so let's now create another heading which is 1.2 let's get our data into tensors turn data into tensors we have to convert them from numpy and we also want to create train and test splits now even though we're working with the toy data set here the principle of turning data into tensors and creating training test splits will stay around for almost any data set that you're working with so let's see how we can do that so we want to turn data into tensors and for this we need to import torch we'll get pytorch and we'll check the torch version it has to be at least 1.10 and i might just put this down in the next cell just make sure we can import pie torch there we go 1.10 plus cuda 111 if your version is higher than that that is okay the code below should still work and if it doesn't let me know so x equals torch dot from numpy why are we doing this well it's because x is a numpy array and if we go x dot does it have a d type attribute float64 can we just go type or maybe type oh there we go numpy nd array can we just go type x oh numpy nd array so we want it in a torch tensor so we're going to go from numpy we saw this in the fundamental section and then we're going to change it into type dot float float is an alias for float32 we could type the same thing these two are equivalent i'm just going to type torch float for writing less code and then we're going to go the same with y torch from numpy now why do we turn it into a torch float well that's because if you recall the default type of numpy arrays is if we go might just put out this and a comma x dot d type is float64 there we go however pi torch the default type is float32 so we're changing it into pi torch's default type otherwise if we didn't have this little section of code here dot type torch.float our tenses would be of float 64 as well and that may cause errors later on so we're just going for the default data type within pi torch and so now let's have a look at the first five values of x and the first five values of y what do we have beautiful we have tensor data types here and now if we check the data type of x and we check the data type of y what do we have and then one more i'm just going to type x so we have our data into tensors wonderful but now so it's torch.tensor beautiful but now we would like training and test sets so let's go split data into training and test sets and a very very popular way to split data is a random split so before i issued the challenge of how you would split this into a training and test set so because these data points are kind of scattered all over the place we could split them randomly so let's see what that looks like to do so i'm going to use our faithful scikit loan again remember how i said scikit-learn has a lot of beautiful methods and functions for a whole bunch of different machine learning purposes well one of them is for a trained test split oh my goodness pie torch i didn't want autocorrect there train test split now you might be able to guess what this does these videos are going to be a battle between me and collab's auto correct sometimes it's good other times it's not so we're going to set this code up i'm going to write it or we're going to write it together so we've got x train for our training features and x tests for our testing features and then we also want our training labels and our testing labels that order is the order that train test split works in and then we have train test split now if we wrote this function and we wanted to find out more i can press command shift space which is what i just did to have this but truly i don't have a great time reading all of this you might but for me i just like going train test split and possibly one of the first functions that appears yes it's psychic learn how good is that so scikit learn.model selection.train test split now split arrays or matrices into random train and test subsets beautiful we've got a code example of what's going on here you can read what the different parameters do but we're going to see them in action this is just another example of where machine learning libraries such as scikit-learn we've used matplotlib we've used pandas they all interact together to serve a great purpose but now let's pass in our features and our labels this is the order that they come in by the way oh and we have the returns splitting so the order here i've got the order it goes x train x test y train y test took me a little while to remember this order but once you've created enough train and test splits with this function you kind of know this off by hard so just remember features first train first and then labels let me jump back in here so i'm going to put in the test size parameter of 0.2 this is percentage-wise so let me just write here 0.2 equals 20 of data will be test and 80 percent will be trained if we wanted to do a 50 50 split that kind of split doesn't usually happen but you could go 0.5 but the test size says hey how big and percentage-wise do you want your test data to be and so behind the scenes train test split will calculate what's 20 percent of our x and y samples so we'll see how many there is in a second but let's also put a random state in here because if you recall back in the documentation train test split splits data randomly into random train and test subsets and random state what does that do for us well this is a random seed equivalent of it's very similar to torch dot manual seed however because we are using scikit-learn setting torch.manual seed will only affect pi torch code rather than scikit learn code so we do this so that we get similar random splits as in i get a similar random split to what your random split is and in fact they should be exactly the same so let's run this and then we'll check the length of x train and length of x test so if we have a thousand total samples and i know that because above in our make circles function we said we want a thousand samples that could be ten thousand that could be a hundred that's the beauty of creating your own data set and we have length y train if we have twenty percent testing values how many samples are going to be dedicated to the test sample twenty percent of one thousand is two hundred and eighty percent which is because training is going to be training here so 100 minus 20 is 80 percent so 80 percent of a thousand years let's find out run oh beautiful so we have 800 training samples 200 testing samples this is going to be the data set that we're going to be working with so in the next video we've now got training and test sets we've started to move through our beautiful pi torch workflow here we've got our data ready we've turned it into tensors we've created a training and test split now it's time to build or pick a model so i think we're still in the building phase let's do that in the next video welcome back in the last video we split our data into training and test sets and because we did an 80 20 split we've got about 800 samples to train on and 200 samples to test on remember the training set is so that the model can learn patterns patterns that represent this data set here the circles data set red dots or blue dots and the test data set is so that we can evaluate those patterns and i took a little break before but you can tell that because my notebook is disconnected but if i wanted to reconnect it what could i do we can go here run time run before that's going to run all of the cells before it shouldn't take too long because we haven't done any large computations but this is good timing because we're up to part two building a model and so there's a fair few steps here but nothing that we haven't covered before we're going to break it down so let's build a model to classify our blue and red dots and to do so we want to tenses i want two not tensors that's all right so let me just make some space here there we go so number one let's set up device agnostic code so we get in the habit of creating that so our code will run on an accelerator i can't even spell accelerator doesn't matter you know what i mean gpu if there is one two what should we do next well we should construct a model because if we want to build a model we need a model construct a model and we're going to go by subclassing and then dot module now we saw this in the previous section we subclassed n.module in fact all models in pi torch subclass and n.module and let's go define loss function and optimizer and finally oh good collabs autocorrect is not ideal and then we'll create a training and test loop though this will probably be in the next section we'll focus on building a model here and of course all of these steps are in line with what they're in line with this so we don't have device agnostic code here but we're just going to do it enough so that we have a habit these are the main steps pick up builder pre-trained models suit your problem pick a loss function an optimizer build a training loop so let's have a look how can we start this off so we'll import pi torch and and then we've already done this but we're going to do it anyway for completeness just in case you wanted to run your code from here import nn and we're going to make device agnostic code so we'll set the device equal to cuda if torch dot cuda is available else cpu which will be the default the cpu is the default if there's no gpu which means that cuda is available all of our pi torch code will default to using the cpu device now we haven't set up a gpu yet so far you may have but as you see my target device is currently cpu how about we set up a gpu we can go into here run time change runtime type gpu and i'm going to click save now this is going to restart the runtime and reconnect so once it reconnects beautiful we could actually just run this code cell here this is going to set up the gpu device but because we're only running this cell if we would have just set up x train we've not been defined so because we restarted our runtime let's run all or we can just run before so this is going to rerun all of these cells here and do we have x train now let's have a look wonderful yes we do okay beautiful so we've got device agnostic code in the next video let's get on to constructing a model i'll see you there welcome back in the last video we set up some device agnostic code so this is going to come in later on when we send our model to the target device and also our data to the target device this is an important step because that way if someone else were to run your code or you were running your code in the future because we've set it up to be device agnostic by default it will run on the cpu but if there's an accelerator present well that means that it might go faster because it's using a gpu rather than just using a cpu so what up to step two here construct a model by subclassing and then module i think we're write a little bit of text here just to plan out the steps that we're doing now we've set up device agnostic code let's create a model that so we're going to break it down we've got some sub steps up here but we're going to break it down even this one down into some sub sub steps so number one is we're going to subclass and then dot module and a reminder here i want to make some space just so we're we're coding in about the middle of the page so almost all models in pytorch subclass and then module because there's some great things that it does for us behind the scenes and step two is we're going to create two nn dot linear layers and we want these that are capable to handle our data so that are capable of handling the shapes of our data step 3 we want to define or defines a forward method why do we want to define a forward method well because we're subclassing nn.module right and so the forward method defines a forward method that outlines the forward pass or forward computation of the model and number four we want to instantiate well this this doesn't really have to be the part of creating it but we're going to do anyway instantiate an instance of our model class and send it to the target device so going to be a couple of little different steps here but nothing too dramatic that we haven't really covered before so let's go number one construct a model that subclasses an end.module so i'm going to code this all out well we're going to code this all out together and then we'll go back through and discuss it and then maybe draw a few pictures or something to check out what's actually happening so circle model v1 because we're going to try and split some circles red and blue circles this is our data up here this is why it's called circle model because we're trying to separate the blue and red circle using a neural network so we've subclassed nn.module and when we create a class in python we'll create a constructor here init and then put in super dot underscore init and then inside the constructor we're going to create our layers so this is number two create two nnn linear layers capable of handling the shapes of our data so i'm going to write this down here two create two to nn dot linear layers capable of handling the shapes of our data and so if we have a look at x train what are the shapes here what's the input shape because xtrain is our features right now features are going to go into our model so we have 800 training samples this is the first number here of size 2 each so 800 of these and inside each is two numbers again depending on the data set you're working with your features may be 100 in length a vector of 100 or it may be a different size tensor altogether or there may be millions it really depends on what data set you're working with because we're working with a simple data set we're going to focus on that but the principle is still the same you need to define a neural network layer that is capable of handling your input features so we're going to make layer 1 equals nn.linear and then if we wanted to find out what's going on in nn.linear we could run shift command space on my computer because it's a mac maybe shift control space if you're on windows so we're going to define the end features what would in features be here well we just decided that our x has two features so in features is going to be two and now what is the out features this one is a little bit tricky so in our case we could have out features equal to one if we wanted to just pass a single linear layer but we want to create two linear layers here so why would out features be one well that's because if we have a look at the first sample of y train we would want us to input or maybe we'll look at the first five we want to map one sample of x to one sample of y and y has a shape of one or well really it's nothing because it's a scalar but we would still put one here so that it outputs just one number but we're going to change this up we're going to put it into five and we're going to create a second layer now this is a an important point of joining together neural networks in features here what do you think the in features of our second layer is going to be if we've produced an out feature of 5 here now this number is arbitrary we could do 128 we could do 256 generally it's multiples of 8 64. we're just doing five now because we're keeping it nice and simple we could do eight multiples of eight is because of the efficiency of computing i don't know enough about computer hardware to know exactly why that's the case but that's just a rule of thumb in machine learning so the in features here has to match up with the out features of a previous layer otherwise we'll get shape mismatch errors and so let's go here our features so we're going to treat this as the output layer so this is the out features equals one so it takes in two features and up scales to five features so five numbers so what this does what this layer is going to do is take in these two numbers of x perform an n.linear let's have a look at what equation it does and then dot linear is going to perform this function here on the inputs and it's going to upscale it to five features now why would we do that well the rule of thumb here because this is denoted as a hidden unit or how many hidden neurons there are the rule of thumb is that the more hidden features there are the more opportunity our model has to learn patterns in the data so to begin with it only has two numbers to learn patterns on but if we upscale it to five it has five numbers to learn patterns on now you might think why don't we just go straight to like 10 000 or something but there is like an upper limit here to sort of where the benefits start to trail off we're just using five because it keeps it nice and simple and then the in features of the next layer is five so that these two line up we're going to map this out visually in a moment but let's keep coding we've got in features 2 for x and now this is the output layer so takes in five features from previous layer and output a single feature and now this is same shape same shape as y so what is our next step we want to define a forward method a forward computation a forward pass so the forward method is going to define the forward computation and as an input it's going to take x which is some form of data and now here's where we can use layer 1 and layer 2. so now let's just go return or we'll put a note here of what we're doing three we're going to go define a forward method that outlines the forward pass so forward and we're going to return and here's some notation we haven't quite seen yet and then we're going to go self layer 2 and inside the brackets we'll have self layer 1 inside those brackets we're going to have x so the way this goes is x goes into layer 1 and then the output of layer 1 goes into layer 2. so whatever data we have so our training data x train goes into layer 1 performs the linear calculation here and then it goes into layer 2 and then layer 2 is going to output go to the output so x is the input layer 1 computation layer 2 output so we've done that now let's do step four which is instantiate an instance of our model class and send it to the target device so this is our model class circle model v0 we're just going to create a model because it's the first model we've created of this section let's call it model 0 and we're going to go circle model v1 and then we're going to go dot 2 and we're going to pass in device because that's our target device let's now have a look at model 0 and then oh typo yep classic what did we get wrong here oh did we not pass in self self oh there we go little typo classic but the beautiful thing about creating a class here is that we could put this into a python file such as model.pi and then we wouldn't necessarily have to rewrite this all the time we could just call it and so let's just check what device it's on so target device is cuda because we've got a gpu thank you google collab and then if we wanted to let's go next model 0 dot parameters we'll call the parameters and then we'll go device cuda beautiful so that means our models parameters are on the cuda device now we've covered enough code in here for this video so if you want to understand it a little bit more go back through it but we're going to come back in the next video and make it a little bit more visual so i'll see you there welcome back in the last video we did something very very exciting we created our first multi-layer neural network but right now this is just code on a page but truly this is what the majority of building machine learning models in pytorch is going to look like you're going to create some layers or as simple or as complex as you like and then you're going to use those layers in some form of forward computation to create the forward pass so let's make this a little bit more visual if we go over to the tensorflow playground and now tensorflow is another deep learning framework similar to pi torch it just allows you to write codes such as this to build neural networks fit them to some sort of data to find patterns and data and then use those machine learning models in your applications but let's create this oh by the way this is playground.tensorflow.org this is a neural network that we can train in the browser if we really wanted to so that's pretty darn cool but we've got a data set here which is kind of similar to the data set that we're working with if we have a look at our circles one let's just say it's close enough it's circular that's what we're after but if we increase this we've got five neurons now we've got two features here x1 and x2 where is this reminding you of what's happening there's a lot of things going on here that we haven't covered yet but don't worry too much we're just focused on this neural network here so we've got some features as the input we've got five hidden units this is exactly what's going on with the model that we just built we pass in x1 and x2 our values so if we go back to our data set these are x1 and x2 we pass those in so we've got two input features and then we pass them to a hidden layer a single hidden layer with five neurons what have we just built if we come down into here to our model we've got in features 2 out features 5 and then that feeds into another layer which has in features 5 and out features 1. so this is the exact same model that we've built here now if we just turn this back to linear activation because we're sticking with linear for now we'll have a look at different forms of activation functions later on and maybe we put the learning rate we've seen the learning rate to 0.01 we've got epochs here got classification and we're going to try and fit this neural network to this data let's see what happens oh the test loss it's sitting about halfway 0.5 so about 50 loss so if we only have two classes and we've got a loss of fifty percent what does that mean well the perfect loss was zero and the worst loss was one then we just divide one by two and get fifty percent but we've only got two classes so that means if our model was just randomly guessing it would get a loss of about 0.5 because you could just randomly guess whatever data point belongs to blue or orange in this case so in a binary classification problem if you have the same number of samples in each class in this case blue dots and orange dots randomly guessing will get you about 50 percent just like tossing a coin toss a coin 100 times and you get about 50 50. might be a little bit different but it's around about that over the long term so we've just fit for 3 000 epochs and we're still not getting any better loss hmm i wonder if that's going to be the case for our neural network and so to draw this in a different way i'm going to come to a little tool called fig jam which is just a whiteboard that we can put shapes on and it's based in the browser so this is going to be nothing fancy it's going to be a simple diagram say this is our input and i'm going to make this green because my favorite color is green and then we're going to have let's make some different colored dots i want a blue dot here so this can be dot one and dot two i'll put another dot here i'll zoom out a little so we have a bit more space or maybe that was too much fifty percent looks alright so let me just move this around i might move these up a little so we're building a neural network here this is exactly what we just built and so we go here well maybe we'll put this as input x1 so this will make a little bit more sense and then we'll maybe we can copy this now this is x2 and then we have some form of output let's make this one and we're going to color this orange so output right so you can imagine have we got connected dots here maybe we'll connect these so our inputs are going to go through all of these i wonder if i can draw here okay this is going to be a little bit more complex but that's all right so this is what we've done we've got two input features here and if we wanted to keep drawing these we could all of these input features are going to go through all of these hidden units that we have i just drew the same arrow twice that's okay but this is what's happening in the forward computation method it can be a little bit confusing for when we coded it out why is that well from here it looks like we've only got an input layer into a single hidden layer in the blue and an output layer but truly this is the same exact shape you get the point and then all of these go to the output but we're going to see this computationally later on so whatever data set you're working with you're going to have to manufacture some form of input layer now this may be you might have 10 of these if you have 10 features or four of them if you have four features and then if you wanted to adjust these well you could increase the number of hidden units or the number of out features of a layer what just has to match up is that the layer it's going into has to have a similar shape as the what's coming out of here so just keep that in mind as you're going on and in our case we only have one output so we have the output here which is y so this is a visual version we've got the tensorflow playground you could play around with that you can change this to increase maybe you want five hidden layers with five neurons in each this is a fun way to explore this is a challenge actually go to playground.tensorflow.org replicate this network and see if it fits on this type of data what do you think will it well we're going to have to find out in the next few videos so i'm going to show you in the next video another way to create the network that we just created this one here with even less code than what we've done before i'll see you there welcome back in the last video we discussed well actually in the previous video to last we coded up this neural network here circle model v0 by subclassing an indoor module we created two linear layers which are capable of handling the shape of our data in features 2 because y we have 2x features out features we're upscaling the two features to 5 so that it gives our network more of a chance to learn and then because we've upscaled it to five features the next subsequent layer has to be able to handle five features as input and then we have one output feature because that's the same shape as our y here then we got a little bit visual by using the tensorflow playground did you try out that challenge make five hidden layers with five neurons did it work and then we also got a little bit visual in figma as well this is just another way of visualizing different things you might have to do this a fair few times when you first start with neural networks but once you get a bit of practice you can start to infer what's going on through just pure code so now let's keep pushing forward how about we replicate this with a simpler way because our network is quite simple that means it only has two layers that means we can use let's replicate the model above using nn.sequential and i'm going to code this out and then we can look up what nn sequential is but i think you'll be able to comprehend what's happening by just looking at it so nn which is torch.n well it was two because we have two in features and then we're going to replicate the same out features remember we could customize this to whatever we want 10 100 128 i'm going to keep it at five nice and simple and then we go nn.linear and the in features of this next layer is going to be 5 because the out features of the previous layer was 5 and then finally the out features here is going to be 1 because we want one y value to our 2x features and then i'm going to send that to the device and then i'm going to have a look at model 0. so this is of course going to override our previous model 0 but have a look the only thing different is that this is from the circle model v0 class we subclassed an end.module and the only difference is the name here this is just from sequential and so can you see what's going on here so as you might have guessed sequential it implements most of this code for us behind the scenes because we've told it that it's going to be sequential it's just going to go hey step the code through this layer and then step the code through this layer it outputs basically the same model rather than us creating our own forward method you might be thinking daniel why don't you show us this earlier that looks like such an easy way to create a neural network compared to this well yes you're 100 right that is an easier way to create a neural network however the benefit of subclassing and that's why i started from here is that when you have more complex operations such as things you'd like to construct in here and a more complex forward pass it's important to know how to build your own subclasses of nn.module but for simple straightforward stepping through each layer one by one so this layer first and then this layer you can use nn.sequential in fact we could move this code up into here so we could do this self dot we'll call this two linear layers equals nn dot sequential and we could have layer one and we could go self self.layer1 and or actually we'll just recode it we'll go nn.linear so it's so it's the same code as what we've got below in features if i could type that would be great in features equals two out features equals five and then we go nn dot linear and then we go in features equals what equals five because it has to line up our features equals one and then we've got two linear layers and then if we wanted to get rid of this return two linear layers and we'll pass it x we'll remake it there we go well because we've created these as well let's get rid of that beautiful so that's the exact same model but just using nn.sequential now i'm going to get rid of this so that our code is not too verbose that means a lot going on but this is the flexibility of pi torch so just keep in mind that there's a fair few ways to make a model the simplest is probably sequential and then subclassing is this is a little bit more complicated than what we've got but this can extend to handle a lot more complex neural networks which you'll likely have to be building later on so let's keep pushing forward let's see what happens if we pass some data through here so we'll just rerun this cell to make sure we've got our model 0 instantiated we'll make some predictions with the model so of course if we have a look at our model 0 dot state dect oh this'll be a good experiment so look at this so we have weight a weight tensor a bias tensor a weight tensor and a bias tensor so this is for the first or the zeroth layer these two here with the zero dot and then the one dot weight is for of course the first index layer now have a look inside here now you see how out features is five well that's why our bias parameter has five values here and the same thing for this weight value here and the weight value here why does this have ten samples one two three 4 5 6 7 8 9 10 because 2 times 5 equals 10. so this is just with a simple two-layer network look at all the numbers that are going on behind the scenes imagine coding all of these by hand like there's something like 20 numbers or something here now we've only done two layers here now the beauty of this is that in the previous section we created all of the weight and biases using nn.parameter and random values you'll notice that these are all random too again if yours are different to mine don't worry too much because they're going to be started randomly and we haven't set a random seed but the thing to note here is that pytorch is creating all of these parameters for us behind the scenes and now when we do back propagation and gradient descent when we code our training loop the optimizer is going to change all of these values ever so slightly to try and better fit or better represent the data so that we can split our two circles and so you can imagine how verbose this could get if we had say 50 layers with 128 different features of each so let's change this up see what happens watch how quickly the numbers get out of hand look at that we just changed one value and look how many parameters our model has so you might be able to calculate that by hand but i personally don't want to so we're going to let pytorch take care of a lot of that for us behind the scenes so for now we're keeping it simple but that's how we can crack our models open and have a look at what's going on now that was a little detour it's time to make some predictions with random numbers i just wanted to highlight the fact that our model is in fact instantiated with random numbers here so the untrained threads model 0 we're going to pass in x test and of course we have to send the test data to the device otherwise if it's on a different device we'll get errors because pytorch likes to make calculations on the same device so we'll go print let's do a nice print statement of length of predictions we're going to go length or len untrained threads we'll pass that in there and then we'll go oh no we need to squiggle and then we'll go shape shape is going to be untrained preds dot shape so this is again following the data explorer's motto of visualize visualize visualize and sometimes print is one of the best ways to do so so length of test samples you might already know this or we've already checked this together haven't we x test and then we're going to get the shape which is going to be x test dot shape wonderful and then we're going to print what's our little error here oh no collabs tricking me so let's go first 10 predictions and we're going to go untrained threads so how do you think these predictions will fare they're doing it with random numbers and what are we trying to predict again well we're trying to predict whether a dot is a red dot or a blue dot or zero or one and then we'll go first ten labels is going to be we'll get this on the next line and we'll go y test beautiful so let's have a look at this untrained predictions so we have length of predictions is 200 length of test samples is 200 but the shapes are different what's going on here y test and let's have a look at x test oh well i better just have a look at y test why don't we have a 2 there oh i've done x test dot shape oh let's test samples that's okay and then the predictions are one oh yes so y test let's just check the first 10 x test so a little bit of clarification needed here with your shapes so maybe we'll get this over here because i like to do features first and then labels what did we miss here oh x test 10 and y test see we're troubleshooting on the fly here this is what you're going to do with a lot of your code so there's our test values there's the ideal labels but our predictions they don't look like our labels what's going on here we can see that they're on the cuda device which is good we set that we can see that they got gradient tracking oh we didn't with torch we didn't do inference mode here that's a poor habit of us excuse me let's inference mode this there we go so you notice that the gradient tracking goes away there and so our predictions are nowhere near what our test labels are but also they're not even in the same like ballpark like these are whole numbers one or zero and these are all floats between one and zero hmm so maybe rounding them will that do something so where's our preds here so we go torch dot round what happens there oh they're all zero well our model is probably going to get about 50 accuracy why is that because all the predictions look like they're going to be zero and they've only got two options basically head or tails so hmm when we create our model and when we evaluate it we want our predictions to be in the same format as our labels but we're going to cover some steps that we can take to do that in a second what's important to take away from this is that there's another way to replicate the model we've made above using nn.sequential we've just replicated the same model as what we've got here n.sequential is a simpler way of creating a pi touch model but it's limited because it literally just sequentially steps through each layer in order whereas in here you can get as creative as you want with the forward computation and then inside our model pytorch has behind the scenes created us some weight and bias tensors for each of our layers with regards to the shapes that we've set and so the handy thing about this is that if we got quite ridiculous with our layers pie torch would still do the same thing behind the scenes create a whole bunch of random numbers for us and because our numbers are random it looks like our model isn't making very good predictions but we're going to fix this in the next few videos when we move on to fitting the model to the data and making a prediction but before we do that we need to pick up a loss function and an optimizer and build a training loop so let's get on to these two things welcome back so over the past few videos we've been setting up a classification model to deal with our specific shape of data now recall depending on the data set that you're working with will depend on what layers you use for now we're keeping it simple nn.linear is one of the most simple layers in pi torch we've got two input features we're up scaling that to five output features so we have five hidden units and then we have one output feature and that's in line with the shape of our data so two features of x equals one number for y so now let's continue on modeling with where we're up to we have build or pick a model so we've built a model now we need to pick a loss function and optimizer we're getting good at this so let's go here setup loss function and optimizer now here comes the question if we're working on classification previously we used let's go to the next one and then dot l1 loss for regression which is mae mean absolute error but just a heads up that won't necessarily work with a classification problem so which loss function or optimizer should you use so again this is problem specific but with a little bit of practice you'll get used to using different ones so for example for regression you might want which is regressions predicting a number and i know it can get fusing because it looks like we're predicting a number here we are essentially predicting a number but this relates to a class so for regression you might want mae or mse which is mean absolute absolute error or mean squared error and for classification you might want binary cross entropy or categorical cross entropy which is sometimes just referred to as cross entropy now where would you find these things out well through the internet of course so you could go what is binary cross entropy i'm going to leave you this for your extracurriculum to read through this we've got a fair few resources here understanding binary across entropy log loss by daniel godoy oh yes great first name my friend this is actually the article that i would recommend too if you want to learn what's going on behind the scenes through binary cross entropy for now there's a lot of math there we're going to be writing code to implement this so pytorch has done this for us essentially what does a loss function do let's remind ourselves go down here as a reminder the loss function measures how wrong your model's predictions are so i'm also going to leave a reference here to i've got a little table here in the book version of this course so 0.2 neural network classification with pytorch setup loss function and optimizer so we've got some example loss functions and optimizers here such as stochastic gradient descent or sgd optimizer adam optimize is also very popular so i've got problem type here and then the pie torch code that we can implement this with we've got binary cross entropy loss we've got cross entropy loss mean absolute error mie means squared error msc so you want to use these two for regression there are other different loss functions you could use but these are some of the most common that's what i'm focusing on the most common ones that are going to get you through a fair few problems we've got binary classification multi-class classification what are we working with we're working with binary classification so we're going to look at torch.nn bce which stands for binary cross entropy loss with logits what the hell is a logit and bce loss this is confusing and trust me when i first started using pi torch i got a little bit confused about why they have two here but we're going to explore that anyway so what is a logit so if you search what is a logic you'll get this and you get statistics and you get the log odds formula in fact if you want to read more i would highly encourage it so you could go through all of this we're going to practice writing code for it instead luckily pytorch does this for us but logit is kind of confusing and deep learning so if we go what is a logit in deep learning it kind of means a different thing it's kind of just a name of what yeah there we go what is the word logits in tensorflow as i said tensorflow is another deep learning framework so let's close this what do we got we've got a whole bunch of definitions here logit's layer yeah this is one of my favorites in context of deep learning the logits layer means the layer that feeds into the softmax so softmax is a form of activation we're going to see all of this later on because this is just words on a page right now softmax or other such normalization so the output of the softmax are the probabilities for the classification task and its input is the logit's layer well there's a lot going on here so let's just take a step back and get into writing some code and for optimizers i'm just going to complete this and for optimizers two of the most common and useful are sgd and atom however pytorch has many built-in options and as you start to learn more about the world of machine learning you'll find that if you go to torch.opt-in or torch.nn so if we have dot n n what do we have in here loss functions there we go beautiful that's what we're after l1 loss which is mae mse loss cross entropy loss ctc loss all of these different types of loss here will depend on the problem you're working on but i'm here to tell you that for regression and classification two of the most common of these see this is that confusion again bce loss bc with logits loss what the hell is a logit my goodness okay that's enough and optim these are different optimizers we've got probably a dozen or so here algorithms out of delta out of grad adamw this can be pretty full on when you first get started but for now just stick with sgd and the atom optimizer they're two of the most common again they may not perform the best on every single problem but they will get you fairly far just knowing those and then you'll pick up some of these extra ones as you go but let's just get rid of all of maybe we'll so i'll put this in here this link so we'll create our loss function for the loss function we're going to use torch dot nn dot bce with logits loss this is exciting for more on what binary cross entropy which is bce a lot of abbreviations in machine learning and deep learning is check out this article and then for a definition on what a logit is we're going to see a logit in a second in deep learning because again deep learning is one of those fields of machine learning which likes to be a bit rebellious you know likes to be a bit different from the pure mathematic type of fields and statistics in general it's this beautiful gestaltism and for different optimizers see torch.opt-in but we've covered a few of these things before and finally i'm going to put up here and then for some common choices of loss functions and optimizers now don't worry too much this is why i'm linking all of these extra resources a lot of this is covered in the book so as we just said set up lost function optimizer we just talked about these things but i mean you can just go to this book website and reference it oh we don't want that we want we want this link come on daniel you can't even copy and paste how you're supposed to code i know i've been promising code this whole time so let's write some so let's set up the loss function what did we say it was we're going to call it lows fn for loss function and we're going to call bce with logit's loss so bce with logit's loss this has the sigmoid activation function built in now we haven't covered what the sigmoid activation function is but we are going to don't you worry about that built in in fact if you wanted to learn what the sigmoid activation function is how could you find out sigmoid activation function but we're going to see it in action activation functions and neural networks this is the beautiful thing about machine learning is there's so much stuff out there people have written some great articles you've got formulas here pytorch has implemented that behind the scenes for us so thank you pytorch but if you recall sigmoid activation function built in where did we discuss the architecture of a classification network what do we have here right back in the zeroth chapter of this little online book thing that we heard here binary classification we have output activation oh look at that so sigmoid torch.sigmoid and pytorch all right and then for multi-class classification we need the softmax okay names on the page again but this is just a a reference table so we can keep coming back to and so let's just keep going with this i just want to highlight the fact that nn dot bce loss also exists so this requires bce loss equals requires inputs to have gone through the sigmoid activation function prior to input to bce loss and so let's look up the documentation i'm going to comment that out because we're going to stick with using this one now why would we stick with using this one let's check out the documentation hey torch.net and i realized this video was all over the place but we're going to step back through bce loss with logics did i even say this right with logit slot so with i got the width around the wrong way so let's check this out so this loss combines a sigmoid layer with the bce loss in one single class so if we go back to the code bce loss is this so if we combined an n dot sequential and then we passed in nn.sigmoid and then we went and then dot bce loss we'd get something similar to this but if we keep reading in the documentation because that's just i just literally read that it combines sigmoid with bce loss but if we go back to the documentation why do we want to use it so this version is more numerically stable than using a plane sigmoid by a bce loss followed by a bce loss as by combining the operations into one layer we take advantage of the log sum x trick for numerical stability beautiful so if we use this log function loss function for our binary cross entropy we get some numeric stability wonderful so there's our loss function we've got the sigmoid activation function built in and so we're going to see the difference between them later on like in the flesh optimizer we're going to choose hmm let's stick with sgd hey old faithful stochastic gradient descent and we have to set the parameters here the parameters parameter params equal to our model parameters we'd be like hey stochastic gradient descent please update if we get another code cell behind here please update our model parameters model with respect to the loss because we'd like our loss function to go down so these two are going to work in tandem again when we write our training loop and we'll set our learning rate to 0.1 we'll see where that gets us so that's what the optimizer is going to do it's going to optimize all of these parameters for us which is amazing and the principle would be the same even if there was a hundred layers here and ten thousand a million different parameters here so we've got a loss function we've got an optimizer and how about we create an evaluation metric so let's calculate accuracy at the same time because that's very helpful with classification problems is accuracy now what is accuracy well we could look up formula for accuracy so true positive over true positive plus true negative times 100 okay let's see if we can implement something similar to that just using pure pi torch now why would we want accuracy because the accuracy is out of 100 examples what percentage does our model get right so for example if we had a coin toss and we did a hundred coin tosses and our model predicted heads 99 out of 100 times and it was right every single time it might have an accuracy of 99 because it got one wrong so 99 out of 100 it gets it right so f accuracy fn accuracy function we're going to pass it y true so remember any type of evaluation function or loss function is comparing the predictions to the ground truth labels so correct equals this is going to see how many of our y true or y threads are correct so torch equal stands for hey how many of these samples y true are equal to y thread and then we're going to get the sum of that and we need to get the item from that because we want it as a single value in python and then we're going to calculate the accuracy acc stands for accuracy equals correct divided by the length of samples that we have as input and then we're going to times that by 100 and then return the accuracy wonderful so we now have an accuracy function we're going to see how all of three of these come into play when we write a training loop which we might as well get started on the next few videos hey i'll see you there welcome back in the last video we discussed some different loss function options for our classification models so we learned that if we're working with binary cross entropy or binary classification problems we want to use binary cross entropy and pi torch has two different types of binary cross entropy except one is a bit more numerically stable that's the bce with logic's loss because it has a sigmoid activation function built in so that's straight from the pi touch documentation and that for optimizer-wise we have a few different choices as well so if we check out this section here of the pi twitch book we have a few different loss functions and optimizers for different problems and the pie twitch code that we can implement but the premise is still the same across the board of different problems the loss function measures how wrong our model is and the goal of the optimizer is to optimize the model parameters in such a way that the loss function goes down and we also implemented our own accuracy function metric which is going to evaluate our model's predictions using accuracy as an evaluation metric rather than just loss so let's now work on training a model so what should we do first well do you remember the steps in a pie torch training loop so to train our model we're going to need to build a training loop so if you watch the video on the pi torch so if you can google unofficial pie torch song you should find my there we go the unofficial pie torch optimization loop song we're not going to watch that that's going to be a little tidbit for the steps that we're going to code out but that's just a fun little jingle to remember these steps here so if we go into the book section this is number three train model exactly we're up to here but we have pi torch training loop steps remember for an epoch in a range do the forward pass calculate the loss optimize the zero grad loss backward optimize a step step step we keep singing this all day you could keep reading those steps all day but it's better to code them but let's write this out so forward pass to calculate the loss three optimize a zero grad four what do we do loss backward so back propagation i'll just write that up in here back propagation we've linked to some extra resources if you'd like to find out what's going on in backpropagation we're focused on code here and then gradient descent so or optimize a step so build a training loop with the following steps however i've kind of mentioned a few things that need to be taken care of before we talk about the forward pass so we've talked about logits we looked up what the hell is a logit so if we go into this stack overflow answer we saw machine learning what is a logit how about we see that we need to do a few steps so i'm going to write this down let's get a bit of clarity about us daniel we're kind of all over the place at the moment but that's all right that's the exciting part of machine learning so let's go from going from raw logits to prediction probabilities to prediction labels that's what we want because to truly evaluate our model we want to so let's write in here our model outputs going to be raw logit so that's the definition of a logit in machine learning and deep learning you might read some few other definitions but for us the raw outputs of our model model 0 are going to be referred to as logits so then model 0 so whatever comes out of here are logits so we can convert these logits into prediction probabilities by passing them to some kind of activation function eg sigmoid for binary cross entropy and softmax for multi-class classification i've got binary class e vacation i have to sound it out every time i spell it for binary classification so class e fication so we're going to see multi-class classification later on but we want some prediction probabilities we're going to see what they look like in a minute so we want to go from logits to prediction probabilities to prediction labels then we can convert our model's prediction probabilities to prediction labels by either rounding them or taking the arg max so round is for binary classification and arg max will be for the outputs of the softmax activation function but let's see it in action first so i've called the logits are the raw outputs of our model with no activation function so view the first five outputs of the forward pass on the test data so of course our model is still instantiated with random values so we're going to set up a variable here y logits and model 0 we're going to pass it the test data so x test not text to device because our model is currently on our cuda device and we need our test data on the same device or target device remember that's why we're writing device agnostic codes so this would work regardless of whether there's a gpu active or not let's have a look at the logits oh okay right now we've got some positive values here and we can see that they're on the cuda device and we can see that they're tracking gradients now ideally we would have run torch dot inference mode here because we're making predictions and the rule of thumb is whenever you make predictions with your model you turn it into a vowel mode we just have to remember to turn it back to train when we want to train and you run torched on inference mode so we get a very similar setup here we just don't have the gradients being tracked anymore okay so these are called logits the logits are the raw outputs of our model without being passed to any activation function so an activation function is something a little separate from a layer so if we come up here we've used layer so the neural networks that we start to build and the ones you'll subsequently build are comprised of layers and activation functions we're going to make the concept of an activation function a little bit more clear later on but for now just treat it all as some form of mathematical operation so if we were to pass data through this model what is happening well it's going through the linear layer now recall we've seen this a few times now torch and then linear if we pass data through a linear layer it's applying the linear transformation on the incoming data so it's performing this mathematical operation behind the scenes so why the output equals the input x multiplied by a weight tensor a this could really be w which is transposed so that this is doing a dot product plus a bias term here and then if we jump into our model state dick we've got weight and we've got bias so that's the formula that's happening in these two layers it will be different depending on the layer that we choose but for now we're sticking with linear and so that the raw output of our data going through our two layers the logits is going to be this information here however it's not in the same format as our test data and so if we want to make a comparison to how good our model is performing we need apples to apples so we need this in the same format as this which is not of course so we need to go to a next step let's use the sigmoid so use the sigmoid activation function on our model logit so why are we using sigmoid well recall in a binary classification architecture the output activation is the sigmoid function here so now let's jump back into here and we're going to create some pred probs and what this stands for on our model logits to turn them into prediction probabilities probabilities so why pred probs equals torch sigmoid why logits and now let's have a look at why pred probs what do we get from this oh well we still get numbers on a page goodness gracious me but the important point now is that they've gone through the sigmoid activation function which is now we can pass these to a torch dot round function let's have a look at this torch dot round and what do we get thread probes oh the same format as what we've got here now you might be asking like why don't we just put torch dot round here well that's a little this step is required to we can't just do it on the raw logits we need to use the sigmoid activation function here to turn it into prediction probabilities and now what is a prediction probability well that's a value usually between 0 and 1 for how likely our model thinks it's a certain class and in the case of binary cross entropy these prediction probability values let me just write this out in text so for our prediction probability values we need to perform a range style rounding on them so this is a decision boundary so this will make more sense when we go why pred probs if it's equal to 0.5 or greater than 0.5 we set y equal to 1. so y equal 1. so class 1 whatever that is a red dot or a blue dot and then y pred probs if it is less than 0.5 we set y equal to 0. so this is class 0. you can also adjust this decision boundary so say if you wanted to increase this value so anything over 0.7 is one and below that is zero but generally most commonly you'll find it split at 0.5 so let's keep going let's actually see this in action so how about we recode this so find the predicted probabilities and so we want no sorry we want the predicted labels that's what we want so when we're evaluating our model we want to convert the outputs of our model the outputs of our model are here the logits the raw outputs of our model are logits and then we can convert those logits to prediction probabilities using the sigmoid function on the logits and then we want to find the predicted labels so we go raw logits output of our model prediction probabilities after passing them through an activation function and then prediction labels this is the steps we want to take with the outputs of our model so find the predicted labels let's go in here a little bit different to our regression problem previously but nothing we can't handle torch round we're going to go y pred probs so i like to name it why pred probs for prediction probabilities and y threads for prediction labels now let's go in full if we wanted to so why pred labels equals torch.round torch.sigmoid so sigmoid activation function for binary cross entropy and model 0 x test.2 device truly this should be within inference mode code but for now we'll just leave it like this to have a single example of what's going on here now i just need to count one two there we want that's where we want the index we just wanted on five examples so check for equality and we want print torch equal we're going to check why preds dot squeeze is equal to why pred labels so it's just we're doing the exact same thing and we need squeeze here to get rid of the extra dimension that comes out you can try doing this without squeeze so get rid of extra dimension once again we want y threads dot squeeze a fair bit of code there but this is what's happened here we create y preds so we turn the white red probes into y prints and then we just do the full step over again so we make predictions with our model we get the raw logits so this is logits to pred probs to pred labels so the raw outputs of our model are logits we turn the logits into prediction probabilities using torch sigmoid and we turn the prediction probabilities into prediction labels using torch.round and we fulfill this criteria here so everything above 0.5 this is what torch dot round does turns it into a one everything below 0.5 turns it into a zero the predictions right now are going to be quite terrible because our model is using random numbers but why preds found with the steps above is the same as why pred labels doing the more than one hit thanks to this check for equality using torch equal why preds dot squeeze and we just do the squeeze to get rid of the extra dimensions and we have out here some labels that look like our actual y test labels they're in the same format but of course they're not the same values because this model is using random weights to make predictions so we've done a fair few steps here but i believe we are now in the right space to start building a training a test loop so let's write that down here 3.2 building a training and testing loop you might want to have a go at this yourself so we've got all the steps that we need to do the forward pass but the reason we've done this step here the logits then the pred probes and the pred labels is because the inputs to our loss function up here this requires so bce with logits loss requires what well we're going to see that in the next video but i'd encourage you to give it a go at implementing these steps here remember the jingle for an epoch in a range do the forward pass calculate the loss which is pc with logic's loss optimize the zero grad which is this one here loss backward optimize a step step step let's do that together in the next video welcome back in the last few videos we've been working through creating a model for a classification problem and now we're up to training a model and we've got some steps here but we started off by discussing the concept of logits logits are the raw output of the model whatever comes out of the forward functions of the layers in our model and then we discussed how we can turn those logits into prediction probabilities using an activation function such as sigmoid for binary classification and softmax for multi-class classification we haven't seen softmax yet but we're going to stick with sigmoid for now because we have binary classification data and then we can convert that from prediction probabilities to prediction labels because remember when we want to evaluate our model we want to compare apples to apples we want our models predictions to be in the same format as our test labels and so i took a little break after the previous video so my collab notebook has once again disconnected so i'm just going to run all of the cells before here it's going to reconnect up here we should still have a gpu present that's a good thing about google colab is that if you change the runtime type to gpu it'll save that wherever it saves the google colab notebook so that when you restart it it should still have a gpu present and how can we check that of course well we can type in device we can run that cell and we can also check nvidia smi it'll tell us if we have a nvidia gpu with cuda enabled ready to go so what's our device cuda wonderful and nvidia smi excellent i have a tesla p100 gpu ready to go so with that being said let's start to write a training loop now we've done this before and we've got the steps up here do the forward pass calculate the loss we've spent enough on this so we're just going to start jumping in to write code there is a little tidbit in this one though but we'll conquer that when we get to it so i'm going to set a manual seed torch.manual seed and i'm going to use my favorite number 42. this is just to ensure reproducibility if possible now i also want you to be aware of there is also another form of random seed manual seed which is a cuda random seed do we have the pie torch yeah reproducibility so torch dot cuda dot manual seed dot seed hmm there is a cuda seed somewhere let's try and find out kuda i think pie twitch have just had an upgrade to their documentation seed yeah there we go okay i knew it was there so torch took it at manual seeds so if we're using cuda we have a cuda manual seed as well so let's see what happens if we put that to watch the cuda dot manual seed 42 we don't necessarily have to put these it's just to try and get as reproducible as numbers as possible on your screen and my screen again what is more important is not necessarily the numbers exactly being the same lining up between our screens it's more so the direction of which way they're going so let's set the number of epochs we're going to train for 100 epochs epochs equals 100 but again as you might have guessed the cuda manual seed is for if you're doing operations on a cuda device which in our case we are well then perhaps we want them to be as reproducible as possible so speaking of cuda devices let's put the data to the target device because we're working with or we're writing data agnostic code here so i'm going to write x train y train equals x train 2 device comma y train dot 2 device that'll take care of the training data and i'm going to do the same for the testing data equals x test 2 device because if we're going to run our model on the cuda device we want our data to be there too and the way we're writing our code our code is going to be device agnostic have i said that enough yet so let's also build our training and evaluation loop because we've covered the steps in here before we're going to start working a little bit faster through here and don't worry i think you're up to it so for an epoch in a range of epochs what do we do we start with training so let me just write this training model zero dot train that's the model we're working with we call the train which is the default but we're going to do that anyway and as you might have guessed the code that we're writing here is you can functionize this so we're going to do this later on but just for now the next couple of videos the next module or two we're going to keep writing out the training loop in full so this is the part the forward pass where there's a little bit of a tidbit here compared to what we've done previously and that is because we're outputting raw logits here if we just pass our data straight to the model so model 0x train and we're going to squeeze them here to get rid of an extra dimension you can try to see what the output sizes look like without squeeze but i'm just going to call squeeze here remember squeeze removes an extra one dimension from a tensor and then to convert it into prediction labels we go torch dot round and torch dot sigmoid because torch dot sigmoid is an activation function which is going to convert logits into what convert the logits into prediction probabilities so y logits and i'm just going to put a node here so this is going to go turn logits into pred probs into pred labels so we've done the forward pass so that's a little tidbit there we could have done all of this in one step but i'll show you why we broke this apart so now we're going to calculate loss slash accuracy we don't necessarily have to calculate the accuracy but we did make an accuracy function up here so that we can calculate the accuracy during training we could just stick with only calculating the loss but sometimes it's cool to visualize different metrics loss plus a few others while your model is training so let's write some code to do that here so we'll start off by going loss equals loss fn and why logits ah here's the difference of what we've done before previously in the notebook zero one we're up to zero 2 now we passed in the prediction right here but because what's our loss function let's have a look at our loss function let's just call that see what it returns bce with logit's loss so the bce with logits expects logits as input so as you might have guessed loss function without logits if we had nn.bce loss notice how we haven't got with logits and then we called los fn fn stands for function by the way without logits what do we get so bce loss so this loss expects prediction probabilities as input so let's write some code to differentiate between these two as i said we're going to be sticking with this one why is that because if we look up torch bce with logits loss the documentation states that it's more numerically stable so this loss combines a sigmoid layer and the bce loss into one single class and is more numerically stable so let's come back here we'll keep writing some code and the accuracy is going to be accuracy fn so our accuracy function there's a little bit of a difference here is y true equals y trained for the training data so this will be the training accuracy and then we have y pred equals y pred so this is our own custom accuracy function that we wrote ourselves this is a testament to the pythonic nature of pi torch as well we've just got a pure python function that we've slotted into our training loop which is essentially what the loss function is behind the scenes now let's write here nnn dot bce with logits loss expects raw logits so the raw output of our model as input now what if we were using a bce loss on its own here well let's just write some code for that so let's call loss function and then we want to pass in y pred or we can just go y or torch sigmoid so why would we pass in torch sigmoid on the logits here because remember calling torch.sigmoid on our logits turns our logits into prediction probabilities and then we would pass in y train here so if this was bce loss expects this expects prediction probabilities as input so does that make sense that's the difference between with logits so our loss function requires logits as input whereas if we just did straight up bce loss we need to call torch.sigmoid on the logits because it expects prediction probabilities as input now i'm going to comment that out because our loss function is bce with logit's loss but just keep that in mind if for some reason you stumble across some pi torch code that's using bce loss not bc with logits loss and you find that torch.sigmoid is calling here or you come across some errors because your inputs to your loss function are not what it expects so with that being said we can keep going with our other steps so we're up to optimizer zero grad so optimizer dot zero grad oh this is step three by the way and what's after this once we've zero grad the optimizer we do number four which is lost backward we can go last backward and then we go what's next optimizer step step step so optimizer dot step and i'm singing the unofficial pi touch optimization loop song there this is back propagation calculate the gradients with respect to all of the parameters in the model and the optimizer step is update the parameters to reduce the gradients so gradient descent hence the descent now if we want to do testing well we know what to do here now we go model 0 what do we do we call model.val when we're testing and if we're making predictions that's what we do when we test we make predictions on the test data set using the patterns that our model has learned on the training data set we turn on inference mode because we're doing inference we want to do the forward pass and of course we're going to compute the test logits because logits are the raw output of our model with no modifications x test dot squeeze we're going to get rid of an extra one dimension there then we create the test pred which is we have to do a similar calculation to what we've done here for the test spread which is torch.round for our binary classification we want prediction probabilities which we're going to create by calling the sigmoid function on the test logits prediction probabilities that are 0.5 or above to go to 1 and prediction probabilities under 0.5 to go to label 0. so 2 is calculate the test loss test loss slash accuracy how would we do this well just if we've done before and we're going to go loss fm test logits because our loss function we're using what we're using bce with logit's loss expects logits as input where we find that out in the documentation of course then we come back here test logits we're going to compare that to the y test labels and then for the test accuracy what are we going to do we're going to call accuracy fn on why true equals y test and why pred equals test spread now you might be thinking why did i switch up the order here for these oh and by the way this is important to know with logits loss so with these loss functions the order here matters of which way you put in your parameters so predictions come first and then true labels for our loss function you might be wondering why i've done it the reverse for our accuracy function why true and why pred that's just because i like to be confusing well not really it's because if we go to scikit-learn i base a lot of my structured code off how scikit loan structures things the scikit-learn metrics accuracy score goes why true why pred so i base it off that order because the scikit-learn metrics package is very helpful so i've based our metric evaluation metric function off this one whereas pi torch's loss function does it in the reverse order and it's important to get these in the right order exactly why they do it in that order i couldn't tell you why and we've got one final step which is to print out what's happening happening so how about we go we're doing a lot of epochs here 100 epochs so we'll divide the epoch by 10 to print out every epoch or every 10th epoch sorry and we have a couple of different metrics that we can print out this time so we're going to print out the epoch number epoch and then we're going to print out the loss so loss how many decimal points we'll go 0.5 here this is going to be the training loss we'll also do the accuracy which will be the training accuracy we could write train ack here for our variable to be a little bit make them a little bit more understandable and then we go here but we're just going to leave it as loss and accuracy for now because we've got test loss over here test loss and we're going to do the same five decimal points here and then we're going to go test accuracy as well test ack dot we'll go 2 for the accuracy and because it's accuracy we want a percentage this is the percent out of 100 guesses what's the percentage that our model gets right on the training data and the testing data as long as we've coded all the functions correctly now we've got a fair few steps here my challenge to you is to run this and if there are any errors try to fix them no doubt there's probably one or two or maybe more that we're going to have to fix in the next video but speaking of next videos i'll see you there let's train our first classification model oh this is very exciting i'll see you soon welcome back in the last video we wrote a mammoth amount of code but nothing that we can't handle we've been through a lot of these steps we did have to talk about a few tidbits between using different loss functions namely the bce loss which is binary cross entropy loss and the bce with logits loss we discussed that the bce lost in pytorch expects prediction probabilities as input so we have to convert our models logits logits are the raw output of the model two prediction probabilities using the torch.sigmoid activation function and if we're using bce with logits loss it expects raw logits as input as sort of the name hints at and so we just pass it straight away the raw logits whereas our own custom accuracy function compares labels to labels and that's kind of what we've been stepping through over the last few videos is going from logits to pred probes to pred labels because that's the ideal output of our model is some kind of label that we as humans can interpret and so let's keep pushing forward you may have already tried to run this training loop i don't know if it works we wrote all this code together in the last video and there's probably an error somewhere so you're ready we're going to train our first classification model together for 100 epochs if it all goes to plan in three two one let's run oh my gosh it actually worked the first time i promised you i didn't change anything in here from the last video so let's inspect what's going on it trains pretty fast why well because we're using a gpu so it's going to be accelerated as much as it can anyway and our data set is quite small and our network is quite small so you won't always get networks training this fast it did 100 epochs in like a second so the loss oh 0.69973 it doesn't go down very much the accuracy even starts high and then goes down what's going on here our model doesn't seem to be learning anything so what would an ideal accuracy be an ideal accuracy is a hundred and what's an ideal loss value well zero because lower is better for loss hmm this is confusing and now if we go have a look at our blue and red dots where's our data so i reckon we do we still have a data frame here how many samples do we have of each let's inspect let's do some data analysis where do we create a data frame here now circles do we still have this instantiated circles dot label dot we're going to call on pandas here value counts is this going to output how many of each okay wow we've got 500 of class 1 and 500 of class 0 so we have 500 red dots and blue dots which means we have a balanced data set so if we're getting we're basically trying to predict heads or tails here so if we're getting an accuracy of under fifty percent or about fifty percent if you rounded it up our model is basically doing as well as guessing or what gives well i think we should get visual with this so let's make some predictions with our model because these are just numbers on the page it's hard to interpret what's going on but our intuition now is because we have 500 samples of each or in the case of the training data set we have 400 of each because we have 800 samples in the training data set and we have in the testing data set we have 200 total samples so we have 100 of each we're basically doing a coin flip here our model is as good as guessing so time to investigate why our model is not learning and one of the ways we can do that is by visualizing our predictions so let's write down here from the metrics it looks like our model isn't learning anything so to inspect it let's make some predictions and make them visual and we'll write down here in other words visualize visualize visualize all right so we've trained a model we've at least got the structure for the training code here like this is the right training code we've written this code before so you know that this setup for training code does allow a model to train so there must be something wrong with either how we've built our model the data set but let's keep going and investigate together so to do so i've got a function that i've pre-built earlier did i mention that we're learning side by side of a machine learning cooking show so this is an ingredient i prepared earlier a part of a dish so to do so we're going to import a function called plot decision or maybe i'll turn this into code plot decision boundary welcome to the cooking show cooking with machine learning what model will we cook up today so if we go to pie torch deep learning well it's already over here but this is the home repo for the course the link for this will be scattered everywhere but there's a little function here called helperfunctions.pi which i'm going to fill up with helper functions throughout the course and this is the one i'm talking about here plot decision boundary now we could just copy this into our notebook or i'm going to write some code to import this programmatically so we can use other functions from in here here's our plot predictions function that we made in the last section zero one but this plot decision boundary is a function that i got inspired by to create from madewithml.com now this is another resource a little bit of an aside i highly recommend going through this by goku mahundas it gives you the foundations of neural networks and also ml ops which is a field which is based on getting your neural networks and machine learning models into applications that other people can use so i can't recommend this resource enough so please please please check that out if you want another resource for machine learning but this is where this helper function came from so thank you goku my hundreds i've made a little bit of modifications for this course but not too many so we could either copy that paste it in here or we could write some code to import it for us magically or using the power of the internet right because that's what we are we're programmers we're machine learning engineers we're data scientists so from pathlib so the request module in python is a module that allows you to make requests a request is like going to a website hey i'd like to get this code from you or this information from you can you please send it to me so that's what that allows us to do and powerfully but we've seen pathway before but it allows us to create file paths because why well we want to save this helper function.pi script to our google colab files and so we can do this with a little bit of code so download helper functions from learn pi torch repo if it's not already downloaded so let's see how we can do that so we're going to write some if else code to check to see if the path of helper functions dot pi already exists we don't want to download it again so at the moment it doesn't exist so this if statement is going to return false so let's just print out what it does if it returns true helper functions dot pi already exists we might we could even probably do a try and accept loop here but if else will help us out for now so if it exists else print downloading helper functions dot pi so ours doesn't exist so it's going to make a request or let's set up our request requests dot get and here's where we can put in a url but we need the raw version of it so this is the raw version if we go back this is just pytorch deep learning the repo for this course slash helper functions if i click raw i'm going to copy that oh don't want to go in there i want to go into request get type that in this has to be in a string format so we get the raw url and then we're going to go with open we're going to open a file called helperfunctions.pi and we're going to set the context to be right binary which is wb as file f is a common short version of writing file because we're going to call file dot write and then request dot content so this code is basically saying hey requests get the information that's at this link here which is of course all of this code here which is a python script and then we're going to create a file called helperfunctions.pi which gives us write permissions we're going to name it f which is short for file and then we're going to call on it file.write the content of the request so instead of talking through it how about we see it in action we'll know if it works if we can from helper functions import plot predictions we're going to use plot predictions later on as well as plot decision boundary so plot predictions we wrote in the last section wonderful i'm going to write here downloading helper functions dot pi did it work we have helper functions.pi look at that we've done it programmatically can we view this in google chrome oh my goodness yes we can look at that so this may evolve by the time you do the course but these are just some general helper functions rather than writing all of this out if you would like to know what's going on in plot decision boundary i encourage you to read through here and what's going on you can step by step at yourself there is nothing here that you can't tackle yourself it's all just python code no secrets just python code we've got we're making predictions with a pie torch model and then we're testing for multi-class or binary so we're going to get out of that but now let's see the ultimate test is if the plot decision boundary function works so again we could discuss plot decision boundary of the model we could discuss what it does behind the scenes to the cows come home but we're going to see it in real life here i like to get visual so fig size 12 6 we're going to create a plot here because we are adhering to the data explorer's motto of visualize visualize visualize and we want a subplot because we're going to compare our training and test sets here train and then we're going to go plt or actually we'll plot the first one plot decision boundary now because we're doing a training plot here we're going to pass in model 0 and x train and y train now this is the order that the parameters go in if we press command shift space i believe google collab if it's working with me will put up a doc string there we go plot decision boundary look at the inputs that it takes model which is torch nn.module and we've got x which is our x value which is a torch tensor and y which is our torch tensor value here so that's for the training data now let's do the same for the testing data plot.subplot this is going to be one two two for the index this is just number of rows of the plot number of columns and this is the index so this plot will appear on the first slot we're going to see this anyway anything below this code will appear on the second slot plt dot title and we're going to call this one test and then we're going to call plot decision boundary if this works this is going to be some serious magic i love visualization functions in machine learning okay you ready three two one let's check it out how's our model doing oh look at that oh now it's clear so behind the scenes this is the plots that plot decision boundary is making of course this is the training data this is the testing data not as many dot points here but the same sort of line of what's going on so this is the line that our model is trying to draw through the data no wonder it's getting about 50 percent accuracy and the loss isn't going down it's just trying to split the data straight through the middle it's drawing a straight line but our data is circular why do you think it's drawing a straight line well do you think it has anything to do with the fact that our model is just made using pure linear layers let's go back to our model what's it comprised on just a couple of linear layers what's a linear line if we look up linear line is this going to work with me i don't actually think it might there we go let me align all straight lines so i want you to have a think about this even if you're completely new to deep learning can we you can answer this question can we ever separate this circular data with straight lines i mean maybe we could if we like drew straight lines here but then trying to curve them around but there's an easier way we're going to see that later on for now how about we try to improve our model so the model that we built we've got a hundred epochs i wonder if our model will improve if we trained it for longer so that's a little bit of a challenge before the next video see if you can train the model for a thousand epochs does that improve the results here and if it doesn't improve the results here have a think about why that might be i'll see you in the next video welcome back in the last video we wrote some code to download a series of helper functions from our helperfunctions.pi and later on you'll see why this is quite standard practice as you write more and more code is to write some code store them somewhere such as a python script like this and then instead of us rewriting everything that we have in helper functions we just import them and then use them later on this is similar to what we've been doing with pytorch pytorch is essentially just a collection of python scripts that we're using to build neural networks well there's a lot more than what we've just done i mean we've got one here but pytorch is a collection of probably hundreds of different python scripts but that's beside the point we're trying to train a model here to separate blue and red dots but our current model is only drawing straight lines and i got you to have a think about whether our straight line model our linear model could ever separate this data maybe it could and i issued the challenge to see if it could if you trained for a thousand epochs did it improve at anything is the accuracy any higher well speaking of training for more epochs we're up to section number five improving a model this is from a model perspective so now let's discuss some ways if you are getting results after you train a machine learning model or a deep learning model whatever kind of model you're working with and you weren't happy with those results so how could you go about improving them so this is going to be a little bit of an overview of what we're going to get into so one way is to add more layers so give the model more chances to learn about patterns in the data why would that help because if our model currently has two layers model 0 dot state deck well we've got however many numbers here 20 or so so this is zeroth layer this is the first layer if we had 10 of these well we'd have 10 times the amount of parameters to try and learn the patterns in this data a representation of this data another way is to add more hidden units so what i mean by that is we created this model here and each of these layers has five hidden units the first one outputs out features equals five and this one takes in features equals five so we could go from go from five hidden units to ten hidden units the same principle as above applies here is that the more parameters our model has to represent our data that potentially now i say potentially here because some of these things might not necessarily work so our data set is quite simple so maybe if we added too many layers our model's trying to learn things that are too complex it's trying to adjust too many numbers for the data set that we have the same thing for more hidden units what other options do we have well we could fit for longer give the model more of a chance to learn because every epoch is one pass through the data so maybe a hundred times looking at this data set wasn't enough so maybe you could fit for a thousand times which was the challenge then there's changing the activation functions which we're using sigmoid at the moment which is generally the activation function you use for a binary classification problem but there are also activation functions you can put within your model hmm there's a little hint we'll get to that later then there's change the learning rate so the learning rate is the amount the optimizer will adjust these every epoch and if it's too small our model might not learn anything because it's taking forever to change these numbers but if also on the other side of things if the learning rate is too high these updates might be too large and our model might just explode there's an actual problem in machine learning called exploding gradient problem where the numbers just get too large on the other side there's also a vanishing gradients problem where the gradients just go basically to zero too quickly and then there's also change the loss function but i feel like for now sigmoid and binary cross entropy are pretty good pretty standard so we're going to have a look at some options here add more layers and fit for longer maybe changing the learning rate but let's just add a little bit of color to what we've been talking about right now we've fit the model to the data and made a prediction i'm just going to step through this where are we up to we've done this we've done this we've done these two we've built a training loop we fit the model to the data made a prediction we've evaluated our model visually and we're not happy with that so we're up to number five we're going to improve through experimentation we don't need to use tensorboard just yet we're going to talk about this as a high level tensorboard is a tool or a utility from pytorch which helps you to monitor experiments we'll see that later on and then we'll get to this we won't save our model until we've got one that we're happy with and so if we look at what we've just talked about improving a model from a model's perspective let's talk about the things we've talked about with some color this time so say we've got a model here this isn't the exact model that we're working with but it's similar structure we've got one two three four layers we've got a loss function bc with logits loss we've got an optimizer optimizer stochastic gradient descent and if we did write some training code this is ten epochs and then the testing code here i've just cut it out because it wouldn't fit on the slide then if we wanted to go to a larger model let's add some color here so we can highlight what's happening adding layers okay so this one's got one two three four five six layers and we've got another color here which is i'd say this is like a little bit of a greeny blue increase the number of hidden units okay so the hidden units are these features here we've gone from 100 to 128 to 128 remember the out features of a previous layer have to line up with the in features of a next layer then we've gone to 256 wow so remember how i said multiples of 8 are pretty good generally in deep learning well this is where these numbers come from and then what else do we have change slash add activation functions we haven't seen this before nn.relu if you want to jump ahead and have a look at what nn.relo is how would you find out about it well i just google nn.relu but we're going to have a look at what this is later on we can see here that this one's got one but this larger model has some relus scattered between the linear layers hmm maybe that's a hint if we combine a linear layout with a relu what's a relu layer i'm not going to spoil this we're going to find out later on change the optimization function okay so we've got sgd do you recall how i said atom is another popular one that works fairly well across a lot of problems as well so adam might be a better option for us here the learning rate as well so maybe this learning rate was a little too high and so we've divided it by 10 and then finally fitting for longer so instead of 10 epochs we've gone to a hundred so how about we try to implement some of these with our own model to see if it improves what we've got going on here because frankly like this isn't satisfactory we're trying to build a neural network here neural networks are supposed to be these models that can learn almost anything and we can't even separate some blue dots from some red dots so in the next video how about we run through writing some code to do some of these steps here in fact if you want to try yourself i'd highly encourage that so i'd start with trying to add some more layers and add some more hitting units and fitting for longer you can keep all of the other settings the same for now but i'll see you in the next video welcome back in the last video we discussed some options to improve our model from a model perspective and namely we're trying to improve it so that the predictions are better so that the patterns it learns better represent the data so we can separate blue dots from red dots and you might be wondering why we said from a model perspective here so let me just write these down these options are all from a model's perspective because they deal directly with the model rather than the data so there's another way to improve a model's results is if the model was sound already in machine learning and deep learning you may be aware that generally if you have more data samples the model learns or gets better results because it has more opportunity to learn there's a few other ways to improve a model from a data perspective but we're going to focus on improving a model from a model's perspective so and because these options are all values we as machine learning engineers and data scientists can change they are referred to as hyper parameters it's a little bit of an important distinction here parameters are the numbers within a model the parameters here like these values the weights and biases are parameters are the values a model updates by itself hyperparameters are what we as machine learning engineers and data scientists such as adding more layers more hidden units fitting for longer number of epochs activation functions learning rate loss functions are hyper parameters because they're values that we can change so let's change some of the hyper parameters of our model so we'll create circle model v1 we're going to import from nn.module as well we could write this model using nn.sequential but we're going to subclass and in.module for practice why would we use nn.sequential well because as you'll see our model is not too complicated but we subclass nn.module in fact nn.sequential so if we write here and then dot sequential is also a version of nn.module but we subclass an end.module here for one for practice and for later on if we wanted to or if you wanted to make more complex models you're going to see a subclass of nn.module a lot in the wild so the first change we're going to update is the number of hidden units so out features i might write this down before we do it let's try and improve our model by adding more hidden units so this will go from 5 and we'll increase it to 10 and we want to increase the number of layers so we want to go from 2 to 3 we'll add an extra layer and then increase the number of epochs so we're going to go from 100 to a thousand now what can you we're going to put on our scientist hats for a second what would be the problem with the way we're running this experiment if we're doing all three things in one hit why might that be problematic well because we might not know which one offered the improvement if there is any improvement or degradation so just to keep in mind going forward i'm just doing this as an example of how we can change all of these but generally when you're doing machine learning experiments you'd only like to change one value at a time and track the results so that's called experiment tracking and machine learning we're going to have a look at experiment tracking a little later on in the course but just keep that in mind a scientist likes to change one variable of what's going on so that they can control what's happening but we're going to create this next layer here layer 2 and of course it takes the same number of out features as in features as the previous layer this is 2 because y our x train has let's look at just the first five samples has two features so now we're going to create self layer three which equals nn.linear the in features here is going to be 10 why because the layer above has out features equals 10. so what we've changed here so far is we've got hidden units previously in v0 of this model was 5 and now we've got a third layer which previously before was two so these are two of our main changes here and out features equals one because why let's have a look at speaking of why our y is just one number so remember the shapes the input and output shapes of a model is one of the most important things in deep learning we're going to see different values for these shapes later on but because we're working with this data set we're focused on two in features and one out feature so now that we've got our layers prepared what's next well we have to override the forward method because every subclass of nn.module has to implement a forward method so what are we going to do here well we could let me just show you one option we could go zed which would be zed for logits logits is actually represented by zed fun fact but you could actually put any variable here so this could be x1 or you could reset x if you wanted to i just look putting a different one because it's a little less confusing for me and then we could go update z by going self layer two and then the because z above is the output of layer one it now goes into here and then if we go zed again equals self layer three what's this going to take it's going to take z from above so this is saying hey give me x put it through layer one assign it to z and then create a new variable z or override z with self layer two with z from before as the input and then we've got z again the output of layer two has the input for layer 3 and then we could return z so that's just passing our data through each one of these layers here but a way that you can leverage speed ups in pi torch is to call them all at once so layer 3 and we're going to put self dot layer 2 and this is generally how i'm going to write them but it also behind the scenes because it's performing all the operations at once you leverage whatever speed ups you can get oh this should be layer one so it goes in order here so what's happening well it's computing the inside of the brackets first so layer one x is going through layer one and then the output of x into layer one is going into layer two and then the same again for layer three so this way this way of writing operations leverages speed ups where possible behind the scenes and so we've done our forward method there we're just passing our data through layers with extra hidden units and an extra layer overall so now let's create an instance of circle model v1 which we're going to set to model 1 and we're going to write circle model v1 and we're going to send it to the target device because we like writing device agnostic code and then we're going to check out model one so let's have a look at what's going on there beautiful so now we have a three-layered model with more hidden units so i wonder if we trained this model for longer are we going to get improvements here so my challenge to you is we've already done these steps before we're going to do them over the next couple of videos for completeness but we need to what create a loss function so i'll give you a hint it's very similar to the one we've already used and we need to create an optimizer and then once we've done that we need to write a training and evaluation loop for model one so give that a shot otherwise i'll see you in the next video we'll do this all together welcome back in the last video we subclassed nn.module to create circle model v1 which is an upgrade on circle model v0 in the fact that we added more hidden units so from five to ten and we added a whole extra layer and we've got an instance of it ready to go so we're up to in the workflow we've got our data well we haven't changed the data so we've built our new model we now need to pick a loss function and i hinted at before that we're going to use the same loss function as before the same optimizer you might have already done all of these steps so you may know whether this model works on our data set or not but that's what we're going to work towards finding out in this video so we've built our new model now let's pick a loss function an optimizer we could almost do all of this with our eyes closed now build a training loop fit the model to the data make a prediction and evaluate the model we'll come back here and let's set up a loss function and by the way if you're wondering like why would adding more features here we've kind of hinted at this before and why would an extra layer improve our model well again it's back to the fact that if we add more neurons if we add more hidden units and if we add more layers it just gives our model more numbers to adjust so look at what's going on here layer one layer two look how many more we have compared to model zero dot state dick we have all of these this is model zero and we just upgraded it look how many more we have from just adding an extra layer and more hidden units so now we have our optimizer can change these values to hopefully create a better representation of the data we're trying to fit so we just have more opportunity to learn patterns in our target data set so that's the theory behind it so let's get rid of these let's create a loss function what are we going to use well we're going to use nn.bc with logits loss and our optimizer is going to be what we're going to keep that as the same as before torch.optim.sgd but we have to be aware that because we're using a new model we have to pass in params of model one these are the parameters we want to optimize and the lr is going to be 0.1 is that the same lr we used before learning rate 0.1 oh potentially that our learning rate may be too big 0.1 where do we create our optimizer see we've written a lot of code here optimizer there we go 0.1 that's alright so we'll keep it at 0.1 just to keep as many things the same as possible so we're going to set up torch dot manual seed 42 to make training as reproducible as possible torch dot cuda dot manual seed 42 now as i said before don't worry too much if your numbers aren't exactly the same as mine the direction is more important whether it's good or bad direction so now let's set up epochs we want to train for longer this time as well so a thousand epochs this is one of our three improvements that we're trying to do adding more hidden units increase the number of layers and increase the number of epochs so we're going to give our model a thousand looks at the data to try and improve its patterns so put data on the target device we want to write device agnostic code and yes we've already done this but we're going to write it out again for practice because even though we could functionalize a lot of this it's good while we're in still the foundation stages to practice what's going on here because i want you to be able to do this with your eyes closed before we start to functionalize it so put the training data and the testing data to the target device whatever it is cpu or gpu and then we're going to well what's our song for an epoch in the range let's loop through the epochs we're going to start off with training what do we do for training while we set model 1 to train and then what's our first step well we have to forward pass what's our outputs of the model well the raw outputs of a model are logits so model one we're going to pass it the training data and we're going to squeeze it so that we get rid of an extra one dimension if you don't believe me that we would like to get rid of that one dimension try running the code without that dot squeeze and why pred equals torch dot round and torch dot sigmoid why we're calling sigmoid on our logits to go from logits to prediction probabilities to prediction labels and then what do we do next well we calculate the loss slash accuracy two here and remember accuracy is optional but loss is not optional so we're gonna pass in here our loss function is going to take in i wonder if it'll work with just straight up why pred i don't think it will because we're using we need logits in here why logits and why train because why ah google collab correcting the wrong thing we have wide logits because we're using bce with logit's loss here so let's keep pushing forward we want our accuracy now which is our accuracy function and we're going to pass in the order here which is the reverse of above a little confusing but i've kept the evaluation function in the same order as scikit learn why pred equals ypred three we're going to zero the gradients of the optimizer optimizer zero grad and you might notice that we've started to pick up the pace a little that is perfectly fine if i'm typing too fast you can always slow down the video or you could just watch what we're doing and then code it out yourself afterwards the code resources will always be available we're going to take the loss backward and perform back propagation the only reason we're going faster is because we've covered these steps so anything that we sort of spend time here we've covered in a previous video optimizer step and this is where the adjustments to all of our models parameters are going to take place to hopefully create a better representation of the data and then we've got testing what's the first step that we do in testing well we call model1.eval to put it in evaluation mode and because we're making predictions we're going to turn on torch inference mode predictions i call them predictions some other places call it inference remember machine learning has a lot of different names for the same thing forward pass so we're going to create the test logits here equals model 1 x test and we're going to squeeze them because we won't don't want the extra one dimension just going to add some code cells here so that we have more space and i'm typing in the middle of the screen then i'm going to put in test spread here how do we get from logits to predictions well we go torch.round and then we go torch.sigmoid why sigmoid because we're working with a binary classification problem and to convert logits from a binary classification problem to prediction probabilities we use the sigmoid activation function and then we're going to calculate the loss so how wrong is our model on the test data so test lost equals loss function we're going to pass it in the test logits and then we're going to pass it in y test for the ideal labels and then we're going to also calculate test accuracy and test accuracy is going to take in y true equals y test so the test labels and y pred equals test pred so the test predictions test predictions here and our final step is to print out what's happening so print out what's happening oh every tutorial needs a song if i could i'd teach everything with song song and dance so because we're training for a thousand epochs how about every 100 epochs we print out something so print f string and we're going to write epoch in here so we know what epoch our model's on and then we're going to print out the loss of course this is going to be the training loss because the test loss has test at the front of it and then the accuracy here now of course this is going to be the training accuracy we go here and then we're going to pipe and we're going to print out the test loss and we want the test loss here we're going to take this to five decimal places again when we see the printouts of the different values do not worry too much about the exact numbers on my screen appearing on your screen because that is inherent to the randomness of machine learning so have we got the direction is more important have we got we need a percentage sign here because that's going to be a bit more complete for accuracy have we got any errors here i don't know i've just we've just all coded this free hand right there's a lot of code going on here so we're about to train our next model which is the biggest model we've built so far in this course three layers 10 hidden units on each layer let's see what we've got three two one run oh what what a thousand epochs an extra hidden layer more hidden units and we still our model is still basically a coin toss 50 now this can't be for real let's plot the decision boundary plot the decision boundary to find out let's get a bit visual plot figure actually to prevent us from writing out all of the plot code let's just go up here and we'll copy this now you know i'm not the biggest fan of copying code but for this case we've already written it so there's nothing really new here to cover and we're going to just change this from model 0 to model 1 because why it's our new model that we just trained and so behind the scenes plot decision boundary is going to make predictions with the target model on the target data set and put it into a nice visual representation for us oh i said noise visual representation what does this look like we've just got a coin toss on our data set our model is just again it's trying to draw a straight line to separate circular data now why is this our model is based on linear is our data non-linear maybe i've revealed a few of my tricks i've done a couple of reveals over the past few videos but this is still quite annoying and it can be fairly annoying when you're training models and they're not working so how about we verify that this model can learn anything because right now it's just basically guessing for our data set so this model looks a lot like the model we built in section zero one let's go back to this this is the learn pytorch.io book pytorch workflow fundamentals where did we create a model model building essentials where did we build a model linear regression model yeah here nn.linear but we built this model down here so all we've changed from 0 1 to here is we've added a couple of layers the forward computation is quite similar if this model can learn something on a straight line can this model learn something on a straight line so that's my challenge to you is grab the data set that we created in this previous notebook so data you could just reproduce this exact data set and see if you can write some code to fit the model that we built here this one here on the data set that we created in here because i want to verify that this model can learn anything because right now it seems like it's not learning anything at all and that's quite frustrating so give that a shot and i'll see you in the next video welcome back in the past few videos we've tried to build a model to separate the blue from red dots yet our previous efforts have proven futile but don't worry we're going to get there i promise you we're going to get there and i may have a little bit of inside information here but we're going to build a model to separate these blue dots from red dots a fundamental classification model and we tried a few things in the last couple of videos such as training for longer so more epochs we added another layer we increased the hidden units because we learned of a few methods to improve a model from a model perspective such as upgrading the hyper parameters such as number of layers more hidden units fitting for longer changing the activation functions changing the learning rate we haven't quite done that one yet and changing the loss function one way that i like to troubleshoot problems is i'm going to put a subheading here 5.1 we're going to prepare or preparing data to see if our model can fit a straight line so one way to troubleshoot this is my trick for troubleshooting problems especially neural networks but just machine learning in general to troubleshoot a larger problem is to test out a smaller problem and so why is this well because we know that we had something working in a previous section so zero one pi torch workflow fundamentals we built a model here that worked and if we go right down we know that this linear model can fit a straight line so we're going to replicate a data set to fit a straight line to see if the model that we're building here can learn anything at all because right now it seems like it can't it's just tossing a coin to split between our data here which is not ideal so let's make some data but yeah this is the let's create a smaller problem one that we know that works and then add more complexity to try and solve our larger problem so create some data this is going to be the same as notebook 01 and i'm going to set up weight equals 0.7 bias equals 0.3 we're going to move quite quickly through this because we've seen this in module one but the overall takeaway from this is we're going to see if our model works on any kind of problem at all or do we have something fundamentally wrong create data we're going to call it x regression because it's a straight line and we wanted to predict a number rather than a class so you might be thinking oh we might have to change a few things of our model architecture well we'll see that in a second dot unsqueeze and we're going to go on the first dimension here or dim equals one and y regression we're going to use the linear regression formula as well weight times x x regression that is because we're working with a new data set here plus the bias so this is linear regression formula without epsilon so it's a simplified version of linear regression but the same formula that we've seen in a previous section so now let's check the data nothing we really haven't covered here but we're going to do a sanity check on it to make sure that we're dealing with what we're dealing with is not just a load of garbage because it's all about the data and machine learning i can't stress to you enough that's the data explorer's motto is to visualize visualize visualize oh what did we get wrong here unsqueezed did you notice that typo why don't you say something i'm kidding there we go okay so we've got 100 samples of x uh we've got a different step size here but that's all right let's have a little bit of fun with this and we've got one x value per y value this is very similar data set to what we used before now what do we do once we have a data set well if we haven't already got training in test splits we better make them so create train and test splits and then we're going to go train split we're going to use 80 equals int 0.8 times the length of or we could just put 100 in there but we're going to be specific here and then we're going to go x train regression y train regression equals what do these equal well we're going to go on x regression and we're going to index up to the train split on the x and then for the y y regression we're going to index up to the train split wonderful and then we can do the same on the test or creating the test data nothing really new here that we need to discuss we're creating training and test sets what do they do for each of them well the model is going to hopefully learn patterns in the training data set that is able to model the testing data set and we're going to see that in a second so if we check the lengths of each what do we have length x train regression we might just check x train x test regression what do we have here and then we're going to go length y train regression long variable names here excuse me for that but we want to keep it separate from our already existing x and y data what values do we have here 80 20 80 20 beautiful so 80 training samples to 100 testing samples that should be enough now because we've got our helper functions file here and if you don't have this remember we wrote some code up here before to where is it to download it from the course github and we imported plot predictions from it now if we have a look at helperfunctions.pi it contains the plot predictions function that we created in the last section section 0.1 there we go plot predictions so we're just running this exact same function here or we're about to run it it's going to save us from retyping out all of this that's the beauty of having a helper helperfunctions.pi file so if we come down here let's plot our data to visually inspected right now it's just numbers on a page and we're not going to plot really any predictions because we don't have any predictions yet but we'll pass in the train data is equal to x train regression and then the next one is the train labels which is equal to y train regression and then we have the test data which is equal to x test regression and then we have the test labels now i think this should be labels too yeah there we go why test regression it might be proven wrong as we try to run this function okay there we go so we have some training data and we have some testing data now do you think that our model model one if we have a look what's model one could fit this data does it have the right amount of in and out features we may have to adjust these slightly so i'd like you to think about that do we have to change the input features to our model for this data set and do we have to change the out features of our model for this data set we'll find out in the next video welcome back we're currently working through a little side project here but really the philosophy of what we're doing we just created a straight line data set because we know that we've built a model in the past back in section 01 to fit a straight line data set and why are we doing this well because the model that we've built so far is not fitting or not working on our circular data set here on our classification data set and so one way to troubleshoot a larger problem is to test out a smaller problem first so later on if you're working with a big machine learning data set you'd probably start with a smaller portion of that data set first likewise with a larger machine learning model instead of starting with a huge model you'll start with a small model so we're taking a step back here to see if our model is going to learn anything at all on a straight line data set so that we can improve it for a non-straight line data set and there's another hint oh we're going to cover it in a second i promise you but let's see how now we can adjust model one to fit a straight line and i issued you the question at the end of last video do we have to adjust the parameters of model one in any way shape or form to fit this straight line data and you may have realized or you may not have that our model one is set up for our classification data which has two x input features whereas this data if we go x train regression how many input features do we have we just get the first sample there's only one value or maybe we get the first 10. there's only one value per let's remind ourselves this is input and output shapes one of the most fundamental things in machine learning and deep learning and trust me i still get this wrong all the time so that's why i'm harping on about it we have one feature per one label so we have to adjust our model slightly we have to change the end features to be one instead of two the out features can stay the same because we want one number to come out so what we're going to do is code up a little bit different version of model one so same architecture as model one but using nn dot sequential we're going to do the faster way of coding a model here let's create model 2 and nn dot sequential the only thing that's going to change is the number of input features so this will be the exact same code as model 1 and the only difference as i said will be features or in features is one and then we'll go out features equals ten so ten hidden units in the first layer and of course the second layer the number of features here has to line up with the out features of the previous layer this one's going to output 10 features as well so we're scaling things up from one feature to 10 to try and give our model as much of a chance or as many parameters as possible of course we could make this number quite large we could make it a thousand features if we want but there is an upper bound on these things and i'm going to let you find those in your experience as a machine learning engineer and a data scientist but for now we're keeping it nice and small so we can run as many experiments as possible beautiful look at that we've created a sequential model what happens with nn.sequential data goes in here passes through this layer then it passes through this layer then it passes through this layer and what happens when it goes through the layer it triggers the layer's forward method the internal forward method in the case of nn.linear we've seen it it's got the linear regression formula so if we go nn dot linear it performs this mathematical operation the linear transformation but we've seen that before let's keep pushing forward let's create a loss and an optimizer loss and optimize we're going to work through our workflow so loss function we have to adjust this slightly we're going to use the l1 loss because why we're dealing with a regression problem here rather than a classification problem and our optimizer what can we use for our optimizer how about we bring in just the exact same optimizer sgd that we've been using for our classification data so model two dot params or parameters i always get a little bit confused and we'll give it an lr of 0.1 because that's what we've been using so far this is the params here so we want our optimizer to optimize our model two parameters here with a learning rate of 0.1 the learning rate is what the amount each parameter will be or the multiplier that will be applied to each parameter each epoch so now let's train the model do you think we could do that in this video i think we can so we might just train it on the training data set and then we can evaluate it on the test data set separately so we'll set up both manual seeds cuda and because we've set our model to the device up here so it should be on the gpu or whatever device you have active so set the number of epochs how many epochs should we set well we set a thousand before so we'll keep it at that epoch equals a thousand and now we're getting really good at this sort of stuff here let's put our data put the data on the target device and i know we've done a lot of the similar steps before but there's a reason for that i've kept all these in here because i'd like you to by the end of this course is to sort of know all of this stuff off by heart and even if you don't know it all off by heart because trust me i don't you know where to look so x train regression we're going to send this to device and then we're going to go y train regression just a reminder or something to get you to think while we're writing this code what would happen if we didn't put our data on the same device as a model we've seen that error come up before but what happens well i've just kind of given away haven't you daniel well that was a great question our code will error oh well don't worry there's plenty of questions i've been giving you that i haven't given the answer to you device beautiful we've got a device agnostic code for the model and for the data and now let's loop through epochs so train we're going to for epoch in range epochs for an epoch in a range do the forward pass calculate the loss so y pred equals model 2 this is the forward pass x train regression it's all going to work out hunky dory because our model and our data are on the same device loss equals what we're going to bring in our loss function we're going to compare the predictions to y train regression to the y labels what do we do next optimizer zero grad optimizer dot zero grad we're doing all of this with our comments look at us go lost backward and what's next optimizer step step step and of course we could do some testing here testing we'll go model 2 dot eval and then we'll go with torch dot inference mode we'll do the forward pass we'll create the test predictions equals model 2 dot x test regression and then we'll go the test loss equals loss fn on the test predictions and versus the y test labels beautiful look at that we've just done an optimization loop something we spent a whole hour on before maybe even longer in about 10 lines of code and of course we could shorten this by making these a function but we're going to see that later on i'd rather us give a little bit of practice while this is still a bit fresh print out what's happening let's print out what's happening what should we do so because we're training for a thousand epochs i like the idea of printing out something every 100 epochs that should be about enough of a step epoch what do we got we'll put in the epoch here with the f string and then we'll go the loss which will be loss and maybe we'll get the first five of those five decimal places that is we don't have an accuracy do we because we're working with regression and we'll get the test loss out here and that's going to be 0.5 f as well beautiful have we got any mistakes i don't think we do we didn't even run this code cell before we'll just run these three again see if we got look at that oh my goodness is our loss our losses going down so that means our model must be learning something now what if we adjusted the learning rate here i think if we went 0.01 or something will that do anything oh yes look how low our loss gets on the test data set but let's confirm that we're going to make some predictions or maybe we should do that in the next video yeah this one's getting too long but how good is that we created a straight line data set and we've created a model to fit it we set up a loss and an optimizer already and we put the data on the target device we trained and we tested so our model must be learning something but i'd like you to give a shot at confirming that by using our plot predictions function so make some predictions with our trained model don't forget to turn on inference mode and we should see some red dots here fairly close to the green dots on the next plot give that a shot and i'll see you in the next video welcome back in the last video we did something very exciting we solved a smaller problem that's giving us a hint towards our larger problem so we know that the model that we've previously been building model 2 has the capacity to learn something now how did we know that well it's because we created this straight line data set we replicated the architecture that we used for model one recall that model one didn't work very well on our classification data but with a little bit of an adjustment such as changing the number of in features and not too much different training code except for a different loss function because well we use mae loss with regression data and we change the learning rate slightly because we found that maybe our model could learn a bit better and again i'd encourage you to play around with different values of the learning rate in fact anything that we've changed try and change it yourself and just see what happens that's one of the best ways to learn what goes on with machine learning models but we trained for the same a number of epochs we set up device agnostic code we did a training and testing loop how good this looks oh my goodness well done and our loss went down so hmm what does that tell us what it tells us that model 2 or the specific architecture has some capacity to learn something so we must be missing something and we're going to get to that in a minute i promise you but we're just going to confirm that our model has learned something and it's not just numbers on a page going down by getting visual so turn on we're going to make some predictions and plot them and you may have already done this because i issued that challenge at the last of at the end of the last video so turn on evaluation mode let's go model 2 dot eval and let's make predictions which are also known as inference and we're going to go with torch dot inference mode inference mode with torch start inference mode make some predictions we're going to save them as y preds and we're going to use model 2 and we're going to pass it through x test regression this should all work because we've set up device agnostic code plot data and predictions to do this we can of course use our plot predictions function that we imported via our helperfunctions.pi function the code for that is just a few cells above if you'd like to check that out but let's set up the train data here train data parameter which is x train regression and my goodness google collab i'm already typing fast enough you don't have to slow me down by giving me the wrong autocorrects train label equals y train regression and then we're going to pass in our test data equals x test regression and then we're going to pass in test labels which is y test regression we've got too many variables going on here my goodness gracious we could have done better with naming but this will do for now is why preds and then if we plot this what does it look like oh no we got an error now secretly i kind of knew that that was coming ahead of time that's the advantage of being the host of this machine learning cooking show so type how do we fix this remember how i asked you in one of the last videos what would happen if our data wasn't on the same device as our model well we'd get an error right but this is a little bit different as well we've seen this one before we've got cuda device type tensor to numpy where is this coming from well because our plot predictions function uses matplotlib and behind the scenes matpot lib references numpy which is another numerical computing library however numpy uses a cpu rather than the gpu so we have to call.cpu this helpful message is telling us call tensor.cpu before we use our tensors with numpy so let's just call dot cpu on all of our tensor inputs here and see if this solves our problem wonderful looks like it does oh my goodness look at those red dots so close well okay so this just confirms our suspicions what we kind of already knew is that our model did have some capacity to learn it's just the data set when we changed the data set it worked so hmm is it our data that our model can't learn on like this circular data or is the model itself remember our model is only comprised of linear functions what is linear linear is a straight line but is our data made of just straight lines i think it's got some non-linearities in there so the big secret i've been holding back will reveal itself starting from the next video so if you want to head start on it i'd go to torch nm and if we have a look at the documentation we've been speaking a lot about linear functions what are these non-linear activations and i'll give you another spoiler we've actually seen one of these nonlinear activations throughout this notebook so go and check that out see what you can infer from that and i'll see you in the next video let's get started with non-linearities welcome back in the last video we saw that the model that we've been building has some potential to learn i mean look at these predictions you could get a little bit better of course get the red dots on top of the green dots but we're just going to leave that the the trend is what we're after our model has some capacity to learn except this is straight line data and we've been hinting at it a fair bit is that we're using linear functions and if we look up linear data what does it look like well it has a quite a straight line if we go linear and just search linear what does this give us linear means straight there we go straight and then what happens if we search for non-linear i kind of hinted at this as well non-linear oh we get some curves we get curved lines so linear functions straight non-linear functions hmm now this is one of the beautiful things about machine learning and i'm not sure about you but when i was in high school i kind of learned a concept called line of best fit or y equals mx plus c or y equals mx plus b and it looks something like this and then if you wanted to go over these you use quadratic functions and a whole bunch of other stuff but one of the most most fundamental things about machine learning is that we build neural networks and deep down neural networks are just a combination it could be a large combination of linear functions and non-linear functions so that's why in torch.nn we have non-linear activations and we have all these other different types of layers but essentially what they're doing deep down is combining straight lines with if we go back up to our data non-straight lines so of course our model didn't work before because we've only given it the power to use linear lines we've only given it the power to use straight lines but our data is what it's curved although it's simple we need non-linearity to be able to model this data set and now let's say we were building a pizza detection model so let's look up some images of pizza one of my favorite foods images pizza right so could you model pizza with just straight lines and you're thinking daniel you can't be serious a computer vision model doesn't look for just straight lines in this and i'd argue that yes it does except we also add some curved lines in here that's the beauty of machine learning could you imagine trying to write the rules of an algorithm to detect that this is a pizza maybe you could put in oh it's a curve here and if you see red and no no no no imagine if you're trying to do a hundred different foods your program would get really large instead we give our machine learning models if we come down to the model that we created we give our deep learning models the capacity to use linear and non-linear functions we haven't seen any nonlinear layers just yet or maybe we've hinted at some but that's all right so we stack these on top of each other these layers and then the model figures out what patterns in the data it should use what lines it should draw to draw patterns through not only pizza but another food such as sushi if we wanted to build a food image classification model it would do this the principle remains the same so the question i'm going to pose to you we'll get out of this is we'll come down here we've unlocked the missing piece so we're about to we're going to cover it over the next couple of videos the missing piece of our model and this is a big one this is going to follow you throughout all of machine learning and deep learning non-linearity so the question here is what patterns could you draw if you were given an infinite amount of straight and non-straight lines or in machine learning terms an infinite amount but really it's but really it is finite by infinite in machine learning terms this is a technicality it could be a million parameters it could be as we've got probably 100 parameters in our model so just imagine a large amount of straight and not in straight lines an infinite amount of linear and non-linear functions you could draw some pretty intricate patterns couldn't you and that's what gives machine learning and especially neural networks the capacity to not only fit a straight line here but to separate two different circles but also to do crazy things like drive a self-driving car or at least power the vision system of a self-driving car of course after that you need some programming to plan what to actually do with what you see in an image but we're getting ahead of ourselves here let's now start diving into non-linearity and the whole idea here is combining the power of linear and non-linear functions straight lines and non-straight lines our classification data is not comprised of just straight lines it's circles so we need non-linearity here so recreating non-linear data red and blue circles we don't need to recreate this but we're going to do it anyway for completeness so let's get a little bit of a practice make and plot data this is so that you can practice the use of nonlinearity on your own matplotlib dot pi plot as plt we're going to go a bit faster here because we've covered this code above so import makes circles we're going to recreate the exact same circle data set that we've created above number of samples we'll create a thousand and we're going to create x and y equals what make circles pass it in number of samples beautiful collab please i wonder if i can turn off autocorrect and colab i'm happy to just see all of my errors in the flesh see look at that i don't want that i want noise like that maybe i'll do that in the next video we're not going to spend time here looking around how to do it we can work that out on the fly later for now i'm too excited to share with you the power of non-linearity so here x we're just going to plot what's going on we've got two x features and we're going to color it with the flavor of y because we're doing a binary classification and we're going to use my one of my favorite c maps which is color map and we're gonna go plt dot cm for c-map and red blue what do we get okay red circle blue circle hey is that the same color as what's above i like this color better did we get that right up here oh my goodness look how much code we've written yeah i like the other blue i'm going to bring this down here it's all about aesthetics and machine learning it's not just numbers on a page daniel how could you be so crass let's go there okay that's the better color red and blue that's more lively isn't it so now let's convert to train and test and then we can start to build a model with non-linearity oh this is so good okay convert data to tensors and then to train and test splits nothing we haven't covered here before so import torch but it never hurts to practice code right import torch from sklearn dot model selection import train test split so that we can split our red and blue dots randomly and we're going to turn data into tensors and we'll go x equals torch from numpy and we'll pass in x here and then we'll change it into type torch.float why do we do this well because oh my godness autocorrect it's getting the best of me here you're watching me live code this stuff and battle with autocorrect that's what this whole course is am i really teaching pytorch am i just battling with google collabs autocorrect we are turning it into torch.float with a type here because why numpy's default which is what makes circles users behind the scenes numpy is actually used in a lot of other machine learning libraries pandas built on numpy scikit-learn does a lot of numpy matplotlib numpy that's just showing the what's the word is it ubiquitous ubiquity i'm not sure maybe if not you can correct me the ubiquity of numpy and test sets but we're using pi torch to leverage the power of auto grad which is what powers our gradient descent and the fact that it can use gpus so we're creating training test splits here with train test split x y and we're going to go test size equals 0.2 and we're going to set random random state equals 42 and then we'll view our first five samples are these going to be fingers crossed we have got an error beautiful we have tenses here okay now we're up to the exciting part we've got our data set back i think it's time to build a model with non-linearity so if you'd like to peak ahead check out torch n again this is a little bit of a spoiler go into the non-linear activation see if you can find the one that we've already used that's your challenge can you find the one we've already used and go into here and search what is our non-linear function so give that a go and see what comes up i'll see in the next video welcome back now put your hand up if you're ready to learn about non-linearity and i know i can't see your hands up but i better see some hands up or i better feel some hands up because my hands up because non-linearity is a magic piece of the puzzle that we're about to learn about so let's title this section building a model with non-linearity so just to re-emphasize linear equals straight lines and in turn non-linear equals non-straight lines and i left off the end of the last video giving you the challenge of checking out the torch.nn module looking for the non-linear function that we've already used now where would you go to find such a thing and oh what do we have in here non-linear activations and there's going to be a fair few things here but essentially all of the modules within torch.net uh either some form of layer in a neural network if we recall let's go to a neural network we've seen the anatomy of a neural network generally you'll have an input layer and then multiple hidden layers and some form of output layer well these multiple hidden layers can be almost any combination of what's in torch.net and in fact they can almost be any combination of function you could imagine whether they work or not is another question but pytorch implements some of the most common layers that you would have as hidden layers and they might be pooling layers padding layers activation functions and they all have the same premise they perform some sort of mathematical operation on an input and so if we look into the non linear activation functions you might have find an n dot sigmoid where have we used this before there's a sigmoid activation function in math terminology it takes some input x performs this operation on it and here's what it looks like if we did it on a straight line but i think we should put this in practice and if you want an example well there's a an example there all of the other non-linear activations have examples as well but i'll let you go through all of these in your own time otherwise we're going to be here forever nn.relu is another common function we saw that when we looked at the architecture of a classification network so with that being said how about we start to code a classification model with non-linearity and of course if you wanted to you could look up what is a non-linear function if you wanted to learn more nonlinear means the graph is not a straight line oh beautiful so that's how i'd learn about nonlinear functions but while we're here together how about we write some code so let's go build a model with non-linear activation functions and just one more thing before just to re-emphasize what we're doing here before we write this code i've got i just remembered i've got a nice slide which is the question we posed in the previous video the missing piece non-linearity but the question i want you to think about is what could you draw if you had an unlimited amount of straight in other words linear and non-straight non-linear lines so we've seen previously that we can build a model a linear model to fit some data that's in a straight line linear data but when we're working with non-linear data well we need the power of non-linear functions so this is circular data and now this is only a 2d plot keep in mind there whereas neural networks and machine learning models can work with numbers that are in hundreds of dimensions impossible for us humans to visualize but since computers love numbers it's a piece of cake to them so from torch import n we're going to create our first neural network with non-linear activations this is so exciting so let's create a class here we'll create circle model we've got circle model v1 already we're going to create circle model v2 and we'll inherit from an end.module and then we'll write the constructor which is the init function and we'll pass in self here and then we'll go self or super sorry too many s words dot underscore underscore init underscore there we go so we've got the constructor here and now let's create a layer one self dot layer one equals just the same as what we've used before nn.linear we're going to create this quite similar to the model that we've built before except with one added feature and we're going to create in features which is akin to the number of x features that we have here again if this was different if we had three x features we might change this to three but because we're working with two we'll leave it as that we'll keep out features as 10 so that we have 10 hidden units and then we'll go layer 2 and then dot linear again these values here are very customizable because why because they're hyper parameters so let's line up the out features of layer 2 and we'll do the same with layer 3 because layer 3 is going to take the outputs of layer 2 so it needs in features of 10 and we want out layer 3 to be the output layer and we want one number as output so we'll set one here now here's the fun part we're going to introduce a non-linear function we're going to introduce the relu function now we've seen sigmoid relu is another very common one it's actually quite simple but let's write it out first nn.relu so remember torch.nn stores a lot of existing non-linear activation functions so that we don't necessarily have to code them ourselves however if we did want to code a relu function let me show you it's actually quite simple if we dive into nn.relu or relu however you want to say it i usually say relo applies the rectified linear unit function element wise so that means element wise on every element in our input tensor and so it stands for rectified linear unit and here's what it does basically it takes an input if the input is negative it turns the input to zero and it leaves the positive inputs how they are and so this line is not straight and you could argue yeah well it's straight here and then straight there but this is a form of a non-linear activation function so it goes if it was linear it would just stay straight there like that but let's see it in practice do you think this is going to improve our model well let's find out together hey ford we need to implement the forward method and here's what we're going to do where should we put where should we put our non-linear activation functions so i'm just going to put a node here relu is a non-linear activation function and remember wherever i say function it's just performing some sort of operation on a numerical input so we're going to put a nonlinear activation function in between each of our layers so let me show you what this looks like self.layer 3. we're going to start from the outside in self.relu and then we're going to go self.layer2 and then we're going to go self dot relu and then there's a fair bit going on here but nothing we can't handle layer 1 and then here's the x so what happens is our data goes into layer 1 performs a linear operation with nn.linear then we pass the output of layer 1 to a relu function so we where's the relu up here we turn all of the negative outputs of our model of our of layer one to zero and we keep the positives how they are and then we do the same here with layer two and then finally the outputs of layer three stay as they are we've got out features there we don't have a relu on the end here because we're going to pass the outputs to the sigmoid function later on and if we really wanted to we could put self.sigmoid here equals nn.sigmoid but i'm going to that's just one way of constructing it we're just going to apply the sigmoid function to the logits of our model because what are the logits the raw output of our model and so let's instantiate our model this is going to be called model 3 which is a little bit confusing but we're up to model 3 which is circle model v2 and we're going to send that to the target device and then let's check model 3 what does this look like wonderful so it doesn't actually show us where the relus appear but it just shows us what are the parameters of our circle model v2 now i'd like you to have a think about this and my challenge to you is to go ahead and see if this model is capable of working on our data on our circular data so we've got the data sets ready you need to set up some training code my challenge to you is write that training code and see if this model works but we're going to go through that over the next few videos and also my other challenge to you is to go to the tensorflow playground and recreate our neural network here you can have two hidden layers does this go to ten well it only goes to eight we'll keep this at five so build something like this so we've got two layers with five it's a little bit different to ours because we've got two layers with ten and then put the learning rate to 0.01 or what do we have 0.1 with stochastic gradient descent we've been using 0.1 so we'll leave that so this is the tensorflow playground and then change the activation here instead of linear which we've used before change it to relu which is what we're using and press play here and see what happens i'll see in the next video welcome back in the last video i left off leaving the challenge of recreating this model here it's not too difficult to do we've got two hidden layers and five neurons we've got our data set which looks kind of like ours but the main points here i have to learning rate of 0.1 which is what we've been using but to change it from we've previously used a linear activation to change it from linear to relu which is what we've got set up here in the code now remember relu is a popular and effective non-linear activation function and we've been discussing that we need non-linearity to model non-linear data and so that's the crux of what neural networks are artificial neural networks not to get confused with the brain neural networks but who knows this might be how they work too i don't know i'm not a neurosurgeon or a neuroscientist artificial neural networks are a large combination of linear so this is straight and non-straight non-linear functions which are potentially able to find patterns in data and so for our data set it's quite small it's just a blue and a red circle but this same principle applies for larger data sets and larger models combined linear and non-linear functions so we've got a few tabs going on here let's get rid of some let's come back to here did you try this out does it work do you think it'll work i don't know let's find out together ready three two one oh look at that almost instantly the training loss goes down to zero and the test loss is basically zero as well look at that that's amazing we can stop that there and if we change the learning rate maybe a little lower let's see what happens it takes a little bit longer to get to where it wants to go to see that's the power of changing the learning rate let's make it really small what happens here so that was about 300 epochs the loss started to go down if we change it to be really small oh we're getting a little bit of a trend is it starting to go down we're already surpassed the epochs that we had so see how the learning rate is much smaller that means our model is learning much slower so this is just a beautiful visual way of demonstrating different values of the learning rate we could sit here all day and that might not get to lower but let's increase it by 10x and that was a over a thousand epochs and it's still at about 0.5 let's say oh we got a better oh we're going faster already so not even at 500 or so epochs we're about 0.4 that's the power of the learning rate we'll increase it by another 10x or reset start again oh would you look at that much faster this time that is beautiful oh there's nothing better than watching a loss curve go down in the world of machine learning that is and then we reset that again and let's change it right back to what we had and we get to zero in basically under 100 epochs so that's the power of the learning rate little visual representation speaking of learning rates it's time for us to build an optimizer and a loss function so let's write here we've got our non-linear model setup loss and optimizer you might have already done this because the code this is code that we've written before but we're going to redo it for completeness and practice so we want a loss function we're working with logits here and we're working with binary cross entropy so what loss do we use uh binary cross entropy sorry we're working with a binary classification problem blue dots or red dots torch.optim what are some other binary classification problems that you can think of we want model 3 dot parameters they're the parameters that we want to optimize this model here and we're going to set our lr to 0.1 just like we had in the tensorflow playground beautiful so some other binary classification problems i can think of would be email spam or not spam credit cards so equals fraud or not fraud what else you might have insurance claims equals who's at fault or not at fault if someone puts in a claim speaking about a car crash whose fault was it was the the person submitting the claim were they at fault or was the person who was also mentioning the claim are they not at fault so there's many more but they're just some i can think of up the top of my head but now let's train our model with non-linearity whoo-hoo we're on a roll here training a model with non-linearity so we've seen that if we introduce a non-linear activation function within a model remember this is a linear activation function and if we train this the loss doesn't go down but if we just adjust this to add a relu in here we get the loss going down so hopefully this replicates with our pure pi torch code so let's do it hey so we're going to create random seeds because we're working with cuda we'll introduce the cuda random seed as well torch.manual seed again don't worry too much if your numbers on your screen aren't exactly what mine are that's due to the inherent randomness of machine learning in fact stochastic gradient descent stochastic again stands for random we're just setting up the seeds here so that they can be as close as possible but the direction is more important so if my loss goes down your loss should also go down on target device and then we're going to go x train so this is setting up device agnostic code we've done this before but we're going to do it again for completeness just to practice every step of the puzzle that's what we want to do we want to have experience that's what this course is it's a momentum builder so that when you go to other repos and machine learning projects that use pi torch you can go oh does this code set device agnostic code what problem are we working on is it binary or multi-class classification so let's go loop through data again we've done this before but we're going to set up the epochs let's do a thousand epochs why not so we can go for epoch in range epochs what do we do here well we want to train so this is training code we set our model model three dot train and i want you to start to think about how could we functionize this training code we're going to start to move towards that in a future video so one is forward pass we've got the logits why the logits well because the raw output of our model without any activation functions towards the final layer classified as logits or called logits and then we create y pred as in prediction labels by rounding the output of torch.sigmoid of the logits so this is going to take us from logits to prediction probabilities to prediction labels and then we can go two which is calculate the loss that's from my unofficial pie torch song calculate the loss we go loss equals loss fn y logits because remember we've got bce with logits loss and takes in logits as first input and that's going to calculate the loss between our models logits and the y training labels and we will go here we'll calculate accuracy using our accuracy function and this one is a little bit backwards compared to pi torch but we pass in the y training labels first but it's constructed this way because it's in the same style as scikit-learn three we go optimize a zero grad we zero the gradients of the optimizer so that it can start from fresh calculating the ideal gradients every epoch so it's going to reset every epoch which is fine then we're going to perform back propagation pi torch is going to take care of that for us by calling loss backwards and then we will perform gradient descent so step the optimizer to see how we should improve our model parameters so optimize our dot step oh and i want to show you speaking of model parameters let's check our model 3.state dick so the relu activation function actually doesn't have any parameters so you'll notice here we've got weight we've got bias of layer one layer two and a layer three so the relu function here doesn't have any parameters to optimize if we go nn.relu does it say what it implements want the there we go so it's just the maximum of 0 or x so it takes the input and takes the max of 0 or x and so when it takes the max of 0 or x if it's a negative number 0 is going to be higher than a negative number so that's why it zeros all of the negative inputs and then it leaves the positive inputs how they are because the max of a positive input versus 0 is the positive input so this has no parameters to optimize that's why it's so effective because you think about it every parameter in our model needs some little bit of computation to adjust and so the more parameters we add to our model the more compute that is required so generally the kind of trade-off in machine learning is that yes small parameters have more of an ability to learn but you need more compute so let's go model 3 dot eval and we're going to go with torch dot inference mode if i could spell inference that would be fantastic we're going to do what we're going to do the forward pass so test logits equals model 3 on the test data and then we're going to calculate the test pred labels by calling torch.round on torch.sigmoid on the test logits and then we can calculate the test loss how do we do that equals loss fn on test logits versus y test and then we can also calculate the test accuracy i'm just going to give myself some more space here so i can code in the middle of the screen equals accuracy function on what we're going to pass in y true equals y test and then we will pass in why pred equals test pred beautiful a final step here is to print out what's happening now this will be very important because one it's fun to know what your model's doing and two if our model does actually learn i'd like to see the loss values go down and the accuracy values go up as i said there's nothing much more beautiful in the world of machine learning than watching a loss function go down or a loss value go down and watching a loss curve go down so let's print out the current epoch and then we'll print out the loss which will just be the training loss and we'll take that to four decimal places and then we'll go accuracy here and this will be ack and we'll take this to two decimal places and we'll put a little percentage sign there and then we'll break it up by putting in the test loss here and we'll put in the test loss because remember our model learns patterns on the training data set and then evaluates those patterns on the test data set so and we'll pass in test ack here and no doubt there might be an error or two within all of this code but we're going to try and run this because we've seen this code before but i think we're ready we're training our first model here with non-linearities built into the model you ready three two one let's go ah of course module torch cuda has no attribute manuals are just a typo standard man u l there we go had to sound that out another one what do we get wrong here oh target size must be same as input size where did it mess up here what do we get wrong test loss test logits on y test so these two aren't matching up model 3 x test and y test what's the size of so let's do some troubleshooting on the fly hey not everything always works out as you want so length of x test we've got a shape issue here remember how i said one of the most common issues in deep learning is a shape issue so we've got the same shape here let's check test logits dot shape and y test dot shape print this out so 200 oh here's what we have to do but that's what we missed dot squeeze oh see how i've been hinting at the fact that we needed to call dot squeeze so this is where the discrepancy is our test logits dot shape we've got an extra dimension here and what are we getting here a value error on the target size which is a shape mismatch so we've got target size 200 must be the same input size as torch size 201 so did we squeeze this ah that's why the training worked okay so we've missed this let's just get rid of this so we're getting rid of the extra one dimension by using squeeze which is the one dimension here we should have everything lined up there we go okay oh look at that yes our accuracy has gone up albeit not by too much it's still not perfect ideally we'd like this to be towards 100 and the loss to be lower but i feel like we've got a better performing model don't you now that is the power of non-linearity all we did was we added in a relu layer or just two of them relu here really here but what did we do we gave our model the power of straight lines oh straight linear of straight lines and non-straight lines so it can potentially draw a line to separate these circles so in the next video let's draw a line plot our model decision boundary using our function and see if it really did learn anything i'll see you there welcome back in the last video we trained our first model and as you can tell i've got the biggest smile on my face but we trained our first model that harnesses both the power of straight lines and non-straight lines or linear functions and non-linear functions and by the thousandth epoch we look like we're getting a bit better results than just pure guessing which is 50 because we have 500 samples of red dots and 500 samples of blue dots so we have evenly balanced classes now we've seen that if we added a relu activation function with a data set similar to ours with a tensorflow playground the model starts to fit but it doesn't work with just linear there's a few other activation functions that you could play around with here you could play around with the learning rate regularization if you're not sure what that is i'll leave that as extra curriculum to look up but we're going to retire the tensorflow playground for now because we're going to go back to writing code so let's get out of that let's get out of that we now have to evaluate our model because right now it's just numbers on a page so let's write down here 6.4 what do we like to do to evaluate things it's visualize visualize visualize so evaluating a model trained with non-linear activation functions and we also discussed the point that neural networks are really just a big combination of linear and non-linear functions trying to draw patterns in data so with that being said let's make some predictions with our model 3 our most recently trained model we'll put it into a vowel mode and then we'll set up inference mode reference mode beautiful and then we'll go y press equals torch dot round and then torch dot sigmoid we could functionalize this of course model 3 and then pass in x test and you know what we're going to squeeze these here because we ran into some troubles in the previous video i actually really liked that we did because then we got to troubleshoot a shape error on the fly because that's one of the most common issues you're going to come across in deep learning so why threads let's check them out and then let's check out why test we want y test 10. so remember when we're evaluating predictions we want them to be in the same format as our original labels we want to compare apples to apples and if we compare the format here do these two things look the same yes they do they're both on cuda and they're both floats we can see that it's got this one wrong whereas the other ones look pretty good hmm this might look pretty good if we visualize it so now let's you might have already done this because i issued the challenge of plotting the decision boundaries plot decision boundaries and let's go plt.figure and we're going to set up the fig size to equal 12 6 because again one of the advantages of hosting a machine learning cooking show is that you can code ahead of time and then we can go plt.title is train and then we're going to call our plot decision boundary function which we've seen before plot decision boundary and we're going to pass this one in we could do model 3 but we could also pass it in our older models too model one that doesn't use non-linearity in fact i reckon that'll be a great comparison so we'll also create another plot here for the test data and this will be on index number two so remember subplot is number of rows number of columns index where the plot appears we'll give this one a title plot.title this will be test and google collab i didn't want that as i said this course is also a battle between me and google collabs autocorrect so we're going model three and we'll pass in the test data here and behind the scenes our plot decision boundary function will create a beautiful graphic for us perform some predictions on the x the features input and then we'll compare them with the y values let's see what's going on here oh look at that yes our first non-linear model okay it's not perfect but it is certainly much better than the models that we had before look at this model one has no linearity model one equals no non-linearity i've got a double negative there whereas model 3 equals has non-linearity so do you see the power of non-linearity or better yet the power of linearity or linear straight lines with non-straight lines so i feel like we could do better than this though here's your challenge is to can you improve model 3 to do better what did we get 79 accuracy to do better than 80 accuracy on the test data i think you can so that's the challenge and if you're looking for hints on how to do so where can you look well we've covered this improving a model so maybe you add some more layers maybe you add more hidden units maybe you fit for longer maybe you if you add more layers you put a relu activation function on top of those as well maybe you lower the learning rate because right now we've got 0.1 so give this a shot try and improve it i think you can do it but we're going to push forward that's going to be your challenge for some extra curriculum i think in the next section we've seen our nonlinear activation functions in action let's write some code to replicate them i'll see you there welcome back in the last video i left off with the challenge of improving model 3 to do better than 80 accuracy on the test data i hope you gave it a shot but here are some of the things i would have done is i potentially add more layers and maybe increase the number of hidden units and then if we needed to fit for longer and maybe lowered the learning rate to 0.01 but i'll leave that for you to explore because that's the motto of the data scientist right is to experiment experiment experiment so let's go in here we've seen our nonlinear activation functions in practice let's replicate them so replicating non-linear activation functions and remember neural networks rather than us telling the model what to learn we give it the tools to discover patterns in data and it tries to figure out the best patterns on its own and what are these tools let's write down here we've seen this in action and these tools are linear and non-linear functions so a neural network is a big stack of linear and non-linear functions for us we've only got about four layers or so four or five layers but as i said other networks can get much larger but the premise remains some form of linear and non-linear manipulation of the data so let's get out of this let's make our workspace a little bit more cleaner replicating nonlinear activation functions so let's create a tensor to start with everything starts from the tensor and we'll go a equals torch a range and we're going to create a range from negative 10 to 10 with a step of one and we can set the d type here to equal torch dot float 32 but we don't actually need to that's going to be the default so if we set a here a dot d type then we've got torch float32 and i'm pretty sure if we got rid of that oh we've got torch n64 huh why is that happening well let's check out a oh it's because we've got integers as our values because we have a step as one if we turn this into a float what's going to happen we get float 32 but we'll keep it otherwise this is going to be what about 100 numbers yeah nah that's too many let's keep it at negative 10 to 10 and we'll set the d type here to torch float32 beautiful so it looks like pi torch's default data type for integers is in 64 but we're going to work with float 32 because float 32 if our data wasn't float 32 with the functions we're about to create we might run into some errors so let's visualize this data i want you to guess is this a straight line or a non-straight line you've got three seconds one two three straight line there we go we've got negative 10 to positive 10 up here oh 9. close enough and so how would we turn this straight line if it's a straight line it's linear how would we perform the relu activation function on this now we could of course call torch relu on a actually let's in fact just plot this plt.plot on torch relu what does this look like boom there we go but we want to replicate the relu function so let's go nn.relu what does it do we've seen this before so we need the max we need to return based on an input we need the max of 0 and x so let's give it a shot we'll come here again we need more space there can never be enough code space here i like writing lots of code i don't know about you but let's go relu we'll take an input x which will be some form of tensor and we'll go return torch dot maximum i think you could just do torch dot max but we'll try maximum torch dot tensor zero so the maximum is going to return the max between whatever this is one option and whatever the other option is so inputs must be tensors so maybe we could just give a type hint here that this is torch dot tensor and this should return a tensor too return torch dot tensor beautiful you ready to try it out let's see what our relu function does relu a wonderful it looks like we got quite a similar output to before here's our original a so we've got negative numbers there we go so recall that the relu activation function turns all negative numbers into 0 because it takes the maximum between 0 and the input and if the input's negative well then 0 is bigger than it and it leaves all of the positive values as they are so that's the beauty of relu quite simple but very effective so let's plot relu activation function our custom one we will go plt dot plot we'll call our relo function on a let's see what this looks like oh look at us go well done just the exact same as the torch reload function easy as that hey and what's another non-linear activation function that we've used before well i believe one of them is if we go down to here what did we say before sigmoid where is that where are you sigmoid here we go hello sigmoid oh this has got a little bit more going on here 1 over 1 plus exponential of negative x so sigmoid or this little symbol for sigmoid of x which is an input we get this so let's try and replicate this we might just bring this one in here right now let's do the same for sigmoid so what do we have here well we want to create a custom sigmoid and we want to have some sort of input x and we want to return one divided by do we have the function in sigmoid one divided by one plus exponential one plus torch dot exp for exponential on negative x and we might put the bottom side in brackets so that it does that operation i reckon that looks alright to me so 1 divided by 1 plus torch exponential of negative x do we have that yes we do well there's only one real way to find out well let's plot the torch version of sigmoid torch.sigmoid and we'll pass in x see what happens and then oh we have a my bad a is our tensor what do we get we get a curved line wonderful and then we go plt dot plot and we're going to use our sigmoid function on a did we replicate torch's sigmoid function yes we did now see this is what's happening behind the scenes with our neural networks of course you could do more complicated activation functions or layers and whatnot you can try to replicate them in fact that's a great exercise to try and do but we've essentially across the videos and the sections that we've done we've replicated our linear layer and we've replicated the relu so we've actually built this model from scratch or we could if we really wanted to but it's a lot easier to use pie torches layers because we're building neural networks here like lego bricks stacking together these layers in some way shape or form and because they're a part of pytorch we know that they've been error tested and they compute as fast as possible behind the scenes and use gpu and get a whole bunch of benefits pi torch offers a lot of benefits by using these layers rather than writing them ourselves and so this is what our model is doing it's literally like to learn these values and decrease the loss function and increase the accuracy it's combining linear layers and non-linear layers or nonlinear functions where's our relu function here a relative function like this behind the scene so just combining linear and non-linear functions to fit a data set and that premise remains even on our small data set and on very large data sets and very large models so with that being said i think it's time for us to push on we've covered a fair bit of code here but we've worked on a binary classification problem have we worked on a multi-class classification problem yet do we have that here where's my fun graphic we have multi-class classification i think that's what we cover next we're going to put together all of the steps in our workflow that we've covered for binary classification but now let's move on to a multi-class classification problem if you're with me i'll see you in the next video welcome back in the last few videos we've been harnessing the power of non-linearity specifically non-straight-line functions and we replicated some here and we learned that a neural network combines linear and non-linear functions to find patterns in data and for our simple red verse blue dots once we added a little bit of non-linearity we found the secret source of to start separating our blue and red dots and i also issued you the challenge to try and improve this and i think you can do it so hopefully you've given that a go but now let's keep pushing forward we're going to reiterate over basically everything that we've done except this time from the point of view of a multi-class classification problem so i believe we're up to section 8 putting it all together with a multi-class classification problem beautiful and recall the difference between binary classification equals one thing or another such as cat versus dog if you were building a cat versus dog image classifier spam verse not spam for say emails that were spam or not spam or even internet posts on facebook or twitter or one of their other internet services and then fraud or not fraud for credit card transactions and then multi-class classification is more than one thing or another so we could have cat versus dog versus chicken so i think we've got all the skills to do this our architecture might be a little bit different for a multi-class classification problem but we've got so many building blocks now it's not funny let's clean up this and we'll add some more code cells and just to reiterate so we've gone over non-linearity the question is what could you draw if you had an unlimited amount of straight linear and non-straight non-linear lines i believe you could draw some pretty intricate patterns and that is what our neural networks are doing behind the scenes and so we also learned that if we wanted to just replicate some of these nonlinear functions some of the ones that we've used before we could create a range linear activation is just the line itself and then if we wanted to do sigmoid we get this curl here and then if we wanted to do relu well we saw how to replicate the relu function as well these both are non-linear and of course torch.nn has far more non-linear activations where they came from just as it has far more different layers and you'll get used to these with practice and that's what we're doing here so let's go back to the keynote so this is what we're going to be working on multi-class classification so there's one of the big differences here we use the softmax activation function versus sigmoid there's another big difference here instead of binary cross entropy we use just cross entropy but i think most of it's going to stay the same we're going to see this in action in a second but let's just describe our problem space just to go visual we've covered a fair bit here well done everyone so binary versus multi-class classification binary one thing or another zero or one multi-class could be three things could be a thousand things could be 5 000 things could be 25 things so more than one thing or another but that's the basic premise we're going to go with let's create some data hey 8.1 creating uh tiny multi-class data set and so to create our data set we're going to import our dependencies we're going to re-import torch even though we already have it just for a little bit of completeness and we're going to go matplotlib so we can plot as always we like to get visual where we can visualize visualize visualize we're going to import from scikit-learn dot data sets let's get make blobs now where would i get this from so sk learn dot data sets what do we get toy data sets do we have classification [Music] toy data sets do we have blobs if we just go make scikit-learn classification data sets what do we get here's one option there's also make blobs beautiful make blobs this is the code for that so let's just copy this in here and make blobs we're going to see this in action anyway make blobs as you might have guessed it makes some blobs for us i like blobs it's a fun word to say blobs so we want train test split because we want to make a data set and then we want to split it into train and test let's set the number of hyper parameters so i set the high parameters for data creation now i got these from the documentation here number of samples how many blobs do we want how many features do we want so say for example we wanted two different classes that would be binary classification say for example you wanted 10 classes you could set this to 10 and we're going to see what the others are in practice but if you want to read through them you can well and truly do that so let's set up we want num classes let's double what we've been working with we've been working with two classes red dots or blue dots let's step it up a notch we'll go to four classes watch out everyone and we're gonna go number of features will be two so we have the same number of features and then the random seed is going to be 42. you might be wondering why these are capitalized well generally if we do have some hyper parameters that we say set at the start of a notebook you'll find it's quite common for people to write them as capital letters just to say that hey these are some settings that you can change you don't have to but i'm just going to introduce that anyway because you might stumble upon it yourself so create multi-class data we're going to use the make blobs function here so we're going to create some x blobs some feature blobs and some label blobs we'll see what these look like in a second i know i'm just saying blobs a lot but we pass in here none samples how many do we want let's create a thousand as well that could really be a hyper parameter but we'll just leave that how it is for now number of features is going to be num features centers equals num classes so we're going to create four classes because we've set up num classes equal to four and then we're going to go center standard deviation we'll give them a little shake up add a little bit of randomness in here give the clusters a little shake up we'll mix them up a bit make it a bit hard for our model but we'll see what this does in a second random state equals random seed which is our favorite random seed 42 of course you can set it whatever number you want but i like 42. oh and we need a comma here of course beautiful now what do we have to do here well because we're using scikit-learn and scikit-learn leverage is numpy so let's turn our data into tensors turn data into tensors and how do we do that well we grab x blob and we call torch from numpy from numpy if i could type that would be fantastic that's right we're doing pretty well today haven't made too many typos we did make a few in a couple videos before but hey i'm only human so we're going to torch from numpy and we're going to pass in the y blob and we'll turn it into torch.float because remember numpy defaults as float 64 whereas pytorch likes float32 so split into training and test and we're going to create x blob train y or x test x blob test we'll keep the blob nomenclature here y blob train and y blob test and here's again where we're going to leverage the train test split function from scikit learn so thank you for that scikit learn x blob and we're going to pass the y blobs so features labels x's of features y are the labels and a test size we've been using a test size of 20 that means 80 of the data will be for the training data that's a fair enough split with our data set and we're going to set the random seed to random seed because generally normally train test split is random but because we want some reproducibility here we're passing random seeds finally we need to get visual so let's plot the data right now we've got a whole bunch of code and a whole bunch of talking but not too much visuals going on so we'll write down here visualize visualize visualize and we can call in plot dot figure what size do we want i'm going to use my favorite hand in poker which is 10 7 because it's generally worked out to be a good plot size for in my experience anyway we'll go xblob and we want the zeroth index here and then we'll grab xblob as well and you might notice that we're visualizing the whole data set here that's perfectly fine we could visualize train and test separately if we really wanted to but i'll leave that as a little challenge to you and we're going to go red yellow blue wonderful what do we get wrong oh of course we got something wrong center std did we spell center wrong cluster std that's what i missed so cluster std standard deviation what do we get wrong random seed oh this needs to be random state oh another typo you know what just as i said i wasn't getting too many typos i get three there we go look at that our first multi-class classification data set so if we set this to zero what does it do to our clusters just take note of what's going on here particularly the space between all of the dots now if we set this cluster std to zero what happens we get dots that are really just look at that that's too easy let's mix it up all right now you can pick whatever value you want here i'm going to use 1.5 because now we need to build a model that's going to draw some lines between these four colors two axes four different classes but it's not going to be perfect because we've got some red dots that are basically in the blue dots and so what's our next step well we've got some data ready it's now time to build a model so i'll see you in the next video let's build our first multi-class classification model welcome back in the last video we created our multi-class classification data set using scikit-learn's make blobs function and now why are we doing this well because we're going to put all of what we've covered so far together but instead of using binary classification or working with binary classification data we're going to do it with multi-class classification data so with that being said let's get into building a multi-class classification model so we'll create a little heading here building a multi-class classification model in pytorch and now i want you to have a think about this we spent the last few videos covering non-linearity does this data set need non-linearity as in could we separate this data set with pure straight lines or do we need some non-straight lines as well have a think about that it's okay if you're not sure we're going to be building a model to fit this data anyway or draw patterns in this data anyway and now before we get into coding a model so for multi-class classification we've got this for the input layer shape we need to define the in features so how many in features do we have for the hidden layers well we could set this to whatever we want but we're going to keep it nice and simple for now for the number of neurons per hidden layer again this could be almost whatever we want but because we're working with a relatively small data set we've only got four different classes we've only got a thousand data points we'll keep it small as well but you could change this remember you can change any of these because they're hyper parameters for the output layer shape well how many output features do we want we need one per class how many classes do we have we have four clusters of different dots here so we'll need four output features and then if we go back we have an output activation of softmax we haven't seen that yet and then we have a loss function rather than binary cross entropy we have cross entropy and then optimizer as well is the same as binary classification two of the most common sgd stochastic gradient descent or the atom optimizer but of course the torch.opt-in package has many different options as well so let's push forward and create our first multi-class classification model first we're going to create or we're going to get into the habit of creating device agnostic code and we'll set the device here equals cuda nothing we haven't seen before but again we're doing this to put it all together so that we have a lot of practice is available else cpu and let's go device so we should have a gpu available beautiful cuda now of course if you don't you can go change runtime type select gpu here that will restart the runtime you'll have to run all of the code that's before this cell as well but i'm going to be using a gpu you don't necessarily need one because our data set's quite small and our models aren't going to be very large but we set this up so we have device agnostic code and so let's build a multi-class classification model look at us go just covering all of the foundations of classification in general here and we now know that we can combine linear and non-linear functions to create neural networks that can find patterns in almost any kind of data so i'm going to call my class here blob model and it's going to of course inherit from nn.module and we're going to upgrade our class here we're going to take some inputs here and i'll show you how to do this if you're familiar with python classes you would have already done stuff like this but we're going to set some parameters for our models because as you write more and more complex classes you'll want to take inputs here and i'm going to pre-build the or preset the hidden units parameter to eight because i've decided you know what i'm going to start off with eight hidden units and if i wanted to change this to 128 i could but in the constructor here we've got some options so we have input features we're going to set these programmatically as inputs to our class when we instantiate it the same with output features as well and so here we're going to call self oh no super sorry i always get this mixed up dot init and underscore underscore beautiful so we could do a doc string here as well so let's write in this initializers multi-class classification if i could spell class e fiecation model oh this is great and then we have some args here this is just a standard way of writing doc strings if you want to find out this is google python docstring guide there we go google python style guide this is where i get mine from you can scroll through this this is just a way to write python code yeah there we go so we've got a little sentence saying what's going on we've got args we've got returns and we've got errors if something's going on so i highly recommend checking that out just a little tidbit so this is if someone was to use our class later on they know what the input features are input features which is an int which is number of input features to the model and then of course we've got output features which is also an int which is number of output features of the model and we've got the red line here is telling us we've got something wrong but that's okay and then the hidden features oh well this is number of output classes for the case of multi-class classification and then the hidden units int and then number of hidden units between layers and then the default is eight beautiful and then under that we'll just do that is that gonna fix itself yeah there we go we could put in what it returns returns whatever it returns and then an example use case but i'll leave that for you to fill out if you like so let's instantiate some things here what we might do is write self dot linear layer stack self.linear layer stack and we will set this as nn dot sequential oh we haven't seen this before but we're just going to look at a different way of writing a model here previously when we created a model what did we do well we instantiated each layer as its own parameter here and then we called on them one by one but we did it in a straightforward fashion so that's why we're going to use sequential here to just step through our layers we're not doing anything too fancy so we'll just set up a sequential stack of layers here and recall that sequential just steps through passes the data through each one of these layers one by one and because we've set up the parameters up here input features can equal to input features and output features what is this going to be is this going to be output features or is this going to be hidden units it's going to be hidden units because it's not the final layer we want the final layer to output our output features so input features this will be hidden units because remember the subsequent layer needs to line up with the previous layer output features we're going to create another one that outputs hidden units and then we'll go nn.linear in features equals hidden units because it takes the output features of the previous layer so as you see here the output features of this feeds into here the output features of this feeds into here and then finally this is going to be our final layer we'll do three layers output features equals output features wonderful so how do we know the values of each of these well let's have a look at xtrain.shape and y train dot shape so in the case of x we have two input features in the case of y well this is a little confusing as well because y is a scalar but what do you think the values for y are going to be well let's go np or is there torch.unique i'm not sure let's find out together hey torch unique 0 and 1 y train oh we need white blob train that's right blog i'm too used to writing blog blob and we need blob train but i believe it's the same here and then blob there we go so we have four classes so we need an output features value of four and now if we wanted to add non-linearity here we could put it in between our layers here like this but i asked the question before do you think that this data set needs non-linearity well let's leave it in there to begin with and one of the challenges for you oh do we need commas here i think we need commas here one of the challenges for you will be to test the model with non-linearity and without non-linearity so let's just leave it in there for the time being what's missing from this well we need a forward method so def forward self x what can we do here well because we've created this as a linear layer stack using nn dot sequential we can just go return linear layer stack and pass it x and what's going to happen whatever input goes into the forward method is just going to go through these layers sequentially oh we need to put self here because we've initialized it in the constructor beautiful and now let's create an instance of blob model and send it to the target device we'll go model 4 equals blob model and then we can use our input features parameter which is this one here and we're going to pass it a value of what 2 and then output features why because we have two x features now the output feature is going to be the same as the number of classes that we have for if we had 10 classes we'd set it to 10 so we'll go 4 and then the hidden units is going to be 8 by default so we don't have to put this here but we're going to put it there anyway and then of course we're going to send this to device and then we're going to go model 4. what do we get wrong here unexpected keyword argument output features do we spell something wrong no doubt we've got a spelling mistake output features output features oh out features ah that's what we needed out features not output i got a little confused there okay there we go okay beautiful so just recall that the parameter here for nn.linear did you pick up on that is out features not output features output features a little confusing here is our final layout output layers number of features there so we've now got a multi-class classification model that lines up with the data that we're using so the shapes line up beautiful well what's next well we have to create a loss function and of course a training loop so i'll see you in the next few videos and let's do that together welcome back in the last video we created our multi-class classification model and we did so by subclassing nn.module and we set up a few parameters for our class constructor here so that when we made an instance of the blob model we could customize the input features the output features remember this lines up with how many features x has and the output features here lines up with how many classes are in our data so if we had 10 classes we could change this to 10 and it would line up and then if we wanted 128 hidden units well we could change that so we're getting a little bit more programmatic with how we create models here and as you'll see later on a lot of the things that we've built in here can also be functionalized in a similar manner but let's keep pushing forward what's our next step if we build a model if we refer to the workflow you'd see that we have to create a loss function and an optimizer for a multi-class classification model and so what's our option here for creating a loss function where do we find lost functions in pi torch i'm just going to get out of this and i'll make a new tab here and if we search torch.nn because torch.net is the basic building blocks for graphs in other words neural networks where do we find loss functions hmm here we go beautiful so we've seen that l1 loss or mse loss could be used for regression predicting a number and i'm here to tell you as well that for classification we're going to be looking at cross-entropy loss now this is for multi-class classification for binary classification we work with bce loss and of course there's a few more here but i'm going to leave that as something that you can explore on your own let's jump in to cross entropy loss so what do we have here this criterion computes remember a loss function in pi torch is also referred to as a criterion you might also see loss function referred to as cost function c-o-s-t but i call them loss functions so this criterion computes the cross-entropy loss between input and target okay so the input is something and the target is our target labels it is useful when training a classification problem with c classes there we go so that's what we're doing we're training a classification problem with c classes c is a number of classes if provided the optional argument weight should be a 1d tensor assigning a weight to each of the classes hmm so we don't have to apply a weight here but why would you apply a weight well it says if we look at weight here this is particularly useful when you have an unbalanced training set so just keep this in mind as you're going forward if you wanted to train a data set that has imbalanced samples in our case we have the same number of samples for each class but sometimes you might come across a data set with maybe you only have 10 yellow dots and maybe you have 500 blue dots and only 100 red and 100 light blue dots so you have an imbalanced data set so that's where you can come in and have a look at the weight parameter here but for now we're just going to keep things simple we have a balanced data set and we're going to focus on using this loss function if you'd like to read more please you can read on here and if you wanted to find out more you could go what is cross entropy loss and i'm sure you'll find a whole bunch of loss functions there we go there's the ml cheat sheet i love that the ml glossary that's one of my favorite websites towards data science you'll find that website wikipedia machine learning mastery is also another fantastic website but you can do that all in your own time let's code together hey we'll set up a loss function oh and one more resource before we get into code is that we've got the architecture while the typical architecture of a classification model the loss function for multi-class classification is cross entropy or torch dot n dot cross entropy loss let's code it out if in doubt code it out so create a loss function for multi-class classification and then we go los fn equals and then dot cross entropy loss beautiful and then we want to create an optimizer create an optimizer for multi-class classification and now the beautiful thing about optimizers is they're quite flexible they can go across a wide range of different problems so the optimizer so two of the most common and i say most common because they work quite well across a wide range of problems so that's why i've only listed two here but of course within the torch dot opt-in module you will find a lot more different optimizers but let's stick with sgd for now and we'll go back we'll go optimizer equals torch dot optim for optimizer sgd for stochastic gradient descent the parameters we want our optimizer to optimize model four we're up to our fourth model already oh my goodness model for dot parameters and we'll set the learning rate to 0.1 of course you could change the learning rate if you wanted to in fact i'd encourage you to see what happens if you do because why the learning rate is a hyper parameter hyper parameter i'm better at writing code than i am at spelling you can change wonderful so we've now got a loss function and an optimizer for a multi-class classification problem what's next well we could start to build building a training loop we could start to do that but i think we have a look at what the outputs of our model are so more specifically so getting prediction probabilities for a multi-class pie torch model so my challenge to you before the next video is to have a look at what happens if you pass x blob test through a model and remember what is the model's raw outputs what is that referred to as i'll let you have a think about that before the next video i'll see you there welcome back in the last video we created a loss function and an optimizer for our multi-class classification model and recall the loss function measures how wrong our model's predictions are and the optimizer optimizer updates our model parameters to try and reduce the loss so that's what that does and i also issued the challenge of doing a forward pass with model 4 which is the most recent model that we created and oh did i just give you some code that wouldn't work did i do that on purpose maybe maybe not you'll never know so if this did work what are the raw outputs of our model let's get some raw outputs of our model and if you recall the raw outputs of a model are called logits so we've got a runtime error expected all tensors to be on the same device of course why did this come up well because if we go next model 4 dot parameters and if we check device what happens here oh we need to bring this in our model is on the cuda device whereas our data is on the cpu still can we go x is our data retention can we check the device parameter of that i think we can i might be proven wrong here oh it's on the cpu of course we're getting a runtime error did you catch that one if you did well done so let's see what happens but before we do a forward pass how about we turn our model into a vowel mode to make some predictions with torch dot inference mode we'll make some predictions we don't necessarily have to do this because it's just tests but it's a good habit oh why preds equals what do we get why preds and maybe we'll just view the first ten what do we get here oh my goodness a whole bunch of numbers on a page is this the same format as our data or our test labels let's have a look no it's not okay oh we need why blob test excuse me i'm going to make that mistake a fair few times here so we need to get this into the format of this hmm how can we do that now i want you to notice one thing as well is that we have one value here per one value except that this is actually four values now why is that we have one two three four well that is because we set the out features up here our model outputs four features per sample so each sample right now has four numbers associated with it and what are these called these are the logics now what we have to do here so let's just write this down in order to evaluate and train and test our model we need to convert our models outputs outputs which are loaded to prediction probabilities and then to prediction labels so we've done this before but for binary classification so we have to go from logits to bread probes to pred labels all right i think we can do this so we've got some logits here now how do we convert these logits to prediction probabilities well we use an activation function and if we go back to our architecture what's our output activation here for a binary classification we use sigmoid but for multi-class classification this is the two main differences between multi-class classification and binary classification one uses softmax one uses cross-entropy and it's going to take a little bit of practice to know this off by heart it took me a while but that's why we have nice tables like this and that's why we write a lot of code together so we're going to use the softmax function here to convert our logits our model's raw outputs which is this here to prediction probabilities and let's see that so convert our models logit outputs to prediction probabilities so let's create y pred probs so i like to call prediction probabilities pred probs for short so torch dot softmax and then we go y logits and we want it across the first dimension so let's have a look if we print y logits we'll get the first five values there and then look at the conversion here why logits or why print problems that's what we want to compare pred probs five let's check this out oh what did we get wrong here why logits do we have y logics oh no we should change this to y logics because really that's the raw output of our model here y logits let's rerun that check that we know that these are different to these but we ideally want these to be in the same format as these our test labels these are our models predictions and now we should be able to convert there we go okay beautiful what's happening here let's just get out of this and we will add a few code cells here so we have some space now if you wanted to find out what's happening with torch.softmax what could you do we could go torch softmax see what's happening softmax okay so here's the function that's happening we replicated some non-linear activation functions before so if you wanted to replicate this what could you do well if in doubt code it out you could code this out you've got the tools to do so we've got softmax to some x input takes the exponential of x so torch exponential over the sum of torch exponential of x so i think you could code that out if you wanted to but let's for now just stick with what we've got we've got some logits here and we've got some softmax some logits that have been passed through the softmax function so that's what's happened here we've passed our logits as the input here and it's gone through this activation function these are prediction probabilities and you might be like daniel these are still just numbers on a page but you'll also notice that none of them are negative okay and there's another little tidbit about what's going on here if we sum one of them up let's get the first one will this work and if we go torch dot sum what happens oh they all sum up to one so that's one of the effects of the softmax function and then if we go torch dot max of why pred probs so this is a prediction probability for multi-class you'll find that for this particular sample here the zeroth sample this is the maximum number and so our model what this is saying is our model is saying this is the prediction probability this is how much i think it is class 0 this number here it's in order this is how much i think it is class 1. this is how much i think it is class 2 this is how much i think it is class 3 and so we have one value for each of our 4 classes a little bit confusing because it's 0th indexed but the maximum value here is this index and so how would we get the particular index value of whatever the maximum number is across these values well we can take the arg max and we get tensor 1. so for this particular sample this one here our model and these guesses or these predictions aren't very good why is that well because our model is still just predicting with random numbers we haven't trained it yet so this is just random output here basically but for now the premise still remains that our model thinks that for this sample using random numbers it thinks that index one is the right class or class number one for this particular sample and then for this next one what's the maximum number here i think it would be the zeroth index and the same for the next one what's the maximum number here well it would be the zeroth index as well but of course these numbers are going to change once we've trained our model so how do we get the maximum index value of all of these so this is where we can go convert our models prediction probabilities to prediction labels so let's do that we can go why preds equals torch dot arg max on why pred probs and if we go across the first dimension as well so now let's have a look at why threads do we have prediction labels in the same format as our y blob test beautiful yes we do although many of them are wrong as you can see ideally they would line up with each other but because our model is predicting or making predictions with random numbers so they haven't been our model hasn't been trained all of these are basically random outputs so hopefully once we train our model it's going to line up the values of the predictions are going to line up with the values of the test labels but that is how we go from our model's raw outputs to prediction probabilities to prediction labels for a multi-class classification problem so let's just add the steps here logits raw output of the model pred probs to get the prediction probabilities use torch dot softmax or the softmax activation function pred labels take the arg max of the prediction probabilities so we're going to see this in action later on when we evaluate our model but i feel like now that we know how to go from logits to prediction probabilities to pred labels we can write a training loop so let's set that up 8.5 create a training loop and testing loop for a multi-class pi torch model this is so exciting i'll see you in the next video let's build our first training and testing loop for a multi-class pie touch model and i'll give you a little hint it's quite similar to the training and testing loops we've built before so you might want to give it a shot i think you can otherwise we'll do it together in the next video welcome back in the last video we covered how to go from raw logits which is the output of the model the raw output of the model for a multi-class pi torch model then we turned our logits into prediction probabilities using torch.softmax and then we turn those prediction probabilities into prediction labels by taking the arg max which returns the index of where the maximum value occurs in the prediction probability so for this particular sample with these four values because it outputs four values because we're working with four classes if we were working with 10 classes it would have 10 values the principal of these steps would still be the same so for this particular sample this is the value that's the maximum so we would take that index which is 1. for this one the index 0 has the maximum value for this sample same again and then same again i mean these prediction labels are just random right so they're quite terrible but now we're going to change that because we're going to build a training and testing loop for our multi-class model let's do that so fit the multi-class model to the data let's go set up some manual seeds torch.manual seed again don't worry too much if our numbers on the page are not exactly the same that's inherent to the randomness of machine learning we're setting up the manual seeds to try and get them as close as possible but these do not guarantee complete determinism which means the same output but we're going to try the direction is more important set number of epochs we're going to go epochs how about we just do a hundred i reckon we'll start with that we can bump it up to a thousand if we really wanted to let's put the data to the target device what's our target device well it doesn't really matter because we've set device agnostic code so whether we're working with a cpu or a gpu our code will use whatever device is available i'm typing blog again so we've got xblob train why blob train this is going to go where it's going to go to the device and y blob train to device and we're going to go x blob test and then y blob test equals x blob test to device otherwise we'll get device issues later on and we'll send this to device as well beautiful now what do we do now well we loop through data loop through dana so for an epoch in range epochs for an epoch in a range epochs i don't want that auto correct come on google collab work with me here we're training our first multi-class classification model this is serious business now i'm joking it's actually quite fun so model4.train and let's do the forward pass i'm not going to put much commentary here because we've been through this before but what are the logits the logits are raw outputs of our model so we'll just go xblob train and x test i didn't want that x blob train why did that do that i need to turn off autocorrect in google caleb i've been saying it for a long time why pred equals torch.softmax so what are we doing here we're going from logits to prediction probabilities here so torch soft max y logits across the first dimension and then we can take the arg max of this and dim equals one in fact i'm going to show you a little bit of oh i've written blog here maybe autocorrect would have been helpful for that a little trick you don't actually have to do the torch soft max the logits if you just took the arg max of the logits is a little test for you just take the arg max of the logits and see do you get the same similar outputs as what you get here so i've seen that done before but for completeness we're going to use the softmax activation function because you'll often see this in practice and now what do we do we calculate the loss so the loss fm we're going to use categorical cross entropy here or just cross entropy loss so if we check our loss function what do we have we have cross entropy loss we're going to compare our models logits to y blob train and then what are we going to do we're going to calculate the accuracy because we're working with a classification problem it'd be nice if we had accuracy as well as loss accuracy is one of the main classification evaluation metrics y pred equals y pred y pred and now what do we do well we have to zero grab the optimizer optimize a zero grad then we go last backward and then we step the optimizer optimize the step step step so none of these steps we haven't covered before we do the forward pass we calculate the loss and any evaluation metric we choose to do so we zero the optimizer we perform back propagation on the loss and we step the optimizer the optimizer will hopefully behind the scenes update the parameters of our model to better represent the patterns in our training data and so we're going to go testing code here what do we do for testing code well or inference code we set our model to a val mode that's going to turn off a few things behind the scenes that our model doesn't need such as dropout layers which we haven't covered but you're more than welcome to check them out if you go torch n drop out layers do we have dropout drop out layers beautiful and another one that it turns off is batch norm beautiful and also you could search this what does model dot eval do and you might come across stack overflow question one of my favorite resources so there's a little bit of extra curriculum but i prefer to see things in action so with torch inference mode again this turns off things like gradient tracking and a few more things so we get as fast as code as possible because we don't need to track gradients when we're making predictions we just need to use the parameters that our model has learned we want x blob test to go to our model here for the test logits and then for the test spreads we're going to do the same step as what we've done here we're going to go torch dot softmax on the test logits across the first dimension and we're going to call the arg max on that to get the index value of where the maximum prediction probability value occurs we're going to calculate the test loss or loss function we're going to pass in what the test logits here then we're going to pass in y blob test compare the test logits behind the scenes our loss function is going to do some things that convert the test logits into the same format as our test labels and then return us a value for that then we'll also calculate the test accuracy here by passing in the y true as y blob test and we have the y pred equals y pred wonderful and then what's our final step well we want to print out what's happening because i love seeing metrics as our model trains it's one of my favorite things to watch if we go if epoch let's do it every 10 epochs because we've got 100 so far equals zero let's print out a nice f string with epoch and then we're going to go loss what do we put in here we'll get our loss value but we'll take it to four decimal places and we'll get the training accuracy which will be ack and we'll take this to two decimal places and we'll get a nice percentage sign there and we'll go test loss equals test loss and we'll go there and finally we'll go test ack at the end here test act now i'm sure by now we've written a fair few of these you're either getting sick of them or you're like wow i can actually do the steps through here and so don't worry we're going to be functionising all of this later on but i thought i'm going to include them as much as possible so that we can practice as much as possible together so you ready we're about to train our first multi-class classification model in three two one let's go no typos of course what did we get wrong here oh this is a fun error runtime error nll lost forward reduce cuda kernel 2d index not implemented for float okay that's a pretty full-on bunch of words there i don't really know how to describe that but here's a little hint we've got float there so we know that float is what float is a form of data it's a data type so potentially because that's our hint we said not implemented for float so maybe we've got something wrong up here our data is of the wrong type can you see anywhere where our data might be the wrong type well let's print it out where's our issue here why logits why blob train okay why blob train and why logits what does y blob train look like why blob train okay and what's the d type here float okay so it's not implemented for float hmm maybe we have to turn them into a different data type what if we went type torch long tensor what happens here expected all tenses to be on the same device but found at least two devices oh my goodness what do we got wrong here type torch long tensor friends guess what i found it and so it was to do with this pesky little data type issue here so if we run this again and now this one took me a while to find and i want you to know that that behind the scenes even though again this is a machine learning cooking show it still takes a while to troubleshoot code and you're going to come across this but i thought rather than spend 10 minutes doing it in a video i'll show you what i did so we went through this and we found that hmm there's something going on here i don't quite know what this is and that seems quite like a long string of words not implemented for float and then we looked back at the line where it went wrong and so that we know that maybe the float is hinting at that one of these two tenses is of the wrong data type now why would we think that it's the wrong data type well because anytime you see float or int or something like that it generally hints at one of your data types being wrong and so the error is actually right back up here where we created our tensor data so we turned our labels here into floats which generally is okay in pi torch however this one should be of type torch dot long tensor which we haven't seen before but if we go into torch long tensor let's have a look torch dot tensor do we have long tensor here we go 64-bit integer signed so why do we need torch.long tensor and again this took me a while to find so i want to express this that in your own code you probably will but your head up against some issues and errors that do take you a while to find and data types is one of the main ones but if we look in the documentation for cross entropy loss the way i kind of found this out was this little hint here the performance of the criterion is generally better when the target contains class indices as this allows for optimized computation but i read this and it says target contains class indices i'm like um al's are indices already but maybe they should be integers and not floats but then if you actually just look at the sample code you would find that they use d type equals torch dot long now that's the thing with a lot of code around the internet is that sometimes the answer you're looking for is a little bit buried but if in doubt run the code and bite your head up against a wall for a bit and keep going so let's just re-run all of this and see do we have an error here let's train our first multi-class classification model together no arrows fingers crossed what did we get wrong here okay so we've got different size we're slightly working through all of the errors in deep learning here value error input batch size 200 to match target size 200 so this is telling me maybe our test data which is of size 200 is getting mixed up with our training data which is of size 800 so we've got test loss the test logits model 4 what's the size let's print out print test logits.shape and y blob test so troubleshooting on the fly here everyone what do we got torch size 800. where are our test labels coming from why blob test equals oh there we go ah did you catch that before maybe you did maybe you didn't but i think we should be right here now if we just comment out this line so we've had a data type issue and we've had a shape issue two of the main and machine learning oh again we've had some issues why blob test what's going on here i thought we'd just change the shape oh no we have to go up and reassign it again because now this is definitely why blob yes let's rerun all of this reassign our data we are running into every single error here but i'm glad we're doing this because otherwise you might not see how to troubleshoot these type of things so the size of a tensor must match the size oh we're getting the issue here test spreads oh my goodness we have written so much code here test spread so instead of why pred this should be test spreads fingers crossed are we training our first model yet or what there we go okay and we're printing out some stuff i don't really want to print out that stuff and i want to see the loss go down so i'm going to so friends i hope you know that we've just been through some of the most fundamental troubleshooting steps and you might say oh daniel that's a cop out because you're just coding wrong and in fact i code wrong all the time but we've now worked out how to troubleshoot them shape errors and data type errors but look at this after all of that thank goodness our loss and accuracy go in the directions that we want them to go so our loss goes down and our accuracy goes up beautiful so it looks like that our model is working on a multi-class classification data set so how do we check that well we're going to evaluate it in the next step by visualize visualize visualize so you might want to give that a shot see if you can use our plot decision boundary function or use our model to separate the data here so it's going to be much the same as what we did for binary classification but this time we're using a different model and a different data set i'll see you there welcome back in the last video we went through some of the steps that we've been through before in terms of training and testing a model but we also butted our heads up against two of the most common issues in machine learning and deep learning in general and that is data type issues and shape issues but luckily we were able to resolve them and trust me you're going to run across many of them in your own deep learning and machine learning endeavors so i'm glad that we got to have a look at them and sort of i could show you what i do to troubleshoot them but in reality it's a lot of experimentation run the code see what errors come out google the errors read the documentation try again but with that being said it looks like that our model our multi-class classification model has learned something the loss is going down the accuracy is going up but we can further evaluate this by making and evaluating predictions with a pi torch multi-class model so how do we make predictions we've seen this step before but let's reiterate make predictions we're going to set our model to what mode a vowel mode and then we're going to turn on what context manager inference mode because we want to make inference we want to make predictions now what do we make predictions on or what are the predictions they're going to be logits because why they are the raw outputs of our model so we'll take model 4 which we just trained and we'll pass it the test data well it needs to be blob test by the way i keep getting that variable mixed up we just had enough problems with the data daniel we don't need any more you're completely right i agree with you but we're probably going to come across some more problems in the future don't you worry about that so let's view the first 10 predictions why logits what do they look like all right just numbers on the page they're raw logits now how do we go from go from logits to prediction probabilities how do we do that with a multi-class model we go why pred probs equals torch dot softmax on the y logits and we want to do it across the first dimension and what do we have when we go pred props let's go up to the first ten are we apples to apples yet what does our y blob test look like we're not apples to apples yet but we're close so these are prediction probabilities you'll notice that we get some fairly different values here and remember the one closest to one here the value closest to one which looks like it's this the index of the maximum value is going to be our models predicted class so this index is index one and does it correlate here yes one beautiful then we have index three which is the maximum value here three beautiful and then we have what do we have here index two yes okay wonderful but let's not step through that we're programmers we can do this with code so now let's go from pred probs to pred labels so why preds equals how do we do that well we can do torch dot arg max on the y pred probes and then we can pass dim equals one we could also do it this way why pred probs call dot arg max there's no real difference between these two but we're just going to do it this way called torch.max y threads let's view the first ten are we now comparing apples to apples when we go why blob test yes we are have a go at that look one three two one zero three one three two one zero three beautiful now we could line these up and look at and compare them all day i mean that would be fun but i know what something that would be even more fun let's get visual so plot dot figure and we're going to go fig size equals 12 6 just because the beauty of this being a cooking show is i kind of know what ingredients work from ahead of time despite what you saw in the last video with all that troubleshooting but i'm actually glad that we did that because seriously shape issues and data type issues you're going to come across a lot of them they're two of the main issues i troubleshoot aside from device issues so let's go xblob train and why blob train we're going to do another plot here we're going to get subplot 1 2 2 and we're going to do this for the test data test and then plot decision boundary plot decision boundary with model 4 on x blob test and why blob test as well let's see this did we train a multi-class oh my goodness yes we did our code worked faster than i can speak look at that beautiful looking plot we've separated our data almost as best as what we could like there's some here that are quite inconspicuous and now what's the thing about these lines with this model have worked i posed the question a fair few videos ago whenever we created our multi-class model that could we separate this data without non-linear functions so how about we just test that since we've got the code ready let's go back up we've got non-linear functions here we've got relu here so i'm just going to recreate our model there so i just took relu out that's all i did commented it out this code will still all work and all fingers crossed it will don't count your chickens before they hatch don't you know come on we're just going to rerun all of these cells all the code's going to stay the same all we did was we took the non-linearity out of our model is it still going to work oh my goodness it still works now why is that well you'll notice that the lines are a lot more straighter here did we get different metrics i'll leave that for you to compare maybe these will be a little bit different i don't think they're too far different but that is because our data is linearly separable so we can draw straight lines only to separate our data however a lot of the data that you deal with in practice will require linear and nonlinear hence why we spent a lot of time on that like the circle data that we covered before and let's look up an image of a pizza if you're building a food vision model to take photos of food and separate different classes of food could you do this with just straight lines you might be able to but i personally don't think that i could build a model to do such a thing and in fact pi torch makes it so easy to add non-linearities to our model we might as well have them in so that our model can use it if it needs it and if it doesn't need it well hey it's going to build a pretty good model as we saw before if we included the non-linearities in our model so we could uncomment these and our model is still going to perform quite well that is the beauty of neural networks is that they decide the numbers that should represent our data the best and so with that being said we've evaluated our model we've trained our multi-class classification model we've put everything together we've gone from binary classification to multi-class classification i think there's just one more thing that we should cover and that is let's go here section number nine a few more classification metrics so as i said before evaluating a model let's just put it here to evaluate our model our classification models that is evaluating a model is just as important as training a model so i'll see you in the next video let's cover a few more classification metrics welcome back in the last video we evaluated our multi-class classification model visually and we saw that it did pretty darn well because our data turned out to be linearly separable so our model even without nonlinear functions could separate the data here however as i said before most of the data that you deal with will require some form of linear and non-linear function so just keep that in mind and the beauty of pi torch is that it allows us to create models with linear and non-linear functions quite flexibly so let's write down here if we wanted to further evaluate our classification models we've seen accuracy so accuracy is one of the main methods of evaluating classification models so this is like saying out of 100 samples how many does our model get right and so we've seen our model right now is that testing accuracy of nearly 100 so it's nearly perfect but of course there were a few tough samples which i mean a little bit hard some of them are even within the other samples so you can forgive it a little bit here for not being exactly perfect what are some other metrics here well we've also got precision and we've also got recall both of these will be pretty important when you have classes with different amounts of values in them so precision and recall accuracy is pretty good to use when you have balance classes so this is just text on a page for now f1 score which combines precision and recall there's also a confusion matrix and there is also a classification report so i'm going to show you a few code examples of where you can access these and i'm going to leave it to you as extracurriculum to try each one of these out so let's go into the keynote and by the way you should paint yourself on the back here because we've just gone through all of the pytorch workflow for a classification problem not only just binary classification we've done multi-class classification as well so let's not stop there though remember building a model evaluating a model is just as important as building a model so we've been through non-linearity we've seen how we could replicate nonlinear functions we've talked about the machine learning explorer's motto visualize visualize visualize machine learning practitioners motto is experiment experiment experiment i think i called that the machine learning or data scientist motto same thing you know and steps and modeling with pi touch we've seen this in practice so we don't need to look at these slides i mean they'll be available on the github if you want them but here we are some common classification evaluation methods so we have accuracy there's the formal formula if you want but there's also code which is what we've been focusing on so we wrote our own accuracy function which replicates this by the way tp stands for not toilet paper it stands for true positive tn is true negative false positive fp false negative fn and so the code we could do torch metrics oh what's that but when should you use it the default metric for classification problems note it is not the best for imbalanced classes so if you had for example a thousand samples of one class so number one label number one but you had only ten samples of class zero so accuracy might not be the best to use for then for imbalanced data sets you might want to look into precision and recall so there's a great article called i think it's beyond accuracy precision and recall which gives a fantastic overview of there we go this is what i'd recommend there we go by will koisrin so i'd highly recommend this article as some extra curriculum here see this article for when to use precision recall we'll go there now if we look back there is the formal formula for precision true positive over true positive plus false positive so higher precision leads to less false positives so if false positives are not ideal you probably want to increase precision and if false negatives are not ideal you want to increase your recall metric however you should be aware that there is such thing as a precision recall trade-off and you're going to find this in your experimentation precision recall trade-off so that means that generally if you increase precision you lower recall and inversely if you increase precision you lower recall so check out that just to be aware of that but again you're going to learn this through practice of evaluating your models if you'd like some code to do precision and recall you've got torchmetrics.precision or torchmetrics.recall as well as scikit-learn so scikit-learn has implementations for many different classification metrics torch metrics is a pi touch like library and then we have f1 score which combines precision and recall so it's a good combination if you want something in between these two and then finally there's a confusion matrix i haven't listed here a classification report but i've listed it up here and we can see a classification report in scikit learn classification report classification report kind of just puts together all of the metrics that we've talked about and we can go there but i've been talking a lot about torch metrics so let's look up torch metrics accuracy torch metrics so this is a library i don't think it comes with google colab at the moment but you can import torch metrics and you can initialize a metric so we've built our own accuracy function but the beauty of using torch metrics is that it uses pi torch like code so we've got metric threads and target and then we can find out what the value of the accuracy is and if you wanted to implement your own metrics you could subclass the metric class here but let's just practice this so let's check to see if i'm going to grab this and copy this in here if you want access to a lot of pi torch metrics c torch metrics so can we import torch metrics maybe it's already in google colab no not here but that's all right we'll go pip install torch metrics so google colab has access to torch metrics and that's going to download from torch metrics it shouldn't take too long it's quite a small package beautiful and now we're going to go from torch metrics import accuracy wonderful and let's see how we can use this so setup metric so we're going to go torch metric underscore accuracy we could call it whatever we want really but we need accuracy here i'm just going to set up the class and then we're going to calculate the accuracy of our multi-class model by calling it torch metric accuracy and we're going to pass it why threads and why blob test let's see what happens here oh what did we get wrong runtime error expected all tenses to be on the same device but found at least two devices oh of course now remember how i said torch metrics implements pytorch like code well let's check what device this is on oh it's on the cpu so something to be aware of that if you use torch metrics you have to make sure your metrics are on the same device by using device agnostic code as your data so if we run this what do we get beautiful we get an accuracy of 99.5 percent which is in line with the accuracy function that we coded ourselves so if you'd like a lot of pre-built metrics functions be sure to see either scikit-learn for any of these or torch metrics for any pi touch-like metrics but just be aware if you use the pytorch version they have to be on the same device and if you'd like to install it what do we have where's the metrics module metrics do we have classification there we go so look how many different types of classification metrics there are from torch metrics so i'll leave that for you to explore the resources for this will be here this is an extracurricular article for when to use precision recall and another extra curriculum would be to go through the torch metrics module for 10 minutes and have a look at the different metrics for classification so with that being said i think we've covered a fair bit but i think it's also time for you to practice what you've learned so let's cover some exercises in the next video i'll see you there welcome back in the last video we looked at a few more classification metrics a little bit of a high level overview for some more ways to evaluate your classification models and i'll link some extra curriculum here that you might want to look into as well but we have covered a whole bunch of code together but now it's time for you to practice some of this stuff on your own and so i have some exercises prepared now where do you go for the exercises well remember on the learn pie torch dot io book for each one of these chapters there's a section now just have a look at how much we've covered if i scroll just keep scrolling look at that we've covered all of that in this module that's a fair bit of stuff but down the bottom of each one is an exercises section so all exercises are focusing on practicing the code in the sections above all of these sections here we've got number one two three four five six seven yeah seven exercises nice writing plenty of code and then of course extra curriculum so these are some challenges that i've mentioned throughout the entire section zero two but i'm going to link this in here exercises but of course you can just find it on the learn pitch.io book so if we come in here and we just create another heading exercises and extracurriculum and then we just write in here c exercises and extracurriculum here and so if you'd like a template of the exercise code you can go to the pie touch deep learning repo and then within the extras folder we have exercises and solutions you might be able to guess what's in each of these exercises we have o2 pi torch classification exercises this is going to be some skeleton code and then of course we have the solutions as well now this is just one form of solutions but i'm not going to look at those because i would recommend you looking at the exercises first before you go into the solutions so we have things like import torch set up device agnostic code create a data set turn data into a data frame and then etc etc for the things that we've done throughout this section so give that a go try it on your own and if you get stuck you can refer to the notebook that we've coded together all of this code here you can refer to the documentation of course and then you can refer to as a last resort the solutions notebooks so give that a shot and congratulations on finishing section 02 pie torch classification now if you're still there you're still with me let's move on to the next section we're going to cover a few more things of deep learning with pytorch i'll see you there hello and welcome back we've got another section we've got computer vision and convolutional neural networks with torch now computer vision is one of my favorite favorite deep learning topics but before we get into the materials let's answer a very important question and that is where can you get help so first and foremost is to follow along with the code as best you can we're going to be writing a whole bunch of pie torch computer vision code and remember our motto if in doubt run the code see what the inputs and outputs are of your code and that's try it yourself if you need the doc string to read about what the function you're using does you can press shift command and space in google colab or it might be control if you're on windows otherwise if you're still stuck you can search for the code that you're running you might come across stack overflow or the pi torch documentation we've spent a bunch of time in the pi torch documentation already and we're going to be referencing a whole bunch in the next module in section three we're up to now if you go through all of these four steps the next step is to try it again if in doubt run the code and then of course if you're still stuck you can ask a question on the pytorch deep learning repo discussions tab now if we open this up we can go new discussion and you can write here section 03 for the computer vision my problem is and then in here you can write some code be sure to format it as best you can that way it'll help us answer it and then go what's happening here now why do i format the code in these backticks here it's so that it looks like code and that it's easier to read when it's formatted on the github discussion then you can select a category if you have a general chat an idea a poll a q a or a show and tell of something you've made of what you've learned from the course for question and answering you'll want to put it as q a then you can click start discussion and it'll appear here and that way they'll be searchable and we'll be able to help you out so i'm going to get out of this and oh speaking of resources we've got the pytorch deep learning repo the links will be where you need the links all of the code that we're going to write in this section is contained within the section 3 notebook pie torch computer vision now this is a beautiful notebook annotated with heaps of text and images you can go through this on your own time and use it as a reference to help out if you get stuck on any of the code we write through the videos check it out in this notebook because it's probably here somewhere and then finally let's get out of these if we come to the book version of the course this is learnpinetorch.io we've got home this will probably be updated by the time you look at that but we have section 03 which is pie torch computer vision it's got all of the information about what we're going to cover in a book format and you can of course skip ahead to different subtitles see what we're going to write here all of the links you need and extra resources will be at learn pi torch dot io and for this section it's pi torch computer vision with that being said speaking of computer vision you might have the question what is a computer vision problem well if you can see it it could probably be phrased at some sort of computer vision problem that's how broad computer vision is so let's have a few concrete examples we might have a binary classification problem such as if we wanted to have two different images is this photo of steak or pizza we might build a model that understands what steak looks like in an image this is a beautiful dish that i cooked by the way this is of me eating pizza at a cafe with my dad and so we could have binary classification one thing or another and so our machine learning model may take in the pixels of an image and understand the different patterns that go into what a steak looks like and the same thing with a pizza now the important thing to note is that we won't actually be telling our model what to learn it will learn those patterns itself from different example of images then we could step things up and have a multi-class classification problem you noticing a trend here we've covered classification before but classification can be quite broad it can be across different domains such as vision or text or audio but if we were working with multi-class classification for an image problem we might have is this a photo of sushi steak or pizza and then we have three classes instead of two but again this could be a hundred classes such as what nutrify uses which is an app that i'm working on we go to nutrify.app this is bare bones at the moment but right now nutrify can classify up to a hundred different foods so if you were to upload an image of food let's give it a try nutrify we'll go into images and we'll go into sample food images and how about some chicken wings what does it classify this as chicken wings beautiful and then if we uploaded an image of not food maybe let's go to nutrify this is on my computer by the way you might not have a sample folder set up like this and then if we upload a photo of a cyber truck what does it say no food found please try another image so behind the scenes nutrify is using the pixels of an image and then running them through a machine learning model and classifying it first whether it's food or not food and then if it is food classifying it as what food it is so right now it works for 100 different foods so if we have a look at all these it can classify apples artichokes avocados barbecue sauce each of these work at different levels of performance but that's just something to keep in mind of what you can do so the whole premise of nutrify is to upload a photo of food and then learn about the nutrition about it so let's go back to our keynote what's another example well we could use computer vision for object detection where you might answer the question is where's the thing we're looking for so for example this car here i caught them on security camera actually did a hit and run on my new car it wasn't that much of an expensive car but i parked it on the street and this person the trailer came off the back of their car and hit my car and then they just picked the trailer up and drove away but i went to my neighbor's house and had a look at their security footage and they found this car so potentially you could design a machine learning model to find this certain type of car it was an orange jute by the way but the images were in black and white to detect to see if it ever recognizes a similar car that comes across the street and you go hey did you crash into my car the other day i didn't actually find who it was so sadly it was a hit and run but that's object detection finding something in an image and then you might want to find out what are the different sections in this image so this is a great example at what apple uses on their devices iphones and ipads and whatnot to segregate or segment the different sections of an image so person 1 person 2 skin tones hair sky original and then it enhances each of these sections in different ways so that's a practice known as computational photography but the whole premise is how do you segment different portions of an image and then there's a great blog post here that talks about how it works and what it does and what kind of model that's used i'll leave that as extracurriculum if you'd like to look into it so if you have these images how do you enhance the sky how do you make the skin tones look how they should how do you remove the background if you really wanted to so all of this happens on device so that's where i got that image from by the way semantic mass and this is another great blog apple machine learning research so keep this in mind we're about to see another example for computer vision which is tesla computer vision a lot of companies have websites such as apple machine learning research where they share a whole bunch of what they're up to in the world of machine learning so in tesla's case they have eight cameras on each of their self-driving cars that fuels their full self-driving beta software and so they use computer vision to understand what's going on in an image and then plan what's going on so this is three-dimensional vector space and what this means is they're basically taking these different viewpoints from the eight different cameras feeding them through some form of neural network and turning the representation of the environment around the car into a vector so a long string of numbers and why would it do that well because computers understand numbers far more than they understand images so we might be able to recognize what's happening here but for a computer to understand it we have to turn it into vector space and so if you want to have a look at how tesla uses computer vision so this is from tesla's ai day video i'm not going to play it all because it's three hours long but i watched it and i really enjoyed it so there's some information there and there's a little tidbit there if you go to two hours and one minute and 31 seconds of the same video have a look at what tesla do well would you look at that where have we seen that before that's some device agnostic code but with tesla's custom dojo chip so tesla uses pi torch so the exact same code that we're writing tesla uses similar python code to of course they write python code to suit their problem but nonetheless they use pytorch code to train their machine learning models that power their self-driving software so how cool is that and if you want to have a look at another example there's plenty of different tesla self-driving videos so oh we can just play it right here i was going to click the link so look this is what happens if we have a look in the environment tesla the cameras understand what's going on here and then it computes it into this little graphic here on your heads-up display in the car and it kind of understands well i'm getting pretty close to this car i'm getting pretty close to that car and then it uses this information about what's happening this perception to plan where it should drive next and i believe here it ends up going into it has to stop yeah there we go because we've got a stop sign look at that it's perceiving the stop sign it's got two people here i just saw a car drive past across this street so that is pretty darn cool that's just one example of computer vision one of many and how would you find out what computer vision can be used for here's what i do what can computer vision be used for plenty more resources so oh there we go 27 most popular computer vision applications in 2022. so we've covered a fair bit there but what are we going to cover specifically with pytorch code well broadly my bad we're going to get a vision data set to work with using torch vision so pytorch has a lot of different domain libraries torch vision helps us deal with computer vision problems and there's existing data sets that we can leverage to play around with computer vision we're going to have a look at the architecture of a convolutional neural network also known as a cnn with pytorch we're going to look at an end-to-end multi-class image classification problem so multi-class is what more than one thing or another could be three classes could be a hundred we're going to look at steps at modeling with cnns in pi torch so we're going to create a convolutional neural network with pi torch we're going to pick a loss function and optimize it to suit our problem we're going to train a model training a model a model a little bit of a timeframe there and then we're going to evaluate a model right so we might have typos with our text but we'll have less typos in the code and how are we going to do this well we could do it cook so we could do it chemist well we're going to do it a little bit of both part art part science but since this is a machine learning cooking show we're going to be cooking up lots of code so in the next video we're going to cover the inputs and outputs of a computer vision problem i'll see you there so in the last video we covered what we're going to cover broadly and we saw some examples of what computer vision problems are essentially anything that you're able to see you can potentially turn into a computer vision problem and we're going to be cooking up lots of machine learning or specifically pi torch computer vision code you see i fixed that typo now let's talk about what the inputs and outputs are of a typical computer vision problem so let's start with a multi-classification example and say we wanted to take photos of different images of food and recognize what they were so we're replicating the functionality of nutrify so take a photo of food and learn about it so we might start with a bunch of food images that have a heightened width of some sort so we have width equals 224 height equals 224 and then they have three color channels why three well that's because we have a value for red green and blue so if we look at this up if we go red green blue image format so 24 bit rgb images so a lot of images or digital images have some value for a red pixel a green pixel and a blue pixel and if you were to convert images into numbers they get represented by some value of red some value of green and some value of blue that is exactly the same as how we'd represent these images so for example this pixel here might be a little bit more red a little less blue and a little less green because it's close to orange and then we convert that into numbers so what we're trying to do here is essentially what we're trying to do with all of the data that we have with machine learning is represented as numbers so the typical image format to represent an image because we're using computer vision so we're trying to figure out what's in an image the typical way to represent that is in a tensor that has a value for the height width and color channels and so we might numerically encode these in other words represent our images as a tensor and this would be the inputs to our machine learning algorithm and in many cases depending on what problem you're working on an existing algorithm already exists for many of the most popular computer vision problems and if it doesn't you can build one and then you might fashion this machine learning algorithm to output the exact shapes that you want in our case we want three outputs we want one output for each class that we have we want a prediction probability for sushi we want a prediction probability for steak and we want a prediction probability for pizza now in our case in this iteration looks like our model got one of them wrong because the highest value was assigned to the wrong class here so for the second image it assigned a prediction probability of 0.81 for sushi now keep in mind that you could change these classes to whatever your particular problem is i'm just simplifying this in making it three you could have a hundred you could have a thousand you could have five it's just it depends on what you're working with and so we might use these predicted outputs to enhance our app so if someone wants to take a photo of their plate of sushi our app might say hey this is a photo of sushi here's some information about those the sushi rolls or the same for steak the same for pizza now it might not always get it right because after all that's what machine learning is it's probabilistic so how would we improve these results here well we could show our model more and more images of sushi steak and pizza so that it builds up a better internal representation of said images so when it looks at images it's never seen before or images outside its training data set it's able to get better results but just keep in mind this whole process is similar no matter what computer vision problem you're working with you need a way to numerically encode your information you need a machine learning model that's capable of fitting the data in the way that you would like it to be fit in our case classification you might have a different type of model if you're working with object detection a different type of model if you're working with segmentation and then you need to fashion the outputs in a way that best suit your problem as well so let's push forward oh by the way the model that often does this is a convolutional neural network in other words a cnn however you can use many other different types of machine learning algorithms here it's just that convolutional neural networks typically perform the best with image data although with recent research there is the transformer architecture or deep learning model that also performs fairly well or very well with image data so just keep that in mind going forward but for now we're going to focus on convolutional neural networks and so we might have input and output shapes because remember one of the chief machine learning problems is making sure that your tensor shapes line up with each other the input and output shapes so if we encoded this image of stake here we might have a dimensionality of batch size width height color channels and now the ordering here could be improved it's usually height then width so alphabetical order and then color channels last so we might have the shape of none two two four two four three now where does this come from so none could be the batch size now it's none because we can set the batch size to whatever we want say for example 32 then we might have a height of two to four and a width of two two four and three color channels now height and width are also customizable you might change this to be 512 by 512 what that would mean is that you have more numbers representing your image and incense would take more computation to figure out the patterns because there is simply more information encoded in your image but 2 2 4 2 2 4 is a common starting point for images and then 32 is also a very common bash size as we've seen in previous videos but again this could be changed depending on the hardware you're using depending on the model you're using you might have a batch size to 64 you might have a batch size of 512. it's all problem specific and that's this line here these will vary depending on the problem you're working on so in our case our output shape is three because we have three different classes for now but again if you have a hundred you might have an output shape of a hundred if you have a thousand you might have an output shape of a thousand the same premise of this whole pattern remains though numerically encode your data feed it into a model and then make sure the output shape fits your specific problem and so for this section computer vision with pi torch we're going to be building cnn's to do this part we're actually going to do all of the parts here but we're going to focus on building a convolutional neural network to try and find patterns in data because it's not always guaranteed that it will finally let's look at one more problem say you had grayscale images of fashion items and you have quite small images that are only 28 by 28. the exact same pattern is going to happen you numerically represent it use it as inputs to a machine learning algorithm and then hopefully your machine learning algorithm outputs the right type of clothing that it is in this case it's a t-shirt but i've got dot dot here because we're going to be working on a problem that uses 10 different types of items of clothing and the image is a grayscale so there's not much detail so hopefully our machine learning algorithm can recognize what's going on in these images there might be a boot there might be a shirt there might be pants there might be a dress etc etc but we numerically encode our images into dimensionality of batch size height with color channels this is known as nhwc or number of batches or number of images in a batch height with c or color channels this is color channel's last why am i showing you two forms of this do you notice color channels in this one is color channels first so color channel's height width well because you come across a lot of different representations of data full stop but particularly image data in pi torch and other libraries many libraries expect color channels last however pytorch currently at the time of recording this video may change in the future defaults to representing image data with color channels first now this is very important because you will get errors if your dimensionality is in the wrong order and so there are ways to go in between these two and there's a lot of debate of which format is the best it looks like color channel's last is going to win over the long term just because it's more efficient but again that's outside the scope but just keep this in mind we're going to write code to interact between these two but it's the same data just represented in different order and so we could rearrange these shapes to how we want color channels last or color channels first and once again the shapes will vary depending on the problem that you're working on so with that being said we've covered the input and output shapes how are we going to see them with code well of course we're going to be following the pi torch workflow that we've done so we need to get our data ready turn it into tenses in some way shape or form we can do that with torque division transforms oh we haven't seen that one yet but we will we can use torch utils data.dataset utils.data.dataloader we can then build a model or pick a pre-trained model to suit our problem we've got a whole bunch of modules to help us with that torching in torch an end module torch vision.models and then we have an optimizer and a loss function we can evaluate the model using torch metrics or we can code our own metric functions we can of course improve through experimentation which we'll see later on which we've actually we've done that right we've done improvement through experimentation we've tried different models we've tried different things and then finally we can save and reload our trained model if we wanted to use it elsewhere so with that being said we've covered the workflow this is just a high level overview of what we're going to code you might be asking the question what is a convolutional neural network or a cnn let's answer that in the next video i'll see you there welcome back in the last video we saw examples of computer vision input and output shapes and we kind of hinted at the fact that convolutional neural networks are deep learning models or cnns that are quite good at recognizing patterns in images so we left off the last video with the question what is a convolutional neural network and where could you find out about that what is a convolutional neural network here's one way to find out and i'm sure as you've seen there's a lot of resources for such things a comprehensive guide to convolutional neural networks which one of these is the best well it doesn't really matter the best one is the one that you understand the best so there we go there's a great video from code basics i've seen that one before simple explanation of convolutional neural network i'll leave you to research these things on your own and if you wanted to look at images there's a whole bunch of images i prefer to learn things by writing code because remember this course is code first as a machine learning engineer 99 of my time is spent writing code so that's what we're going to focus on but anyway here's the typical architecture of a cnn in other words a convolutional neural network if you hear me say cnn i'm not talking about the news website in this course i'm talking about the architecture convolutional neural network so this is some pie touch code that we're going to be working towards building but we have some hyper parameters layer types here we have an input layer so we have an input layer which takes some in channels and an input shape because remember it's very important in machine learning and deep learning to line up your input and output shapes of whatever model you're using whatever problem you're working with then we have some sort of convolutional layer now what might happen in a convolutional layer well as you might have guessed as what happens in many neural networks is that the layers perform some sort of mathematical operation now convolutional layers perform convolving window operation across an image or across a tensor and discover patterns using let's have a look actually let's go nn.com2d there we go this is what happens so the output of our network equals a bias plus the sum of the weight tensor over the convolutional channel out and k times input now if you want to dig deeper into what is actually going on here you're more than welcome to do that but we're going to be writing code that leverages the torch nnconf2d and we're going to fix up all of these hyperparameters here so that it works with our problem now what you need to know here is that this is a bias term we've seen this before this is a weight matrix so bias vector typically and a weight matrix and they operate over the inputs but we'll see these later on with code so just keep that in mind this is what's happening as with every layer in a neural network some form of operation is happening on our input data and those operations happen layer by layer until eventually hopefully they can be turned into some usable output so let's jump back in here then we have a hidden activation slash non-linear activation because why do we use non-linear activations well it's because if our data was non-linear non-straight lines we need the help of straight and non-straight lines to model it to draw patterns in it then we typically have a pooling layer and i want you to take this architecture i've said typical here for a reason because these type of architectures are changing all the time so this is just one typical example of a cnn it's about as basic as a cnn as you can get so over time you will start to learn to build more complex models you will not only start to learn to build them you will just start to learn to use them as we'll see later on in the transfer learning section of the course and then we have an output layer so do you notice the trend here we have an input layer and then we have multiple hidden layers that perform some sort of mathematical operation on our data and then we have an output linear layer that converts our output into the ideal shape that we'd like so we have an output shape here and then how does this look in process well we put in some images i go through all of these layers here because we've used nn.sequential and then hopefully this forward method returns x in a usable status or usable state that we can convert into class names and then we could integrate this into our computer vision app in some way shape or form and here's the asterisks here note there are almost an unlimited amount of ways you could stack together a convolutional neural network this slide only demonstrates one so just keep that in mind only demonstrates one but the best way to practice this sort of stuff is not to stare at a page it's to if in doubt code it out so let's code i'll see you in google co-lab welcome back now we've discussed a bunch of fundamentals about computer vision problems and convolutional neural networks but rather than talk to more slides let's start to code them out i'm going to meet you at colab.research just going to clean up some of these tabs and i'm going to start a new notebook and then i'm going to name this one this is going to be o3 pie torch computer vision and i'm going to call mine video so just so it has the video tag because if we go in here if we go video notebooks of the pi touch deep learning repo the video notebooks are stored in here they've got the underscore video tag so the video notebooks have all of the code i write exactly in the video but there are some reference notebooks to go along with it let me just write a heading here pie torch computer vision and i'll put a resource here c reference notebook now of course this is the one that's the ground truth it's got all of the code that we're going to be writing i'm going to put that in here explain with text and images and whatnot and then finally is we've got c reference online book and that is at learn pi torch dot io at section number three pi torch computer vision going to put that in there and then i'm going to turn this into markdown with command mm beautiful so let's get started i'm going to get rid of this get rid of this how do we start this off well i believe there are some computer vision libraries that you should be aware of computer vision libraries in pytorch so this is just going to be a text based cell but the first one is torch vision which is the base domain library for pytorch computer vision so if we look up torch vision what do we find we have torch vision 0.12 that's the version that put vision is currently up to at the time of recording this so in here this is very important to get familiar with if you're working on computer vision problems and of course in the documentation this is just another tidbit we have torch audio for audio problems we have torch text for text torch vision which is what we're working on torch rec for recommendation systems torch data for dealing with different data pipelines torch serve which is for serving pi torch models and pi torch on xla so i believe that stands for accelerated linear algebra devices you don't have to worry about these ones for now we're focused on torch vision however if you would like to learn more about a particular domain this is where you would go to learn more so there's a bunch of different stuff that's going on here transforming and augmenting images so fundamentally computer vision is dealing with things in the form of images even a video gets converted to an image we have models and pre-trained weights so as i referenced before you can use an existing model that works on an existing computer vision problem for your own problem we're going to cover that in section i think it's six for transfer learning and then we have data sets which is a bunch of computer vision data sets utils operators a whole bunch of stuff here so pytorch is really really good for computer vision i mean look at all the stuff that's going on here but that's enough talking about it let's just put it in here torch vision this is the main one i'm not going to link to all of these all the links for these by the way is in the book version of the course pi torch computer vision and we have what we're going to cover finally computer vision libraries in pi torch torch vision data sets models transforms etc but let's just write down the other ones so we have torch vision dot data sets something to be aware of so get data sets and data loading functions for computer vision here then we have torch vision so these all come from torch vision models is get pre-trained computer vision so when i say pre-trained computer vision models we're going to cover this more in transfer learning as i said pre-trained computer vision models are models that have been already trained on some existing vision data and have trained weights trained patterns that you can leverage for your own problems that you can leverage for your own problems then we have torch vision dot transforms and then we have functions for manipulating your vision data which is of course images to be suitable for use with an ml model so remember what do we have to do when we have image data or almost any kind of data for machine learning we have to prepare it in a way so it can't just be pure images so that's what transforms help us out with transforms helps to turn our image data into numbers so we can use it with a machine learning model and then of course we have some these are the torch utils this is not vision specific it's entirety of pi torch specific and that's data set so if we wanted to create our own data set with our own custom data we have the base data set class for pi torch and then we have finally torch utils data these are just good to be aware of because you'll almost always use some form of data set slash data loader with whatever pi touch problem you're working on so this creates a python iterable over a data set wonderful i think these are most of the libraries that we're going to be using in this section let's import some of them hey so we can see what's going on let's go import pi torch import pi torch so import torch we're also going to get nn which is stands for neural network what's in nn well n of course we have lots of layers lots of loss functions a whole bunch of different stuff for building neural networks we're going to also import torch vision and then we're going to go from torch vision import data sets because we're going to be using data sets later on to get a data set to work with from torch vision we'll import transforms you could also go from torch vision dot transforms import two tensor this is one of the main ones you'll see for computer vision problems two tensor you can imagine what it does but let's have a look transforms to tensor transforming and augmenting images so look where we are we're in pytorch.org vision slash stable slash transforms over here so we're in the torch vision section and we're just looking at transforming and augmenting images so transforming what do we have transforms are common image transformations available in the transforms module they can be trained together using compose beautiful so if we have two tensor what does this do convert a pill image or numpy and the array to a tensor beautiful that's what we want to do later on isn't it well this is kind of me giving you a spoiler is we want to convert our images into tensors so that we can use those with our models but there's a whole bunch of different transforms here and actually one of your extra curriculum is to be to read through each of these packages for 10 minutes so that's about an hour of reading but it will definitely help you later on if you get familiar with using the pie torch documentation after all this course is just a momentum builder we're going to write heaps of pine torch code but fundamentally you'll be teaching yourself a lot of stuff by reading the documentation let's keep going with this where were we up to when we're getting familiar with our data matplotlib is going to be fundamental for visualization remember the data explorers motto visualize visualize visualize become one with the data so we're going to import matplotlib matplotlib.pyplot as plt and then finally let's check the versions so print torch.version or underscore underscore version and print torch vision so by the time you watch this there might be a newer version of each of these modules out if there's any errors in the code please let me know but this is just a bare minimum version that you'll need to complete this section i believe at the moment google colab is running 1.11 for torch and maybe or maybe 1.10 we'll find out in a second it just connected so we're importing pi torch okay there we go so my pytorch version is 1.10 and it's got cuda available and torch vision is 0.11 so just make sure if you're running in google colab if you're running this at a later date you probably have at minimum these versions you might even have a later version so these are the minimum versions required for this upcoming section so we've covered the base computer vision libraries in pytorch we've got them ready to go how about in the next video we cover getting a data set i'll see you there welcome back so in the last video we covered some of the fundamental computer vision libraries in pi torch the main one being torch vision and then modules that stem off torch vision and then of course we've got torch utils.data.dataset which is the base dataset class for pytorch and data loader which creates a python iterable over a data set so let's begin where most machine learning projects do and that is getting a data set getting a data set i'm going to turn this into markdown and the data set that we're going to be used to demonstrating some computer vision techniques is fashion mnist which is a take of the data set we'll be using is fashion mnist which is a take on the original mnist data set mnist database which is modified national institute of standards and technology database which is kind of like the hello world in machine learning and computer vision which is these are sample images from the mnist test data set which are grayscale images of handwritten digits so this i believe was originally used for trying to find out if you could use computer vision at a postal service to i guess recognize postcodes and whatnot i may be wrong about that but that's what i know yeah 1998 so all the way back at 1998 how cool is that so this was basically where convolutional neural networks were founded i'll let you read up on the history of that but neural networks started to get so good that this data set was quite easy for them to do really well and that's when fashion mness came out so this is a little bit harder if we go into here this is by zolando research fashion mnist and it's of grayscale images of pieces of clothing so like we saw before the input and output what we're going to be trying to do is turning these images of clothing into numbers and then training a computer vision model to recognize what the different styles of clothing are and here's a dimensionality plot of all the different items of clothing visualizing where similar items are grouped together there's the shoes and whatnot is this interactive oh no it's a video excuse me there we go to serious machine learning researchers we are talking about replacing mnist mnist is too easy mnist is overused mnist cannot represent modern cv tasks so even now fashion mnist i would say has also been pretty much solved but it's a good way to get started now where could we find such a data set we could download it from github but if we come back to the torch vision documentation have a look at data sets we have a whole bunch of built-in data sets and remember this is your extracurricular to read through these for 10 minutes or so each but we have a example we could download imagenet if we want we also have some base classes here for custom data sets we'll see that later on but if we scroll through we have image classification data sets caltech 101 i don't even know what all of these are there's a lot here cfar 100 so that's an example of a hundred different items so that would be a 100 class multi-class classification problem c510 is 10 classes we have emnist we have fashion mnist oh that's the one we're after but this is basically what you would do to download a dataset from torchvision.datasets you would download the data in some way shape or form and then you would turn it into a data loader so imagenet is one of the most popular or is probably the gold standard data set for computer vision evaluation it's quite a big data set it's got millions of images but that's the beauty of torch vision is it allows us to download example data sets that we can practice on or even perform research on from a built-in module so let's now have a look at the fashion mnist data set how might we get this so we've got some example code here or this is the documentation torch vision.datasets.fashionmnist we have to pass in a route so where do we want to download the dataset we also have to pass in whether we want the training version of the data set or whether we want the testing version of the data set do we want to download it yes or no should we transform the data in any way shape or form so when we're going to be downloading images through this function call or this class call do we want to transform those images in some way what do we have to do to images before we can use them with a model we have to turn them into a tensor so we might look into that in a moment and target transform is do we want to transform the labels in any way shape or form so often the data sets that you download from torch vision.datasets are pre-formatted in a way that they can be quite easily used with pytorch but that won't always be the case with your own custom data sets however what we're about to cover is just important to get an idea of what the computer vision workflow is and then later on you can start to customize how you get your data in the right format to be used with the model then we have some different parameters here and whatnot let's just rather look at the documentation if in doubt code it out so we'll be using fashion mnist and we'll start by i'm going to just put this here from torch vision dot data sets and we'll put the link there and we'll start by getting the training data set up training data i'm just going to make some code cells here so that i can code in the middle of the screen set up training data train data equals data sets dot fashion mnist because recall we've already from torch vision we don't need to import this again i'm just doing it for demonstration purposes but from torch vision import data sets so we can just call datasets.fashionmnist and then we're going to type in root see how the docstring comes up and tells us what's going on i personally find this a bit hard to read in google collab so if i'm looking up the documentation i like to just go into here but let's code it out so root is going to be data so where to download data too we'll see what this does in a minute then we're going to go train we want the training version of the data set so as i said a lot of the data sets that you find in torch vision.datasets have been formatted into training data set and testing data set already so this building tells us do we want the training data set so if that was false we would get the testing data set of fashion mnist do we want to download it do we want to download yes no so yes we do we're going to set that to true now what sort of transform do we want to do so because we're going to be downloading images and what do we have to do to our images to use them with a machine learning model we have to convert them into tensors so i'm going to pass the transform to tensor but we could also just go torch vision dot transforms dot two tensor that would be the exact same thing as what we just did before and then the target transform do we want to transform the labels no we don't we're going to see how they come or the targets sorry high torch this is another way another naming convention often uses target for the target that you're trying to predict so using data to predict a target which is i often use data to predict a label they're the same thing so how do we want to transform the data and how do we want to transform the labels slash targets and then we're going to do the same for the test data so we're going to go data sets you might know what to do here it's going to be the exact same code as above except we're going to change one line we want to store it in data we want to download the training data set as false because we want the testing version do we want to download it yes we do do we want to transform it the data yes we do we want to use two tensor to convert our image data to tensors and do we want to do a target transform well no we don't we want to keep the label slash the targets as they are let's see what happens when we run this downloading fashion mnist beautiful so this is going to download all of the labels what do we have train images train labels lovely test images test labels beautiful so that's how quickly we can get a data set by using torch vision data sets now if we have a look over here we have a data folder because we set the route to be data then if we look what's inside here we have fashion mnist exactly what we wanted then we have the raw and then we have a whole bunch of files here which torch vision has converted into data sets for us so let's get out of that and this process would be much the same if we used almost any data set in here of course they might be slightly different depending on what the documentation says and depending on what the data set is but that is how easy torch vision.datasets makes it to practice on example computer vision data sets so let's go back let's check out some parameters or some attributes of our data how many samples do we have so we'll check the lengths so we have 60 000 training examples and 10 000 testing examples so what we're going to be doing is we're going to be building a computer vision model to find patterns in the training data and then use those patterns to predict on the test data and so let's see a first training example see the first training example so we can just index on the train data let's get the zeroth index and then we're going to have a look at the image and the label oh my goodness a whole bunch of numbers now you see what the two tensor has done for us so we've downloaded some images and thanks to this torch vision transforms to tensor how would we find the documentation for this well we could go and see what this does transforms to tensor can go to tensor there we go what does this do convert a pill image so that's python image library image or numpy array to a tensor this transform does not support torch script so converts a pill image or numpy array height with color channels in the range 0-255 to a torch float tensor of shape see here this is what i was talking about how pytorch defaults with a lot of transforms to chw so color channels first height then width in the range of zero to one so typically red green and blue values are between zero and two five five but neural networks like things between zero and one and in this case it is now in the shape of color channels first then height then width however some other machine learning libraries prefer height width then color channels just keep that in mind we're going to see this in practice later on so we've got an image we've got a label let's check out some more details about it remember how we discussed oh there's our label by the way so nine we can go train data dot classes find some information about our class names class names beautiful so number nine would be zero one two three four five six seven eight nine so this particular tensor seems to relate to an ankle boot how would we find that out well one second i'm just going to show you one more thing class to idx let's go train data dot class to idx what does this give us class to idx this is going to give us a dictionary of different labels and their corresponding index so if our machine learning model predicted 9 or class 9 we can convert that to ankle boot using this attribute of the train data there are more attributes that you can have a look at if you like you can go train data dot then i just push tab to find out a bunch of different things you could go data there's the data that'll be the images and then i believe you can also go targets so targets that's all the labels which is one big long tensor now let's check the shape check the shape of our image so image dot shape and label dot shape what are we going to get from that oh label doesn't have a shape why is that well because it's only an integer so oh beautiful look at that so our image shape is we have a color channel of one so let me print this out in something prettier print image shape which is going to be image shape remember how i said it's very important to be aware of the input and output shapes of your models and your data it's all part of becoming one with the data so that is what our image shape is and then if we go next this is print image label which is label but we'll index on class names for label and then we'll do that wonderful so our image shape is currently in the format of color channel's height width we've got a bunch of different numbers that's representing our image it's black and white it only has one color channel why do you think it only has one color channel well because it's black and white so if we jump back into the keynote fashion mnist we've already discussed this grayscale images have one color channel so that means that for black the pixel value is zero and for white it's some value for whatever color is going on here so if it's a very high number say it's one it's going to be pure white if it's like 0.001 it might be a faint white pixel but if it's exactly zero it's going to be black so color images have three color channels for red green and blue grayscale have one color channel but i think we've done enough of visualizing our images as numbers how about in the next video we visualize our image as an image i'll see you there welcome back so in the last video we checked the input output shapes of our data and we downloaded the fashion mnist dataset which is comprised of images or grayscale images of t-shirts trousers pullovers dress coat sandal shirt sneaker bag ankle boot now we want to see if we can build a computer vision model to decipher what's going on in fashion mness so to separate to classify different items of clothing based on their numerical representation and part of becoming one with the data is of course checking the input output shapes of it so this is a fashion mnist data set from zolando research now if you recall why did we look at our input and output shapes well this is what we looked at before we have 28 by 28 grayscale images that we want to represent as a tensor we want to use them as input into a machine learning algorithm typically a computer vision algorithm such as a cnn then we want to have some sort of outputs that are formatted in the ideal shape that we'd like so in our case we have 10 different types of clothing so we're going to have an output shape of 10 but our input shape is what so by default pi torch turns tensors into color channels first so we have an input shape of none one twenty eight twenty eight so none is going to be our batch size which of course we can set that to whatever we'd like and so our input shape format is in nchw or in other words color channels first but just remember if you're working with some other machine learning libraries you may want to use color channels last so let's have a look at where that might be the case we're going to visualize our images so i make a little heading here 1.2 now this is all part of becoming one with the data other words understanding its input and output shapes how many samples there are what they look like visualize visualize visualize let's import matplotlib i'm just going to add a few code cells here import matplotlib dot pi plot pi plot as plt now let's create our image and label is our train data zero and we're going to print the image shape so we can understand what inputs are going into our matplotlib function and then we're going to go plot.m show and we're going to pass in our image and see what happens because recall what does our image look like image our image is this big tensor of numbers and we've got an image shape 128 128 now what happens if we call plot.m show what happens there oh we get an error invalid shape 128 128 for image data now as i said this is one of the most common errors of machine learning is a shape issue so the shape of your input tensor doesn't match the expected shape of that tensor so this is one of those scenarios where our data format so color channels first doesn't match up with what matplotlib is expecting so matplotlib expects either just height and width so no color channel for gray style images or it also expects the color channels to be last so we'll see that later on but for grayscale we can get rid of that extra dimension by passing in image dot squeeze so do you recall what squeeze does it's going to remove that singular dimension if we have a look at what goes on now beautiful we get an ankle boot well that's a very pixelated ankle boot but we're only dealing with 28 by 28 pixels so not a very high definition image let's add the title to it we're going to add in the label beautiful so we've got the number nine here so where if we go up to here that's an ankle boot now let's plot this in grayscale how might we do that we can do the same thing we can go plot plt.mshow we're going to pass in image.squeeze and we're going to change the color map cmap equals gray so in matplotlib if you ever have to change the colors of your plot you want to look into the cmap property or parameter or sometimes it's also shortened to just c but in this case in show is cmap and we want to plot title and we're going to put in class names and the label integer here so we have a look at it now we have an ankle boot and we can remove the axises too if we wanted to plot dot access and turn that off that's going to remove the axis so there we go that's the type of images that we're dealing with but that's only a singular image how about we harness the power of randomness and have a look at some random images from our data set so how would we do this let's go plot more images we'll set a random seed so you and i are both looking at as similar as possible images 42. now we'll create a plot by calling plot.figure and we're going to give it a size we might create a nine by nine grid so we want to see nine random images from our data set so rows coles oh sorry maybe we'll do four by four that'll give us 16. we're going to go for i in range and we're going to go 1 to rows times columns plus 1. so we can print i what's that going to give us we want to see 16 images or thereabouts so 16 random images but used with a manual c to 42 of our data set this is one of my favorite things to do with any type of data set that i'm looking at whether it be text image audio doesn't matter i like to randomly have a look at a whole bunch of samples at the start so that i can become one with the data with that being said let's use this loop to grab some random indexes we can do so using torches rand int so random integer between zero and length of the training data this is going to give us a random integer in the range of zero and however many training samples we have which in our case is what sixty thousand or thereabouts so i wanna create the size of one and we want to get the item from that so that we have a random index what is this going to give us oh excuse me maybe we print that out there we go so we have random images now because we're using manual seed it will give us the same numbers every time so we have three seven five four two three seven five four two and then if we just commented out the random seed we'll get different numbers every time but this is just to demonstrate we'll keep the manual seed there for now you can comment that out if you want different numbers or different images different indexes each time so we create the image and the label by indexing on the training data at the random index that we're generating and then we'll create our plot so fig or we'll add a subplot config add subplot and we're going to go rows coles i so at the ith index we're going to add a subplot remember we set rows and columns up to here and then we're going to go plt.imshow we're going to show what we're going to show our image but we have to squeeze it to get rid of that singular dimension as the color channel otherwise we end up with an issue with matplotlib we're going to use a color map of gray so it looks like the image we plotted above and then for our title it's going to be our class names indexed with our label and then we don't want the accesses because that's going to clutter up our plot let's see what this looks like oh my goodness look at that it works first go usually visualizations take a fair bit of trial and error so we have ankle boots we have shirts we have bags we have ankle boots sandal shirt pull over oh do you notice something about the data set right now pull over and shirt to me they look quite similar do you think that will cause an issue later on when our model is trying to predict between a pullover and a shirt how about if we look at some more images we'll get rid of the random seeds so we can have a look at different styles so we have a sandal ankle boot coat t-shirt top shirt oh is that a little bit confusing that we have a class for t-shirt and top and shirt like i'm not sure about you but what's the difference between a t-shirt and a shirt this is just something to keep in mind as a t-shirt and top does that look like it could be maybe even address like the shape is there so this is just something to keep in mind going forward chances are if we get confused on our like you and i looking at our data set if we get confused about different samples and what they're labeled with our model might get confused later on so let's have a look at one more and then we'll go into the next video so we have sneaker trouser shirt sandal dress pullover bag bag t-shirt oh that's quite a difficult one it doesn't look like there's even much going on in that image but the whole premise of building machine learning models to do this would be could you write a program that would take in the shapes of these images and figure out write a rule based program that we would go hey if it's look like a rectangle with a buckle in the middle it's probably a bag i mean you probably could after a while but i prefer to write machine learning algorithms to figure out patterns and data so let's start moving towards that we're now going to go on figuring out how we can prepare this data to be loaded into a model i'll see you there all right all right all right so we've got 000 images of clothing that we'd like to build a computer vision model to classify into 10 different classes and now that we've visualized a fair few of these samples do you think that we could model these with just linear lines so straight lines or do you think we'll need a model with non-linearity so i'm going to write that down so do you think these items of clothing images could be modeled with pure linear lines or do you think we'll need non-linearity don't have to answer that now we could test that out later on you might want to skip ahead and try to build a model yourself with linear lines or non-linearities we've covered linear lines and non-linearities before but let's now start to prepare our data even further to prepare data loader so right now our data is in the form of pi torch data sets so let's have a look at it train data there we go so we have data set which is of fashion mnist and then if we go test data we see a similar thing except we have a different number of data points we have the same transform on each we've turned them into tensors so we want to convert them from a data set which is a collection of all of our data into a data loader recall that a data loader turns our data set into a python iterable so i'm going to turn this into markdown beautiful more specifically specific specifically can i spell right it doesn't really matter we want to just code right we're not here to learn spelling we want to turn our data into batches or mini batches why would we do this well we may get away with it by building a model to look at all 60 000 samples of our current data set because it's quite small it's only comprised of images of 28 by 28 pixels and when i say quite small yes 60 000 images is actually quite small for a deep learning scale data set modern data sets could be in the millions of images but if our computer hardware was able to look at 60 000 samples of 28 by 28 at one time it would need a fair bit of memory so we have ram space up here we have gpu memory we have compute memory but chances are that it might not be able to store millions of images in memory so what you do is you break a data set from say 60 000 into groups of batches or mini batches so we've seen batch size before why would we do this well one it is more computationally efficient as in your computing hardware may not be able to look store in memory at 60 000 images in one hit so we break it down to 32 images at a time this would be batch size of 32. now again 32 is a number that you can change 32 is just a common batch size that you'll see with many beginner style problems as you go on you'll see different batch sizes this is just to exemplify the concept of mini batches which is very common in deep learning and why else would we do this the second point or the second main point is it gives our neural network more chances to update its gradients per epoch so what i mean by this this will make more sense when we write a training loop but if we were to just look at 60 000 images at one time we would per epoc so per iteration through the data we would only get one update per epoch across our entire data set whereas if we look at 32 images at a time our neural network updates its internal states its weights every 32 images thanks to the optimizer this will make a lot more sense once we write our training loop but these are the two of the main reasons for turning our data into mini batches in the form of a data loader now if you'd like to learn more about the theory behind this i would highly recommend looking up andrew ong mini batches there's a great lecture on that so yeah large scale machine learning mini batch gradient descent mini batch gradient descent yeah that's what it's called mini batch gradient descent if you look up some results on that you'll find a whole bunch of stuff i might just link this one i'm going to pause that i'm going to link this in there so for more on mini batches see here now to see this visually i've got a slide prepared for this so this is what we're going to be working towards there's our input and output shapes we want to create batch size of 32 across all of our 60 000 training images and we're actually going to do the same for our testing images but we only have 10 000 testing images so this is what our data set's going to look like batched so we're going to write some code mainly using the data loader from torch.utils.data we're going to pass it a data set which is our train data we're going to give it a batch size which we can define as whatever we want for us we're going to use 32 to begin with and we're going to set shuffle equals true if we're using the training data why would we set shuffle equals true well in case our data set for some reason has order say we had all of the pants images in a row we had all the t-shirt images in a row we had all the sandal images in a row we don't want our neural network to necessarily remember the order of our data we just want it to remember individual patterns between different classes so we shuffle up the data we mix it we mix it up and then it looks something like this so we might have batch number zero and then we have 32 samples now i ran out of space when i was creating these but we got that was fun up to 32 so this is setting batch size equals 32 so we look at 32 samples per batch we mix all the samples up and we go batch batch ratchet batch and we'll have however many batches we have will have number of samples divided by the batch size so 60 000 divided by 32 what's that 1800 or something like that so this is what we're going to be working towards i did want to write some code in this video but i think to save it getting too long we're going to write this code in the next video if you would like to give this a go on your own here's most of the code we have to do so there's the trained data loader do the same for the test data loader and i'll see you in the next video and we're going to batchify our fashion mnist data set welcome back in the last video we had a brief overview of the concept of mini batches and so rather than our computer looking at 60 000 images in one hit we break things down we turn it into batches of 32 again the batch size will vary depending on what problem you're working on but 32 is quite a good value to start with and try out and we do this for two main reasons if we jump back to the code why would we do this it is more computationally efficient so if we have a gpu it with say 10 gigabytes of memory it might not be able to store all 60 000 images in one hit in our data set because it's quite small it may be able to but it's better practice for later on to turn things into mini batches and it also gives our neural network more chances to update its gradients per epoch which will make a lot more sense once we write our training loop but for now we've spoken enough about the theory let's write some code to do so so i'm going to import data loader from torch.utils.data import data loader and this principle by the way preparing a data loader goes the same for not only images but for text for audio whatever sort of data you're working with many batches will follow you along or batches of data will follow you along throughout a lot of different deep learning problems set up the batch size hyperparameter remember a hyper parameter is a value that you can set yourself so batch size equals 32 and it's practice you might see it typed as capitals you won't always see it but you'll see it quite often a hyper parameter typed as capitals and then we're going to turn data sets into iterables so batches so we're going to create a trained data loader here of our fashion mnest data set we're going to use data loader we're going to see what the docs string is or actually let's look at the documentation torch data loader this is some extra curriculum for you too by the way is to read this data page taunt utils.data because no matter what problem you're going with with deep learning or pytorch you're going to be working with data so spend 10 minutes just reading through here i think i might have already assigned this but this is just so important that it's worth going through again read through all of this even if you don't understand all of it what's going on it's just it helps you know where to look for certain things so what does it take data loader takes a data set we need to set the batch size to something is the default of one that means that we would create a batch of one image at a time in our case do we want to shuffle it do we want to use a specific sampler there's a few more things going on number of workers number of workers stands for how many cores on our machine do we want to use to load data generally the higher the better for this one but we're going to keep most of these as the default because most of them are set to pretty good values to begin with i'll let you read more into the other parameters here we're going to focus on the first three data set batch size and shuffle true or false let's see what we can do so data set equals our train data which is 60 000 fashion mnist and then we have a batch size which we're going to set to our batch size hyperparameter so we're going to have a batch size of 32 and then finally do we want to shuffle the training data yes we do and then we're going to do the same thing for the test data loader except we're not going to shuffle the test data now you can shuffle the test data if you want but in my practice it's actually easier to evaluate different models when the test data isn't shuffled so you shuffle the training data to remove order and so your model doesn't learn order but for evaluation purposes it's generally good to have your test data in the same order because our model will never actually see the test data set during training we're just using it for evaluation so the order doesn't really matter the test data loader it's just easier if we don't shuffle it because then if we evaluate it multiple times it's not been shuffled every single time so let's run that and then we're going to check it out our train data loader and our test data loader beautiful we've got instances of torch utils data dataloader.dataloader and now let's check out what we've created hey i always like to print different attributes of whatever we make check out what we've created this is all part of becoming one with the data so print f i'm going to go data loaders and then pass in this is just going to output basically the exact same as what we've got above test data loader and we can also see what attributes we can get from each of these by going train data loader i don't need caps lock there train data loader full stop and then we can go tab we've got a whole bunch of different attributes we've got a batch size we've got our data set do we want to drop the last as in if our batch size overlapped with our 60 000 samples do we want to get rid of the last batch say for example the last batch only had 10 samples do we want to just drop that do you want to pin the memory that's going to help later on if we wanted to load our data faster a whole bunch of different stuff here if you'd like to research more you can find all the stuff about what's going on here in the documentation but let's just keep pushing forward what else do we want to know so let's find the length of the train data loader we will go length train data loader so this is going to tell us how many batches there are batches of which of course is batch size and we want print length of test data loader we want length test data loader batches of batch size dot dot so let's find out some information what do we have oh there we go so just we're seeing what we saw before with this one but this is more interesting here length of train data loader yeah we have about 1875 batches of 32. so if we do 60 000 training samples divided by 32 yeah it comes out to 1875. and if we did the same with 10 000 for testing samples of 32 it comes out at 3 13. this gets rounded up so this is what i meant that the last batch will have maybe not 32 because 32 doesn't divide evenly into 10 000. but that's okay and so this means that our model is going to look at 1875 individual batches of 32 images rather than just one big batch of 60 000 images now of course the number of batches we have will change if we change the batch size so we have 469 batches of 128 and if we reduce this down to 1 what do we get we have a batch per sample so 60 000 batches of one 10 thousand batches of one we're going to stick with 32. but now let's visualize so we've got them in train data loader how would we visualize a batch or a single image from a batch so let's show a sample i'll show you how you can interact with a data loader we're going to use randomness as well so we'll set a manual seed and we'll get a random index random idx equals torch rand int we're gonna go from zero to length of train features batch oh where did i get that from excuse me getting ahead of myself here i want to check out what's inside the training data loader we'll check out what's inside the training data loader because the test data loader is going to be similar so we want the train features batch so i say features as in the images themselves and the train labels batch is going to be the labels of our data set or the targets in pi torch terminology so next it a data loader so because our data loader has 1875 batches of 32 we're going to turn it into an iterable with ita and we're going to get the next batch with next and then we can go here train features batch.shape and we'll get train labels batch dot shape what do you think this is going to give us well there we go look at that so we have a tensor each batch we have 32 samples so this is batch size and this is color channels and this is height and this is width and then we have 32 labels associated with the 32 samples now where have we seen this before if we go back through our keynote input and output shapes so we have shape equals 32 28 28 1 so this is color channels last but ours is currently in color channels first now again i sound like a broken record here but these will vary depending on the problem you're working with if we had larger images what would change well the height and width dimensions would change if we had color images the color dimension would change but the premise is still the same we're turning our data into batches so that we can pass that to a model let's come back let's keep going with our visualization so we want to visualize one of the random samples from a batch and then we're going to go image label equals train features batch and we're going to get the random idx from that and we'll get the train labels batch and we'll get the random idx from that so we're matching up on the we've got one batch here train features batch train labels batch and we're just getting the image and the label at a random index within that batch so oh excuse me i need to set this equal there and then we're going to go plt dot m show what are we going to show we're going to show the image but we're going to have to squeeze it to remove that singular dimension and then we'll set the c map equal to gray and then we'll go plt dot title we'll set the title which is going to be the class names indexed by the label integer and then we can turn off the axises you can use off here or you can use false depends on what you'd like to use let's print out the image size because you can never know enough about your data and then print let's also get the label label and label shape or label size our label will be just a single integer so it might not have a shape but that's okay let's have a look oh bag see look that's quite hard to understand i wouldn't be out of detector that's a bag can you tell me that you could write a program to understand that that just looks like a warped rectangle to me but if we had a look at another one we'll get another random oh we've got a random seed so it's going to produce the same image each time so we have a shirt okay shirt so we see the image size there 1 28 28 now recall that the image size is it's a single image so it doesn't have a batch dimension so this is just color channels height width we'll go again label 4 which is a coat and we could keep doing this to become more and more familiar with our data but these are all from this particular batch that we created here coat and we'll do one more another coat we'll do one more just to make sure it's not a coat there we go we've got a bag beautiful so we've now turned our data into data loaders so we could use these to pass them into a model but we don't have a model so i think it's time in the next video we start to build model 0 we start to build a baseline i'll see in the next video welcome back so in the last video we got our data sets or our data set into data loaders so now we have 1875 batches of 32 images or for the training data set rather than 60 000 in a one big data set and we have 13 or 313 batches of 32 for the test data set then we learned how to visualize it from a batch and we saw that we have still the same image size one color channel 28 28 all we've done is we've turned them into batches so that we can pass them to our model and speaking of model let's have a look at our workflow where are we up to well we've got our data ready we've turned it into tensors through a combination of torch vision transforms torch utils data.data set we didn't have to use that one because torch vision.datasets did it for us with the fashion mness set but we did use that one we did torch utils.data the data loader to turn our data sets into data loaders now we're up to building or picking a pre-trained model to suit your problem so let's start simply let's build a baseline model and this is very exciting because we're going to build our first model our first computer vision model albeit a baseline but that's an important step so i'm just going to write down here when starting to build a series of machine learning modeling experiments it's best practice to start with a baseline model i'm going to turn this into markdown a baseline model so a baseline model is a simple model you will try and improve upon with subsequent models models slash experiments so you start simply in other words start simply and add complexity when necessary because neural networks are pretty powerful right and so they have a tendency to almost do too well on our data set that's a concept known as overfitting which we'll cover a little bit more later but we built a simple model to begin with a baseline and then our whole goal will be to run experiments according to the workflow improve through experimentation again this is just a guide it's not set in stone but this is the general pattern of how things go get data ready build a model fit the model evaluate improve the model so the first model that we build is generally a baseline and then later on we want to improve through experimentation so let's start building a baseline but i'm going to introduce to you a new layer that we haven't seen before that is creating a flattened layer now what is a flattened layer well this is best seen when we code it out so let's create a flattened model which is going to be nn.flatten and where could we find the documentation for this we go nn flatten flatten empire torch what does it do flattens a continuous range of dimms into a tensor for use with sequential so there's an example there but i'd rather if in doubt code it out so we'll create the flattened layer and of course all nn.flatten or nn.modules could be used as a model on their own so we're going to get a single sample so x equals train features batch let's get the first one zero what does this look like so it's a tensor x maybe we get the shape of it as well x shape what do we get there we go so that's the shape of x keep that in mind when we pass it through the flattened layer do you have an inkling of what flattened might do so our shape to begin with is what 128 28 now let's flatten the sample so output equals we're going to pass it to the flattened model x so this is going to perform the forward pass internally on the flattened layer so perform forward pass now let's print out what happened print out what happened print shape before flattening equals x dot shape and we're going to print shape after flattening equals output dot shape so we're just taking the output of the flat model and printing its shape here oh do you notice what happened well we've gone from 128.28 to 1784. wow what does the output look like output oh the values are now in all one big vector and if we squeeze that we can remove the extra dimension so we've got one big vector of values now where did this number come from well if we take this and this is what shape is it we've got color channels we've got height we've got width and now we've flattened it to be color channels height width so we've got one big feature vector because 28 by 28 equals what we've got one value per pixel 784 one value per pixel in our output vector now where did we see this before if we go back to our keynote if we have a look at tesla's takes eight cameras and then it turns it into a three-dimensional vector space vector space so that's what we're trying to do here we're trying to encode whatever data we're working with in tesla's case they have eight cameras now theirs has more dimensions than ours because they have the time aspect because they're dealing with video and they have multiple different camera angles we're just dealing with a single image here but regardless the concept is the same we're trying to condense information down into a single vector space and so if we come back to here why might we do this well it's because we're going to build a baseline model and we're going to use a linear layer as the baseline model and the linear layer can't handle multi-dimensional data like this we want it to have a single vector as input now this will make a lot more sense after we've coded up our model so let's do that from torch import nn we're going to go class fashion mnist model v0 we're going to inherit from nn dot module and inside here we're going to have an init function in the constructor we're going to pass in self we're going to have an input shape which we'll use a type hint which will take an integer because remember input shape is very important for machine learning models we're going to define a number of hidden units which will also be an integer and then we're going to define our output shape which will be what do you think our output shape will be how many classes are we dealing with we're dealing with 10 different classes so our output shape will be i'll save that for later on i'll let you guess for now or you might already know we're going to initialize it and then we're going to create our layer stack self.layer stack equals nn.sequential recall that sequential whatever you put inside sequential if data goes through sequential it's going to go through it layer by layer so let's create our first layer which is going to be nn.latin so that means anything that comes into this first layer what's going to happen to it it's going to flatten its external dimensions here so it's going to flatten these into something like this so we're going to flatten it first flatten our data then we're going to pass in our linear layer and we're going to have how many in features this is going to be input shape because we're going to define our input shape here and then we're going to go out features equals hidden units and then we're going to create another linear layer here and we're going to set up in features equals hidden units why are we doing this and then out features equals output shape why are we putting the same out features here as the in features here well because subsequent layers the input of this layer here its input shape has to line up with the output shape of this layer here hence why we use out features as hidden units for the output of this nn.linear layer and then we use in features as hidden units for the input value of this hidden layer here so let's keep going let's go def we'll create the forward pass here because if we subclass nn.module we have to override the forward method the forward method is going to define what it's going to define the forward computation of our model so we're just going to return self.layer stack of x so our model is going to take some input x which could be here x in our case it's going to be a batch at a time and then it's going to pass each sample through the flattened layer it's going to pass the output of the flattened layer to this first linear layer then it's going to pass the output of this linear layer to this linear layer so that's it our model is just two linear layers with a flattened layer the flattened layer has no learnable parameters only these two do and we have no non-linearities so do you think this will work does our data set need non-linearities well we can find out once we fit our model to the data but let's set up an instance of our model so torch dot manual seed let's go set up model with input parameters so we have model 0 equals fashion mnist model which is just the same class that we wrote above and here's where we're going to define the input shape equals 784 where do i get that from well that's here that's 28 by 28 so the output of flatten needs to be the input shape here so we could put 28 by 28 there or we're just going to put 784 and then write a comment here this is 28 by 28 now if we go i wonder if nn.linear will tell us nn.linear will tell us what it expects as in features size of each input sample shape where star means any number of dimensions including none in features linear weight well let's figure it out let's see what happens if in doubt code it out hey we'll see what we can do hidden units equals let's go with 10 to begin with how many units in the hidden layer and then the output shape is going to be what output shape is length of class names which will be one for every class beautiful and now let's go model 0 we're going to keep it on the cpu to begin with we could write device agnostic code but to begin we're going to send it to the cpu i might just put that up here actually 2 cpu and then let's have a look at model zero wonderful so we can try and do a dummy forward pass and see what happens so let's create dummy x equals torch rand we'll create it as the same size of a image just a singular image so this is going to be a batch of one color channel 1 height 28 height 28 and we're going to go model 0 and pass through dummy x so this is going to send dummy x through the forward method let's see what happens okay wonderful so we get an output of one two three four five six 7 8 9 10 logits beautiful that's exactly what we want so we have one logit value per class that we have now what would happen if we got rid of flatten and then we ran this ran this ran this what do we get oh mat 1 and mat 2 shapes cannot be multiplied so we have 28 by 28 and 7 okay what happens if we change our input shape to 28 we're getting shape mismatches here what happens here oh okay we get an interesting output but this is still not the right shape is it so that's where the flattened layer comes in what is the shape of this oh we get 1 1 28 10 oh so that's why we put in flatten so that it combines it into a vector so we get rid of this see if we just leave it in this shape we get 28 different samples of 10 which is not what we want we want to compress our image into a singular vector and pass it in so let's reinstantiate the flattened layer and let's make sure we've got the right input shape here 28 by 28 and let's pass it through torch size 110 that's exactly what we want one logit per class so this can be a bit fiddly when you first start but it's also a lot of fun once you get it to work and so just keep that in mind i showed you what it looks like when you have an error one of the biggest errors that you're going to face in machine learning is different tensor shape mismatches so just keep in mind the data that you're working with and then have a look at the documentation for what input shape certain layers expect so with that being said i think it's now time that we start moving towards training our model i'll see you in the next video welcome back in the last video we created model zero which is going to be our baseline model for our computer vision problem of detecting different types of clothing in 28 by 28 gray scale images and we also learned the concept of making sure our or we rehashed on the concept of making sure our input and output shapes line up with where they need to be we also did a dummy forward pass with some dummy data this is a great way to troubleshoot to see if your model shapes are correct if they come out correctly and if the inputs are lining up with where they need to be and just to rehash on what our model is going to be or what's inside our model if we check model 0 state dict what we see here is that our first layer has a weight tensor it also has a bias and our next layer has a weight tensor and it also has a bias so these are of course initialized with random values but the whole premise of deep learning machine learning is to pass data through our model and use our optimizer to update these random values to better represent the features in our data and i keep saying features but i just want to rehash on that before we move on to the next thing feature in data could be almost anything so for example the feature of this bag could be that it's got a rounded handle at the top it has a edge over here it has an edge over there now we aren't going to tell our model what features to learn about the data the whole premise of it is to or the whole fun the whole magic behind machine learning is that it figures out what features to learn and so that is what the weights and bias matrices or tensors will represent there's different features in our images and there could be many because we have 60 000 images of 10 classes so let's keep pushing forward it's now time to set up a loss function and an optimizer speaking of optimizers so 3.1 setup loss optimizer and evaluation metrics now recall in notebook 2 i'm going to turn this into markdown we created oh i don't need an emoji there so this is by the way we're just moving through this workflow we've got our data ready into tensors we've built a baseline model it's now time to pick a loss function and an optimizer so if we go back to google chrome let's write here loss function what's our loss function going to be since we're working with multi-class data our loss function will be nn dot cross entropy loss and our optimizer we've got a few options here with the optimizer but we've had practice in the past with sgd which stands for stochastic gradient descent and the atom optimizer so our optimizer let's just stick with sgd which is kind of the entry level optimizer torch opt-in sgd for stochastic gradient descent and finally our evaluation metric since we're working on a classification problem let's use accuracy as our evaluation metric so recall that accuracy is a classification evaluation metric now where can we find this well if we go into learn pi torch dot io this is the beauty of having online reference material in here neural network classification with pytorch in this notebook section 02 we created do we have different classification methods yes we did so we've got a whole bunch of different options here for classification evaluation metrics we've got accuracy precision recall f1 score a confusion matrix now we have some code that we could use if we wanted to use torch metrics for accuracy we could and torch metrics is a beautiful library that has a lot of evaluation oh it doesn't exist what happened to watch metrics maybe i need to fix that link torch metrics has a whole bunch of different pie touch metrics so very useful library but we also coded a function in here which is accuracy fn so we could copy this straight into our notebook here or i've also if we go to the pi torch deep learning github i'll just bring it over here i've also put it in helperfunctions.pi now this is a script of common functions that we've used throughout the course including if we find accuracy function here calculate accuracy now how would we get this helper functions file this python file into our notebook one way is to just copy the code itself straight here but let's import it as a python script so import requests then we're going to go from path lib import path so we want to download and this is actually what you're going to see very common practice in larger python projects especially deep learning and machine learning projects is different functionality split up in different python files and that way you don't have to keep rewriting the same code over and over again like you know how we've written a training and testing loop a fair few times well if we've written it once and it works we might want to save that to a dot pi file so we can import it later on but let's now write some code to import this helper functions.pi file into our notebook here so download helper functions from learn pi torch repo so we're going to check if our helper functions dot pi if this already exists we don't want to download it so we'll print helper functions dot pi already exists skipping download skipping download dot dot and we're going to go else here if it doesn't exist so we're going to download it downloading helper functions dot pi and we're going to create a request here with the request library equals request dot get now here's where we have to pass in the url of this file it's not this url here when dealing with github to get the actual url to the files many files you have to click the raw button so just go back and show you click raw here and we're going to copy this raw url see how it's just text here this is what we want to download into our colab notebook and we're going to write it in there request equals request.get and we're going to go with open and here's where we're going to save our helper functions dot pi we're going to write binary as file f is for file we're going to go f dot write request dot content so what this is saying is python is going to create a file called helperfunctions.pi and give it write binarypermissions as f f is for file short for file and then we're going to say f.right hey request get that information from helperfunctions.pi here and write your content to this file here so let's give that a shot beautiful so downloading helperfunctions.pi let's have a look in here do we have helper function stop hi yes we do wonderful so now we can import our accuracy function where is it there we go import accuracy function so this is very common practice when writing lots of python code is to put helper functions into dot pi scripts so let's import the accuracy metric accuracy metric from helper functions of course we could have used torch metrics as well that's another perfectly valid option but i just thought i'd show you what it's like to import your own helper function script of course you can customize helperfunctions.pi to have whatever you want in there so see this we've got from helper functions import accuracy function what's this saying could not be resolved is this going to work it did and we can go accuracy function do we get a doc string hmm seems like collab isn't picking things up but that's all right it looks like it still worked we'll find out later on if it actually works when we train our model so setup loss function and optimizer so i'm going to set up the loss function equals nn dot cross entropy loss and i'm going to set up the optimizer here as we discussed before as torch dot optim dot sgd for stochastic gradient descent the parameters i want to optimize are the parameters from model 0 our baseline model which we had a look at before which are all these random numbers we'd like our optimizer to tweak them in some way shape or form to better represent our data and then i'm going to set the learning rate here how much should they be tweaked each epoch i'm going to set it to 0.1 nice and high because our data set is quite simple it's 28 by 28 images there are 60 000 of them but again if this doesn't work we can always adjust this and experiment experiment experiment so let's run that we've got a loss function this is going to give me a doc string there we go so calculates accuracy between truth and predictions now where does this docs string come from well let's have a look helper functions that's what we wrote before good on us for writing good doc strings accuracy function well we're going to test all these out in the next video when we write a training loop so oh actually i think we might do one more function before we write a training loop how about we create a function to time our experiments yeah let's give that a go in the next video i'll see you there welcome back in the last video we downloaded our helperfunctions.pi script and imported our accuracy function that we made in notebook 2 but we could really beef this up our helperfunctions.pi file we could put a lot of different helper functions in there and import them so we didn't have to rewrite them that's just something to keep in mind for later on but now let's create a function to time our experiments so creating a function to time our experiments so one of the things about machine learning is that it's very experimental you've probably gathered that so far so let's write here so machine learning is very experimental two of the main things you'll often want to track are one your model's performance such as its loss and accuracy values etc and two how fast it runs so usually you want a higher performance and a fast model that's the ideal scenario however you could imagine that if you increase your model's performance you might have a bigger neural network it might have more layers it might have more hidden units it might degrade how fast it runs because you're simply making more calculations so there's often a trade-off between these two and how fast it runs will really be important if you're running a model say on the internet or say on a dedicated gpu or say on a mobile device so these are two things to really keep in mind so because we're tracking our model's performance with our loss value and our accuracy function let's now write some code to check how fast it runs and i did on purpose above i kept our model on the cpu so we're also going to compare later on how fast our model runs on the cpu versus how fast it runs on the gpu so that's something that's coming up let's write a function here we're going to use the time module from python so from time it import the default timer as i'm going to call it timer so if we go python default timer do we get the documentation for here we go time it so do we have default timer wonderful so the default timer which is alwaystime.perthcounter you can read more about python timing functions in here but this is essentially just going to say hey this is the exact time that our code started and then we're going to create another stop for when our code stopped and then we're going to compare the start and stop times and that's going to basically be how long our model took to train so we're going to go def print train time this is just going to be a display function so start we're going to get the float type hint by the way start and end time so the essence of this function will be to compare start and end time and we're going to set the torch or the device here or pass this in as torch.device and we're going to set that default to none because we want to compare how fast our model runs on different devices so i'm just going to write a little dock string here prints difference between start and end time and then of course we could add more there for the arguments but that's a quick one liner tell us what our function does so total time equals end minus start and then print we're going to write here train time on whichever device we're using might be cpu might be gpu total time equals we'll go to three and we'll say seconds three decimal places that is and return total time beautiful so for example we could do start time equals timer and then end time equals timer and then we can put in here some code between those two and then if we go print train oh maybe we need timer like this we'll find out if in doubt code it out you know we'll see if it works start equals start time and end equals end time and device equals we're running on the cpu right now cpu let's see if this works wonderful so it's a very small number here so train time on cpu very small number because the start time is basically on this exact line comment basically takes no time to run and then m time is on here we get 3.304 times 10 to the power of negative 5. so quite a small number but if we put some modeling code in here it's going to measure the start time of this cell it's going to model our code in there then we have the end time and then we find out how long our model took the train so with that being said i think we've got all of the pieces of the puzzle for creating some training and testing functions so we've got a loss function we've got an optimizer we've got a valuation metric we've got a timing function we've got a model we've got some data how about we train our first baseline computer vision model in the next video i'll see you there morning well might not be morning wherever you are in the world it's nice and early here i'm up recording some videos because we have a lot of momentum going with this but look at this i took a little break last night i have a runtime disconnected but this is just what's going to happen if you're using google colab since i use google colab pro completely unnecessary for the course but i just found it worth it for how much i use google colab i get longer idle timeouts so that means that my collab notebook will stay persistent for a longer time but of course overnight it's going to disconnect so i click reconnect and then if i want to get back to wherever we were because we downloaded some data from torch vision.datasets i have to rerun all of these cells so a nice shortcut we've might have seen this before is to just come down to where we were and if all the code above works oh there we go i wrote myself some notes on where we're up to let's go run before so this is going to run all the cells above and we're up to here 3.3 creating a training loop and training a model on batches of data so that's going to be a little bit interesting and i wrote myself another reminder here this is a little bit of behind the scenes the optimizer will update a model's parameters once per batch rather than once per epoch so let's hold myself to that note and uh make sure i let you know so we'll go make another title here let's go creating a training loop and training a model on batches of data so something a little bit different to what we may have seen before if we haven't created batches of data using data loader and recall that just up above here we've got something like 1800 dead there we go so we've split our data into batches rather than our model looking at 60 000 images of fashion mnist data at one time it's going to look at 1875 batches of 32 so 32 images at a time of the training data set and 313 batches of 32 of the test data set so let's build a training loop and train our first model so i'm going to write out a few steps actually because we have to do a little bit differently to what we've done before so one we want to loop through epochs so a number of epochs loop through training batches and by the way you might be able to hear some birds singing the sun is about to rise i hope you enjoy them as much as i do so we're going to perform training steps and we're going to calculate calculate the train loss per batch so this is going to be one of the differences between our previous training loops and this is going to after number two we're going to loop through the testing batches so we'll train and evaluate our model at the same step or same loop and we're going to perform testing steps and then we're going to calculate the test loss per batch as well per batch wonderful four we're gonna of course print out what's happening you may have seen the unofficial pie torch optimization loop theme song and we're going to time it all for fun of course because that's what our timing function is for so let's get started there's a fair few steps here but nothing that we can't handle and remember the motto if in doubt code it out or there's another one if in doubt run the code but we haven't written any code to run just yet so we're going to import tqdm for a progress bar if you haven't seen tqdm before it's a very good python progress bar that you can add with a few lines of code so this is just the github it's open source software one of my favorite pieces of software and it's going to give us a progress bar to let us know how many epochs our training loop has gone through it doesn't have much overhead but if you want to learn more about it please refer to the tqdm github however the beautiful thing is that google colab has tqdm built in because it's so good and so popular so we're going to import from tqdm.auto so there's a few different types of tqdm progress bars dot auto is just going to recognize what compute environment we're using and it's going to give us the best type of progress bar for what we're doing so for example google collab is running a jupyter notebook behind the scenes so the progress bar for jupyter notebooks is a little bit different to python scripts so now let's set the seed and start the timer we want to write all of our training loop in this single cell here and then once it starts once we run this cell we want the timer to start so that we can time how long the entire cell takes to run so we'll go train time start on cpu equals we set up our timer before beautiful now we're going to set the number of epochs now we're going to keep this small for faster training time so we can run more experiments so we'll keep this small for faster training time that's another little tidbit do you notice how quickly all of the cells ran above well that's because we're using a relatively small data set in the beginning when you're running experiments you want them to run quite quickly so that you can run them more often so you can learn more about your data so that you can try different things try different models so this is why we're using number of epochs equals three we start with three so that our experiment runs in 30 seconds or a minute or so that way if something doesn't work we haven't wasted so much time waiting for a model to train later on we could train it for 100 epochs if we wanted to so we're going to create a training and test loop so for epoc in tqdm range epochs let's get this going so for tqdm to work we just wrap our iterator with tqdm and you'll see later on how this tracks the progress so i'm going to put out a little print statement here we'll go epoc and this is just going to say what epoch we're on we go here that's something that i like to do quite often is put little print statements here and there so that we know what's going on so let's set up the training we're going to have to instantiate the train loss we're going to set that to 0 to begin with and we're going to cumulatively add some values to the train loss here and then we'll see later on how this accumulates and we can calculate the training loss per batch that's what we're doing up here calculate the train loss per batch and then finally at the end of the loop we will divide our training loss by the number of batches so we can get the average training loss per batch and that will give us the training loss per epoch now that's a lot of talking if that doesn't make sense remember if in doubt code it out so add a loop to loop through the training batches so because our data is batchified now and i've got a crow or maybe a kookaburra sitting on the roof across from my apartment it's uh singing its song this morning lovely so we're going to loop through our training batch data so i've got four batch comma xy because remember our training batches come in the form of x so that's our data or our images and y which is label you could call this image label or target as pie touch would but it's convention to often call your features x and your labels y we've seen this before and we're going to enumerate the train data loader as well we do this so we can keep track of the number of batches we've been through so that will give us batch there i'm going to set model 0 to training mode because even though that's the default we just want to make sure that it's in training mode now we're going to do the forward pass if you remember what are the steps in a pi torch optimization loop we do the forward pass we calculate the loss optimize the zero grad last backwards optimize the step step step so let's do that hey model zero we'll put the features through there and then we're going to calculate the loss we've been through these steps before so we're not going to spend too much time on the exact steps here but we're just going to practice writing them out and of course later on you might be thinking daniel how come we haven't functionized this training loop already we've seemed to write the same generic code over and over again well that's because we like to practice writing pytorch code right we're going to functionalize them later on don't you worry about that so here's another little step that we have in number four is we have the training loss and so because we've set that to zero to begin with we're going to accumulate the training loss values every batch so we're going to just add it up here and then later on we're going to divide it by the total number of batches to get the average loss per batch so you see how this loss calculation is within the batch loop here so this means that one batch of data is going to go through the model and then we're going to calculate the loss on one batch of data and this loop is going to continue until it's been through all of the batches in the trained data loader so 1875 steps or whatever there was so accumulate train loss and then we're going to optimizer zero grad optimizer dot zero grad and then number four is what loss backward loss backward we'll do the back propagation step and then finally we've got number five which is optimizer step so this is where i left my little note above to remind me and to also let you know highlight that the optimizer will update a model's parameters once per batch rather than once per epoch so you see how we've got a for loop inside our epoch loop here so the batch loop so this is what i meant that the optimizer this is one of the advantages of using mini batches is not only is it more memory efficient because we're not loading 60 000 images into memory at a time we are updating our models parameters once per batch rather than waiting for it to see the whole data set with every batch our model is hopefully getting slightly better so that is because the optimizer. call is within the batch loop rather than the epoch loop so let's now print out what's happening print out what's happening so if batch let's do it every 400 or so batches because we have a lot of batches we don't want to print out too often otherwise we'll just fill our screen with numbers that might not be a bad thing but 400 seems a good number that'll be about five print outs if we have 2000 batches so print looked at and of course you can adjust this to whatever you would like that's the flexibility of pie torch flexibility of python as well so looked at how many samples have we looked at so we're going to take the batch number multiply it by x the length of x is going to be 32 because that is our batch size then we're going to just write down here the total number of items that we've got in our data set and we can access that by going train data loader dot data set so that's going to give us length of the data set contained within our trained data loader which is you might be able to guess 60 000 or should be now we have to because we've been accumulating the train loss this is going to be quite high because we've been adding every single time we've calculated the loss we've been adding it to the train loss the overall value per batch so now let's adjust if we wanted to find out see how now we've got this line we're outside of the batch loop we want to adjust our training loss to get the average training loss per batch per epoch so we're coming back to the epoch loop here a little bit confusing but you just line up where the loops are and this is going to help you figure out what context you're computing in so now we are in the epoch loop so divide total train loss by length of train data loader oh this is so exciting training our biggest model yet so train loss equals or divide equals we're going to reassign the train loss we're going to divide it by the length of the train data loader so why do we do this well because we've accumulated the train loss here for every batch in the train data loader but we want to average it out across how many batches there are in the train data loader so this value will be quite high until we readjust it to find the average loss per epoch because we are in the epoch loop all right fair few steps going on but that's all right we'll figure this out we'll watch it happening in a minute let's code up the testing loop so testing what do we have to do for testing well let's set up a test loss variable why don't we do accuracy for testing as well did we do accuracy for training we didn't do accuracy for training but that's all right we'll stick to doing accuracy for testing we'll go model zero dot eval we'll put it in evaluation mode and we'll turn on our inference mode context manager with torch dot inference mode now we'll do the same thing for x y in test data loader we don't need to keep track of the batches here again in the test data loader so we'll just loop through x so features images and labels in our test data loader we're going to do the forward pass because the test loop we don't have an optimization step we are just passing our data through the model and evaluating the patterns it learned on the training data so we're going to pass in x here this might be a little bit confusing let's do this x test y test that way we don't get confused with our x above for the training set now we're going to calculate the loss accumulatively might smell that wrong i have to sound that out what do we have here so we've got our test loss variable that we just assigned to 0 above just up here so we're going to do test loss plus equals we're doing this in one step here test pred y test so we're comparing our test prediction to our y test labels our test labels now we're going to back out of the for loop here because that's all we have to do the forward pass and calculate the loss for the test data set oh i said we're going to calculate the accuracy silly me so calculate accuracy let's go test ack and we've got plus equals we can bring in our accuracy function here that's what we downloaded from our helperfunctions.pi before y true equals y test and then y pred equals test pred dot arg max dem equals 1. why do we do this well because recall that the outputs of our model the raw outputs of our model are going to be logits and our accuracy function expects our true labels and our predictions to be in the same format if our test spread is just logits we have to call argmax to find the logic value with the highest index and that will be the prediction label and so then we're comparing labels to labels that's what the arg max does here so we can back out of the batch loop now and we're going to now calculate cal cue late the test loss average per batch so let's go here test loss divide equals length test data loader so because we were in the context of the loop here of the batch loop our test lost and test accuracy values are per batch and accumulated every single batch so now we're just dividing them by how many batches we had test data loader and the same thing for the accuracy calculate the ack or test act average per batch so this is giving us test loss and test accuracy per epoch test ack divided equals length test data loader wonderful oh we're so close to finishing this up and now we'll come back to where's our epoch loop we can these lines are very helpful in google colab we scroll down i believe if you want them you can go settings or something like that yes settings that's where you can get these lines from if you don't have them so print out what's happening we are going to print f equals n let's get the train loss in here train loss and we'll print that to four decimal places and then we'll get the test loss of course test loss and we'll go we'll get that to four decimal places as well and then we'll get the test ack test accuracy we'll get that to four decimal places as well for f wonderful and then finally one more stamp in these two oh we've written a lot of code in this video we want to calculate the training time because that's another thing that we want to track we want to see how long our model is taken to train so train time end on cpu is going to equal the timer and then we're going to get the total train time model zero so we can we'll set up a variable for this so we can compare our modeling experiments later on we're going to go print train time start equals train time start on cpu end equals train time end on cpu and finally the device is going to be string next model 0 dot parameters so we're just this is one way of checking where our model zero parameters live so beautiful all right have we got enough brackets there i don't think we do okay there we go i'll just show you what the output of this is so next model 0 dot parameters what does this give us oh can we go device here ah what do we have here model zero dot parameters i thought this was a little trick and then if we go next parameter containing i thought we could get device oh there we go excuse me that's how we get it that's how we get the device that it's on so let me just turn this this is what the output of that's going to be cpu that's what we're after so troubleshooting on the fly here hopefully all of this code works so we went through all of our steps so we're looping through epochs at the top level here we loop through the training batches perform the training steps so our training loop forward pass loss calculation optimizes zero grad loss backwards calculate the loss per batch accumulate those we do the same for the testing batches except without the optimizer steps we print out what's happening and we time it all for fun a fair bit going on here but if you don't think there's any errors give that a go run that code i'm gonna leave this one on a cliffhanger and we're gonna see if this works in the next video i'll see you there welcome back the last video was pretty full on we did a fair few steps but this is all good practice the best way to learn pytorch code is to write more pie torch codes so did you try it out did you run this code did it work did we probably have an error somewhere well let's find out together you ready let's train our biggest model yet in three two one boom of course we did what do we have what's going on indentation error ah classic so print out what's happening do we not have an indent there oh is that not in line with where it needs to be excuse me okay why is this not in line so this is strange to me enter how did this all get off by one i'm not sure but this is just what you'll face like sometimes you'll write this beautiful code that should work but the main error of your entire code is that it's off by a single space i'm not sure how that happened but we're just going to pull this all into line we could have done this by selecting it all but we're going to do it line by line just to make sure that everything's in the right order beautiful and we print out what's happening ready three two one round two oh we're going okay so this is the progress bar i was talking about look at that how beautiful is that oh we're going quite quickly through all of our samples i need to talk faster oh there we go we've got some good results we've got the test the train loss the test loss and the test accuracy is pretty darn good oh my goodness this is a good baseline already 67 percent so this is showing us it's about seven seconds per iteration remember tqdm is tracking how many epochs we're going through so we have three epochs and our print statement is just saying hey we've looked at zero out of sixty thousand samples then we looked at twelve thousand out of sixty thousand samples and we finished on epoch two because it's zero indexed and we have a train loss of zero point four five five zero and test loss four seven six six and a test accuracy eight three four two six five and a training time about just over 21 seconds or just under 22. so keep in mind that your numbers may not be the exact same as mine they should be in the same realm as mine but due to inherent randomness of machine learning even if we set the manual seed might be slightly different so don't worry too much about that and what i mean by in the same realm if your accuracy is 25 rather than 83 well then it's probably something's wrong there but if it's 83.6 well then that's not too bad and the same with the train time on cpu this will be heavily dependent how long it takes to train will be heavily dependent on the hardware that you're using behind the scenes so i'm using google collab pro now that may mean i get a faster cpu than the free version of google colab it also depends on what cpu is available in google's computer warehouse where google colab is hosting of how fast this will be so just keep that in mind if your time is 10 times that then there's probably something wrong if your time is 10 times less than that well hey keep using that hardware because that's pretty darn good so let's keep pushing forward this will be our baseline that we try to improve upon so we have an accuracy of 83.5 and we have a train time of 20 or so seconds so we'll see what we can do with a model on the gpu later and then also later on a convolutional neural network so let's evaluate our model where are we up to what we just did we built a training loop so we've done that that was a fair bit of code but now we're up to we fit the model to the data and make a prediction let's do these two combined hey we'll evaluate our model so we'll come back number four is make predictions and get model zero results now we're going to create a function to do this because we want to build multiple models and that way we can if we have say model 0 1 2 3 we can pass it to our function to evaluate that model and then we can compare the results later on so that's something to keep in mind if you're going to be writing a bunch of code multiple times you probably want to functionize it and we could definitely do that for our training and lost loops but we'll see that later on so let's go def eval model so evaluate a given model we'll pass it in a model which will be a torch.nn dot module or of type and we'll pass it in a data loader which will be of type torch.utils.data.dataloader and then we'll pass in a loss function so that it can calculate the loss we could pass in an evaluation metric if we wanted to track that too so this would be torch nn.module as well and then oh there we go speaking of an evaluation function let's pass in our accuracy function as well and i don't want l i want that so we want to returns a dictionary containing the results of model predicting on data loader so that's what we want we're going to return a dictionary of model results that way we could call this function multiple times with different models and different data loaders and then compare the dictionaries full of results depending on which model we passed in here so let's set up loss and accuracy equals zero zero we'll start those off we'll go this is going to be much the same as our testing loop above except it's going to be functionalized and we're going to return a dictionary so we'll turn on our context manager for inferencing with torch.inferencemode now we're going to loop through the data loader and we'll get the x and y values so the x will be our data the y will be our ideal labels we'll make predictions with the model in other words do the forward pass so we'll go y print equals model on x now we don't have to specify what model it is because we've got the model parameter up here so this is we're starting to make our functions here or this function generalizable so it could be used with almost any model and any data loader so we want to accumulate the loss and accuracy values per batch because this is within the batch loop here per batch and then we're going to go loss plus equals loss function we'll pass it in the y thread and the y the true label and we'll do the same with the accuracy so except this time we'll use our accuracy function we'll send in y true equals y and y pred equals ypred dot argmax because the raw outputs of our model are logits and if we want to convert them into labels we could take the softmax for the prediction probabilities but we could also take the arg max and just by skipping the softmax step the arg max will get the index where the highest value logit is dim equals 1 and then we're going to make sure that we're still within the context manager here so with torch inference mode but outside the loop so that'll be this line here we're going to scale the loss and ack to find the average loss slash act per batch so loss will divide and assign to the length of the data loader so that'll divide and reassign it to however many batches are in our data loader that we pass into our of our model function then we'll do the same thing for the accuracy here length data loader beautiful and now we're going to return a dictionary here so return we can return the model name by inspecting the model we'll get an attribute of the model which is its class name i'll show you how you can do that so this is helpful to track if you've created multiple different models and given them different class names you can access the name attribute so this only works when model was created with a class so you just have to ensure that your models have different class names if you want to do it like that because we're going to do it like that we can set the model name to be its class name we'll get the model loss which is just this value here after it's been scaled we'll turn it into a single value by taking dot item and then we'll go model.ack or we'll get model underscore ack for the model's accuracy we'll do the same thing here ack i don't think we need to take the item because accuracy comes back in a different form we'll find out if in doubt code it out so calculate model 0 results on test data set and i want to let you know that you can create your own functions here to do almost whatever you want i've just decided that this is going to be helpful for the models and the data that we're building but keep that in mind that your models your data sets might be different and will likely be different in the future so you can create these functions for whatever use case you need model 0 results equals a vowel model so we're just going to call our function that we've just created here model is going to equal model 0. the data loader is going to equal what the test data loader of course because we want to evaluate it on the test data set and we're going to send in our loss function which is loss function that we assigned above just before our training loop so if we come up here our loss function is up here and then if we go back down we have our accuracy function is equal to our accuracy function we just pass another function in there beautiful and let's see if this works model 0 results did you see any typos likely or errors in our code how do you think our model did well let's find out oh there we go we got model accuracy you see how we could reuse this dictionary later on so if we had model 1 results model 2 results we could use these dictionaries and compare them all together so we've got our model name our version 0 the model has an accuracy of 83.42 and a loss of 0.47 on the test data loader again your numbers may be slightly different they should be in the same realm but if they're not the exact same don't worry too much if they're 20 accuracy points less and the loss is 10 times higher then you should probably go back through your code and check if something is wrong and i believe if we wanted to do a progress bar here could we do that tqdm let's have a look hey oh look at that progress bar that's very nice so that's nice and quick because it's only on 313 batches it goes quite quick so now what's next well we've built model one we've got a model 0 sorry i'm getting ahead of myself we've got a baseline here we've got a way to evaluate our model what's our workflow say so we've got our data ready we've done that we've picked or built a model we've picked a loss function we've built an optimizer we've created a training loop we fit the model to the data we've made a prediction we've evaluated the model using loss and accuracy we could evaluate it by making some predictions but we'll save that for later on as in visualizing some predictions i think we're up to improving through experimentation so let's give that a go hey do you recall that we trained model 0 on the cpu how about we build model 1 and start to train it on the gpu so in the next section let's create number five is set up device agnostic code so we've done this one together for using a gpu if there is one so my challenge to you for the next video is to set up some device agnostic code so you might have to go into collab if you haven't got a gpu active change runtime type to gpu and then because it might restart the runtime you might have to rerun all of the cells above so that we get our helper functions file back and the data and whatnot so set up some device agnostic code and i'll see you in the next video how'd you go did you give it a shot did you set up some device agnostic code i hope you gave it a go but let's do it together this won't take too long the last two videos have been quite long so if i wanted to set device agnostic code i want to see if i have a gpu available do i i can check it from nvidia smi that fails because i haven't activated the gpu in google collab yet i can also check here torch cuda is available that will pi torch will check if there's a gpu available with cuda and it's not so let's fix these two because we want to start using a gpu and we want to set up device agnostic code so no matter what hardware our system is running pytorch leverages it so we're going to select gpu here i'm going to click save and you'll notice that our google colab notebook will start to reset and we'll start to connect there we go we've got a gpu on the back end python 3 google compute engine back-end gpu do we have to reset this nvidia smi wonderful i have a tesla t4 gpu with 16 gigabytes of memory that is wonderful and now do we have a gpu available oh torch is not defined well do you notice the numbers of these cells one two that means because we've reset our runtime to have a gpu we have to re-run all the cells above so we can go run before that's going to run all the cells above make sure that we download the data make sure that we download the helper functions file if we go back up we should see our data may be downloading it shouldn't take too long that is another advantage of using a relatively small data set that is already saved on pi torch data sets just keep in mind that if you use a larger data set and you have to re-download it into google colab it may take a while to run and if you build bigger models they may take a while to run so just keep that in mind for your experiments going forward start small increase when necessary so we'll rerun this we'll rerun this and finally we're going to oh there we go we've got a gpu wonderful but we'll write some device agnostic code here setup device agnostic code so import torch now realistically you will quite often do this at the start of every notebook but i just wanted to highlight how we might do it if we were in the middle and i wanted to practice running a model on a cpu only before stepping things up and going to a gpu so device equals cuda this is for our device agnostic code if torch.cuda is available and it looks like this is going to return true else use the cpu and then we're going to check device wonderful cuda so we've got some device agnostic code ready to go i think it's time we built another model and i asked the question before do you think that the data set that we're working with requires non-linearity so the shirts and the bags and the shoes do we need non-linear functions to model this well it looks like our baseline model without non-linearities did pretty well at modeling our data so i've got a pretty good test accuracy value so 83 so out of 100 images it predicts the right one 83 of the time 83 times out of 100 it did pretty well without non-linearities why don't we try a model that uses non-linearities and it runs on the gpu so you might want to give that a go see if you can create a model with non-linear functions try nn.relu run it on the gpu and see how it goes otherwise we'll do it together in the next video i'll see you there hello everyone and welcome back we are making some terrific progress let's see how far we've come we've got a data set we've repaired our data loaders we've built a baseline model and we've trained it evaluated it now it's time oh and the last video we set up device agnostic code but where are we in our little framework we're up to improving through experimentation and quite often that is building a different model and trying it out it could be using more data it could be tweaking a whole bunch of different things and so let's get into some coding i'm going to write it here model one i believe we're up to section six now model one is going to be building a better model with non-linearity so i actually do the challenge in the last video to give it a go to try and build a model with non-linearity i hope you gave it a go because if anything that this course i'm trying to impart on you in this course it's to give things a go to try things out because that's what machine learning and coding is all about trying things out giving it a go but let's write down here we learned about the power of non-linearity in notebook o2 so if we go to the learn pi torch.io book we go to section number two we'll just wait for this to load and then if we come down here we can search for non-linearity the missing piece non-linearity so i'm going to get this and just copy that in there if you want to see what nonlinearity helps us do it helps us model non-linear data and in the case of a circle can we model that with straight lines in other words linear lines recall linear means straight non-linear means non-straight and so we learned that through the power of linear and nonlinear functions neural networks can model almost any kind of data if we pair them in the right way so you can go back through and read that there but i prefer to code things out and try it out on our data so let's create a model with non-linear and linear layers but we also saw that our model with just linear layers can model our data it's performing quite well so that's where the experimentation side of things will come into play sometimes you won't know what a model will do whether it will work or won't work on your data set but that is where we try different things out so we come up here we look at our data hmm that looks actually quite linear to me as a bag like it's just some straight lines you could maybe model that with just straight lines but there are some things which you could potentially classify as nonlinear in here it's hard to tell without knowing so let's give it a go let's write a non-linear model which is going to be quite similar to model 0 here except we're going to intersperse some relu layers in between our linear layers so recall that relu is a nonlinear activation function and relu has the formula if something comes in and it's a negative value relu is going to turn that negative into a zero and if something is positive rel is just going to leave it there so let's create another class here fashion mnist model v1 and we're going to subclass from nn.module beautiful and then we're going to initialize our model this is going to be quite the same as what we created before we want an input shape and that's going to be an integer and then we want a number of hidden units and that's going to be an int here and then we want an output shape int and i want to stress as well that although we're creating a class here with these inputs classes are as flexible as functions so if you need different use cases for your modeling classes just keep that in mind that you can build that functionality in self.layer stack we're going to spell layer stack correctly and we're going to set this equal to nn dot sequential because we just want a sequential set of layers the first one's going to be nn.flatten which is going to be flatten inputs into a single vector and then we're going to go nn.linear because we want to flatten our stuff because we want it to be the right shape if we don't flatten it we get shape issues input shape and then the out features of our linear layer is going to be the hidden units hidden units i'm just going to make some code cells here so that my code goes into the middle of the screen then here is where we're going to add a non-linear layer so this is where we're going to add in a relu function and where might we put these well generally you'll have a linear function followed by a non-linear function in the construction of neural networks however neural networks are as customizable as you can imagine whether they work or not is a different question so we'll go output shape here as the out features oh do we mess this one up yes we did this needs to be hidden units and why is that well it's because the output shape of this linear layer here needs to match up with the input shape of this linear layer here the relu layer won't change the shape of our data and you could test that out by printing the different shapes if you'd like and then we're going to finish off with another non-linear layer at the end relu now do you think that this will improve our model's results or not well it's hard to tell without trying it out right so let's continue building our model we have to override the forward method self x is going to be we'll give a tie pin here this is going to be a torch tensor as the input and then we're just going to return what's happening here we'll go self dot layer stack x so that just means that x is going to pass through our layer stack here and we could customize this we could try it just with one nonlinear activation this is actually our previous network just with those commented out all we've done is added in two relu functions so i'm going to run that beautiful and so what should we do next well we should instantiate it but previously we ran our last model model 0 on if we go parameters do we run this on the gpu or the cpu on the cpu so how about we try out our fashion mnist model v1 running on the device that we just set up which should be cuda wonderful so we can instantiate so create an instance of model one so we want model one or actually we'll set up a manual seed here so that whenever we create a new instance of a model it's going to be instantiated with random numbers we don't necessarily have to set a random seed but we do so anyway so that our values are quite similar on your end and my end input shape is going to be 784 where does that come from well that's because this is the output of the flattened layer after our 28 by 28 image goes in then we're going to set up the hidden units we're going to use the same number of hidden units as before which is going to be 10 and then the output shape is what we need one value one output neuron for each of our classes so length of the class names and then we're going to send this to the target device so we can write send to the gpu if it's available so now that we've set up device agnostic code in the last video we can just put two device instead of hard coding that and so if we check so this was the output for model zeros device let's now check model one's device model one parameters and we can check where those parameters live by using the device attribute beautiful so our model one is now living on the gpu cuda at index zero index zero means that it's on the first gpu that we have available we only have one gpu available so it's on this tesla t4 gpu now we've got a couple more things to do now that we've created another model we can recreate if we go back to our workflow we've just built a model here what do we have to do after we built a model we have to instantiate a loss function and an optimizer now we've done both of those things for model zero so that's what we're going to do in the next video but i'd like you to go ahead and try to create a loss function for our model and an optimizer for model one the hint is that they can be the exact same loss function and optimizer as model 0. so give that a shot and i'll see you in the next video welcome back in the last video we created another model so we're continuing with our modeling experiments and the only difference here between fashion mnist model v1 and v0 is that we've added in non-linear layers now we don't know for now we could think or guess whether they would help improve our model and with practice you can start to understand how different functions will influence your neural networks but i prefer to if in doubt code it out run lots of different experiments so let's continue we now have to create a loss function loss optimizer and evaluation metrics so we've done this for model zero so we're not going to spend too much time explaining what's going on here and we've done this a fair few times now so from helper functions which is the script we downloaded before we're going to import our accuracy function and we're going to set up a loss function which is we're working with multi-class classification so what loss function do we typically use nn dot cross entropy loss and as our optimizer is going to be torch dot optim dot sgd and we're going to optimize this time i'll put in the params keyword here model 1 dot parameters and the learning rate we're just going to keep it the same as our previous model and that's a thing to keep a note for your experiments when you're running fair few experiments you only really want to tweak a couple of things or maybe just one thing per experiment that way you can really narrow down what actually influences your model and what improves it slash what doesn't improve it and a little pop quiz what does a loss function do this is going to measure how wrong our model is and what does the optimizer do tries to update our model's parameters to reduce the loss so that's what these two functions are going to be doing the accuracy function is of a course going to be measuring our model's accuracy we measure the accuracy because that's one of the base classification metrics so we'll run this now what's next we're getting quite good at this we've picked a loss function and optimizer now we're going to build a training loop however we spent quite a bit of time doing that in a previous video if we go up here that was our vowel model function oh that was helpful we turned it into a function how about we do the same with these why don't we make a function for our training loop as well as our testing loop so i think you can give this a go we're going to make a function in the next video for training we're going to call that train step and we'll create a function for testing called test step now they'll both have to take in some parameters i'll let you figure out what they are but otherwise we're going to code that up together in the next video so i'll see you there so we've got a loss function ready and an optimizer what's our next step well it's to create training and evaluation loop so let's make a heading here we're going to call this functionalizing training and evaluation or slash testing loops because we've written similar code quite often for training and evaluating slash testing our models now we're going to start moving towards functionalizing code that we've written before because that's not only a best practice it helps reduce errors because if you're writing a training loop all the time we may get it wrong if we've got one that works for our particular problem hey we might as well say that as a function so we can continually call that over and over and over again so how about we and this is going to be very rare that i'm going to allow you to do this is that is we're going to copy this training and you might have already attempted to create this that is the function called let's create a function for one training loop and we're going to call this train step and we're going to create a function for the testing loop i'm going to call this test step now these are just what i'm calling them you can call them whatever you want i just understand it quite easily by calling it train step and then we can for each epoch in a range we call our training step and then the same thing for each epoch in a range we can call a testing step this will make a lot more sense once we've coded it out so let's put the training code here to functionalize this let's start it off with train step now what parameters should our train step function take in well let's think about this we need a model we need a data loader we need a loss function and we need an optimizer we could also put in an accuracy function here if we wanted to and potentially it's not here but we could put in what target device we'd like to compute on and make our code device agnostic so this is just the exact same code we went through before we loop through a data loader we do the forward pass we calculate the loss we accumulate it we zero the optimizer we perform back propagation in respect to the loss with the parameters of the model and then we step the optimizer to hopefully improve the parameters of our model to better predict the the data that we're trying to predict so let's craft a train step function here we'll take a model which is going to be torch nn.module type hint and we're going to put in a data loader which is going to be of type torch utils.data.dataloader now we don't necessarily need to put this in these type hints but they're relatively new addition to python and so you might start to see them more and more and it also just helps people understand what your code is expecting so the loss function we're going to put in an optimizer torch dot optim which is a type optimizer we also want an accuracy function we don't necessarily need this either these are a lot of nice to haves the first four are probably the most important and then the device so torch is going to be torch.device equals device so we'll just hard code that to be our already set device parameter and we'll just write in here performs uh training step with model trying to learn on data loader nice and simple we could make that more explanatory if we wanted to but we'll leave it at that for now and so right at the start we're going to set up train loss and train ack equals zero zero we're going to introduce accuracy here so we can get rid of this let's just go through this line by line what do we need to do here well we've got four batch x y in enumerate train data loader but we're going to change that to data loader up here so we can just change this to data loader wonderful and now we've got model 0.train do we want that well no because we're going to keep this model agnostic we want to be able to use any model with this function so let's get rid of this model.train what we are missing one step here is put data on target device and we could actually put this model.train up here put model into training mode now this will be the default for the model but just in case we're going to call it anyway model.train put data on the target device so we're going to go x y equals x dot 2 device y dot 2 device wonderful and the forward pass we don't need to use model 0 anymore we're just going to use model that's up here the loss function can stay the same because we're passing in a loss function up there the train loss can be accumulated that's fine but we might also accumulate now the train accuracy calculate loss and accuracy per batch so train ack equals or plus equals our accuracy function on y true equals y and y pred equals y pred so the outputs here y pred we need to take because the raw outputs outputs the raw logits from the model because our accuracy function expects our predictions to be in the same format as our true values we need to make sure that they are we can call the arg max here on the first dimension this is going to go from logits to prediction labels we can keep the optimizer 0 grad the same because we're passing in an optimizer up here we can keep the loss backwards because the loss is just calculated there we can keep optimize a step and we could print out what's happening but we might change this up a little bit we need to divide the total train loss and accuracy i just want to type in accuracy here because now we've added in accuracy metric ack so trainack divided equals length trained data loader oh no sorry we can just use the data loader here data loader data loader and we're not going to print out per batch here i'm just going to get rid of this we'll make at the end of this step we will make our printout here print notice how it's at the end of the step because we're outside the for loop now so we're going to here we're accumulating the loss on the training data set and the accuracy on the training data set per batch and then we're finding out at the end of the training step so after it's been through all the batches in the data loader we're finding out what the average loss is per batch and the average accuracy is per batch and now we're going to go train loss is going to be the train loss on 0.5 and then we're going to go train ack is going to be train ack and we're going to set that to 0.2 f and get that there percentage wonderful so if all this works we should be able to call our train step function and pass it in a model a data loader a loss function an optimize an accuracy function and a device and it should automatically do all of these steps so we're going to find that out in a later video in the next video we're going to do the same thing we've just done for the training loop with the test step but here's your challenge for this video is to go up to the testing loop code we wrote before and try to recreate the test step function in the same format that we've done here so give that a go and i'll see you in the next video welcome back in the last video we functionized our training loop so now we can call this train step function and instead of writing all this training loop code again well we can train our model through the art of a function now let's do the same for our testing loop so i issued you the challenge in the last video to give it a go i hope you did because that's the best way to practice pytorch code is to write more part two code let's put in a model which is going to be torch nn dot module and we're going to put in a data loader because we need a model and we need data the data loader is going to be of course the test data loader here torch.utils.data.dataloader and then we're going to put in a loss function which is going to be torchnn.module as well because we're going to use nn.cross entropy loss we'll see that later on we're going to put in an accuracy function we don't need an optimizer because we're not doing any optimization in the testing loop we're just evaluating and the device can be torch.device and we're going to set that as a default to the target device parameter beautiful so i'll put a little dock string here so performs a testing loop step on model going over data loader wonderful so now let's set up a test loss and a test accuracy because we'll measure test loss and accuracy with our testing loop function and we're going to set the model into i'll just put a comment here put the model in a vowel mode so model dot eval we don't have to use any underscore here as in model 0 because we have a model coming in the top here now what should we do well because we're performing a test step we should turn on inference mode so turn on inference mode inference mode context manager remember whenever you're performing predictions with your model you should put it in model.val and if you want as many speed ups as you can get make sure the predictions are done within the inference mode because remember inference is another word for predictions within the inference mode context manager so we're going to loop through our data loader for x and y in data loader we don't have to specify that this is x test or y test we could if we wanted to but because we're in another function here we can just go for x y in data loader we can do the forward pass after we send the data to the target device target device so we're going to have x y equals x dot 2 device and the same thing with y we're just doing best practice here creating device agnostic code then what should we do well we should do the thing that i said before which is the forward pass now that our data and model will be on the same device we can create a variable here test pred equals model we're going to pass in x and then what do we do we can calculate the loss so two calculate the loss slash accuracy we're going to accumulate it per batch so we'll set up test loss equals loss function oh plus equals loss function we're going to pass it in test pred and y which is our truth label and then the test act where you will accumulate as well using our accuracy function we'll pass in y true equals y and then y pred what do we have to do to y pred well i'll test spread we have to take the arg max to convert it from so this is going to outputs raw logits remember a model's raw output is referred to as logits and then here we have to go from logics to prediction labels beautiful oh little typo here did you cache that one tab tab beautiful oh look how good this function is looking now we're going to adjust the metrics so adjust metrics and print out you might notice that we're outside of the batch loop here right so if we draw down from this line four and we write some code here we're still within the context manager this is important because if we want to adapt a value created inside the context manager we have to modify it still with inside that context manager otherwise pytorch will throw an error so try to write this code if you want outside the context manager and see if it still works so test loss we're going to adjust it to find out the average test loss and test accuracy per batch across a whole step so we're going to go length data loader now we're going to print out what's happening print out what's happening so test loss what should we put in here well we're going to get the test loss let's get this to five decimal places and then we're going to go test ack and we will get that to two decimal places you could do this as many decimals as you want you could even times it by a hundred to get it in proper accuracy format and we'll put a new line on the end here wonderful so now it looks like we've got functions i haven't run this cell yet for a training step and a test step so how do you think we could replicate if we go back up to our training loop that we wrote before how do you think we could replicate the functionality of this except this time using our functions well we could still use this for epoch in tqdm range epochs but then we would just call our training step for this training code our training step function and we would call our testing step function passing in the appropriate parameters for our testing loop so that's what we'll do in the next video we will leverage our two functions train step and test step to train model one but here's your challenge for this video give that a go so use our training step and test step function to train model one for three epochs and see how you go but we'll do it together in the next video welcome back how'd you go did you create a training loop or a pytorch optimization loop using our training step function and a test step function were there any errors in fact i don't even know but how about we find out together hey how do we combine these two functions to create an optimization loop so i'm going to go torch dot manual seed 42 and i'm going to measure the time of how long our training and test loop takes this time we're using a different model so this model uses non-linearities and it's on the gpu so that's the main thing we want to compare is how long our model took on cpu versus gpu so i'm going to import from time it import default timer as timer and i'm going to start the train time train time start on gpu equals timer and then i'll just right here set epochs i'm going to set epochs equal to three because we want to keep our training experiments as close to the same as possible so we can see what little changes do what and then it's create a optimization and evaluation loop using train step and test step so we're going to loop through the epochs for epoch in tqdm so we get a nice progress bar in epochs then we're going to print epoch little print out of what's going on epoc and we'll get a new line and then maybe one two three four five six seven eight or something like that maybe i miscounted there but that's all right train step what do we have to do for this now we have a little dock string we have a model what model would we like to use we'd like to use model one we have a data loader what data loader would we like to use well we'd like to use our trained data loader we also have a loss function which is our loss function we have an optimizer which is our optimizer and we have an accuracy function which is our accuracy function and oops forgot to put fn and finally we have a device which equals device but we're going to set that anyway so how beautiful is that for creating a training loop thanks to the code that we've functionized before and just recall we set our optimizer and loss function in a previous video you could bring these down here if you really wanted to so that they're all in one place either way up but we can just get rid of that because we've already set it now we're going to do the same thing for our test step so what do we need here let's check the docs string we could put a little bit more information in this docstring if we wanted to to really make our code more reusable and so that if someone else was to use our code or even us in the future knows what's going on but let's just code it out because we're just it's still fresh in our minds model equals model one what's our data letter going to be for the test step it's going to be our test data loader then we're going to set in a loss function which is going to be just the same loss function we don't need to use an optimizer here because we are only evaluating our model but we can pass in our accuracy function accuracy function and then finally the device is already set but we can just pass it in anyway look at that our whole optimization loop in a few lines of code isn't that beautiful so these functions are something that you could put in like our helperfunctions.pi and that way you could just import it later on and you don't have to write your training loops all over again but we'll see a more of an example of that later on in the course so let's keep going we want to measure the train time right so we're going to create once it's been through these steps we're going to create train time end on cpu and then we're going to set that to the timer so all this is going to do is measure at value and time once this line of code is run it's going to run all of these lines of code so it's going to perform the training and optimization loop and then it's going to oh excuse me this should be gpu it's going to measure a point in time here so once all this code's run measure a point in time there and then finally we can go total train time for model one is equal to print train time which is our function that we wrote before and we pass it in a start time and it prints the difference between the start and end time on a target device so let's do that start equals what train time start on gpu the end is going to be train time end on gpu and the device is going to be device beautiful so are you ready to run our next modeling experiment model one we've got a model running on the gpu and it's using nonlinear layers and we want to compare it to our first model which our results were model 0 results and we have total train time on model 0 yes we do so this is what we're going for does our model one beat these results and does it beat this result here so three two one do we have any errors no we don't okay train step got an unexpected keyword loss oh did you catch that i didn't type in loss function let's run it again there we go okay we're running we've got a progress bar it's going to output at the end of each epoch there we go training loss all right test accuracy training accuracy this is so exciting i love watching neural networks train okay we're improving per epoc that's a good sign but we've still got a fair way to go oh okay so what do we have here well we didn't beat our hmm it looks like we didn't beat our model 0 results with the non-linear layers and we only just slightly had a faster training time now again your numbers might not be the exact same as what i've got here right so that's a big thing about machine learning is that it uses randomness so your numbers might be slightly different the direction should be quite similar and we may be using different gpus so just keep that in mind right now i'm using a nvidia smi i'm using a tesla t4 which is at the time of recording this video wednesday april 20 2022 is a relatively fast gpu for making inference so just keep that in mind your gpu in the future may be different and your cpu that you run may also have a different time here so if these numbers are like 10 times higher you might want to look into seeing if your code is there's some error if they're 10 times lower well hey you're running it on some fast hardware so it looks like my code is running on cuda slightly faster than the cpu but not dramatically faster and that's probably akin to the fact that our data set isn't too complex and our model isn't too large what i mean by that is our model doesn't have like a vast amount of layers and our data set is only comprised of like this is the layers our model has and our data set is only comprised of 60 000 images that are 28 by 28 so as you can imagine the more parameters in your model more features in your data the higher this time is going to be and you might sometimes even find that your model is faster on cpu so this is the train time on cpu you might sometimes find that your model's training time on a cpu is in fact faster for the exact same code running on a gpu now why might that be well let's write down this here let's go note sometimes depending on your data slash hardware you might find that your model trains faster on cpu than gpu now why is this so one of the number one reasons is that one it could be that the overhead for copying data slash model to and from the gpu outweighs the compute benefits offered by the gpu so that's probably one of the number one reasons is that you have to for data to be processed on a gpu you have to copy it because it is by default on the cpu if you have to copy it to that gpu you have some overhead time for doing that copy into the gpu memory and then although the gpu will probably compute faster on that data once it's there you still have that back and forth of going between the cpu and the gpu and the number two reason is that the hardware you're using has a better cpu in terms of compute capability than the gpu now this is quite a bit rarer usually if you're using a gpu like a fairly modern gpu it will be faster at computing deep learning or running deep learning algorithms than your general cpu but sometimes these numbers of compute time are really dependent on the hardware that you're running so you'll get the biggest benefits of speed ups on the gpu when you're running larger models larger data sets and more compute intensive layers in your neural networks and so if you'd like a great article on how to get the most out of your gpus it's a little bit technical but this is something to keep in mind as you progress as a machine learning engineer is how to make your gpus go burr and i mean that borrow from first principles there we go making deep learning go as in your gpu is going because it's running so fast from first principles so this is by horace hair who works on pytorch and it's great it talks about compute as a first principle so here's what i mean by copying memory and compute there might be a fair few things you're not familiar with here but that's okay but just be aware bandwidth so bandwidth costs are essentially the cost paid to move data from one place to another that's what i was talking about copying stuff from the cpu to the gpu and then also there's one more where is it overhead overhead is basically everything else i called it overhead there are different terms for different things this article is excellent so i'm going to just copy this in here and you'll find this in the resources by the way so for more on how to make your models compute faster see here lovely so right now our baseline model is performing the best in terms of results and in terms of or actually our model computing on the gpu is performing faster than our cpu again yours might be slightly different for my case for my particular hardware cuda is faster except model 0 our baseline is better than model one so what's to do next well is to keep experimenting of course i'll see you in the next video welcome back now before we move on to the next modeling experiment let's get a results dictionary for our model one a model that we trained on so just like we've got one for model zero let's create one of these for model one results and we can create that with our vowel model function so we'll go right back down to where we were i'll just get rid of this cell and let's type in here get model 1 results dictionary this is helpful so later on we can compare all of our modeling results because they'll all be in dictionary format so we're gonna model one results equals a vowel model on a model equals model one and we can pass in a data loader which is going to be our test data loader then we can pass in a loss function which is going to equal our loss function and we can pass in our accuracy function equals accuracy function wonderful and then if we check out our model one results what do we get oh no we get an error do we get the code right that looks right to me oh what does this say runtime error expected all tensors to be on the same device but found at least two devices cuda and cpu of course so why did this happen well let's go back up to our aval model function wherever we defined that here we go ah i see so this is a little gotcha in pi torch or in deep learning in general there's a saying in the industry that deep learning models fail silently and this is kind of one of those ones it's because our data and our model are on different devices so remember how i said the three big errors are shape mismatches with your data and your model device mismatches which is what we've got so far and then data type mismatches which is if your data is in the wrong data type to be computed on so what we're going to have to do to fix this is let's bring down our vowel model function down to where we were and just like we've done in our test step and train step functions where we've created device agnostic data here we've sent out data to the target device we'll do that exact same thing in our aval model function and this is just a note for going forward it's always handy to where you can create device agnostic code so we've got our new of our model function here for x y in our data loader let's make our data device agnostic so just like our model is device agnostic we've sent it to the target device we will do the same here x.2 device and then y.2 device let's see if that works we will just rerun this cell up here i'll grab this we're just going to write the exact same code as what we did before but now it should work because we've sent our we could actually also just pass in the target device here device equals device that way we can pass in whatever device we want to run it on and we're going to just add in device here device equals device and let's see if this runs correctly beautiful so if we compare this to our model 0 results it looks like our baseline's still out in front but that's okay we're going to in the next video start to step things up a notch and move on to convolutional neural networks this is very exciting and by the way just remember if your numbers here aren't exactly the same as mine don't worry too much if they're outlandishly different just go back through your code and see if it's maybe a cell hasn't been run correctly or something like that if there are a few decimal places off that's okay that's due to the inherent randomness of machine learning and deep learning but with that being said i'll see you in the next video let's get our hands on convolutional neural networks welcome back in the last video we saw that our second modeling experiment model one didn't quite beat our baseline but now we're going to keep going with modeling experiments and we're going to move on to model 2 and this is very exciting we're going to build a convolutional neural network which also known as cnn cnns are also known as comnets and cnns are known for their capabilities to find patterns in visual data so what are we going to do well let's jump back into the keynote we had a look at this slide before where this is the typical architecture of a cnn there's a fair bit going on here but we're going to step through it one by one we have an input layer just like any other deep learning model we have to input some kind of data we have a bunch of hidden layers in our case in a convolutional neural network you have convolutional layers you often have hidden activations or non-linear activation layers you might have a pooling layer you generally always have an output layer of some sort which is usually a linear layer and so the values for each of these different layers will depend on the problem you're working on so we're going to work towards building something like this and you'll notice that a lot of the code is quite similar to the code that we've been writing before for other pi torch models the only difference is in here is that we're going to use different layer types and so if we want to visualize a cnn in a colored block edition we're going to code this out in a minute so don't worry too much we have a simple cnn you might have an input which could be this image of my dad eating some pizza with two thumbs up we're going to pre-process that input we're going to in other words turn it into a tensor in red green and blue for an image and then we're going to pass it through a combination of convolutional layers relu layers and pooling layers now again this is a thing to note about deep learning models i don't want you to get too bogged down in the order of how these layers go because they can be combined in many different ways in fact research is coming out almost every day every week about how to best construct these layers the overall principle is what's more important is how do you get your inputs into an idolized output that's the fun part and then of course we have the linear output layer which is going to output however many classes or value for however many classes that we have in the case of classification and then if you want to make your cnn deeper this is where the deep comes from deep learning you can add more layers so the theory behind this or the practice behind this is that the more layers you add to your deep learning model the more chances it has to find patterns in the data now how does it find these patterns well each one of these layers here is going to perform just like what we've seen before a different combination of mathematical operations on whatever data we feed it and each subsequent layer receives its input from the previous layer in this case there are some advanced networks that you'll probably come across later in your research and machine learning career that use inputs from layers that are kind of over here all the way down here or something like that they're known as residual connections but that's beyond the scope of what we're covering for now we just want to build our first convolutional neural network and so let's go back to google chrome i'm going to show you my favorite website to learn about convolutional neural networks it is the cnn explainer website and this is going to be part of your extra curriculum for this video is to spend 20 minutes clicking and going through this entire website we're not going to do that together because i would like you to explore it yourself that is the best way to learn so what you'll notice up here is we have some images of some different sort and this is going to be our input so let's start with pizza and then we have a convolutional layer a relu layer a convlayer a relu layer max pool layer column 2 reload 2 comp 2 rel2 max pool 2 this architecture is a convolutional neural network and it's running live in the browser and so we pass this image you'll notice that it breaks down into red green and blue and then it goes through each of these layers and something happens and then finally we have an output and you'll notice that the output has ten different classes here because we have one two three four five six seven eight nine ten different classes of image in this demo here and of course we could change this if we had a hundred classes we might change this to a hundred but the pieces of the puzzle here would still stay quite the same and you'll notice that the class pizza has the highest output value here because our images of pizza if we change to what is this one espresso it's got the highest value there so this is a pretty well performing convolutional neural network then we have a sport car now if we clicked on each one of these something is going to happen let's find out we have a convolutional layer so we have an input of an image here that's 64 64x3 this is color channel's last format so we have a kernel and this kernel this is what happens inside a convolutional land you might be going well there's a lot going on here and yes of course there is if this is the first time you've ever seen this but essentially what's happening is a kernel which is also known as a filter is going over our image pixel values because of course they will be in the format of a tensor and trying to find small little intricate patterns in that data so if we have a look at here and this is why it's so valuable to go through this and just play around with it we start in the top left corner and then slowly move along you'll see on the output on the right hand side we have another little square and do you notice in the middle all of those numbers changing well that is the mathematical operation that's happening as a convolutional layer convolves over our input image how cool is that and you might be able to see on the output there that there's some slight values for like look around the headlight here do you notice on the right how there's some activation there's some red tiles there well that just means that potentially this layer or this hidden unit and i want to zoom out for a second is we have 1 2 3 4 5 6 7 8 9 10 hidden units each one of these is going to learn a different feature about the data and now the beauty of deep learning but also one of the curses of deep learning is that we don't actually control what each one of these learns the magic of deep learning is that it figures it out itself what is best to learn we go into here notice that each one we click on has a different representation on the right hand side and so this is what's going to happen layer by layer as it goes through the convolutional neural network and so if you want to read about what is a convolutional neural network you can go through here but we're going to replicate this exact neural network here with pytorch code that's how i'd prefer to learn it but if you want the intuition behind it the math behind it you can check out all of these resources here that is your extra curriculum for this video so we have an input layer we have a convolutional layer you can see how the input gets modified by some sort of mathematical operation which is of course the convolutional operation and we have there all different numbers finding different patterns and data this is a really good example here you'll notice that the output size slightly changes that'll be a trend throughout each layer and then we can understand the different hyper parameters but i'm going to leave this for you to explore on your own in the next video we're going to start to write pytorch code to replicate everything that's going on here so i'm going to link this in here to find out what's happening inside our cnn see this website here so join me in the next video this is super exciting we're going to build our first convolutional neural network for computer vision i'll see you there welcome back in the last video we went briefly through the cnn explainer website which is my favorite resource for learning about convolutional neural networks and of course we could spend 20 minutes clicking through everything here to find out what's going on with a convolutional neural network or we could start to code one up so how about we do that hey if in doubt code it out so we're going to create a convolutional neural network and what i'm going to do is i'm going to build this or we're going to build this model together in this video and then because it's going to use layers or pie touch layers that we haven't looked at before we're going to spend the next couple of videos stepping through those layers so just bear with me as we code this entire model together we'll go break it down in subsequent videos so let's build our first convolutional neural network that's a mouthful by the way i'm just going to probably stick to saying cnn fashion mnist we're up to model v2 we're going to subclass nn.module as we always do when we're building apply touch model and in here we're going to say model architecture that replicates the tiny vgg and you might be thinking where did you get out from daniel model from cnn explainer website and so oftentimes when convolutional neural networks or new types of architecture come out the authors of the research paper that present the model get to name the model and so that way in the future you can refer to different types of model architectures with just a simple name like tiny vgg and people kind of know what's going on so i believe somewhere on here it's called tiny vgg tiny vgg we have nothing yeah there we go in tiny vgg and do we have more than one tiny tiny yeah tiny vgg and if we look up vgg convnet vgg16 was one of the original ones vgg very deep convolutional neural networks of egg net there's also resnet which is another convolutional neural network you can also i don't want to give you my location google you can go popular cnn architectures and this will give you a fair few options lynette is one of the first alexnet zfnet whole bunch of different resources and also how could you find out more about a convolutional neural network what is a convolutional neural network you could go through that but let's stop that for a moment let's code this one up together so we're going to initialize our class here def init we're going to pass it in an input shape just like we often do we're going to put in a number of hidden units which is an int and we're going to put in an output shape which is an int wonderful so nothing too outlandish that we haven't seen before there and we're going to go super.net to initialize our initializer for lack of a better way of putting it now we're going to create our neural network in a couple of blocks this time and you might often hear in when you learn more about convolutional normal works or i'll just tell you that things are referred to are often referred to as convolutional blocks so if we go back to our keynote this here this combination of layers might be referred to as a convolutional block and a convolutional block a deeper cnn might be comprised of multiple convolutional blocks so to add to the confusion a block is comprised of multiple layers and then an overall architecture is comprised of multiple blocks and so the deeper and deeper your models get the more blocks it might be comprised of and the more layers those blocks may be comprised of within them so it's kind of like lego which is very fun so let's put together an n sequential now the first few layers here that we're going to create in comp block one uh nn.com 2d oh look at that us writing us our first cnn layer and we have to define something here which is in channels so this channels refers to the number of channels in your visual data and we're going to put in input shape so we're defining the input shape this is going to be the first layer in our model the input shape is going to be what we define when we instantiate this class and then the out channels oh what's the out channels going to be well it's going to be hidden units just like we've done with our previous models now the difference here is that in nn comp2d we have a number of different hyper parameters that we can set i'm going to set some pretty quickly here but then we're going to step back through them not only in this video but in subsequent videos we've got a fair bit going on here we've got in channels which is our input shape we've got out channels which are hidden units we've got a kernel size which equals 3 or this could be a tuple as well three by three but i just like to keep it as three we've got a stride and we've got padding now because these are values we can set ourselves what are they referred to as let's write this down values we can set ourselves in our neural networks in our nns neural networks are called hyper parameters so these are the hyper parameters of nn conf 2d and you might be thinking what what is 2d for well because we're working with two dimensional data our images have height and width there's also com1d for one dimensional data 3d for three-dimensional data we're going to stick with 2d for now and so what do each of these hyper parameters do well before we go through what each one of them do we're going to do that when we step by step through this particular layer what we've just done is we've replicated this particular layer of the cnn explainer website we've still got the relu we've still got another con vanarellu and a max pool and a column and a relu and a column and a reload and a max pool but this is the block i was talking about this is one block here of this neural network or at least that's how i've broken it down and this is another block you might notice that they're comprised of the same layers just stacked on top of each other and then we're going to have an output layer and if you want to learn about where the hyper parameters came from what we just coded where could you learn about those well one you could go of course to the python documentation pytorch nn com2d you can read about it there there's the mathematical operation that we talked about or briefly stepped on before or touched on stepped on is that the right word so create a comp layer it's there but also this is why i showed you this beautiful website so that you can read about these hyper parameters down here understanding hyper parameters so your extra curriculum for this video is to go through this little graphic here and see if you can find out what padding means what the kernel size means and what the stride means i'm not going to read through this for you you can have a look at this interactive plot we're going to keep coding because that's what we're all about here if in doubt code it out so we're going to now add a relu layer and then after that we're going to add another conv 2d layer and the in channels here is going to be the hidden units because we're going to take the output size of this layer and use it as the input size to this layer we're going to keep going here our channels equals hidden units again in this case and then the kernel size is going to be 3 as well stride will be one padding will be one now of course we can change all of these values later on but just bear with me while we set them how they are we'll have another relu layer and then we're going to finish off with the nn max pool 2d layer again the 2d comes from the same reason we use conf2d we're working with 2d data here and we're going to set the kernel size here to be equal to 2 and of course this can be a tuple as well so it can be 2 2. now where could you find out about nn max pool 2d well we go nnn max pool 2d what does this do applies a 2d max pooling over an input signal composed of several input planes so it's taking the max of an input and we've got some parameters here kernel size the size of the window to take the max over now where have we seen a window before i'm just going to close these we come back up where do we see a window let's dive into the max pool layer see where my mouse is do you see that 2x2 well that's a window now look at the difference between the input and the output what's happening well we have a tile that's two by two a window of four and the max we're taking the max of that tile in this case it's zero let's find the actual value there we go so if you look at those four numbers in the middle inside the max brackets we have 0.07 0.09 0.06 0.05 and the max of all those is 0.09 and you'll notice that the input and the output shapes are different the output is half the size of the input so that's what max pooling does is it tries to take the max value of whatever its input is and then outputs it on the right here and so as our data this is a trend in all of deep learning actually as our image moves through this is what you'll notice notice all the different shapes here even if you don't completely understand what's going on here you'll notice that the two values here on the left start to get smaller and smaller as they go through the model and what our model is trying to do here is take the input and learn a compressed representation through each of these layers so it's going to smush and smush and smush trying to find the most generalizable patterns to get to the ideal outputs and that input is eventually going to be a feature vector to our final layer so a lot going on there but let's keep coding what we've just completed is this first block we've got a convly a relu layer a comms layer a relu layer and a max pool layer look at that columns layer relu layer convlayer reload layer max pool should we move on to the next block we can do this one a bit faster now because we've already coded the first one so i'm going to do nn.sequential as well and then we're going to go nn.com2d we're going to set the in channels what should the in channels be here well we're going to set it to hidden units as well because our network is going to flow just straight through all of these layers and the output size of this is going to be hidden units and so we want the in channels to match up with the previous layers out channels so then we're going to go out channels equals hidden units as well we're going to set the kernel size kernel size equals three stride equals one padding equals one then what comes next well because the two blocks are identical the comp block one and conf two we can just go the exact same combination of layers nnrelu and then com2d in channels equals hidden units out channels equals you might already know this hidden units then we have kernel size equals three oh 32 don't want it that big stride equals one padding equals one what comes next well we have another relu layer relu and then what comes after that we have another max pool and then max pool 2d kernel size equals two beautiful now what have we coded up so far we've got this block number one that's what this one on the inside here and then we have com2 relu 2 com2 reload 2 max pool 2. so we've built these two blocks now what do we need to do well we need an output layer and so what did we do before when we made model one we flattened the inputs of the final layer before we put them to the last linear layer so flatten so this is going to be the same kind of setup as our classifier layer now i say that on purpose because that's what you'll generally hear the last output layer in a classification model called is a classifier layer so we're going to have these two layers are going to be feature extractors in other words they're trying to learn the patterns that best represent our data and this final layer is going to take those features and classify them into our target classes whatever our model thinks best suits those features or whatever our model thinks those features that it learned represents in terms of our classes so let's code it out we'll go down here let's build our classifier layer this is our biggest neural network yet you should be very proud we have nn.sequential again and we're going to pass in nn.flatten because the output of these two blocks is going to be a multi-dimensional tensor something similar to this size 13 13 10 so we want to flatten the outputs into a single feature vector and then we want to pass that feature vector to an nn dot linear layer and we're going to go in features equals hidden units times something times something now the reason i do this is because we're going to find something out later on or times 0 just so it doesn't error but sometimes calculating what you're in features needs to be is quite tricky and i'm going to show you a trick that i use later on to figure it out and then we have out features relates to our output shape which will be the length of how many classes we have right one value for each class that we have and so with that being said let's now that we've defined all of the components of our tiny vgg architecture there's a lot going on but this is the same methodology we've been using the whole time defining some components and then putting them together to compute in some way in a forward method so forward self x how are we going to do this well we're going to set x is equal to self conf block 1 x so x is going to go through comp block 1 it's going to go through the comp2d layer relu layer 2d layer relu layer max pool layer which will be the equivalent of an image going through this layer this layer this layer this layer this layer and then ending up here so we'll set it to that and then we can print out x dot shape to get its shape we'll check this later on then we pass x through block 2 which is just going to go through all of the layers in this block which is equivalent to the output of this layer going through all of these layers and then because we've constructed a classifier layer we're going to take the output of this block which is going to be here and we're going to pass it through our output layer or what we've termed it our classifier layer i'll just print out x dot shape here so we can track the shape as our model moves through the architecture x equals self dot classifier x and then we're going to return x look at us go we just built our first convolutional neural network by replicating what's on a cnn explainer website now that is actually very common practice in machine learning is to find some sort of architecture that someone has found to work on some sort of problem and replicate it with code and see if it works on your own problem you'll see this quite often and so now let's instantiate a model we'll go torch dot manual c we're going to instantiate our first convolutional neural network model 2 equals fashion mnist we will go model v2 and we are going to set the input shape now what will the input shape be well a comp 2d layer up here the input shape is the number of channels in our images so do we have an image ready to go image shape this is the number of color channels in our image we have one if we had color images we would set the input shape to three so the difference between our convolutional neural network our cnn tiny vgg and the cnn explainer tiny vgg is that they are using color images so their input is three here so one for each color channel red green and blue whereas we have black and white images so we have only one color channel so we set the input shape to one and then we're going to go hidden units equals 10 which is exactly the same as what tiny vgg has used 10 1 2 3 4 5 6 7 8 9 10. so that sets the hidden units value of each of our layers that's the power of creating an initializer with hidden units and then finally our output shape is going to be what we've seen this before this is going to be the length of our class names one value for each class in our data set and of course we're going to send this model to the device we're going to hit shift and enter oh no what did we get wrong out channels output shape where did i spell wrong out channels out channels out channels i forgot an l of course typo oh kernel size another typo did you notice that kernel size kernel size kernel size kernel size where did we spell this from oh here kernel size are there any other typos probably beautiful there we go okay what have we got initializing zero opt-ins is a non-op oh so we've got an issue here an error here because i've got this but this is just to there's a trick to calculating this we're going to cover this in another video but pat yourself on the back we've written a fair bit of code here this is a convolutional neural network that replicates the tiny vgg architecture on the cnn explainer website now don't forget your extracurriculum is to go through this website for at least 20 minutes and read about what's happening in our models we're focused on code here but this is particularly where you want to pay attention to if you read through this understanding hyper parameters and play around with this the next couple of videos will make a lot more sense so read about padding read about kernel size and read about stride i'll see you in the next video we're going to go through our network step by step welcome back now i'm super stoked because in the last video we coded together our first ever convolutional neural network in pi torch so well done we replicated the tiny vgg architecture from the cnn explainer website my favorite place for learning about cnns in the browser so now we introduced two new layers that we haven't seen before com 2d and max pool 2d but they all have the same sort of premise of what we've been doing so far is that they're trying to learn the best features to represent our data in some way shape or form now in the case of max pool 2d it doesn't actually have any learnable parameters it just takes the max but we're going to step through that later on let's use this video to step through nnn com 2d we're going to do that with code so i'll make a new heading here 7.1 stepping through and then conf 2d beautiful now where could we find out what's going on in nn conf2d well of course we have the documentation nn comp 2d we've got pi torch so if you want to learn the mathematical operation that's happening we have this value here this operation here essentially it's saying the output is equal to the bias term times something plus the sum of the weight times something times the input so do you see how just the weight matrix the weight tensor and the bias value manipulating our input in some way equals the output now if we map this we've got batch size channels in height width channels out out out etc etc but we're not going to focus too much on this if you'd like to read more into that you can let's try it with code and we're going to reproduce this particular layer here the first layer of the cnn explainer website and we're going to do it with a dummy input in fact that's one of my favorite ways to test things so i'm just going to link here the documentation see the documentation for nnn com 2d here and if you'd like to read through more of this of course this is a beautiful place to learn about what's going on there's the shape how to calculate the shape height out without etc that's very helpful if you need to calculate input and output shapes but i'll show you my trick for doing so later on we have here let's create some dummy data so i'm going to set torch manual seed we need it to be the same size as our cnn explainer data so 64 64 3 but we're going to do a pie torch style this is color channel's last we're going to do color channels first so how about we create a batch of images we're going to be writing torch dot rand n and we're going to pass in size equals 32 3 64 64. and then we're going to create a singular image by taking the first of that so images 0. now let's get the image batch shape because a lot of machine learning as i've said before and deep learning is making sure your data has the right shape so let's check images.shape and let's check single image shape we're going to go test image dot shape and finally we're going to print what does the test image look like we'll get this on a new line hey new line test image this is of course not going to be an actual image it's just going to be a collection of random numbers and of course that is what our model is currently comprised of model 2 if we have a look at what's on the insides we are going to see a whole bunch of random numbers look at all this what do we have we scroll up is it going to give us a name for something we have conf block 2 2 we have a weight we have a bias keep going up if we go right to the top we have another weight keep going down we have a bias a weight etc etc now our model is comprised of random numbers and what we are trying to do is just like all of our other models is pass data in and adjust the random numbers within these layers to best represent our data so let's see what happens if we pass some random data through one of our comp2d layers so let's go here we're going to create a single conv 2d layer so com layer equals what does it equal and then com2d and we're going to set the in channels is equal to what oh revealed the answer too quickly three why is it three well it's because the in channels is the same number of color channels as our images so if we have a look at our test image shape what do we have three it has three color channels that is the same as the value here except the order is reversed this is color channels last pi torch defaults to color channels first so or for now it does in the future this may change so just keep that in mind so out channels equals 10 this is equivalent to the number of hidden units we have one oh i don't want that one just yet one 2 3 4 5 6 7 8 9 10. so we have that 10 there so we have 10 there then we have kernel size oh what is the kernel well it's not kfc i can tell you that and then we have stride and then we have padding we're going to step through these in a second but let's check out the kernel and this kernel can also be three by three but it's a shortcut to just type in three so that's what it actually means if you just type in a single number it's equivalent to typing in a tuple now of course you could find that out by reading through the documentation here but where did i get that value well let's dive into this beautiful website and let's see what's happening so we have a kernel here which is also called a filter so the thing i'm talking about is this little square here this kernel oh we can see the weights there up the top this is how beautiful this website is so if we go over there this is what's going to happen this is a convolution it starts with this little square and it moves pixel by pixel across our image and you'll notice that the output is creating some sort of number there and you'll notice in the middle we have a mathematical operation this operation here is what's happening here right weight times the input that's what we've got there now the beauty of pytorch is it does all of this behind the scenes for us so again if you'd like to dig more into the mathematical operation behind the scenes you've got the resource here and you've also got plenty of other resources online we're going to focus on code for now so if we keep doing this across our entire image we get this output over here so that's the kernel and now where did i get three by three from well look at this one two three one two three one two three three by three we have nine squares now if we scroll down this was your extracurriculum for the last video understanding hyper parameters what happens if we change the kernel size to three by three have a look at the red square on the left now if we change it to two by two it changed again three by three this is our kernel also known as a filter passing across our image performing some sort of mathematical operation and now the whole idea of a convolutional layer is to try and make sure that this kernel performs the right operation to get the right output over here now what do these kernels learn well that is entirely up to the model that's the beauty of deep learning is that it learns how to best represent our data hopefully on its own by looking at more data and then so if we jump back in here so that's the equivalent of setting kernel size three by three what if we set the stride equal to one have we got this in the right order it doesn't really matter let's go through stride next if we go to here what does stride say stride of the convolution of the convolving kernel the default is one wonderful now if we set the stride or if we keep it at one it's a default one it's going to hop over watch the red square on the left it's going to hop over one pixel at a time so the convolution the convolving happens one pixel at a time that's what the stride sets now watch what happens when i change the stride value to the output shape wow do you notice that it went down so we have here the kernel size is still the same but now we're jumping over two pixels at a time notice how on the left two pixels become available and then if i jump over again two pixels so the reason why the output compresses is because we're skipping some pixels as we go across the image and now this pattern happens throughout the entire network that's one of the reasons why you see the size of our input or the size of each layer go down over time what our convolutional layer is doing and in fact a lot of deep learning neural networks do is they try to compress the input into some representation that best suits the data because it would be no point of just memorizing the exact patterns you want to compress it in some way otherwise you just might as well move your input data around you want to learn generalizable patterns that you can move around and so we keep going we've got padding equals 0. let's see what happens here if we change the padding value what happens up down notice the size here oh we've added two extra pixels around the edge now if we go down one extra pixel now if we go zero now why might we do that if we add some padding on the end well that's so that our kernel can operate on what's going on here in the corner in case there's some information on the edges of our image now you might be thinking daniel there's a whole bunch of values here how do we know what to set them well you notice that i've just copied exactly what is going on here there's a three by three kernel there's no padding on the image and the stride is just going one by one and so that's often very common in machine learning is that when you're just getting started and you're not sure what values to set these values to you just copy some existing values from somewhere and see if it works on your own problem and then if it doesn't well you can adjust them so let's see what happens when we do that so pass the data through the convolutional layer so let's see what happens comb output equals convlayer let's pass it our test image and we'll check the conv output what happens oh no we get an error of course we get a shape error one of the most common issues of machine learning and deep learning so this is saying that our input for the convlayer expects a four-dimensional tensor except it got a three-dimensional input size of 3 64 64. now how do we add an extra dimension to our test image let's have a look how would we add a batch dimension over on the left here we can go unsqueeze zero so now we have a four dimensional tensor now just keep in mind that if you're running this layer and then conf 2d on a pi torch version that is i believe they fixed this or they changed it in pie torch what am i on i think this google collab instance is on 1.10 i think you might not get this error if you're running 1.11 so just keep that in mind like this should work if you're running 1.11. but if it doesn't you can always unsqueeze here and let's see what happens look at that we get another tensor output again this is just all random numbers though because our test image is just random numbers and our conv layer is instantiated with random numbers but we'll set the manual seed here now if our numbers are different to what's if your numbers are different to what's on my screen don't worry too much why is that because our convlayer is instantiated with random numbers and our test image is just random numbers as well what we're paying attention to is the input and output shapes do you see what just happened we put our input image in there with three channels and now because we've set out channels to be 10 we've got 10 and we've got 62 62 and this is just the batch size it just means one image so essentially our random numbers our test image have gone through the convolutional layer that we created have gone through this mathematical operation with regards to all the values that we've set we've put the weight tense out well actually pi torch created that for us pytorch has done this whole operation for us thank you pytorch it's gone through all of these steps across you could code this all by hand if you want but it's a lot easier and simpler to use a pi torch layer and it's done this and now it's created this output now whatever this output is i don't know it is random numbers but this same process will happen if we use actual data as well so let's see what happens if we change the values kernel size we increase notice how our output has gotten smaller because we're using a bigger kernel to convolve across the image what if we put this to 3 3 back to what it was and stride of 2 what do you think will happen well our output size basically halves because we're skipping 2 pixels at a time we'll put that back to 1. what do you think will happen if we set padding to one 64 64 we get basically the same size because we've added an extra pixel around the edges so you can play around with this and in fact i encourage you to do this is what we just did padding one we just added an extra dummy zero pixel around the edges so practice with this see what happens as you pass our test image random numbers through a com2d layer with different values here what do you think will happen if you change this to 64. give that a shot and i'll see you in the next video who's ready to step through the nn max pool 2d layer put your hand up i've got my hand up so let's do it together hey we've got 7.2 now you might have already given this a shot yourself stepping through nn max pool 2d and this is this is what i do for a lot of different concepts that i haven't gone through before is i just write some test code and see what the inputs and outputs are and so where could we find out about max pool 2d well of course we've got the documentation i'm just going to link this in here max pool 2d in the simplest case the output value of layer with input size nchw output and c h out w out by the way this is number of batches color channels height width and this is the output of that layer and kernel size which is a parameter up here k h khkw can be precisely described as out is going to be the max of some value depending on the kernel size and the stride so let's have a look at that in practice and of course you can read further through the documentation here i'll just grab the link for this actually so it's here wonderful and let's now first try it with our test image that we created above so just highlight what the test image is a bunch of random numbers in the same shape as what a single image would be if we were to replicate the image size of the cnn explainer by the way we'll have a look at a visual in a second of max pool here but you can go through that on your own time let's if in doubt code it out so we're going to print out the original image shape without unsqueezed dimension because recall that we had to add an extra dimension to pass it through our conf 2d layer now if you're using a later version of pi torch you might not get an error if you only use a three-dimensional image tensor and pass it through a comp layer so we're going to pass it in test image original shape test image dot shape so this is just going to tell us what the line of code in the cell above tells us but that's fine i like to make pretty printouts you know test image with unsqueezed dimension so this is just going to be our test image and we're going to see what happens when we unsqueeze a dimension unsqueeze on the zeroth dimension that is i was about to say first but it's the zero now we're going to create a sample nn max pool 2d layer because remember even layers themselves in torch.nn are models of their own accord so we can just create a single this is like creating a single layer model here we'll set the kernel size equal to two and recall if we go back to cnn explainer kernel size equal to 2 results in a 2 by 2 square a 2 by 2 kernel that's going to convolve over our image like so and this is an example input an example output and you can see the operation that max pooling does here so just keep that in mind as we pass some sample data through our max pool layer and now let's pass data through it i actually we'll pass it through just the commv layer first through just the comm layer because that's sort of how you might stack things you might put a convolutional layer and then a max pool layer on top of that convolutional layer so test image through conv we'll create a variable here equals our comp layer is going to take as an input our test image dot unsqueeze on the zeroth dimension again beautiful now we're going to print out the shape here this is just highlighting how i like to troubleshoot things is i do one step print the shape one step print the shape see what is happening as our data moves through various layers so test image through conv.shape we'll see what our convlayer does to the shape of our data and then we're going to pass data through the max pool layer which is the layer we created a couple of lines above this one here so let's see what happens test image through can't type at the moment through commv and max pool so quite a long variable name here but this is to help us avoid confusion of what's going on so we go test image through comp so you notice how we're taking the output of our convolutional layer this here and we're passing it through our max pool layer which has another typo wonderful and finally we'll print out the shape shape after going through convlayer and max pool layer what happens here so we want test image through conv and max pool let's see how our max pool layout manipulates our test images shape you ready three two one let's go what do we get okay so we have the test image original shape recall that our test image is just a collection of random numbers and of course our convlayer is going to be instantiated with random numbers and max pool actually has no parameters it just takes the maximum of a certain range of in a tensor so when we unsqueeze the test image as the input we get an extra dimension here when we pass it through our comp layer oh where did this 64 come from 1 64 64 64. let's go back up to our conv layer do you notice how that we get the 64 there because we changed the out channel's value if we change this back to 10 like what's in the cnn explainer model 1 2 3 4 5 6 7 8 9 10. what do you think will happen there well we get a little highlight here 10 then we keep going i'll just get rid of this extra cell we don't need to check the version anymore we'll check the test image shape still 364 64. but then as we pass it through the convlayer here we get a different size now so it originally had three channels as the input for color channels but we've upscaled it to 10 so that we have 10 hidden units in our layer and then we have 6464. now again these shapes will change if we change the values of what's going on here so we might put padding to zero what happens there instead of 6464 we get 6262. and then what happens after we pass it through the conv layer and then through the max pool layer we've got 1 10 64 64 and now we have 1 10 32 32 now why is that well let's go back into the cnn explainer jump into this max pool layer here maybe maybe this one because it's got a bit more going on do you notice on the left here is the input and we've got a two by two kernel here and so the max pooling layer what it does is it takes the maximum of whatever the input is so you'll notice the input is 60 60 in this case whereas the output over here is 30 30. now why is that well because the max operation here is reducing it from section of four numbers so let's get one with a few different numbers there we go that'll do so it's taking it from four numbers and finding the maximum value within those four numbers here now why would it do that so as we've discussed before what deep learning neural network is trying to do or in this case a cnn is take some input data and figure out what features best represent whatever the input data is and compress them into a feature vector that is going to be our output now the reason being for that is because you could consider it from a neural network's perspective is that intelligence is compression so you're trying to compress the patterns that make up actual data into a smaller vector space go from a higher dimensional space to a smaller vector space in terms of dimensionality of a tensor but still this smaller dimensionality space represents the original data and can be used to predict on future data so that's the idea behind max pool is hey if we've got these learned features from our convolutional layers will the patterns will the most important patterns stay around if we just take the maximum of a certain section so do you notice how the input here we still have you can still see the outline of the car here albeit a little bit more pixelated but just by taking the max of a certain region we've got potentially the most important feature of that little section and now of course you could customize this value here if when we create our max pool layer you could increase the kernel size to four by four what do you think will happen if we can increase it to four so here we've got a two by two kernel if we increase it to four by four what happens ah do you notice that we've gone from 62 to 15 we've essentially divided our feature space by four we've compressed it even further now will that work well i'm not sure that's part of the experimental nature of machine learning but we're going to keep it at two for now and so this is with our tensor here 6464 but now let's do the same as what we've done above but we'll do it with a smaller tensor so that we can really visualize things and we're going to just replicate the same operation that's going on here so let's go here we'll create another random tensor we'll set up the manual c first and we're going to create a random tensor with a similar number of dimensions now recall dimensions don't tell you so this is a dimension 1 3 64 64. that is a dimension now dimensions can have different values within themselves so we want to create a four dimensional tensor to our images so what that means is let me just show you it's way easier to explain things when we've got code is torch dot rand and we're going to set it up as size equals one one two two we can have a look at this random tensor it's got four dimensions one two three four so you could have a batch size color channels and height width a very small image but it's a random image here but this is quite similar to what we've got going on here right four numbers now what do you think will happen if we create a max pool layer just like we've done above create a max pool layer so we'll go max pool layer just repeating the code that we have in the cell above that's all right a little bit of practice kernel size equals two and then we're going to pass the random tensor through the max pool layer so we'll go max pool tense up equals max pull layout and we're going to pass it in the random tensor wonderful and then we can print out some shapes and print out some tensors as we always do to visualize visualize so we're going to write in here max pool tensor on a new line we'll get in the max pool tensor we'll see what this looks like and we'll also print out max pull tensor shape and we can probably print out random tensor itself as well as its shape as well we'll get the shape here dot shape and we'll do the same for the random tensor so print get a new line random tensor new line random tensor and then we'll get the shape random tensor shape random tensor oh a lot of coding here but that's that's the fun part about machine learning right you get to write lots of code okay so we're visualizing what's going on with our random tensor this is what's happening within the max pool layer we've seen this from a few different angles now so we have a random tensor of numbers we've got a size here but the max pool tensor once we pass our random tensor through the max pool layer what happens well we have point three three six seven one two eight eight two three four five two three o three now what's the max of all these well it takes the max here it's three three six seven oh and we've got the random tensor down there we don't want that and see how we've reduced the shape from two by two to one by one now what's going on here just for one last time to reiterate the convolutional layer is trying to learn the most important features within an image so if we jump into here now what are they well we don't decide what a convolutional layer learns it learns these features on its own so the convolutional layer learns those features we pass them through a relu non-linear activation in case our data requires nonlinear functions and then we pass those learned features through a max pool layer to compress them even further so the convolutional layer can compress the features into a smaller space but the max pooling layer really compresses them so that's the entire idea one more time we start with some input data we design a neural network in this case a convolutional neural network to learn a compressed representation of what our input data is so that we can use this compressed representation to later on make predictions on images of our own and in fact you can try that out if you wanted to click here and add your own image so i'd give that a go that's your extension for this video but now we've stepped through the max pool 2d layer and the conf 2d layer i think it's time we started to try and use our tiny vgg network this is your challenge is to create a dummy tensor and pass it through this model pass it through its forward layer and see what happens to the shape of your dummy tensor as it moves through comp block one and conv block two and i'll show you my trick to calculating the in features here for this final layer which is equivalent to this final layer here i'll see you in the next video over the last few videos we've been replicating the tiny vgg architecture from the cnn explainer website and i hope you know that this is it's actually quite exciting because years ago this would have taken months of work and we've just covered we've broken it down over the last few videos and rebuilt it ourselves with a few lines of pie torch code so that's just goes to show how powerful pytorch is and how far the deep learning field has come but we're not finished yet let's just go over to our keynote this is what we've done cnn explainer model we have an input layer we've created that we have com 2d layers we've created those we have reload activation layers we've created those and finally we have pulling layers and then we finish off with an output layer but now let's see what happens when we actually pass some data through this entire model and as i've said before this is actually quite a common practice is you replicate a model that you found somewhere and then test it out with your own data so we're going to start off by using some dummy data to make sure that our model works and then we're going to pass through oh i've got another slide for this by the way here's a breakdown of torch and encomp 2d if you'd like to see it in text form nothing here that we really haven't discussed before but this will be in the slides if you would like to see it then we have a video animation we've seen this before though and plus i'd rather you go through the cnn explainer website on your own and explore these different values rather than me just keep talking about it here's what we're working towards doing we have a fashion mness dataset and we have our inputs we're going to numerically encode them we've done that already then we have our convolutional neural network which is a combination of convolutional layers non-linear activation layers pooling layers but again these could be comprised in many different ways shapes and forms in our case we've just replicated the tiny vgg architecture and then finally we want to have an output layer to predict what class of clothing a particular input image is and so let's go back we have our cnn model here and we've got model 2. so let's just practice a dummy forward pass here we're going to come back up a bit to where we were we'll make sure we've got model 2 and we get an error here because i've times this by zero so i'm going to just remove that and keep it there let's see what happens if we create a dummy tensor and pass it through here now if you recall what our image is do we have image this is a fashion mnist image so i wonder if we can go plot dot m not m show image and i'm going to squeeze that and i'm going to set the c map equal to gray so this is our current image wonderful so there's our current image so let's create a tensor or maybe we just try and pass this through the model and see what happens how about we try that model image right we're going to try the first pass forward pass so pass image through model what's going to happen well we get an error another shape mismatch we've seen this before how do we deal with this because what is the shape of our current image 1 28 28 now if you don't have this image instantiated you might have to go back up a few cells where did we create image i'll just find this so just we created this a fairly long time ago so i'm going to probably recreate it down the bottom my goodness we've written a lot of code well done to us we could create a dummy tensor if we wanted to how about we do that and then if you want to find oh right back up here we have an image how about we do that we can just do it with a dummy tensor that's fine we can create one of the same size but if you have image instantiated you can try that out so there's an image let's now create an image that is or a random tensor that is the same shape as our image so rand image tensor equals what torch.rand and we're going to pass in size equals 1 28 28. then if we get rand image tensor we check its shape what do we get so the same shape as our test image here but it's just going to be random numbers but that's okay we just want to highlight a point here of input and output shapes we want to make sure our model works can our random image tends to go all the way through our model that's what we want to find out so we get an error here we have four dimensions but our image is three dimensions how do we add an extra dimension for batch size now you might not get this error if you're running a later version of pi torch just keep that in mind so unsqueeze zero oh expected all tenses to be on the same device but found at least two devices ah again we're going through all the three major issues in deep learning shape mismatch device mixed match data type mismatch so let's put this on the device to target device because we've set up device agnostic code map1 and mat2 shapes cannot be multiplied oh but we can output here that is very exciting so what i might do is move this a couple of cells up so that we can tell what's going on i'm going to delete this cell so where do these shapes come from well we've printed out the shapes there and so this is what's happened when our i'll just create our random tensor i'll bring our random tensor up a bit too let's bring this up there we go so we pass our random to image tensor through our model and we've made sure it's got four dimensions by unsqueeze zero and we make sure it's on the same device as our model because our model has been sent to the gpu and this is what happens as we pass our random image tensor we've got 128.28 instead of previously we've seen 64 64 3 i'm just going to clean this up a bit and we get different shapes here so you'll notice that as our input if it was 64 64 3 goes through these layers it gets shaped into different values now this is going to be universal across all of the different data sets you work on you will be working with different shapes so it's important to and also quite fun to troubleshoot what shapes you need to use for your different layers so this is where my trick comes in to find out the shapes for different layers i often construct my models how we've done here as best i can with the information that i've got such as replicating what's here but i don't really know what the output shape is going to be before it goes into this final layer and so i recreate the model as best i can and then i pass data through it in the form of a dummy tensor in the same shape as my actual data so we could customize this to be any shape that we wanted and then i print the shapes of what's happening through each of the forward pass steps and so if we pass it through this random tensor through the first column block it goes through these layers here and then it outputs a tensor with this size so we've got 10 because that's how many output channels we've set and then 1414 because our 2828 tensor has gone through a max pool 2d layer and gone through a convolutional layer and then it goes through the next block column block two which is because we've put it in the forward method here and then it outputs the shape and if we go back down we have now a shape of one ten seven seven so our previous tensor the output of com block one has gone from 1414 to seven seven so it's been compressed so let me just write this down here output shape of column block one just so we get a little bit more information and i'm just going to copy this put it in here that will become block two and then finally i want to know if i get an output shape of classifier so if i rerun all of this i don't get an output shape of classifier so my model is running into trouble once it gets to so i get the output of com block one i don't get an output of classifier so this is telling me that i have an issue with my classifier layer now i know this because well i've coded this model before and the in features here we need a special calculation so what is going on with our shapes mat 1 and mat two shapes cannot be multiplied so do you see here what is the rule of matrix multiplication the inner dimensions here have to match we've got 490 where could that number have come from and we've got 10 times 10. now okay i know i've set hidden units to 10. so maybe that's where that 10 came from and what is the output layout or the output shape of conf block 2 so if we look we've got the output shape of com block 2 where does that go the output of conv block 2 goes into our classifier model and then it gets flattened so that's telling us something there and then our nn linear layer is expecting the output of the flattened layer as its in features so this is where my trick comes into play i pass the output of convblock 2 into the classifier layer it gets flattened and then that's what my an endnote linear layer is expecting so what happens if we flatten this shape here do we get this value let's have a look so if we go 10 times 7 times 7 490 now where was this 10 well that's our hidden units and where were these sevens well these sevens are the output of com block two so that's my trick i print the shapes of previous layers and see whether or not they line up with subsequent layers so if we go times seven times seven we're going to have hidden units equals ten times seven times seven where do we get the two sevens because that is the output shape of conf block two do you see how this can be a little bit hard to calculate ahead of time now you could have calculate this by hand if you went into m column two d but i prefer to write code to calculate things for me you can calculate that value by hand if you go through h out w out you can add together all of the different parameters and multiply them and divide them and whatnot you can calculate the input and output shapes of your convolutional layers you're more than welcome to try that out by hand but i prefer to code it out if in doubt code it out now let's see what happens if we run our random image tensor through our model now do you think it will work well let's find out all we've done is we've added this little line here times seven times seven and we've calculated that because we've gone huh what if we pass a tensor of this dimension through a flattened layer and what is our rule of matrix multiplication the inner dimensions here must match and why do we know that these are matrices well mat 1 and mat 2 shapes cannot be multiplied and we know that inside a linear layer is a matrix multiplication so let's now give this a go we'll see if it works would you look at that that is so exciting we have the output shape of the classifier is one and ten we have a look we have one number one two three four five six seven eight nine ten one number for each class in our data set wow just like the cnn explainer website we have 10 outputs here we just happen to have 10 classes as well now this number again could be whatever you want it could be 100 could be 30 could be 3 depending on how many classes you have but we have just figured out the input and output shapes of each layer in our model so that's very exciting i think it's now time we've passed a random tensor through how about we pass some actual data through our model in the next video let's use our train and test step functions to train our first convolutional neural network i'll see you there let's get ready to train our first cnn so what do we need where are we up to in the workflow well we've built a model and we've stepped through it we know what's going on but let's really see what's going on by training this cnn or see if it trains because we don't always know if it will on our own data set which is of fashion mnist so we're going to set up a loss function and optimizer for model 2 and just as we've done before model 2 turn that into markdown i'll just show you the workflow again so this is what we're doing we've got some inputs we've got a numerical encoding we've built this architecture and hopefully it helps us learn or helps us make a predictive model that we can input images such as grayscale images of clothing and predict and if we look where we are at the pi torch workflow we've got our data ready we've built our next model now here's where we're up to picking a loss function and an optimizer so let's do that hey loss function or we can do evaluation metrics as well so setup loss function slash eval matrix slash optimizer and we want from helper functions import accuracy function we don't need to re-import it but we're going to do it anyway for completeness loss function equals nn dot cross entropy loss because we are working with a multi-class classification problem and the optimizer we're going to keep the same as what we've used before torch dot optim sgd and we'll pass it in this time the params that we're trying to optimize are the parameters of model 2 parameters and we'll use a learning rate of 0.1 run that and just to reiterate here's what we're trying to optimize model 2 state dick we have a lot of random weights in model 2. have a look at all this there's the bias there's the weight we're going to try and optimize these to help us predict on our fashion mnist dataset so without any further ado let's in the next video go to the workflow we're going to build our training load but thanks to us before we've now got functions to do this for us so if you want to give this a go use our train step and test step function to train model 2. try that out and we'll do it together in the next video we're getting so close to training our model let's write some code to train our first singing end model training and testing i'm just going to make another heading here model 2 using our training and test functions so we don't have to rewrite all of the steps in a training loop and a testing loop because we've already created that functionality before through our train step function there we go performs a training or this should be performs a training step with model trying to learn on data loader so let's uh set this up i'm going to set up torch manual seed 42 and we can set up a cuda manual seed as well just to try and make our experiments as reproducible as possible because we're going to be using cuda we're going to measure the time because we want to compare our models not only their performance in evaluation metrics but how long they take to train from time it because there's no point having a model that performs really really well but takes 10 times longer to train well maybe there is depending on what you're working on model 2 equals timer and we're going to train and test model but that the time is just something to be aware of is that usually a better performing model will take longer to train not always the case but just something to keep in mind so for epoch in we're going to use tqdm to measure the progress we're going to create a range of epochs we're just going to train for three epochs keeping our experiment short for now just to see how they work epoc and we're going to print a new line here so for an epoch in a range we're going to do the training step which is our train step function the model is going to be equal to model 2 which is our convolutional neural network our tiny vgg the data loader is just going to be equal to the trained data loader the same one we've used before the loss function is going to be equal to the loss function that we've set up above los fn the optimizer as well is going to be the optimizer in our case stochastic gradient descent optimizer equals optimizer then we set up the accuracy function which is going to be equal to our accuracy function and the device is going to be the target device how easy was that now we do the same for the train or the testing step sorry the model is going to be equal to model 2 and then the data loader is going to be the test data loader and then the loss function is going to be our same loss function and then we have no optimizer for this we're just going to pass in the accuracy function here and then of course the device is going to be equal to the device and then what do we do now well we can measure the end time so that we know how long the code here took to run so let's go train time end for model 2 this will be on the gpu by the way but this time it's using a convolutional neural network and the total train time total train time for model 2 is going to be equal to print train time our function that we created before as well to help us measure start and end time so we're going to pass in train to time start model 2 and then end is going to be train time and model 2 and then we're going to print out the device that it's using as well so you're ready are you ready to train our first convolutional neural network hopefully this code works we've created these functions before so it should be all right but if in doubt code it out if and out run the code let's see what happens oh my goodness oh of course oh we forgot to comment out the output shapes so we get a whole bunch of outputs for our model because uh what have we done back up here we forgot to so this means every time our data goes through the forward pass it's going to be printing out the output shapes so let's just um comment out these and i think this cell is going to take quite a long time to run because it's got so many printouts yeah see streaming output truncated to the last 5000 lines so we're going to try and stop that okay there we go beautiful that actually worked sometimes it doesn't stop so quickly so we're going to rerun our fashion mns v2 model cell so that we comment out these print lines and then we'll just rerun these cells down here just go back through fingers crossed there's no errors and we'll train our model again beautiful not as many printouts this time so here we go our first cnn is training how do you think it'll go well that's why we have printouts right so we can see the progress so you can see here all the functions that have been called behind the scenes from pytorch so thank you to pytorch for that there's our oh our train step function was in there train step wonderful beautiful so there's epoch zero oh we get a pretty good test accuracy how good is that oh test accuracy is climbing as well have we beaten our baseline we're looking at about 14 seconds per epoch here and then the final epoch what do we finish at oh 88.5 wow in 41.979 or 42 they're about seconds again your mileage may vary don't worry too much if these numbers aren't exactly the same on your screen and same with the training time because we might be using slightly different hardware what gpu do i have today i have a tesla p100 gpu you might not have the same gpu so the training time if this training time is something like 10 times higher you might want to look into what's going on and if these values are like 10 percent lower or 10 percent higher you might want to see what's going on with your code as well but let's now calculate our model 2 results i think it is the best performing model that we have so far let's get a results dictionary model 2 results this is so exciting we're learning the power of convolutional neural networks model 2 results equals a vowel model and this is a function that we've created before so returns a dictionary containing the results of a model predicting on data loader so now let's pass in the model which will be our trained model 2 and then we'll pass in the data loader which will be our test data loader and then oop excuse me typo our loss function will be of course our loss function and the accuracy function will be accuracy function and the device is already set but we can reset it anyway device eagles device and we'll check out the model 2 results make some predictions look at that model accuracy 88 does that beat our baseline model 0 results oh we did beat our baseline with a convolutional neural network all right so i feel like that's uh that's quite exciting but now let's keep going on and uh let's start to compare the results of all of our models i'll see you in the next video welcome back now in the last video we trained our first convolutional neural network and from the looks of things it's improved upon our baseline but let's make sure by comparing this is another important part of machine learning experiments is comparing the results across your experiments so and training time now we've done it in a way where we've got three dictionaries here of our model 0 results model 1 results model 2 results so how about we create a data frame comparing them so let's import pandas as pd and we're going to compare results equals pd.data frame and because our model results dictionaries all have the same keys let's pass them in as a list so model 0 results model 1 results and model two results to compare them wonderful and what it looks like when we compare the results alrighty so recall our first model was our baseline v0 was just two linear layers and so we have an accuracy of 83.4 and a loss of 0.47 the next model was we trained on the gpu and we introduced non-linearities so we actually found that that was worse off than our baseline but then we brought in the big guns we brought in the tiny vgg architecture from the cnn explainer website and trained our first convolutional neural network and we got the best results so far but there's a lot more experiments that we could do we could go back through our tiny vgg and we could increase the number of hidden units where do we create our model up here we could increase this to say 30 and see what happens that would be a good experiment to try and if we found that non-linearities didn't help with our second model we could comment out the relu layers we could of course change the kernel size change the padding change the max pool a whole bunch of different things that we could try here we could train it for longer so maybe if we trained it for 10 epochs it would perform better but these are just things to keep in mind and try out i'd encourage you to give them a go yourself but for now we've kept all our experiments quite the same how about we see the results we add in the training time because that's another important thing that we've been tracking as well so we'll add training time to results comparison so the reason why we do this is because if this model is performing quite well even compared to our cnn so a difference in about five percent accuracy maybe that's tolerable in the the space that we're working except that this model might actually train and perform inference ten times faster than this model so that's just something to be aware of it's called their performance speed trade-off so let's add another column here compare results and we're going to add in oh excuse me got a little error there that's all right got trigger happy on the shift and enter training time equals we're going to add in we've got another list here is going to be total train time for model 0 and total train time for model one and total train time for model two and then we have a look at our compare results dictionary or sorry compare results data frame wonderful so we see and now this is another thing i keep stressing this to keep in mind if your numbers aren't exactly of what i've got here don't worry too much go back through the code and see if you've set up the random seeds correctly you might need a cuda random seed we may have missed one of those if your numbers are outlandishly different to these numbers then you should go back through your code and see if there's something wrong and again the training time will be highly dependent on the compute environment you're using so if you're running this notebook locally you might get faster training times if you're running it on a different gpu to what i have nvidia smi you might get different training times so i'm using a tesla p100 which is quite a fast gpu but that's because i'm paying for colab pro which generally gives you faster gpus and model 0 was trained on the cpu so depending on what compute resource google allocates to you with google colab this number might vary here so just keep that in mind these values training time will be very dependent on the hardware you're using but if your numbers are dramatically different well then you might want to change something in your code and see what's going on and how about we finish this off with a graph so let's go visualize our model results and while we're doing this have a look at the data frame above is the performance here 10 seconds longer training time worth that extra 5 of the results on the accuracy now in our case we're using a relatively toy problem what i mean by toy problem is quite a simple data set to try and test this out but in your practice that may be worth doing if your model takes longer to train but gets quite a bit better performance it really depends on the problem you're working with compare results and we're going to set the index as the model name because i think that's what we want our graph to be we want the model name and then we're going to plot we want to compare the model accuracies and we want to plot the kind is going to be equal to bar h horizontal bar chart we go p x label we're going to get accuracy as a percentage and then we're going to go py label this is just something that you could share if someone was asking how did your modeling experiments go on fashion mnist well here's what i've got and then they ask you well what's the fashion ms model v2 well you could say that's a convolutional neural network that trained well that's replicates the cnn explainer website that trained on a gpu how long did that take to train well then you've got the training time here we could just do it as a vertical bar chart i did it as horizontal so that this looks a bit funny to me so horizontal like that so the model names are over here wonderful so now i feel like we've got a trained model how about we make some visual predictions because we've just got numbers on a page here but our model is trained on computer vision data and the whole point of making a machine learning model on computer vision data is to be able to visualize predictions so let's give that a shot hey in the next video we're going to use our best performing model fashion numbness model v2 to make predictions on random samples from the test data set you might want to give that a shot make some predictions on random samples from the test data set and plot them out with their predictions as the title so try that out otherwise we'll do it together in the next video in the last video we compared our model's results we tried three experiments one was a basic linear model one was a linear model with non-linear activations and fashion eminence model v2 is a convolutional neural network and we saw that from an accuracy perspective our convolutional neural network performed the best however it had the longest training time and i just want to exemplify the fact that the training time will vary depending on the hardware that you run on we spoke about this in the last video however i took a break after finishing the last video reran all of the cells that we've written all of the code cells up here by coming back to the notebook and going run all and as you'll see if you compare the training times here to the last video we get some different values now i'm not sure exactly what hardware google collab is using behind the scenes but this is just something to keep in mind at least from now on we know how to track our different variables such as how long our model takes to train and what its performance values are but it's time to get visual so let's create another heading make and evaluate this is one of my favorite steps after training a machine learning model so make and evaluate random predictions with the best model so we're going to follow the data explorer's motto of getting visual visual visual or visualize visualize visualize let's make a function called make predictions and it's going to take a model which will be a torch nn.module type it's also going to take some data which can be a list it'll also take a device type which will be tors torch.device and we'll set that by default to equal the default device that we've already set up and so what we're going to do is create an empty list for prediction probabilities because what we'd like to do is just take random samples from the test data set make predictions on them using our model and then plot those predictions we want to visualize them and so we'll also turn our model into evaluation mode because if you're making predictions with your model you should turn on evaluation mode we'll also switch on the inference mode context manager because predictions is another word for inference and we're going to loop through for each sample in data let's prepare the sample so this is going to take in a single image so we will unsqueeze it because we need to add a batch size dimension on the sample we'll set dim equals to zero and then we'll pass that to the device so add a batch dimension that's with the unsqueeze and pass to target device that way our data and model are on the same device and we can do a forward pass well we could actually up here go model.2 device that way we know that we've got device agnostic code there now let's do the forward pass forward pass model outputs raw logits so recall that if we have a linear layer at the end of our model it outputs raw logits so pred logit for a single sample is going to equal model we pass the sample to our target model and then we're going to get the prediction probability how do we get the prediction probability so we want to go from logit to prediction probability well if we're working with a multi-class classification problem we're going to use the softmax activation function on our pred logit and we're going to squeeze it so it gets rid of an extra dimension and we're going to pass in dim equals zero so that's going to give us our prediction probability for a given sample now let's also turn our prediction probabilities into prediction labels so get pred well actually i think we're just going to return the pred probes yeah let's see what that looks like because we've got a an empty list up here for pred probs so for matplotlib we're going to have to use our data on the cpu so let's make sure it's on the cpu because matplotlib doesn't work with the gpu so get pred prob off gpu for further calculations so we'll just hard code it in here to make sure that our prediction probability is off the gpu so pred probs which is our list up here we're going to append the pred prob that we just calculated but we're going to put it on the cpu and then let's uh go down here and we're going to so if we've done it right we're going to have a list of prediction probabilities relating to particular samples so we're going to stack the pred probes to turn list into a tensor so this is only one way of doing things there are many different ways that you could make predictions and visualize them i'm just exemplifying one way so we're going to torch stack which is just going to say hey concatenate everything in the list to a single tensor so oh we might need to tab that over tab tab beautiful so let's try this function in action and see what happens i'm going to import random and then i'm going to set the random seed to 42 and then i'm going to create test samples as an empty list because we want an empty or we want a list of test samples to iterate through and i'm going to create test labels also as an empty list so that remember when we are evaluating predictions we want to compare them to the ground truth so we want to get some test samples and then we want to get their actual labels so that when our model makes predictions we can compare them to their actual labels so for sample comma label in we're going to use random to sample the test data now note that this is not the test data loader this is just test data and we're going to set k equals to 9 and recall if you want to have a look at test data what do we do here we can just go test data which is our data set not converted into a data loader yet and then if we wanted to get the first 10 samples can we do that only one element tensors can be converted into python scalars so if we get the first zero and maybe we can go up to ten yeah there we go and what's the shape of this tuple has no object shape ah okay so we need to go image label equals that and then can we check the shape of the image label oh because the labels are going to be integers wonderful so that's not the first 10 samples but that's just what we get if we iterate through the test data we get an image tensor and we get an associated label so that's what we're doing with this line here we're just randomly sampling nine samples and this could be any number you want i'm going to use nine because this is a spoiler for later on we're going to create a three by three plot so that just nine is just a fun number so get some random samples from the test data set and then we can go test samples dot append sample and we will go test labels dot append label and then let's go down here view the first maybe we go first sample shape so test samples 0 dot shape and then if we get test samples 0 we're going to get a tensor of image values and then if we wanted to plot that can we go plt m show c map equals gray and we may have to squeeze this i believe to remove the batch tensor let's see what happens batch dimension there we go beautiful so that's to me uh a shoe a high heel shoe of some sort if we get the title plt.title test labels let's see what this looks like it's a five which is of course class names we will index on that sandal okay beautiful so we have nine random samples nine labels that are associated with that sample now let's make some predictions so make predictions and this is one of my favorite things to do i can't stress it enough is to randomly pick data samples from the test data set and predict on them and do it over and over and over again to see what the model is doing so not only at the start of a problem i'll just get the prediction probabilities here we're going to call our mate predictions function so not only at the start of a problem should you become one with the data even after you've trained a model you'll want to further become one with the data but this time become one with your model's predictions on the data and see what happens so view the first two prediction probabilities list so we're just using our make predictions function that we created before passing at the model the train model 2 and we're passing it the data which is the test samples which is this list that we just created up here which is comprised of random samples from the test data set wonderful so let's go tread probes oh we don't want to view them all that's going to give us oh we want 2 the prediction probabilities for a given sample and so how do we convert prediction probabilities into labels because if we're trying to if we have a look at test labels if we're trying to compare apples to apples when we're evaluating our model we want to we can't really necessarily compare their prediction probabilities straight to the test labels so we need to convert these prediction probabilities into prediction labels so how can we do that well we can use argmax to take whichever value here the index in this case this one the index of whichever value is the highest of these prediction probabilities so let's see that in action convert prediction probabilities to labels so we'll go pred classes equals pred probs and we'll get the arg max across the first dimension and now let's have a look at the print classes wonderful so are they in the same format as our test labels yes they are so if you would like to go ahead in the next video we're going to plot these and compare them so we're going to write some code to create a matplotlib plotting function that's going to plot nine different samples along with their original labels and their predicted label so give that a shot we've just written some code here to make some predictions on random samples if you'd like them to be truly random you can comment out the seed here but i've just kept the seed at 42 so that our random dot sample selects the same samples on your end and on my end so in the next video let's plot these let's now continue following the data explorers motto of visualize visualize visualize we have some prediction classes we have some labels we'd like to compare them to you can compare them visually it looks like our model is doing pretty good but let's since we're making predictions on images let's plot those images along with the predictions so i'm going to write some code here to plot the predictions i'm going to create a matte plot lib figure i'm going to set the fig size to 9 and 9. because we've got 9 random samples you could of course change this to however many you want i just found that a three by three plot works pretty good in practice and i'm going to set n rows so for my matplotlib plot i want three rows and i want three columns and so i'm going to enumerate through the samples in test samples and then i'm going to create a subplot for each sample so create a subplot because this is going to create a subplot because it's within the loop each time it goes through a new sample create a subplot of n rows and coles and the index it's going to be on is going to be i plus one because it can't start at zero so we just put i plus one in there what's going on here enumerate oh excuse me in enumerate wonderful so now we're going to plot the target image we can go plot dot m show we're going to get sample dot squeeze because we need to remove the batch dimension and then we're going to set the c map is equal to gray what's this telling me up here oh no that's correct next we're going to find the prediction label in text form because we don't want it in numeric form we could do that but we want to look at things visually with human readable language such as sandal for whatever class sandal is whatever number class that is so we're going to set the pred label equals class names and we're going to index using the pred classes i value so right now we're going to plot our sample we're going to find its prediction and now we're going to get the truth label so we also want this in text form and what is the truth label going to be well the truth label is we're going to have to index using class names and index on that using test labels i so we're just matching up our indexes here finally we're going to create a title create a title for the plot and now here's what i like to do as well if we're getting visual well we might as well get really visual right so i think we can change the color of the title text depending if the prediction is right or wrong so i'm going to create a title using an f-string pred is going to be a pred label and the truth label we could even plot the prediction probabilities here if we wanted to that might be an extension that you might want to try and so here we're going to check for equality between pred and truth and change color of title text so what i mean by this it's going to be a lot easier to explain if we just if in doubt code it out so if the pred label equals the truth label so they're equal i want the plot.title to be the title text but i want the font size well the font size can be the same 10. i want the color to equal green so if they're so green text if prediction same as truth and else i'm going to set the plot title to have title text font size equals 10 and the color is going to be red so does that make sense all we're doing is we're enumerating through our test samples that we got up here test samples that we found randomly from the test data set and then each time we're creating a subplot we're plotting our image we're finding the prediction label by indexing on the class names with our print classes value we're getting the truth label and we're creating a title for the plot that compares the pred label to the truth and we're changing the color of the title text depending if the pred label is correct or not so let's see what happens did we get it right oh yes we did oh i'm going to do one more thing i want to turn off the accesses just so we get more real estate i love these kind of plots it helps that our model got all of these predictions right so look at this pred sandal truth sandal pred trouser truth trouser so that's pretty darn good right see how for me i much appreciate like i much prefer visualizing things numbers on a page look good but there's something there's nothing quite like visualizing your machine learning model's predictions especially when it gets it right so how about we select some different random samples up here we could functionalize this as well to do like all of this code in one hit but that's all right we'll be a bit hacky for now so this is just going to randomly sample with no seed at all so your samples might be different to mine nine different samples so this time we have an ankle boot we'll make some predictions we'll just step through all of this code here and oh there we go it got one wrong so all of these are correct but this is more interesting as well is where does your model get things wrong so it predicted a dress but this is a coat now do you think that this could be potentially a dress to me i could see that as being addressed so i kind of understand where the model's coming from in there let's make some more random predictions we might do two more of these before we move on to the next video oh all correct we're interested in getting something wrong here so our model seems to be too good all correct again okay one more time if we don't get any wrong we're going on to the next video but this is just uh really oh there we go two wrong beautiful so predicted a dress and that's a shirt okay i can kind of see where the model might have stuffed up there it's a little bit long for a shirt for me but i can still understand that that would be a shirt and this is a pullover but the truth is a coat so maybe maybe there's some issues with the labels and that's probably what you'll find in a lot of data sets especially quite large ones just for the sheer lore of large numbers there may be some truth labels in your data sets that you work with that are wrong and so that's why i like to see compare the models predictions versus the truth on a bunch of random samples to go you know what is our model's results better or worse than they actually are and that's what visualizing helps you do is figure out you know what our model is actually it says it's good on the accuracy but when we visualize the predictions it's not too good and vice versa right so you can keep playing around with this try uh look at some more random samples by running this again we'll do one more for good luck and then we'll move on to the next video we're going to go on to another way oh see this is another example some labels here could be confusing and speaking of confusing well that's going to be a spoiler for the next video but do you see how the prediction is a t-shirt top but the truth is a shirt to me that label is kind of overlapping like i don't know what's the difference between a t-shirt and a shirt so that's something that you'll you'll find as you train models is maybe your model is going to tell you about your data as well and so we hinted that this is going to be confused the model is confused between t-shirt top and shirt how about we plot a confusion matrix in the next video i'll see you there we're up to a very exciting point in evaluating our machine learning model and that is visualizing visualizing visualizing and we saw that in the previous video our model kind of gets a little bit confused and in fact i would personally get confused at the difference between t-shirts top and a shirt so these kind of uh insights into our model predictions can also give us insights into maybe some of our labels could be improved and another way to check that is to make a confusion matrix so let's do that making a confusion matrix for further prediction evaluation now a confusion matrix is another one of my favorite ways of evaluating a classification model because that's what we're doing we're doing multi-class classification and if you recall if we go back to section 2 of the learn pi torch.io book and then if we scroll down we have a section here more classification evaluation metrics so accuracy is probably the gold standard of classification evaluation there's precision there's recall there's f1 score and there's a confusion matrix here so how about we try to build one of those i want to get this and copy this so and write down a confusion matrix is a fantastic way of evaluating your classification models visually beautiful so we're going to break this down first of all we need to to plot a confusion matrix we need to make predictions with our trained model on the test data set number two we're going to make a confusion matrix and to do so we're going to leverage torch metrics tricks have to figure out how to spell metrics then confusion matrix so recall that torch metrics we've touched on this before is a great package torch metrics for a whole bunch of evaluation metrics of machine learning models in pi torch flavor so if we find we've got classification metrics we've got audio image detection look out this is beautiful a bunch of different evaluation metrics and if we go down over here we've got confusion matrix so i only touched on five here but or six but if you look at torch metrics they've got how many is that about 25 different classification metrics so if you want some extra curriculum you can read through these but let's go to confusion matrix and if we look at some code here we've got torch metrics confusion matrix we need to pass in number of classes we can normalize if we want and you notice how this is quite similar to the pi torch documentation well that's the beautiful thing about torch metrics is that it's created with pytorch in mind so let's try out if you wanted to try it out on some tester code you could do it here but since we've already got some of our own code let's just bring in this and then number three is to plot it we've got another helper package here plot the confusion matrix using ml extend so this is another one of my favorite helper libraries for machine learning things it's got a lot of functionality that you can code up yourself but you often find yourself coding it a few too many times such as plotting a confusion matrix so if we look up ml extend plot confusion matrix this is a wonderful library i believe it was it was created by yeah sebastian rashka who's a machine learning researcher and also author of a great book there he is yeah this is a side note machine learning with pi torch and scikit-learn i just got this book it just got released in the start of 2022 and it's a great book so that's a little side note for learning more about machine learning with pytorch and sidekick loan so shout out to sebastian rashka thank you for this package as well this is going to just help us plot a confusion matrix like this so we'll have our predicted labels on the bottom and our true labels on the side here but we can just copy this code in here link sorry and then confusion matrix we can copy that in here the thing is that torch metrics doesn't come with google colab so if you're using google code lab i think ml extend does but we need a certain version of ml extend that google codelab doesn't yet have yeah so we actually need version 0.19.0 but we're going to import those in a second let's first make some predictions across our entire test data set so previously we made some predictions only on nine random samples so random sample we selected nine you could of course change this number to make it all more but this was only on nine samples let's write some code to make predictions across our entire test data set so import tqdm.auto for progress bar tracking dqdm auto we don't need to re-import it i believe we've already got it above but i'm just going to do it anyway for completeness and so we're going to make this is step one above make predictions make predictions with trained model our trained model is model two so let's create an empty predictions list so we can add our predictions to that we're going to set our model into evaluation mode and we're going to set with torch inference mode as our context manager and then inside that let's just build the same sort of code that we used for our testing loop except this time we're going to append all of our predictions to a list so we're going to iterate through the test data loader and we can give our tqdm description we're going to say making predictions you'll see what that looks like in a minute and here we're going to send the data and targets to target device so x y equals x to device and y to device wonderful and we're going to do the forward pass so we're going to create y logit remember the raw outputs of a model with a linear layer at the end are referred to as logits and we don't need to calculate the loss but we want to turn predictions from logits to prediction probabilities to prediction labels so we'll set here y print equals torch.softmax you could actually skip the torch softmax step if you wanted to and just take the arg max of the logits but we will just go from prediction probabilities to pred labels for completeness so squeeze and we're going to do it across the first dimension or the zeroth dimension and then we'll take the arg max of that across the first dimension as well and a little tidbit if you take different dimensions here you'll probably get different values so just check the inputs and outputs of your code to make sure you're using the right dimension here and so let's go put predictions on cpu for evaluation because if we're going to plot anything matplotlib will want them on the cpu so we're going to append our predictions to ypres wypread.cpu beautiful and because we're going to have a list of different predictions we can use concatenate list of predictions into a tensor so let's just print out why preds and so i can show you what it looks like and then if we go y print tensor this is going to turn our list of predictions into a single tensor and then we'll go y pred tensor and we'll view the first 10. let's see if this works so making predictions oh would you look at that okay so yeah here's our list of predictions a big list of tenses right we don't really want it like that so if we get rid of that and there's our progress bar it's going through each batch in the test data loader so there's 313 batches of 32 so if we comment out printwide preds this line here torch.cat why preds is going to turn this these tenses into a single tensor or this list of tensors into a single tensor concatenate now if we have a look there we go beautiful and if we have a look at the whole thing we're making predictions every single time here but that's all right they are pretty quick there we go one big long tensor and then if we check length y print tensor there should be one prediction per test sample 10 000. beautiful so now we're going to we need to install torch metrics because torch metrics doesn't come with google colab at the time of recording so let me just show you if we tried to import torch metrics it doesn't it might in the future so just keep that in mind it might come with google codelab because it's a pretty useful package but let's now install see if required packages are installed and if not install them so we'll just install torch metrics we'll finish off this video by trying to import we'll set up a try and accept loop so python is going to try import torch metrics and ml extend i write it like this because you may already have torch metrics and ml extend if you're running this code on a local machine but if you're running it in google colab which i'm sure many of you are we are going to try and import it anyway and if it doesn't work we're going to install it so ml extend i'm just going to check the version here because we need version for our plot confusion matrix function this one we need version 0.19.0 or higher so i'm just going to write a little statement here assert int ml extend dot version so if these two if this condition in the try loop is or try block is accepted it will skip the next step dot split and i'm just going to check the first index string equals is greater than or equal to 19 otherwise i'm going to return an error saying ml extend version should be 0.19.0 or higher and so let me just show you what this looks like if we run this here string and int did i not turn it into a string oh excuse me there we go and i don't need that bracket on the end there we go so that's what i'm saying so this is just saying hey the version of ml extend that you have should be 0 or should be 19 or higher because right now google colab by default has 14 this may change in the future so let's finish off this accept block if the above condition fails which it should we are going to pip install so we're going to install this into google colab torch metrics we're going to do it quietly and we're also going to pass the u tag for update ml extend so import torch metrics ml extend afterwards after it's been installed and upgraded and print we're going to go ml extend version going to go ml extend underscore version and let's see what happens if we run this so we should see yeah some installation happening here this is going to install torch metrics oh do we not have ml extend the upgraded version let's have a look we may need to restart our google colab instance ah okay let's take this off quiet is this going to tell us to restart google colab well let's restart our runtime after you've run this cell if you're using google colab you may have to restart your runtime to reflect the fact that we have the updated version of mlxtend so i'm going to restart my runtime now otherwise we won't be able to plot our confusion matrix we need 0.19.0 and i'm going to run all of these cells so i'm going to pause the video here run all of the cells by clicking run all note if you run into any errors you will have to run those cells manually and then i'm going to get back down to this cell and make sure that i have ml extend version 0.1.9 i'll see you in a few seconds i'm back and just a little heads up if you restart your runtime and click run all your colab notebook will stop running cells if it runs into an error so this is that error we found in a previous video where our data and model were on different devices so to skip past that we can just jump to the next cell and we can click run after there we go and it's going to run all of the cells after for us it's going to retrain our models everything's going to get rerun and then we're going to come right back down to where we were before trying to install the updated version of ml extend i'm going to write some more code while our code is running import ml extend and then i'm going to just make sure that we've got the right version here you may require a runtime restart you may not so just try to see after you've run this install of torch metrics and upgrade of ml extend see if you can re-import ml extend and if you have the version 0.19 or above we should be able to run the code yeah there we go wonderful ml extend 0.19.0 and we've got ml extend version assert import beautiful so we've got a lot of extra code here in the next video let's uh move forward with creating a confusion matrix i just wanted to show you how to install and upgrade some packages in google colab if you don't have them but now we've got predictions across our entire test data set and we're going to be moving towards using confusion matrix function here to compare our predictions versus the target data of our test data set so i'll see you in the next video let's plot a confusion matrix welcome back in the last video we wrote a bunch of code to import some extra libraries that we need for plotting a confusion matrix this is really helpful by the way google collab comes with a lot of pre-built installed stuff but definitely later on down the track you're going to need to have some experience installing stuff and this is just one way that you can do it and we also made predictions across our entire test data set so we've got 10 000 predictions in this tensor and what we're going to do with the confusion matrix is confirm or compare these predictions to the target labels in our test data set so we've done step number one and we've prepared ourselves for step two and three by installing torch metrics and installing ml extend or the later version of ml extend so now let's go through step two making a confusion matrix and step three plotting that confusion matrix this is going to look so good i love how good confusion matrices look so because we've got torch metrics now we're going to import the confusion matrix class and from lml extend we're going to go into the plotting module and import plot confusion matrix recall that the documentation for both of these are within torch metrics here and within ml extend here let's see what they look like so number two is set up confusion matrix instance and compare predictions to targets that's what evaluating a model is right comparing our model's predictions to the target predictions so i'm going to set up a confusion matrix under the variable conf mat then i'm going to call the confusion matrix class from torch metrics and to set up an instance of it i need to pass in the number of classes that we have so because we have 10 classes they are all contained within class names recall that class names is a list of all of the different classes that we're working with so i'm just going to pass in the number of classes as the length of our class names and then i can use that conf matt instance confusion matrix instance to create a confusion matrix tensor by passing into conf mat which is what i've just created up here conf mat just like we do with our loss function i'm going to pass in preds equals our y pred tensor which is just above why pred tensor that we calculated all of the predictions on the test data set there we go that's our preds and our target is going to be equal to test data dot targets and this is our test data data set that we've seen before so if we go test data and press tab we've got a bunch of different attributes we can get the classes and of course we can get the targets which is the labels pytorch calls labels targets i usually refer to them as labels but the target is the test data target so we want to compare our models predictions on the test data set to our test data targets and so let's keep going forward we're up to step number three now so this is going to create our confusion matrix tensor oh let's see what that looks like actually conformat tensor oh okay so we've got a fair bit going on here but let's turn this into a pretty version of this so along the bottom is going to be our predicted labels and along the side here is going to be our true labels but this is where the power of ml extend comes in we're going to plot our confusion matrix so let's create a figure and an axis i'm going to call the function plot confusion matrix that we've just imported above and we're going to pass in our conf mat equals our conf mat tensor but because we're working with matplotlib it'll want it as numpy so i'm just going to write here matplotlib likes working with numpy and we're going to pass in the class names so that we get labels for each of our rows and columns class names this is just a list of our text based class names and then i'm going to set the fig size to my favorite hand in poker which is 10 7. also happens to be a good dimension for google collab look at that oh that is something beautiful to see now a confusion matrix the idl confusion matrix will have all of the diagonal rows darkened with all of the values and no values here and no values here because that means that the predicted label lines up with the true label so in our case we have definitely a very dark diagonal here but let's dive into some of the highest numbers here it looks like our model is predicting shirt when the true label is actually t-shirt top so that is reflective of what we saw before do we still have that image there okay we don't have an image there but in a previous video we saw that when we plotted our predictions the model predicted t-shirt top when it was actually a shirt and of course vice versa so what's another one here it looks like our model is predicting shirt when it's actually a coat and now this is something that you can use to visually inspect your data to see if the the errors that your model is making make sense from a visual perspective so it's getting confused by predicting pull over when the actual label is coat predicting pull over when the actual label is shirt so a lot of these things clothing-wise and data-wise may in fact look quite the same here's a relatively large one as well it's predicting sneaker when it should be an ankle boot so it's confusing two different types of shoes there so this is just a way to further evaluate your model and start to go hmm maybe our labels are a little bit confusing could we expand them a little bit more so keep that in mind a confusion matrix is one of the most powerful ways to visualize your classification model predictions and a really really really helpful way of creating one is to use torch metrics confusion matrix and to plot it you can use plot confusion matrix from ml extend however if you're using google colab for these you may need to import them or install them so that's a confusion matrix if you'd like more classification metrics you've got them here and you've got of course more in torch metrics so give that a look i think in the next video we've done a fair bit of evaluation where are we up to in our workflow i believe it's time we saved and loaded our best trained model so let's give that a go i'll see you in the next video in the last video we created a beautiful confusion matrix with the power of torch metrics and ml extend but now it's time to save and load our best model because if we if we evaluated it our convolutional neural network can go you know what this model is pretty good let's export it to a file so we can use it somewhere else let's see how we do that and by the way if we go into our keynote we've got evaluate model torch metrics we've been through this a fair few times now we've improved through experimentation we haven't used tensorboard yet but that'll be in a later video and save and reload your train model so here's where we're up to if we've gone through all these steps enough times and we're like you know what let's save our model so we can use it elsewhere and we can reload it in to make sure that it's it's saved correctly let's go through with this step we want number 11 i'm going to go save and load best performing model you may have already done this before so if you've been through the other parts of the course you definitely have so if you want to give that a go pause the video now and try it out yourself i believe we did it in notebook number one we have here we go saving and loading a pie torch model you can go through this section of section number one on your own and see if you can do it otherwise let's code it out together so i'm going to start from with importing path from path lib because i like to create a model directory path so create model directory path so my model path is going to be set equal to path and i'm going to save it to models this is where i want to i want to create file over here called models and save my models to there model path dot mkdir for make directory parents yes i wanted to make the parent directories if they don't exist and exist okay also equals true so if we try to create it but it's already existing we're not going to get an error that's fine and next we are going to create a model save path i'm just going to add some code cells here so we have more space let's pass in here our model name going to set this equal to since we're on section three i'm going to call this o3 pi torch computer vision model 2 is our best model and i'm going to save it to pth for pytorch you can also save it to dot pt i like to use pth and we're going to go model save path equal model path slash model name so now if we have a look at this we're going to have a path called model save path but it's going to be a posix path in models o3 pytorch computer vision model 2.pth and if we have a look over here we should have yeah we have a models directory now that's not going to have anything in it at the moment we've got our data directory that we had before there's fashion mnist this is a good way to start setting up your directories break them down data models helper function files etc but let's keep going let's save save the model state dict we're going to go print saving model to just going to give us some information about what's happening model save path and we can save a model by calling torch dot save and we pass in the object that we want to save using the object parameter obj then we get a dock string there we're going to go model 2 we want to save the state dict recall that the state dict is going to be our models what our models learned parameters on the data set so all the weights and biases and all that sort of jazz beautiful so when we first created model 2 these were all random numbers they've been or since we trained model 2 on our training data these have all been updated to represent the training images and we can leverage these later on as you've seen before to make predictions so i'm not going to go through all those but that's what we're saving and the file path is going to be our model save path so let's run this and see what happens beautiful we're saving our model to our model directory and now let's have a look in here do we have a model yes we do beautiful so that's how quickly we can save a model of course you can customize what the name is where you save it etc etc now let's see what happens when we load it in so create a new instance because we only saved the state dick of model 2 we need to create a new instance of our model 2 or how it was created which was with our class fashion m nest v2 if we saved the whole model we could just import it to a new variable but i'll let you read back more on that on the different ways of saving a model in here there's also a link to the pytorch documentation would highly recommend that but let's see it in action we need to create a new instance of our fashion mnist model v2 which is our convolutional neural network so i'm going to set the manual seed that way when we create a new instance it's instantiated with the same random numbers so we're going to set up loaded model 2 equals fashion mnist v2 and it's important here that we set it up with the same parameters as our original saved model so fashion mnist v2 oh we've got a typo here old-fashioned mnist model v2 wonderful so the input shape is going to be one because that is the number of color channels in our test in our images test image dot shape do we still have a test image should be oh well we've created a different one but our image size our image shape is 1 28 28 image shape for color channels height width then we create it with hidden units we use 10 for hidden units so we can just set that here this is important they just have to otherwise if the shapes aren't the same what are we going to get we're going to get a shape mismatch error and our output shape is what is also going to be 10 or length of class names if you have the class names variable instantiated that is so we're going to load in the saved state dick the one that we just saved so we can go loaded model 2 dot load state dict and we can pass in torch dot load in here and the file that we want to load or the file path is model save path up here this is why i like to just save my path variables to a variable so that i can just use them later on instead of retyping out this all the time which is definitely prone to errors so we're going to send the model to the target device loaded model 2.2 device beautiful let's see what happens here wonderful so let's now evaluate the loaded model so evaluate loaded model the results should be very much the same as our model 2 results so model 2 results so this is what we're looking for we want to make sure that our saved model saved these results pretty closely now i say pretty closely because you might find some discrepancies in this lower these lower decimals here just because of the way files get saved and something gets lost etc etc so that's just to do with precision and computing but as long as like the first few numbers are quite similar well then we're all gravy so let's go torch manual seed remember evaluating a model is almost as well is just as important as training a model so this is what we're doing we're making sure our model saved correctly before we deployed it if it didn't if we deployed it and it didn't save correctly well then we'd get a we would get less than ideal results wouldn't we so model equals loaded model 2 we're going to use our same avail model function by the way and of course we're going to evaluate it on the same test data set that we've been using test data loader and we're going to create a loss function or just putting in our loss function that we've created before and our accuracy function is the accuracy function we've been using throughout this notebook so now let's check out loaded model 2 results they should be quite similar to this one i'm going to make some predictions and then if we go down do we have the same numbers yes we do so we have five six eight two nine five six eight two nine wonderful and three one three five eight three one three five eight beautiful it looks like our loaded model gets the same results as our previously trained model before we even saved it and if you wanted to check if they were close you can also use torch dot is close check if model results if you wanted to check if they were close programmatically that is because we just looked at these visually check if model results are close to each other now we can go torch is close we're going to pass in torch.tensor we have to turn these values into a tensor we're going to go model to results and we'll compare the model loss how about we do that we want to make sure the loss values are the same or very close that is with torch dot is close torch.tensor model or we want this one to be loaded model 2 results model loss another bracket on the end there and we'll see how close they are true wonderful now if this doesn't return true you can also adjust the tolerance levels in here so we go atol equals this is going to be the absolute tolerance so if we do one to the negative eight it's saying like hey we need to make sure our results are basically the same up to eight decimal points that's probably quite low i would say just make sure they're at least within two but if you're getting discrepancies here between your saved model and your loaded model or sorry this model here the original one and your loaded model if they are quite large so they're like more than a few decimal points off in this column or even here i'd go back through your code and make sure that your model is saving correctly make sure you've got random seeds set up but if they're pretty close like in terms of within three or two decimal places of each other well then i'd say that's that's close enough but you can also adjust the tolerance level here to check if your model results are close enough programmatically wow we have covered a fair bit here we've gone through this entire workflow for a computer vision problem let's uh in the next video i think that's enough code for this section section three pi torch computer vision i've got some exercises and some extra curriculum lined up for you so let's have a look at those in the next video i'll see you there my goodness look how much computer vision pie torch code we've written together we started off right up the top we looked at the reference notebook and the online book we checked out computer vision libraries and pi torch the main one being torch vision then we got a data set namely the fashion mnist data set there are a bunch more data sets that we could have looked at and in fact i'd encourage you to try some out in the torch vision.datasets use all of the steps that we've done here to try it on another dataset we repaired our data loaders so turned our data into batches we built a baseline model which is an important step in machine learning because the baseline model is usually relatively simple and it's going to serve as a baseline that you're going to try and improve upon through just go back to the keynote through various experiments we then made predictions with model 0 we evaluated it we timed our predictions to see if running our models on the gpu was faster and we learned that sometimes a gpu won't necessarily speed up code if it's a relatively small data set because of the overheads between copying data from cpu to gpu we tried a model with non-linearity and we saw that it didn't really improve upon our baseline model but then we brought in the big guns a convolutional neural network replicating the cnn explainer website and by gosh didn't we spend a lot of time here i'd encourage you as part of your actual curriculum to go through this again and again i still even come back to refer to it too i've referred to it a lot making the materials for this video section and this code section so be sure to go back and check out the cnn explainer website for more of what's going on behind the scenes of your cnns but we coded one using pure pie torch that is amazing we compared our model results across different experiments we found that our convolutional neural network did the best although it took a little bit longer to train and we also learned that the training time values will definitely vary depending on the hardware you're using so that's just something to keep in mind we made and evaluated random predictions with our best model which is an important step in visualizing visualizing visualizing your model's predictions because you could get evaluation metrics but until you start to actually visualize what's going on well in my case that's how i best understand what my model is thinking we saw a confusion matrix using two different libraries torch metrics and ml extend a great way to evaluate your classification models and we saw how to save and load the best performing model to file and made sure that the results of our saved model weren't too different from the model that we trained within the notebook so now it is time i'd love for you to practice what you've gone through this is actually really exciting now because you've gone through an end-to-end computer vision problem i've got some exercises prepared if you go to the learn pi torch.io website in section o3 scroll down you can read through all of this this is all the materials that we've just covered in pure code there's a lot of pictures in this notebook too that are helpful to learn things what's going on we have some exercises here so all of the exercises are focused on practicing the code and the sections above we have two resources we also have some extra curriculum that i've put together if you want an in-depth understanding of what's going on behind the scenes in the convolutional neural networks because we focused a lot on code i'd highly recommend mit's induction to deep computer vision lecture you can spend 10 minutes clicking through the different options in the pytorch vision library torch vision look up most common convolutional neural networks in the torch vision model library and then for a larger number of pre-trained power torch computer vision models and if you get deeper into computer vision you're probably going to run into the torch image models library otherwise known as tim but i'm going to leave that as extra curriculum i'm going to just link this exercises section here again it's at learnpytorch.oh in the exercises section we come down there we go but there is also resource here an exercise template notebook so we've got one what are three areas in industry where computer vision has been currently used now this is in the pytorch deep learning repo extras exercises number three i've put out some template code here for you to fill in these different sections so some of them are code related some of them are just text-based but they should all be able to be completed by referencing what we've gone through in this notebook here and just as one more if we go back to pytorch deep learning this will probably be updated by the time you get here you can always find the exercises and extra curriculum by going computer vision go to exercises and extracurricular or if we go into the extras file and then we go to solutions i've now also started to add video walkthroughs of each of the solutions so this is me going through each of the exercises myself and coding them and so you'll get to see their unedited videos so they're just one long live stream and i've done some for o2 03 and o4 and there will be more here by the time you watch this video but if you'd like to see how i figure out the solutions to the exercises you can watch those videos and go through them yourself but first and foremost i would highly recommend trying out the exercises on your own first and then if you get stuck refer to the notebook here refer to the python documentation and finally you can check out what i would have coded as a potential solution so there's number three computer vision exercise solutions so congratulations on going through the pytorch computer vision section i'll see you in the next section we're going to look at pytorch custom data sets but no spoilers i'll see you soon [Music] hello hello hello and welcome to section number four of the learn pie torch for deep learning course we have custom data sets with pytorch now before we dive into what we're going to cover let's answer the most important question where can you get help now we've been through this a few times now but it's important to reiterate follow along with the code as best you can we're going to be writing a bunch of pytorch code remember the motto if in doubt run the code that's in line with try it for yourself if you'd like to reach or read the docs string you can press shift command plus space in google colab or if you're on windows command might be control then if you're still stuck you can search for it two of the resources you'll probably come across is stack overflow or the wonderful pie torch documentation which we've had a lot of experience with so far then of course try again go back through your code if in doubt code it out or if in doubt run the code and then finally if you're still stuck ask a question on the pytorch deep learning discussions github page so if i click this link we come to mr d burke slash python deep learning the url is here we've seen this before if you have a trouble or a problem with any of the course you can start a discussion and you can select a category general ideas polls q a and then we can go here video put the video number in so 99 for example my code doesn't do what i'd like it to so say your problem and then come in here write some code here code here and then my question is something something something click start discussion and then we can help out and then if we come back to discussions of course you can search for what's going on so if you have an error and you feel like someone else might have seen this error you can of course search it and find out what's happening now i just want to highlight again the resources for this course are at learnpytorch.io we are up to section 4. this is a beautiful online book version of all the materials we are going to cover in this section so spoiler alert you can use this as a reference and then of course in the github we have the same notebook here pytorch custom data sets this is the ground truth notebook so check that out if you get stuck so i'm just going to exit out of this we've got pi torch custom data sets that learn pytorch.oh and then of course the discussions tab for the q a now if we jump back to the keynote what do we have we might be asking what is a custom data set now we've built a fair few pi torch deep learning neural networks so far on various data sets such as fashion mnist but you might be wondering hey i've got my own data set or i'm working on my own problem can i build a model with pytorch to predict on that data set and the answer is yes however you do have to go through a few pre-processing steps to make that data set compatible with pytorch and that's what we're going to be covering in this section and so i'd like to highlight the pytorch domain libraries now we've had a little bit of experience before with torch vision such as if we wanted to classify whether a photo was a pizza steak or sushi so a computer vision image classification problem now there's also text such as if these reviews are positive or negative and you can use torch text for that but again these are only just one problem within the vision space within the tech space i want you to just understand that if you have any type of vision data you probably want to look into torch vision and if you have any kind of text data you probably want to look into torch text and then if you have audio such as if you wanted to classify what song was playing this is what shazam does it uses the input sound of some sort of music and then runs a neural network over it to classify it to a certain song you can look into torch audio for that and then if you'd like to recommend something such as you ever have an online store or if you're netflix or something like that and you'd like to have a home page that updates for recommendations you'd like to look into torch rec which stands for recommendation system and so this is just something to keep in mind because each of these domain libraries has a data sets module that helps you work with different data sets from different domains and so different domain libraries contain data loading functions for different data sources so torch vision let's just go into the next slide we have problem space vision for pre-built data sets so existing data sets like we've seen with fashion mnist as well as functions to load your own vision data sets you'll want to look into torch vision.datasets so if we click on this we have built-in data sets this is the pi torch documentation and if we go here we have torch audio torch text torch vision torch rec torch data now at the time of recording which is april 2022 this is torch data is currently in beta but it's going to be updated over time so just keep this in mind update it over time to add even more ways to load different data resources but for now we're just going to get familiar with torch vision data sets if we went into torch text there's another torch text.datasets and then if we went into torch audio we have torch audio data sets and so you're noticing a trend here that depending on the domain you're working in whether it be vision text audio or your data is recommendation data you'll probably want to look into its custom library within pytorch and of course the bonus is torch data it contains many different helper functions for loading data and is currently in beta as of april 22 so 2022 so that by the time you watch this torch data may be out of beta and then that should be something that's extra curriculum on top of what we're going to cover in this section so let's keep going so this is what we're going to work towards building food vision mini so we're going to load some data namely some images of pizza sushi and steak from the food 101 dataset we're going to build an image classification model such as the model that might power a food vision recognition app or a food image recognition app and then we're going to see if it can classify an image of pizza as pizza an image of sushi as sushi and an image of steak as steak so this is what we're going to focus on we want to load say we had images existing already of pizza sushi and steak we want to write some code to load these images of food so our own custom data set for building this food vision mini model which is quite similar to if you go to this is the project i'm working on personally nutrify.app this is a food image recognition model here we go so it's still a work in progress as i'm going through it but you can upload an image of food and nutrifier will try to classify what type of food it is so do we have steak there we go let's upload that beautiful steak so we're going to be building a similar model to what powers nutrify and then there's the macronutrients for the steak yada yada if you'd like to find out how it works i've got all the links here but that's at nutrifi.app so let's keep pushing forward we'll go back to the keynote this is what we're working towards as i said we want to load these images into pytorch so that we can build a model we've already built a computer vision model so we want to figure out how do we get our own data into that computer vision model and so of course we'll be adhering to our pi torch workflow that we've used a few times now so we're going to learn how to load a data set with our own custom data rather than an existing data set within pytorch we'll see how we can build a model to fit our own custom data set we'll go through all the steps that's involved in training a model such as picking a loss function and an optimizer we'll build a training loop we'll evaluate our model we'll improve through experimentation and then we can see save and reloading our model but we're also going to practice predicting on our own custom data which is a very very important step whenever training your own models so what we're going to cover broadly we're going to get a custom data set with pytorch as we've said we're going to become one with the data in other words preparing and visualizing it we'll learn how to transform data for use with a model very important step we'll see how we can load custom data with pre-built functions and our own custom functions we'll build a computer vision model aka food vision mini to classify pizza steak and sushi images so a multi-class classification model we'll compare models with and without data augmentation we haven't covered that yet but we will later on and finally we'll see how we can as i said make predictions on custom data so this means data that's not within our training or our test data set and how we're going to do it well we could do it cooks or chemists but i like to treat machine learning as a little bit of an art so we're going to be cooking up lots of code with that being said i'll see you in google collab let's code welcome back to the pie torch cooking show let's now learn how we can cook up some custom data sets i'm going to jump into google collab so colab.research.google.com and i'm going to click new notebook just going to make sure this is zoomed in enough for the video wonderful so i'm going to rename this notebook 04 because we're up to section 04 and i'm going to call it pytorch custom data sets underscore video because this is going to be one of the video notebooks which has all the code that i write during the videos which is of course contained within the video notebooks folder on the pi touch deep learning repo so if you'd like the resource or the ground truth notebook for this i'm going to just put a heading here o4 pi torch custom data sets video notebook make that bigger and then put resources so book version of the course materials for o4 we'll go there and then we'll go ground truth version of notebook 04 which will be the reference notebook that we're going to use for this section come into pi torch custom data sets and then we can put that in there wonderful so the whole synopsis of this custom data set section is we've used some data sets with pi torch before but how do you get your own data into pytorch because that's what you want to start working on right you want to start working on problems of your own you want to come into any sort of data that you've never worked with before and you want to figure out how do you get that into pytorch so one of the ways to do so is via custom data sets and then i want to put a note down here so we're going to go 0 section 0 is going to be importing pi torch and setting up device agnostic code but i want to just stress here that domain libraries so just to reiterate what we went through last video so depending on what you're working on whether it be vision text audio recommendation something like that you'll want to look into each of the pytorch domain libraries for existing data loader or data loading functions and customizable data loading functions so just keep that in mind we've seen some of them so if we go torch vision which is what we're going to be looking at torch vision we've got data sets and we've got documentation we've got data sets for each of the other domain libraries here as well so if you're working on a text problem it's going to be a similar set of steps to what we're going to do with our vision problem when we build food vision mini what we have is a data set that exists somewhere and what we want to do is bring that into pi torch so we can build a model with it so let's import the libraries that we need so we're going to import torch and we'll probably import nn so we'll import that from pytorch and i'm just going to check the torch version here so note we we need pi torch 1.10.0 plus is required for this course so if you're using google colab at a later date you may have a later version of pi torch i'm just going to show you what version i'm using i'm just going to let this load we're going to get this ready we're going to also set up device agnostic code right from the start this time because this is best practice with pytorch so this way if we have a cuda device available our model is going to use that cuda device and our data is going to be on that computer device so there we go wonderful we've got pie torch 1.10.0 plus cuda 111 maybe that's 11.1 so let's check if cuda dot is available now i'm using google colab we haven't set up a gpu yet so it probably won't be available yet let's have a look wonderful so because we've started a new collab instance it's going to use the cpu by default so how do we change that we come up to runtime change runtime type i'm going to go harder accelerator gpu we've done this a few times now i am paying for google colab pro so one of the benefits of that is that it uh google collab reserves faster gpus for you you do not need google collab pro as i've said to complete this course you can use the free version but just recall google colab pro tends to give you a better gpu just because gpus aren't free wonderful so now we've got access to a gpu cuda what gpu do i have nvidia smi we have i have a tesla p100 with 16 gigabytes of memory which will be more than enough for the problem that we're going to work on in this video so i believe that's enough uh to cover for the first coding video let's in the next section we are working with custom data sets after all let's in the next video let's get some data hey now as i said in the last video we can't cover custom data sets without some data so let's get some data and just remind ourselves what we're going to build and that is food vision mini so we need a way of getting some food images and if we go back to google chrome torch vision data sets has plenty of built-in data sets and one of them is the food 101 data set food 101 so if we go in here this is going to take us to the original food 101 website so food 101 is 101 different classes of food it has a challenging data set of 101 different food categories with 101 000 images so it's a quite a beefy data set and so for each class 250 manually reviewed test images are provided so we have per class 101 classes 250 testing images and we have 750 training images now we could start working on this entire data set straight from the get go but to practice i've created a smaller subset of this data set and i'd encourage you to do the same with your own problems start small and upgrade when necessary so i've reduced the number of categories to three and the number of images to ten percent now you could reduce this to an arbitrary amount but i've just decided three is enough to begin with and 10 of the data and then if it works hey you could upscale that on your own accord and so i just want to show you the notebook that i used to create this data set and as extracurricular you could go through this notebook so if we go into extras oh for custom data creation this is just how i created the subset of data so making a data set to use with notebook number four i created it in custom image data set or image classification style so we have a top level folder of pizza steak and sushi we have a training directory with pizza steak and sushi images and we have a test directory with pizza steak and sushi images as well so you can go through that to check it out how it was made but now oh and also if you go to learn pytorch.io section 4 there's more information here about what food 101 is so get data here we go there's all the information about food 101 there's some resources the original food 101 data set torch vision data sets food 101 how i created this data set and actually downloading the data but now we're going to write some code because this data set the smaller version that i've created is on the pytorch deep learning repo under data and then we have pizza steak sushi.zip this one is a little spoiler for one of the exercises for this section but you'll see that later so let's go in here let's now write some code to get this data set from github pizza state sushi.zip and then we'll explore it we'll become one with the data so i just want to write down here our data set is a subset of the food 101 data set food 101 starts with 101 different classes of food so we could definitely build computer vision models for 101 classes but we're going to start smaller our data set starts with three classes of food and only 10 of the images so it's right here and a thousand images per class which is 750 training 250 testing and we have about 75 training images per class and about 25 testing images per class so why do this when starting out ml projects it's important to try things on a small scale and then increase the scale when necessary the whole point is to speed up how fast you can experiment because there's no point trying to experiment on things that if we try to train on 100 000 images to begin with our models might train take half an hour to train at a time so at the beginning we want to increase the rate that we experiment at and so let's get some data we're going to import requests so that we can request something from github to download this url here then we're also going to import zip file from python because our data is in the form of a zip file right now then we're going to get path lib because i like to use paths whenever i'm dealing with file paths or directory paths so now let's set up a path to a data folder and this of course will depend on where your data set lives what you'd like to do but i typically like to create a folder over here called data and that's just going to store all of my data for whatever project i'm working on so data path equals path data and then we're going to go image path equals data path slash pizza steak sushi that's our we're going to have images from those three classes pizza steak and sushi are three of the classes out of the 101 in food 101 so if the image folder doesn't exist so if our data folder already exists we don't want to re-download it but if it doesn't exist we want to download it and unzip it so if image path is dur so we want to print out the image path directory already exists skipping download and then if it doesn't exist we want to print image path does not exist creating one beautiful and so we're going to go imagepath.mkdir to make a directory we want to make its parents if we need to so the parent directories and we want to pass exist okay equals true so we don't get any errors if it already exists and so then we can write some code i just want to show you what this does if we run it so our target directory data slash pizza state sushi does not exist it's creating one so then we have now data and inside pizza steak sushi wonderful but we're going to fill this up with some images so that we have some data to work with and then the whole premise of this entire section will be loading this data of just images into pi torch so that we can build a computer vision model on it but i just want to stress that this step will be very similar no matter what data you're working with you'll have some folder over here or maybe it'll live on the cloud somewhere who knows wherever your data is but you'll want to write code to load it from here into pie torch so let's download pizza steak and sushi data so i'm going to use with i'll just x over here so we have more screen space with open i'm going to open the data path slash the file name that i'm trying to open which will be pizza steak sushi dot zip and i'm going to write binary as f so this is essentially saying i'm doing this in advance because i know i'm going to download this folder here so i know the the file name of it pizzastakessushi.zip i'm going to download that into google colab and i want to open it up so request equals request dot get and so when i want to get this file i can click here and then if i click download it's going to what do you think it's going to do well let's see if i wanted to download it locally i could do that and then i could come over here and then i could click upload if i wanted to so upload to session storage i could upload it from that but i prefer to write code so that i could just run this cell over again and have the file instead of being downloaded to my local computer it just goes straight into google colab so to do that we need the url from here and i'm just going to put that in there it needs to be as a string excuse me i'm getting trigger happy on the shift and enter wonderful so now i've got a request to get the content that's in here and github can't really show this because this is a zip file of images spoiler alert now let's keep going we're going to print out that we're downloading pizza steak and sushi data dot dot dot and then i'm going to write to file the request dot content so the content of the request that i just made to github so that's request is here using the python request library to get the information here from github this url could be wherever your file has been stored and then i'm going to write the content of that request to my target file which is this this here so if i just copy this i'm going to write the data to here data path slash pizza steak sushi zip and then because it's a zip file i want to unzip it so unzip pizzas take sushi data let's go with zip file so we imported zip file up there which is a python library to help us deal with zip files i'm going to use zip file.zip file i'm going to pass it in the data path so just the path that we did below datapad pizza steak sushi zip and this time instead of giving it write permissions so that's what wb stands for it stands for write binary i'm going to give it read permissions so i want to read this target file instead of writing it and i'm going to go as zip ref we could call this anything really but zip ref is kind of you'll see this a lot in different python examples so we're going to print out again so unzipping pizza steak and sushi data then we're going to go zip underscore ref dot extract all and we're going to go image path so what this means is it's taking the zip ref here and it's extracting all of the information that's within that zip ref so within this zip file to the image path which is what we created up here so if we have a look at image path let's see that image path wonderful so that's where all of the contents of that zip file are going to go into this file so let's see it in action you ready hopefully it works three two one run file is not a zip file oh no what did we get wrong so did i type this wrong dot zip data path oh we got the zip file here pizza steak sushi dot zip read data path okay i found the error so this is another thing that you'll have to keep in mind and i believe we've covered this before but i like to keep the errors in these videos so that you can see where i get things wrong because you never write code right the first time so we have this link in github we have to make sure that we have the raw link address so if i come down to here and copy the link address from the download button you'll notice a slight difference if we come back into here so i'm just going to copy that there so if we step through this github mr d burke pie torch deep learning we have raw instead of blob so that is why we've had an error is that our code is correct it's just downloading the wrong data so let's change this to the raw so just keep that in mind you must have raw here and so let's see if this works do we have the correct data oh we might have to delete this oh there we go test beautiful train pizza steak sushi wonderful so it looks like we've got some data and if we open this up what do we have we have various jpegs okay so this is our testing data and if we click on there we've got an image of pizza beautiful so we're going to explore this a little bit more in the next video but that is some code that we've written to download datasets or download our own custom data set now just recall that we are working specifically on a pizza steak and sushi problem for computer vision however our whole premise is that we have some custom data and we want to convert these how do we get these into tensors that's what we want to do and so the same process will be for your own problems we'll be loading a target data set and then writing code to convert whatever the format the data set is in into tensors for pytorch so i'll see you in the next video let's explore the data we've downloaded welcome back in the last video we wrote some code to download a target data set our own custom data set from the pi torch deep learning data directory and if you'd like to see how that data set was made you can go to pytorch deep learning extras and it's going to be in the custom data creation notebook here for o4 so i've got all the code there all we've done is take data from the food 101 dataset which you can download from this website here or from torch vision so if we go to torch vision food 101 we've got the data set built into pi torch there so i've used that data set from pytorch and broken it down from 101 classes to three classes so that we can start with a small experiment so there we go get the training data data sets food 101 and then i've customized it to be my own style so if we go back to colab we've now got pizza steak sushi a test folder which will be our testing images and a train folder which will be our training images this data is in standard image classification format but we'll cover that in a second all we're going to do in this video is kick off section number two which is becoming one with the data which is one of my favorite ways to refer to data preparation and data exploration so becoming one with the data and i'd just like to show you one of my favorite quotes from abraham loss function so if i had eight hours to build a machine learning model i'd spend the first six hours preparing my data set and that's what we're going to do abraham lost function sounds like he knows what is going on but since we've just downloaded some data let's explore it hey and we'll write some code now to walk through each of the directories how you explore your data will depend on what data you've got so we've got a fair few different directories here with a fair few different folders within them so how about we walk through each of these directories and see what's going on if you have visual data you'll probably want to visualize an image so we're going to do that in a second too write a little doc string for this helper function so walks through dirt path returning its contents now just in case you didn't know abraham lost function does not exist as far as i know but i did make up that quote so we're going to use the os dot walk function os dot walk and we're going to pass it in a dir path and what does walk do we can get the doc string here directory tree generator for each directory in the directory tree rooted at the top including top itself but in excluding dot and dot dot yields a three tuple der path der names and file names you can step through this in the python documentation if you'd like but essentially it's just going to go through our target directory which in this case will be this one here and walk through each of these directories printing out some information about each one so let's see that in action this is one of my favorite things to do if we're working with standard image classification format data so there are lane length der names directories and let's go lang lan file names i'm saying it length like i've got the g on the end but it's just len images in let's put in here der path so a little bit confusing if you've never used walk before but it's so exciting to see all of the information in all of your directories oh we didn't we didn't run it let's check our function now walk through dirt and we're going to pass it in the image path which is what well it's going to show us how beautiful so let's compare what we've got in our print out here there are two directories and zero images in data pizza steak sushi so this one here there's zero images but there's two directories test and train wonderful and there are three directories in data pizza steak sushi test yes that looks correct three directories pizza steak sushi and then we have 0 directories and 19 images in pizza steak sushi slash tesla steak if we have a look at this so that means there's 19 testing images for steak let's have a look at one of them there we go now again these are from the food 101 data set the original food 101 data set which is just a whole bunch of images of food 100 000 of them there's some steak there wonderful and we're trying to build a food vision model to recognize what is in each image then if we jump down to here we have three directories in the training directory so we have pizza steak sushi and then we have 75 steak images 72 sushi images and 78 pizza so slightly different but very much the same numbers they're not too far off each other so we've got about 75 or so training images and we've got about 25 or so testing images per class now these were just randomly selected from the food 101 data set 10 percent of three different classes so let's keep pushing forward and we're going to set up our training and test paths paths so i just want to show you we'll just set up this and then i'll just show you the the standard image classification setup imagepath.train and we're going to go tester so if you're working on an image classification problem we want to set this up as test and then if we print out the trainer and the tester this is what we're going to be trying to do we're going to write some code to go hey look at this path for our training images and look at this path for our testing images and so this is the standard image classification data format is that you have your overall data set folder and then you have a training folder dedicated to all of the training images that you might have and then you have a testing folder dedicated to all of the testing images that you might have and you could have a validation data set here as well if you wanted to but to label each one of these images the class name is the folder name so all of the pizza images live in the pizza directory the same for steak and the same for sushi so depending on your problem your own data format will depend on whatever you're working on you might have folders of different text files or folders of different audio files but the premise remains we're going to be writing code to get our data here into tensors for use with pytorch and so where does this come from this image data classification format well if we go to the torch vision.datasets documentation as you start to work with more data sets you'll start to realize that there are standardized ways of storing specific types of data so if we come down to here base classes for custom data sets we'll be working towards using this image folder data set but this is a generic data loader where the images are arranged in this way by default so i've specifically formatted our data to mimic the style that this pre-built data loading function is for so we've got a root directory here in case of we were classifying dog and cat images we have root then we have a dog folder then we have various images and the same thing for cat this would be dog vs cat but the only difference for us is that we have food images and we have pizza steak sushi if we wanted to use the entire food 101 data set we would have 101 different folders of images here which is totally possible but to begin with we're keeping things small so let's keep pushing forward as i said we're dealing with a computer vision problem so what's another way to explore our data other than just walking through the directories themselves let's visualize an image hey but we've done that before with just clicking on the file how about we write some code to do so we'll replicate this but with code i'll see you in the next video welcome back in the last video we started to become one with the data and we learned that we have about 75 images per training class and about 25 images per testing class and we also learned that the standard image classification data structure is to have the stake images within the stake folder of the training data set and the same for test and the pizza images within the pizza folder and so on for each different image classification class that we might have so if you want to create your own data set you might format it in such a way that your training images are living in a directory with their classification name so if you wanted to classify photos of dogs and cats you might create a training folder of train slash dog train slash cat put images of dogs in the dog folder images of cats in their cat folder and then the same for the testing data set but the premise remains i'm going to sound like a broken record here we want to get our data from these files whatever files they may be in whatever data structure they might be in into tensors but before we do that let's keep becoming one with the data and we're going to visualize an image so visualizing an image and you know how much i love randomness so let's select a random image from all of the files that we have in here and let's plot it hey because we could just click through them and visualize them but i like to do things with code so specifically let's let's plan this out let's write some code to number one is get all of the image paths we'll see how we can do that with the path path lib library we then want to pick a random image path using we can use python's random for that python's random.choice we'll pick a single image random.choice then we want to get the image class name and this is where pathlib comes in handy class name recall that whichever target image we pick the class name will be whichever directory that it's in so in the case of if we picked a random image from this directory the class name would be pizza so we can do that using i think it's going to be pathlib.path and then we'll get the parent folder wherever that image lives so the parent image parent folder that parent directory of our target random image and we're going to get the stem of that so we have stem stem is the last little bit here number four what should we do well we want to open the image so since we're working with images let's open the image with python's pill which is python image library which will actually be pillow so if we go python pillow a little bit confusing when i started to learn about python image manipulation so pillow is a friendly pill fork but it's still called pill so just think of pillow as a way to process images with python so pill is the python imaging library by frederic lund and so alex clark and contributors have created pillow so thank you everyone and let's go to number five what do we want to do as well we want to yeah let's get some metadata about the image we'll then show the image and print metadata wonderful so let's import random because machine learning is all about harnessing the power of randomness and i like to use randomness to explore data as well as model it so let's set the seed so we get the same image on both of our ends so random.seed i'm going to use 42. you can use whatever you'd like but if you'd like to get the same image of me i'd suggest using 42 as well now let's get all the image paths so we can do this because our image path list we want to get our image path so recall that our image path is this so this folder here i'm just going to close all this so this is our image path this folder here you can also go copy path if you wanted to we're just going to get something very similar there that's going to error out so i'll just comment that so it does an error that's our path but we're going to keep it in the posix path format and we can go list let's create a list of image path dot glob which stands for grab i don't actually know what glob stands for but to me it's like glob together all of the images that or all of the files that suit a certain pattern so glob together for me means stick them all together and you might be able to correct me if i've got the wrong meaning there i'd appreciate that and so we're going to pass in a certain combination so we want star star and then we want star dot jpg now why are we doing this well because we want every image path so star is going to be this first directory here so any combination it can be train or test and then this star means anything for what's inside tests and let's say this first star is equal to test this second star is equal to anything here so it could be any of pizza steak or sushi and then finally this star let's say it was test pizza this star is anything in here that is before dot jpg so it could be any one of these files here now this will make more sense once we print it out so image path list let's have a look there we go so now we've got a list of every single image that's within pizza steak sushi and this is just another way that i like to visualize data is to just get all of the parts and then randomly visualize it whether it be an image or text or audio you might want to randomly listen to it recall that each each of the domain libraries have different input and output methods for different data sets so if we come to torch vision we have utils so we have different ways to draw on images reading and writing images and videos so we could load an image via read image we could decode it we could do a whole bunch of things i'll let you explore that as extracurricular but now let's select a random image from here and plot it so we'll go number two which was our step up here pick a random image so pick a random image path let's get rid of this and so we can go random image path equals random.choice harness the power of randomness to explore our data let's get a random image from image part list and then we'll print out random image path which one was our lucky image that we selected beautiful so we have a test pizza image as our lucky random image and because we've got a random seed it's going to be the same one each time yes it is and if we comment out the random seed we'll get a different one each time we've got a stake image we've got another steak image another steak image oh three in a row four in a row pizza okay let's keep going so we'll get the image class from the path name so the image class is the name of the directory because our image data is in standard image classification format where the image is stored so let's do that image class equals random image path dot parent dot stem and then we're going to print image class what do we get so we've got pizza wonderful so the parent is this folder here and then the stem is the end of that folder which is pizza beautiful now what are we up to now we're working with images let's open up the image so we can open up the image using pill we could also open up the image with pytorch here so with read image but we're going to use pill to keep things a little bit generic for now so open image image equals image so from pill import image and the image class has an open function and we're just going to pass it in here the random image path note if this is corrupt if your image is corrupt this may error so then you could potentially use this to clean up your data set i've imported a lot of images with image.open of our target dataset here i don't believe any of them are corrupt but if they are please let me know and we'll find out later on when our model tries to train on it so let's print some metadata so when we open our image we get some information from it so let's go our random image path is what random image path we're already printing this out but we'll do it again anyway and then we're going to go the image class is equal to what will be the image class wonderful and then we can print out and get some metadata about our images so the image height is going to be img.height we get that metadata from using the pill library and then we're going to print out image width and we'll get img.width and then we'll print the image itself wonderful and we can get rid of this then we can get rid of this let's now have a look at some random images from our data set lovely we've got an image of pizza there now i will warn you that one of the downsides of working with food data is it does make you a little bit hungry so there we've got some sushi and then we've got some more sushi some steak and we have a steak we'll go one more for good luck and we finish off with some sushi oh that could be a little bit confusing to me i thought that might be state to begin with and this is do you see now we'll do one more why it's important to sort of visualize your images randomly because you never know what you're going to come across and this way once we visualize enough images you could do this 100 more times you could do this 20 more times until you feel comfortable to go hey i feel like i know enough about the data now let's see how well our model goes on this sort of data so finish off on this stake image and now i'll set you a little challenge before the next video is to visualize an image like we've done here but this time do it with matplotlib so try to visualize an image with matplotlib that's your little challenge before the next video so give that a go we want to do a random image as well so quite a similar setup to this but instead of printing out things like this we want to visualize it using matplotlib so try that out and we'll do it together in the next video we are well on the way to creating our own python custom data set we've started to become one with the data but now let's uh continue to visualize another image i set you the challenge in the last video to try and replicate what we've done here with the pill library with matplotlib so now let's give it a go hey and why use matplotlib well because matplotlib and i'm going to import numpy as well because we're going to have to convert this image into an array that was a little uh little trick that i didn't quite elaborate on but i hope you tried to decode it out and figured it out from the errors you received but matplotlib is one of the most fundamental data science libraries you're going to see it everywhere so it's just important to be aware of how to plot images and data with matplotlib so turn the image into an array so we can go image as array and i'm going to use the numpy method np as array we're going to pass it in the image recall that the image is the same image that we've just set up here and we've already opened it with pill and then i'm going to plot the image so plot the image with matplotlib plt dot figure and then we can go fig size equals 10 7 and then we're going to go plt dot m show image as array pass it in the array of numbers i'm going to set the title here as an f string and then i'm going to pass in image class equals image class and then i'm going to pass in image shape so we can get the shape here now this is another important thing to be aware of of your different data sets when you're exploring them is what is the shape of your data because what's one of the main errors in machine learning and deep learning it's shape mismatch issues so if we know the shape of our data we can start to go okay i kind of understand what shape i need my model layers to be in what what shape i need my other data to be in i'm going to turn the axises off here beautiful so look at what we've got now i've just thrown this in here without really explaining it but we've seen this before in the computer vision section is our image shape is 512 306 3. now the dimensions here height is 512 pixels the width is 306 pixels and it has three color channels so what format is this this is color channels last which is the default for the pill library there's also the default for matplotlib but pytorch recall is default if we put the color channels at the start color channels first now there is a lot of debate as i've said over which is the best order it looks like it's leading towards going towards this but for now pie torch defaults to color channels first but that's okay because we can manipulate these dimensions to what we need for whatever code that we're writing and the three color channels is what red green and blue so if you combine red green and blue in some way shape or form you get the different colors here that represent our image and so if we have a look at our image as array our image is in numerical format wonderful so okay we've got one way to do this for one image i think we start moving towards scaling this up to do it for every image in our data folder so let's just finish off this video by visualizing one more image what do we get same premise the image is now as an array different numerical values we've got a delicious looking pizza here of shape 5 12 5 12 with color channels last and we've got the same thing up here so that is one way to become one with the data is to visualize different images especially random images you could do the same thing visualizing different text samples that you're working with or listening to different audio samples it depends what domain you're working in so now in the next video let's start working towards turning all of the images in here now that we've visualized some of them and become one with the data we've seen that the shapes are varying in terms of height and width but they all look like they have three color channels because we have color images but now we want to write code to turn all of these images into pie torch tenses so let's start moving towards that i'll see you in the next video hello and welcome back in the last video we converted an image to a numpy array and we saw how an image can be represented as an array but what if we'd like to get this image from our custom data set over here pizza steak sushi into pie torch well let's cover that in this video so i'm going to create a new heading here and it's going to be transforming data and so what we'd like to do here is i've been hinting at the fact the whole time is we want to get our data into tensor format because that is the data type that python accepts so let's write down here before we can use our image data with pytorch now this goes for images other vision data it goes for text it goes to audio basically whatever kind of data set you're working with you need some way to turn it into tenses so that's step number one turn your target data into tenses in our case it's going to be a numerical representation of our images and number two is turn it into a torch.utils.data.dataset so recall from a previous video that we've used the dataset to house all of our data in tensor format and then subsequently we've turned our data sets our python data sets into torch dot utils dot data loader and a data loader creates an iterable or a batched version of our data set so for short we're going to call these data set and data loader now as i discussed previously if we go to the pi torch documentation torch vision for torch vision and this is going to be quite similar for torch audio torch text torch rec torch data eventually when it comes out of beta there are different ways to create such data sets so we can go into the data sets module and then we can find built-in data sets and then also base classes for custom data sets but if we go into here image folder there's another parameter i'd like to show you and this is going to be universal across many of your different data types is the transform parameter now the transform parameter is a parameter we can use to pass in some transforms on our data so when we load our data sets from an image folder it performs a transform on those data samples that we've sent in here as the target data folder now this is a lot more easier to understand through illustration rather than just talking about it so let's create a transform and the main transform we're going to be doing is transforming our data and we're turning it into tensors so let's see what that looks like so we're going to i'm just going to re-import all of the main libraries that we're going to use so from torch utils.data let's import data loader and we're going to import from torch vision i'm going to import data sets and i'm also going to import transforms beautiful and i'm going to create another little heading here this is going to be 3.1 transforming data with torch vision dot transform so the main transform we're looking to here is turning our images from jpegs if we go into train and then we go into any folder we've got jpeg images and we want to turn these into tensor representations so there's some pizza there if we get out of this let's see what we can do how about we create a transform here write a transform for image and let's start off by calling it data transform and i'm going to show you how we can combine a few transforms together if you want to combine transforms together you can use transforms dot compose you can also use nn dot sequential to combine transforms but we're going to stick with transforms.compose for now and it takes a list and so let's just write out three transforms to begin with and then we can talk about them after we do so so we want to resize our images to 6464. now why might we do this well do you recall the last section computer vision we use the tiny vgg architecture and what size were the images that the tiny vgg architecture took well we replicated the cnn website version or the cnn explainer website version and they took images of size 6464. so perhaps we want to leverage that computer vision model later on so we're going to resize our images to 6464. and then we're going to create another transform and so this is i just want to highlight how transforms can help you manipulate your data in a certain way so if we wanted to flip the images which is a form of data augmentation in other words artificially increasing the diversity of our data set we can flip the images randomly on the horizontal so transforms dot random horizontal flip and i'm going to put a probability in here of p equals 0.5 so that means 50 percent of the time if an image goes through this transform pipeline it will get flipped on the horizontal axis as i said this makes a lot more sense when we visualize it so we're going to do that very shortly and finally we're going to turn the image into a torch tensor so we can do this with transforms.2 tensor and now where might you find such transforms so this transform here says two tensor if we have a look at the dock string we've got convert a pill image which is what we're working with right now or a numpy array to a tensor this transform does not support torch script if you'd like to find out what that is i'd like to read the documentation for that it's essentially turning your pytorch code into a python script it converts a pill image or a numpy array from height with color channels in the range 0 to 255 which is what our values are up here they're from zero to two five five red green and blue to a torch float tensor of shape color channels height width in the range zero to one so it will take our tensor values here or our numpy array values from 0 to 255 and convert them into a torch tensor in the range 0 to 1. we're going to see this later on in action but this is our first transform so we can pass data data through that in fact i'd encourage you to try that out see what happens when you pass in data transform what happens when you pass it in our image as array image as array let's see what happens hey oh image should be pill image got class numpy all right what if we just pass in our straight up image so this is a pill image there we go beautiful so if we look at the shape of this what do we get 3 64 64. there's 64. and if what if we wanted to change this to 224 which is another common value for computer vision models two two four two two four do you see how powerful this is this little transforms module the torch vision library we'll change that back to 6464. and then if we have a look at what uh d type of our transform tensor is we get torch float32 beautiful so now we've got a way to transform our images into tensors and so but we're still only doing this with one image how about we progress towards doing it for every image in our data folder here but before we do that i'd like to visualize what this looks like so in the next video let's write some code to visualize what it looks like to transform multiple images at a time and i think it'll be a good idea to compare the transform that we're doing to the original image so i'll see you in the next video let's write some visualization code let's now follow our data explorer's motto of visualizing our transformed images so we saw what it looks like to pass one image through a data transform and if we wanted to find more documentation on torch vision transforms where could we go there's a lot of these so transforming and augmenting images this is actually going to be your extra curriculum for this video so transforms are common image transformations available in the transforms module they can be chained together using compose which is what we've already done beautiful and so if you'd like to go through all of these there's a whole bunch of different transforms that you can do including some data augmentation transforms and then if you'd like to see them visually i'd encourage you to check out illustration of transforms but let's write some code to explore our own transform visually first so i'll leave this as a link so i'll go up here right here transforms help you get your images ready to be used with a model slash perform data augmentation wonderful so we've got a way to turn images into tensors that's what we want for our model we want our images as pi torch tensors the same goes for any other data type that you're working with but now i just like to visualize what it looks like if we plot a number of transformed images so we're going to make a function here that takes in some image paths a transform a number of images to transform at a time and a random seed here because we're going to harness the power of randomness and sometimes we want to set the seed sometimes we don't so we have an image path list that we've created before which is just all of the image paths that we have of our data set so data pizza state sushi now how about we select some random image paths and then take the image from that path run it through our data transform and then compare the original image of what it looks like and the transformed image and what that looks like let's give it a try hey so i'm going to write a doctrine of what this does and then selects random images from a path of images and loads slash transforms them then plots the original verse the transformed version so that's quite a long dock string but that'll be enough we could put in some stuff for the image paths transforms and seed we'll just code this out let's go random seed we'll create the seed well maybe we do it if seed random seed let's put that and we'll set seed to equal none by default that way we can uh we'll see if this works hey if and down code it out random image paths and then we're going to go random sample from the image paths and the number of sample that we're going to do so random sample is going to this will be a list i'll just pass in here that this is a list so we're going to randomly sample k which is going to be n so three images from our image path list and then we're going to go for image path we're going to loop through the randomly sampled image paths you know how much i love harnessing the power of randomness for visualization so for image path in random image paths let's open up that image using pill image dot open image path as f and then we're going to create a figure and an axis and we're going to create a subplot with matplotlib so subplots and we want it to create one row so it goes n rows and coles one row and end calls equals two and then on the first or the zeroth axis we're going to plot the original image so in show we're just going to pass it straight in f and then if we want to go ax 0 we're going to set the title so set title we're going to set it to be the original so we'll create this as an f string original and then newline will create a size variable and this is going to be f dot psi so we're just getting the size attribute from our file so we'll keep going and we'll turn off the axes here so axis and we're going to set that to false now let's transform on the first axes plot we're going to transform and plot target image this is so that our images are going to be side by side the original and the transformed version so there's one thing that we're going to have to do i'll just i'll code it out in the wrong way first i think that'll be a good way to illustrate what's going on f so i'm just going to put a note here note we will need to change shape for matplotlib because we're going to come back here because what does this do what have we noticed that our transform does if we check the shape here oh excuse me it converts our image to color channels first whereas matplotlib prefers color channels last so just keep that in mind for when we're going forward this code i'm writing it it will error on purpose so transformed image and then we're going to go acts 1 as well we're going to set the title which is going to be transformed and then we'll create a new line and we'll say size is going to be transformed image dot shape oh probably a bit of yeah we could probably go shape here and then finally we're going to go axe one we're going to turn the axis we're going to set that to false you can also set it to off so you could write false or you could write off you might see that different versions of that somewhere and i'm going to write a super title here which we'll see what this looks like class is going to be image path so we're getting the target image path and we're just going to get the attribute or the parent attribute and then the stem attribute from that just like we did before to get the class name and then i'm going to set this to a larger font size so that we make some nice looking plots right if we're going to visualize our data we might as well make our plots visually appealing so let's plot some transformed data or transformed images so image paths we're going to set this to image part list which is just the variable we have down below which is the path list a list containing all of our image paths our transform we're going to set our transform to be equal to our data transform so this just means that if we pass the transform in our image is going to go through that transform and then go through all of these it's going to be resized it's going to be randomly horizontally flipped and it's going to be converted to a tensor and then so we're going to set that data transfer there or data transform sorry n is going to be three so we plot three images and we'll set the seed to 42 to begin with let's see if this works oh what did we get wrong we have invalid shape as i said i love seeing this error because we have seen this error many times and we know what to do with it we know that we have to rearrange the shapes of our data in some way shape or form well i said shape a lot there that's all right let's go here permute this is what we have to do we have to permute we have to swap the order of the axises so right now our color channels is first so we have to bring this color channel axis or dimension to the end so we need to shuffle these across so 64 into here 64 under here and three on the end we need to in other words turn it from color channels first to color channels last so we can do that by permuting it to have the first axi come now in the zero dimension spot and then number two is going to be in the first dimension spot and then number zero is going to be at the back end so this is essentially going from c h w and we're just changing the order to be h w c so the exact same data is going to be within that tensor we're just changing the order of the dimensions let's see if this works look at that oh i love seeing some manipulated data we have a class of pizza and the original image is there and it's 512 by 512 but then we've resized it using our transform notice that it's a lot more pixelated now but that makes sense because it's only 64 64 pixels now why might we do such a thing well one if is this image still look like that well to me it still does but the most important thing will be does it look like that to our model does it still look like the original to our model now 64 by 64 there is less information encoded in this image so our model will be able to compute faster on images of this size however we may lose some performance because not as much information is encoded as the original image again the size of an image is something that you can control you can set it to be a hyper parameter you can tune the size to see if it improves your model but i've just decided to go 60 60 64 64 3 in line with the cnn explainer website so a little hint we're going to be re-replicating this model that we've done before now you notice that our images are now the same sign 64643 as what the cnn explainer model uses so that's where i've got that from but again you could change this to size to whatever you want and we see oh we've got a stake image here and you notice that our image has been flipped on the horizontal so the horizontal axis our image has just been flipped same with this one here so this is the power of torch transforms now there are a lot more transforms as i said you can go through them here to have a look at what's going on illustrations of transforms is a great place so there's resize there's center crop you can crop your images you can crop five different locations you can do grayscale you can change the color a whole bunch of different things i'd encourage you to check this out that's your extra curriculum for this video but now that we've visualized a transform this is what i hinted at before that we're going to use this transform for when we load all of our images in using into a torch data set so i just wanted to make sure that they had been visualized first we're going to use our data transform in the next video when we load all of our data using a torch vision.datasets helper function so let's give that a go i'll see you in the next video have a look at that beautiful plot we've got some original images and some transformed images and the beautiful thing about our transformed images is that they're in tensor format which is what we need for our model that's what we've been slowly working towards we've got a data set and now we've got a way to turn it into tenses ready for a model so let's just visualize what another i'll turn the seed off here so we can look at some more random images there we go okay so we've got stank pixelated because we're downsizing 64 64 3. same thing for this one and it's been flipped on the horizontal and then same thing for this pizza image and we'll do one more to finish off wonderful so that is the premise of transforms turning our images into tenses and also manipulating those images if we want to so let's get rid of this i'm going to make another heading we're up to section or part four now and this is going to be option one so loading image data using image folder and now i'm going to turn that into markdown and so let's go torch vision data sets so recall how each one of the torch vision domain libraries has its own data sets module that has built-in functions for helping you load data in this case we have an image folder and there's a few others here if you'd like to look into those but an image folder this class is going to help us load in data that is in this format the generic image classification format so this is a pre-built data sets function just like there's pre-built data sets we can use pre-built data set functions now option two later on this is a spoiler is we're going to create our own custom version of a data set loader but we'll see that in a later video so let's see how we can use image folder to load all of our custom data our custom images into tensors so this is where the transform is going to come in helpful so let's write here we can load image classification data using let's write this let's write the full path name torch vision.datasets dot image folder put that in there beautiful and so let's just start it out use image folder to create data sets now in a previous video i hinted at the fact that we can pass a transform to our image folder class that's going to be right here so let's see what that looks like in practice so from torch vision i'm going to import data sets because that's where the image folder module lives and then we can go train data equals data sets dot image folder and we're going to pass in the root which is our trainer because we're going to do it for the training directory first and then we're going to pass in a transform which is going to be equal to our data transform and then we're going to pass in a target transform but we're going to leave this as none which is the default i believe if we go up to here yeah target transform is optional so what this means is this is going to be a transform for the data and this is going to be a transform for the label slash target pi pytorch likes to use target i like to use label but that's okay so this means that we don't need a target transform because our labels are going to be inferred by the target directory where the images live so our pizza images are in this directory and they're going to have pizza as the label because our data set is in standard image classification format now if your data set wasn't in a standard image classification format you might use a different data loader here a lot of them will have a transform for the data so this transform is going to run our images whatever images are loaded from these folders through this transform that we've created here it's going to resize them randomly flip them on the horizontal and then turn them into tenses which is exactly how we want them for our pi torch models and if we wanted to transform the labels in some way shape or form we could pass in a target transform here but in our case we don't need to transform the labels so let's now do the same thing for the test data and so that's why i wanted to visualize our transforms in the previous videos because otherwise we're just passing them in as a transform so really what's going to happen behind the scenes is all of our images are going to go through these steps and so that's what they're going to look like when we turn them into a data set so let's create the tester here or the test data set the transform we're going to transform the test data set in the same way we've transformed our training data set and we're just going to leave that like that so let's now print out what our data sets look like train data and test data beautiful so we have a data set a torch data set which is an image folder and we have number of data points this is going to be for the training data set we have 225 so that means about 75 images per class and we have the root location which is the folder we've loaded them in from which is our training directory we've set these two up before trainer and tester and then we have a transform here which is a standard transform a resize followed by a random horizontal flip followed by two tensor then we've got basically the same output here for our test directory except we have less samples there so let's get a few little attributes from the image folder this is one of the benefits of using a pytorch pre-built data loader is that or data set loader is that it comes with a fair few attributes so we could go to the documentation find this out from in here inherits from data set folder keep digging into there or we could just come straight into google colab let's go get class names as a list can we go train data dot and then press tab beautiful so we've got a fair few things here that are attributes let's have a look at classes this is going to give us a list of the class names class names this is very helpful later on so we've got pizza steak sushi we're trying to do everything with code here so if we have this attribute of traindata.classes we can use this list later on for when we plot images straight from our data set or make predictions on them and we want to label them you can also get class names as a dictionary map to their integer index that is so we can go train data dot and press tab we've got class to idx let's see what this looks like class decked wonderful so then we've got our string class names mapped to their integer so we've got pizza is zero steak is one sushi is two now this is where the target transform would come into play if you wanted to transform those these labels here in some way shape or form you could pass a transform into here and then if we keep going let's check the lengths of what's going on check the lengths of our data set so we've seen this before but this is going to just give us how many samples that we have length train data length test data beautiful and then of course if you'd like to explore more attributes you can go train data dot and then we've got a few other things functions images loader samples targets if you wanted to just see the images you can go dot samples if you wanted to see just the labels you can go dot targets this is going to be all of our labels look at that and i believe they're going to be in order so we're going to have 0 0 0 1 1 1 2 2 2 and then if we wanted to have a look let's say we have a look at the first sample hey we have data pizza steak sushi train pizza there's the image path and it's a label zero for pizza wonderful so now we've done that how about we we've been visualizing this whole time so let's keep up that trend and let's visualize a sample and a label from the train data data set so in this video we've used image folder to load our images into tensors and because our data is already in standard image classification format we can use one of torchvision.dataset's pre-built functions so let's do some more visualization in the next video i'll see you there welcome back in the last video we used datasets.imagefolder to turn all of our image data into tensors and we did that with the help of our data transform which is a little pipeline up here to take in some data or specifically an image resize it to a value that we've set in our case 6464 randomly flip it along the horizontal we don't necessarily need this but i've just put that in there to indicate what happens when you pass an image through a transforms pipeline and then most importantly we've turned our images into a torch tensor so that means that our data our custom data set this is so exciting is now compatible to be used with a pytorch model so let's keep pushing forward we're not finished yet we're going to visualize some samples from the train data data set so let's how can we do this let's get we can index on the train data data set to get a single image and a label so if we go can we do train data zero what does that give us okay so this is going to give us an image tensor and it's associated label in this case it's a image of pizza because why it's associated label is pizza so let's take the zero zero so this is going to be our image and the label is going to be train data zero and we're just going to get the first index item there which is going to be one and then if we have a look at them separately image and label beautiful so now one of our target images is in tensor format exactly how we want it and its label is in numeric format as well which is also exactly how we want it and then if we wanted to convert this back to a non-label we can go class names and index on that and we see pizza and i mean non-label is in non-numeric we can get it back to string format which is human understandable we can just index on class names so let's print out some information about what's going on here print f we're going to go image tensor i love f strings if you haven't noticed yet image tensor and we're going to set in new line we're going to pass it in our image which is just the image that we've got here then we'll print in some more information about that this is still all becoming one with the data right where we're slowly finding out information about our data set so that if errors arise later on we can go hmm our image or we're getting a shape error and i know our images are of this shape or we're getting a a data type error which is why i've got the dot d type here and that might be why we're getting a data type issue so let's do one more with the image label label oh well actually we'll do one more we'll do print we'll get the label data type as well label this will be important to take note of later on type as i said three big issues shape mismatch device mismatch and data type mismatch can we get the type of our label beautiful so we've got our image tensor and we've got its shape it's of torch size 364 64. that's exactly how we want it the data type is torch float 32 which is uh the default data type in pi torch our image label is zero and the label data type is of integer so let's try and plot this and see what it looks like hey using matplotlib so first of all what do we have to do well we have to rearrange the order of dimensions in other words matplotlib likes color channels last so let's see what looks this looks like we'll go image permute we've done this before image dot permute one two zero means we're reordering the dimensions zero would usually be here except that we've taken the 0th dimension the color channels and put it on the end and shuffled the other two forward so let's now print out different shapes i love printing out the change in shapes helps me really understand what's going on because sometimes i look at a line like this and it doesn't really help me but if i print out something of what the shapes were originally and what they changed to well hey that's a big help that's what jupiter notebooks are all about right so this is going to be color channels first height width and depending on what data you're using if you're not using images if you're using text still knowing the shape of your data is a very good thing we're going to go image permute dot shape and this should be everything going right is height with color channels on the end here and we're just going to plot the image you can never get enough plotting practice plot the image i'm going to go plt.figure we'll pass in fig size equals 10 7 and then we're going to go plt.mshow we'll pass in the permuted image image underscore permutes and then we'll turn off the axes and we will set the title to be class names and we're going to index on the label just as we did before and we're going to set the font size equal to 14 so it's nice and big here we go beautiful there is our image of pizza it is very pixelated because we're going from about 512 as the original size 512 by 512 to 64 64. i would encourage you to try this out potentially you could use a different image here so we've indexed on sample zero maybe you want to change this to just be a random image and go through these steps here and then if you'd like to see different transforms i'd also encourage you to try changing this out our transform pipeline here maybe increase the size and see what it looks like and if you're feeling really adventurous you can go into torch vision and look at the transforms library here and then try one of these and see what it does to our images but we're going to keep pushing forward we are going to look at another way or actually i think for completeness let's now turn we've got a data set we want to we wrote up here before that we wanted to turn our images into a data set and then subsequently a torch utils data data loader so we've done this before by batching our images or batching our data that we've been working with so i'd encourage you to give this a shot yourself try to go through the next video and create a train data loader using our trained data wherever that is trained data and a test data loader using our test data so give that a shot and we'll do it together in the next video we'll turn our data sets into data loaders welcome back how'd you go in the last video i issued you the challenge to turn our data sets into data loaders so let's do that together now i hope you gave it a shot that's the best way to practice so turn loaded images into data loaders so we're still adhering to our pie torch workflow here we've got a custom data set we found a way to turn it into tenses in the form of data sets and now we're going to turn it into a data loader so we can turn our data sets into iterables or batchify our data so let's write down here a data loader is going to help us turn our data sets into iterables and we can customize the batch size write this down so our model can see batch size images at a time so this is very important as we touched on in the last section computer vision we create a batch size because if we had a hundred thousand images chances are if they were all in one data set there's a hundred thousand images in the food 101 data set we're only working with about 200. if we tried to load all 100 000 in one hit chances are our hardware may run out of memory and so that's why we matchify our images so if we have a look at this nvidia smi our gpu only has 16 gigabytes i'm using a tesla t4 right now well has about 15 gigabytes of memory so if we tried to load a hundred thousand images into that whilst also computing on them with a pi torch model potentially we're going to run out of memory and run into issues so instead we can turn them into a data loader so that our model looks at 32 images at a time and can leverage all of the memory that it has rather than running out of memory so let's uh turn our train and test data sets into data loaders turn train and test data sets into data loaders now this is not just for image data this is for all kinds of data in pi torch images text audio you name it so import data loader then we're going to create a train data loader we're going to set it equal to data loader we're going to pass in a data set so let's set this to train data let's set the batch size what should we set the batch size to i'm going to come up here and set it as a capital variable i'm going to use 32 because 32 is a good batch size so we'll go 32 or actually let's start small let's just start with a batch size of 1 and see what happens batch size 1 number of workers so this parameter is going to be this is an important one i'm going to i potentially have covered it before but i'm going to introduce it again is this going to be how many cores or how many cpu cores that is used to load your data so the higher the better usually and you can set this via os cpu count which will count how many cpus your compute hardware has so i'll just show you how this works import os and this is a python os module we can do cpu count to find out how many cpus our google colab instance has mine has two your number may vary but i believe most colab instances have two cpus if you're running this on your local machine you may have more if you're running it on dedicated deep learning hardware you may even have even more right so generally if you set this to one it will use one cpu core but if you set it to os.cpu count it will use as many as possible so we're just going to leave this as one right now you can customize this to however you want and i'm going to shuffle the training data because i don't want my model to recognize any order in the training data so i'm going to mix it up and then i'm going to create the test data loader data set equals test data and batch size equals 1 num workers i'm going to set this to equal one as well again you can customize each of these their hyper parameters to whatever you want number of workers generally the more the better and then i'm going to set shuffle equals false for the test data so that if we want to evaluate our models later on our test data set is always in the same order so now let's have a look at train data loader see what happens and test data loader wonderful so we get two instances of torch utils.data.dataloader and now we can see if we can visualize something from the trained data loader as well as the test data loader or actually maybe we just visualize something from one of them so we're not just double handling everything do we get a length here wonderful because we're using a batch size of one our lengths of our data loaders are the same as our data sets now of course this would change if we set oh we didn't even set this to the batch size parameter batch size let's come down here and do the same here batch size so we'll watch this change if we wanted to look at 32 images at a time we definitely could do that so now we have eight batches because 22 225 divided by 32 equals roughly eight and then 75 divided by 32 also equals roughly three and remember these numbers are going to be rounded if there are some overlaps so let's get rid of we'll change this back to one and we'll keep that there we'll get rid of these two and let's see what it looks like to plot an image from our data loader or at least have a look at it check out the shapes that's probably the most important point at this time we've already plotted enough things so let's iterate through our train data loader and we'll grab the next one we'll grab the image and the label and we're going to print out here so batch size will now be one you can change the batch size if you like this is just again another way of getting familiar with the shapes of our data so image shape let's go image dot shape and we're going to right down here this shape is going to be batch size this is what our data loader is going to add to our images it's going to add a batch dimension color channels height width and then print let's check out that label shape same thing with the labels it's going to add a batch dimension label and let's see what happens oh we forgot the end of the bracket beautiful so we've got image shape our label shape is only one because we have a batch size of one and so now we've got batch size 1 color channels 3 height width and if we change this to 32 what do you think is going to happen we get a batch size of 32 still three color channels still 64 still 64 and now we have 32 labels so that means within each batch we have 32 images and we have 32 labels we could use this with a model i'm going to change this back to 1. and i think we've covered enough in terms of learning our data sets how cool is this we've come a long way we've downloaded a custom data set we've loaded it into a data set using image folder turned it into tensors using our data transform and now batchified our custom data set in data loaders we've used these with models before so if you wanted to you could go right ahead and build a convolutional neural network to try and find patterns in our image tensors but in the next video let's pretend we didn't have this data loader this image folder class available to us how could we load our image data set so that it's compatible like our image data set here how could we replicate this image folder class so that we could use it with a data loader because data load is part of torch utils.data you're going to see these everywhere let's pretend we didn't have the torch vision.dataset's image folder helper function and we'll see in the next video how we can replicate that functionality i'll see you there welcome back so over the past few videos we've been working out how to get our data from our data folder pizza steak and sushi we've got images of different food data here and we're trying to get it into tensor format so we've seen how to do that with an existing data loader helper function or a data set function in image folder however what if image folder didn't exist and we need to write our own custom data loading function now the premise of this is although it does exist it's going to be good practice because you might come across a case where you're trying to use a data set where a pre-built function doesn't exist so let's replicate the functionality of image folder by creating our own data loading class so we want a few things we want to be able to get the class names as a list from our loaded data and we want to be able to get our class names as a dictionary as well so the whole goal of this video is to start writing a function or a class that's capable of loading data from here into tensor format capable of being used with the pi torch's data loader class like we've done here so we want to create a data set let's start it off we're going to create another heading here this is going to be number five option two loading image data with a custom data set so we want a few functionality steps here number one is want to be able to load images from file to want to be able to get class names from the data set and three want to be able to get classes as dictionary from the data set and so let's briefly discuss the pros and cons of creating your own custom data set we saw option one was to use a pre-existing data set loader helping function from torch vision and it's going to be quite similar if we go torch vision data sets quite similar if you're using other domain libraries here that we're going to be data loading utilities but at the base level of pytorch is torch utils.data.dataset now this is the base dataset class so we want to build on top of this to create our own image folder loading class so what are the pros and cons of creating your own custom data set well let's discuss some pros so one pro would be you can create a data set out of almost anything as long as you write the right code to load it in and another pro is that you're not limited to high torch pre-built data set functions a couple of cons would be that even though this is to point number one so even though you could create a data set out of almost anything it doesn't mean that it will automatically work it will work and of course you can verify this through extensive testing seeing if your model actually works if it actually loads data in the way that you want it and another con is that using a custom data set requires us to write more code so often results in us writing more code which could be prone to errors or performance issues so typically if something makes it into the pie torch standard library or the pie touch domain libraries if functionality makes it into here it's generally been tested many many times and it can kind of be verified that it works quite well with well if you do use it it works quite well whereas if we write our own code sure we can test it ourselves but it hasn't got the robustness to begin with that is we could fix it over time as something that's included in say the pi torch standard library nonetheless it's important to be aware of how we could create such a custom data set so let's import a few things that we're going to use we'll import os because we're going to be working with python's file system over here we're going to import path lib because we're going to be working with file paths we'll import torch we don't need to again but i'm just doing this for completeness we're going to import image from pill the image class because we want to be opening images i'm going to import from torch utils.data i'm going to import dataset which is the base data set and as i said over here we can go to data sets click on torch utils data dot data set this is an abstract class representing a data set and you'll find that this data set links to itself so this is the base data set class many of the data sets in pi torch the pre-built functions subclass this so this is what we're going to be doing it and there's a few notes here all subclasses should overwrite get item and you should optionally overwrite len these two methods we're going to see this in a future video for now we're just we're just setting the scene here so from torch vision we're going to import transforms because we want to not only import our images but we want to transform them into tensors and from the pythons typing module i'm going to import tuple dict and list so we can put type hints when we create our class and loading functions wonderful so this is this is our instance of torch vision.dataset's image folder torch vision.dataset.imagefolder let's have a look at the train data so we want to write a function that can replicate getting the classes from a particular directory and also turning them into an index or dictionary that is so let's build a helper function to replicate this functionality here in other words i'd like to write a helper function that if we pass it in a file path such as pizza state sushi or this data folder it's going to go in here and it's going to return the class names as a list and it's also going to turn them into a dictionary because it's going to be helpful for later on when we'd like to access the classes and the class to idx and if we really want to completely recreate image folder well image folder has this functionality so we'd like that too so this is just a little high level overview of what we're going to be doing i might link in here that we're going to subclass this so all custom data sets in pi torch often subclass this so here's what we're going to be doing over the next few videos we want to be able to load images from a file now you could replace images with whatever data that you're working with the same premise will be here you want to be able to get the class names from the data set and want to be able to get classes as a dictionary from the data set so we're going to map our samples our image samples to their class name by just passing a file path to a function that we're about to write and some pros and cons of creating a custom data set we've been through that let's in the next video start coding up a helper function to retrieve these two things from our target directory in the last video we discussed the exciting concept of creating a custom data set and we wrote down a few things that we want to get we discussed some pros and cons and we learned that many custom data sets inherit from torch.utils.data.dataset so that's what we'll be doing later on in this video let's focus on writing a helper function to recreate this functionality so i'm going to title this 5.1 creating a helper function to get class names i'm going to turn this into markdown and if i go into here so we want a function to let's write down some steps and then we'll code it out so we'll get the class names we're going to use os.scander so it's going to scan a directory to traverse a target directory and ideally the directory is in standard image classification format so just like the image folder class our custom data class is going to require our data already be formatted in the standard image classification format such as train and test for training and test images and then images for a particular class are in a particular directory so let's keep going and number two what else do we want it to do we want it to raise an error if the class names aren't found so if this happens there might be we want this to to hint at the fact that there might be something wrong with the directory structure and number three we also want to turn the class names into our dict and a list and return them beautiful so let's get started let's set up the path directory for the target directory so our target directory is going to be what the directory we want to load directory if i could spell we want to load our data from let's start with the training dir just for an example so target directory what do we get so we're just going to use the training folder as an example to begin with and we'll go print target dir we'll put in the target directory just want to exemplify what we're doing and then we're going to get the class names from the target directory so i'll show you the functionality of os scan dir of course you could look this up in the python documentation so class names found let's set this to be sorted and then we'll get the entry name entry dot name for entry in list so we're going to get os list scanner of the image path slash target directory let's see what happens when we do this target directory have we got the right brackets here is this going to work let's find out oh image path slash target directory what do we get wrong oh we don't need the image path there let's put let's just put target directory there there we go beautiful so we set up our target directory as being the training dir and so if we just go let's just do list what happens if we just run this function here os scanner yeah so there we go so we have three directory entries so this is where we're getting entry.name for everything in the training directory so if we look in the training directory what do we have train and we have one entry for pizza one entry for sushi one entry for steak wonderful so now we have a a way to get a list of class names and we could quite easily turn this into a dictionary couldn't we which is exactly what we want to do we want to recreate this which we've done and we want to recreate this which is also done so now let's take this functionality here and let's turn that into a function all right what can we do what do we call this i'm going to call this def find classes and i'm going to say that it takes in a directory which is a string and it's going to return this is where i imported typing from python type i imported tuple and i'm going to return a list which is a list of strings and a dictionary which is strings mapped to integers beautiful so let's keep going we want to we want this function to return given a target directory we want it to return these two things so we've seen how we can get a list of the directories in a target directory by using os scander so let's write finds the classes the class folder names in a target directory beautiful and we know that it's going to return a list and a dictionary so let's do step number one we want to get the class names by scanning the target directory we'll go classes just we're going to replicate the functionality we've done about but for any given directory here so classes equals sorted entry dot name for entry in os scanda and we're going to pass it the target directory if entry dot is dirt we're just going to make sure it's a directory as well and so if we just return classes and see what happens so find classes let's pass it in our target directory which is our training directory what do we get beautiful so we need to also return class to idx so let's keep going so number two is let's go raise and error if class names could not be found so if not classes let's say raise file we're going to raise a file not found error and then let's just write in here f couldn't find any classes in directory so we're just writing some error checking code here so if we can't find a class list within our target directory we're going to erase this error and say couldn't find any classes in directory please check file structure and there's another check up here that's going to help us as well to check if the entry is a directory so finally let's do number three what do we want to do so we want to create a dictionary of index labels so computers why do we do this well computers prefer numbers rather than strings as labels so we can do this we've already got a list of classes so let's just create class to idx equals class name i for i class name in enumerate classes let's see what this looks like hey so we've got class names and then class to idx or we can just return it actually did we spell enumerate wrong yes we did so what this is going to do is going to map a class name to an integer or to i for i class name in enumerate classes so it's going to go through this and it's going to go for i so the first one 0 is going to be pizza ideally 1 will be steak 2 will be sushi let's see how this goes beautiful look at that we've just replicated the functionality of image folder so now we can use this helper function in our own custom data set find classes to traverse through a target directory such as train we could do the same for test if we wanted to too and that way we've got a list of classes and we've also got a dictionary mapping those classes to integers so now let's in the next video move towards subclassing torch utils.data.dataset and we're going to fully replicate image folder so i'll see you there in the last video we wrote a great helper function called find classes that takes in a target directory and returns a list of classes and a dictionary mapping those class names to an integer so let's move forward and this time we're going to create a custom data set to replicate image folder now we don't necessarily have to do this right because image folder already exists and if something already exists in the pi touch library chances are it's going to be tested well it's going to work efficiently and we should use it if we can but if we needed some custom functionality we can always build up our own custom data set by subclassing torch.utils.data.dataset or if a pre-built dataset function didn't exist well we're probably going to want to subclass torchutil's data.dataset anyway and if we go into the documentation here there's a few things that we need to keep in mind when we're creating our own custom data set all data sets that represent a map from keys to data samples so that's what we want to do we want to map keys in other words targets or labels to data samples which in our case are food images so we should subclass this class here now to note all subclasses should overwrite getitem so getitem is a method in python which is going to get an item or get a sample supporting fetching a data sample for a given key so for example if we wanted to get sample number 100 this is what get item should support it should return us sample number 100 and subclasses could also optionally override len which is the length of a data set so return the size of the data set by many sampler implementations and the default options of data loader because we want to use this custom data set with data loader later on so we should keep this in mind when we're building our own custom subclasses of torch utils data data set let's see this hands-on we're going to break it down it's going to be a fair bit of code but that's all right nothing that we can't handle so to create our own custom data set we want to number one first things first is we're going to sub class subclass torch dot utils.data.dataset two what do we want to do we want to init our subclass with a target directory so the directory we'd like to get data from as well as a transform if we'd like to transform our data so just like when we used image folder we could pass a transform to our data set so that we could transform the data that we were loading we want to do the same thing and we want to create several attributes let's write them down here we want paths which will be the paths of our images what else do we want we want transform which will be the transform we'd like to use we want classes which is going to be a list of the target classes and we want class to idx which is going to be a dict of the target classes mapped to integer labels now of course these attributes will differ depending on your data set but we're replicating image folder here so these are just some of the things that we've seen that come with image folder but regardless of what data set you're working with there are probably some things that you want across from universal you probably want all the paths of where your data is coming from the transforms you'd like to perform on your data what classes you're working with and a map of those classes to an index so let's keep pushing forward we want to create a function to load images because after all we want to open some images so this function will open an image number five we want to overwrite the len method to return the length of our data set so just like it said in the documentation if you subclass using torch.utils.data the data set you should overwrite get item and you should optionally overwrite len so we're gonna instead of optionally we are going to overwrite length and number six we want to overwrite the get item method to return a given sample when passed an index excellent so we've got a fair few steps here but if they don't make sense now it's okay let's code it out remember our motto if in doubt code it out and if and down run the code so we're going to write a custom data set this is so exciting because when you work with pre-built data sets it's pretty cool in machine learning but when you can write code to create your own data sets and that's well that's magic so number one is we're going to well number zero is we're going to import torch utils data set we don't have to rewrite this we've already imported it but we're going to do it anyway for completeness now step number one is to subclass it subclass torch utils data the data set so just like when we built a model we're going to sub class and then module but in this time we're going to call ours our class image folder custom and we're going to inherit from data set this means that all the functionality that's contained within torch utils data data set we're going to get for our own custom class number two let's initialize so we're going to initialize our custom data set there's a few things that we'd like a knit our subclass with the target directory the directory we'd like to get data from as well as a transform if we'd like to transform our data so let's write a knit function init and we're going to go self target and tag der is going to be a string and we're going to set a transform here we'll set it equal to none beautiful so this way we can pass in a target directory of images that we'd like to load and we can also pass in a transform just similar to the transforms that we've created previously so now we're up to number three which is create several attributes so let's see what this looks like create class attributes so we'll get all of the image paths so we can do this just like we've done before self paths equals list path lib.path because what's our target directory going to be well i'll give you a spoiler alert it's going to be a path like the test directory or it's going to be the train directory because we're going to use this once for our test directory and our train directory just like we use the original image folder so we're going to go through the target directory and find out all of the paths so this is getting all of the image paths that support or that follow the file name convention of star star dot jpg so if we have a look at this we passed in the test folder so test is the folder star would mean any of these one two three pizza state sushi that's the first star then slash we'd go into the pizza directory the star here would mean any of the file combinations here that end in jpg so this is getting us a list of all of the image paths within a target directory in other words within the test directory and within the train directory when we call these two separately so let's keep going we've got all of the image paths what else did we have to do we want to create transforms so let's set up transforms self dot transforms equals transform oh we'll just call that transform actually set up transform equals transform so we're going to get this from here and i put it as none because transform can be optional so let's create classes and class to idx attributes which is the next one on our list which is here classes and class to idx now lucky us in the previous video we created a function to return just those things so let's go self dot classes and self dot class to idx equals find classes and we're going to pass in the target dir or the target from here now what's next we've done step number three we need number four is create a function to load images all right let's see what this looks like so number four create a function to load images so let's call it load image and we're going to pass in self and we'll also pass in an index so the index of the image we'd like to load and this is going to return an image dot image so where does that come from well previously we imported from pill so we're going to use python image library or pillow to import our images so we're going to given a file path from here such as pizza we're going to import it with the image class and we can do that using i believe it's image.open so let's give that a try i'll just write a note in here opens an image via a path and returns it so let's write image path equals self this is why we got all of the image paths above so self.paths and we're going to index it on the index beautiful and then let's return image dot open image path so we're going to get a particular image path and then we're just going to open it so now we're up to step number five override the len method to return the length of our data set this is optional but we're going to do it anyway so overwrite len so this just wants to return how many samples we have in our data set so let's write that def len so if we call len on our data set instance it's going to return just how many numbers there are so let's write this down returns the total number of samples and this is just going to be simply return length or length of self.paths so for our target directory if it was the training directory we'd return the number of image paths that this code has found out here and same for the test directory so next we're going to go number six is we want to overwrite we put this up here the get item method so this is required if we want to subclass torch utils data data set so this is in the documentation here all subclasses should overwrite get items so we want get item to if we pass it an index to our data set we want it to return that particular item so let's see what this looks like override the get item method to return a particular sample and now this method is going to leverage get item all of the code that we've created above so this is going to go taking self which is the class itself and it's going to take in an index which will be of an integer and it's going to return a tuple of torch dot tensor and an integer which is the same thing that gets returned when we index on our training data so if we have a look image label equals train data zero get item is going to replicate this we pass it an index here let's check out the image and the label this is what we have to replicate so remember train data was created with image folder from torch vision.datasets and so we want our get item to return an image and a label which is a tuple of a torch tensor where the image is of a tensor here and the label is of an integer which is the label here the particular index as to which this image relates to so let's keep pushing forward i'm going to write down here returns one sample of data data and label x and we'll just go x y so we know that it's a tuple beautiful so let's set up the image what do we want the image to be well this is where we're going to call on our self.load image function which is what we've created up here do you see the customization capabilities of creating your own class so we've got a fair bit of code here right but essentially all we're doing is we're just creating functions that is going to help us load our images into some way shape or form now again i can't stress this enough regardless of the data that you're working on the pattern here will be quite similar you'll just have to change the different functions you use to load your data so let's load an image of a particular index so if we pass in an index here it's going to load in that image then what do we do well we want to get the class name which is going to be self dot paths and we'll get the index here and we can go parent dot name so this expects path in format data folder class name slash image dot jpg that's just something to be aware of and the class idx is going to be self dot class to idx and we will get the class name here so now we have an image by loading in the image here we have a class name by because our data is going to be or our data is currently in standard image classification format you may have to change this depending on the format your data is in we can get the class name from that and we can get the class idx by indexing on our attribute up here our dictionary of class names to indexes now we have one small little step this is transform if necessary so remember our transform parameter up here if we want to transform our target image well let's put in if self.transform if the transform exists let's pass the image through that transform transform image and then we're going to also return the class idx so do you notice how we've returned a tuple here this is going to be a torch tensor if our transform exists and the class idx is also going to be returned which is what we want here x and y which is what gets returned here image as a tensor label as an integer so return data label x y and then if the transform doesn't exist let's just return image class idx return untransformed image and label beautiful so that is a fair bit of code there so you can see the pro of subclassing torch utils data.dataset is that we could customize this in almost any way we wanted to to load whatever data that we're working with well almost any data however because we've written so much code this may be prone to errors which we're going to find out in the next video to see if it actually works but essentially all we've done is we've followed the documentation here torch.utilsdata.dataset to replicate the functionality of an existing data loader function namely image folder so if we scroll back up ideally if we've done it right we should be able to write code like this passing in a root directory such as a training directory a particular data transform and we should get very similar instances as image folder but using our own custom dataset class so let's try that out in the next video so now we've got a custom image folder class that replicates the functionality of the original image folder data loader class or data set class that is let's test it out let's see if it works on our own custom data so we're going to create a transform here so that we can transform our images raw jpeg images into tensors because that's the whole goal of importing data into pi torch so let's set up a train transforms compose we're going to set it to equal to transforms dot compose and i'm going to pass in a list here that it's going to be transformed we're going to resize it to 64.64 whatever the image size will reduce it down to 64.64 then we're going to go transforms dot random horizontal flip we don't need to necessarily flip them but we're going to do it anyway just to see if it works and then let's put in here transforms dot to tensor because our images are getting opened as a pill image using image.open but now we're using the to transform transform from pi torch or torch visions dot transforms so i'll just put this here from torch vision dot transforms that way you know we're importing transforms there and let's create one for the test data set as well test transforms we'll set this up oh excuse me i need to just go import transforms and let's go transforms.compose and we'll pass in another list we're going to do the exact same as above we'll set up resize and we'll set the size equal to 6464. and then transforms we're going to go dot to tensor we're going to skip the data augmentation for the test data because typically you don't manipulate your test data in terms of data augmentation you just convert it into a tensor rather than manipulate its orientation shape size etc etc so let's run this and now let's see how image folder custom class works test out image folder custom let's go we'll set up the train data custom is equal to image folder custom and then we'll set up the tag der which is equal to the training directory and then we'll pass in the transform which is equal to the train transforms which we just created above train transforms and then we're going to i think that's all we need actually we only had two parameters there we're not going to use a target transform because our labels we've got a helper function to transform our labels so test data custom is going to be image folder custom and i'm going to set up the target to be equal to the test directory and the transform is going to be the test transforms from the cell above there and what's collab telling me there oh i'm going to set that up did we spell something oh we spelled it wrong train transforms there we go beautiful now let's have a look at our train data and test data custom see if it worked what do we have oh we have an image folder custom well it doesn't give us as much rich information as just checking it out as it does for the train data but that's okay we can still inspect these so this is our original one made with image folder and we've got now train data custom and test data custom let's see if we can get some information from these so let's check the original length of the train data and see if we can use the len method on our train data custom did that work wonderful now how about we do it for the original test data made with image folder and our custom version made with test data or image folder custom beautiful that's exactly what we want and now let's have a look at the train data custom let's see if the classes attribute comes up dot classes and we'll just leave that there we'll do the class.idx yes it is so this attribute here is i wonder if we get information from google collab loading what do we get oh classes to idx classes load image paths transform so if we go back up here all these attributes are from here paths transform classes class to idx as well as load image so this is all coming from the code that we wrote our custom data set class so let's keep pushing forward let's have a look at the class to idx do we get the same as what we wanted before yes we do beautiful a dictionary containing our string names and their integer associations so let's now check for equality we can do this by going check for equality between original image folder data set and image folder custom data set now we've kind of already done that here but let's just try it out let's go print let's go train data custom.classes is that equal to train oh i don't want three equals train data the original one.classes and also print let's do test data custom dot classes is this equal to test data the original one classes true and true now you could try this out in fact it's a little exercise to try it out to compare the others but congratulations to us we have replicated the main functionality of the image folder data set class and so the takeaways from this is that whatever data you have pi torch gives you a base data set class to inherit from and then you can write a function or a class that somehow interacts with whatever data you're working with so in our case we load in an image and then you as long as you override the len method and the get item method and return some sort of values well you can create your own data set loading function how beautiful is that so that's going to help you work with your own custom data sets in pytorch so let's keep pushing forward we've seen analytically that our custom data set is quite similar to the original pi torch torch vision dot data sets image folder data set but you know what i like to do i like to visualize things so let's in the next video create a function to display some random images from our train data custom class it's time to follow the data explorer's motto of visualize visualize visualize so let's create another section i'm going to write here a title called create a function to display random images and sure we've we've had a look at the different attributes of our custom data set we see that it gives back a list of different class names we see that the lengths are similar to the original but there's nothing quite like visualizing some data so let's go in here we're going to write a function a helper function so step number one we need to take in a data set so one of the data sets that we just created whether it be train data custom or trained data and a number of other parameters such as class names and how many images to visualize and then step number two is to prevent the display getting out of hand let's cap the number of images to see at 10 because look if our data set is going to be thousands of images and we want to put in a number of images to look at let's just make sure it's the maximum is 10 that should be enough so we'll set the random seed for reproducibility number four is let's get a list of random samples so we want random sample indexes let's get rid of this s from what do we want it from from the target data set so we want to take in a data set and we want to cap the number of images we're seeing we want to set a random seed and do you see how much i use randomness here to really get an understanding of our data i really really really love harnessing the power of randomness so we want to get a random sample of indexes from all of our data set and then we're going to set up a matplotlib plot then we want to loop through the random sample images and plot them with matplotlib and then as a side to this one step seven is we need to make sure the dimensions of our images line up with matplotlib so matplotlib needs uh height width color channels all right let's take it on hey so number one is create a function to take in a data set so we're going to call this def let's call it def display random images going to be one of our helper functions we've created a few type of functions like this but let's take in a data set which is torch utils of type that is of type data set then we're going to take in classes which is going to be a list of different strings so this is going to be our class names for whichever data set we're using i'm going to set this equal to none and then we're going to take in n which is the number of images we'd like to plot and i'm going to set this to 10 by default so we can see 10 images at a time 10 random images that is do we want to display the shape let's set that equal to true so that we can display what the shape of the image is because we're we're passing it through our transform as it goes into a data set so we want to see what the shape of our images are just to make sure that that's okay and we can also let's set up a seed which is going to be an integer and we'll set that to none to begin with as well okay so step number two what do we have above we have to prevent the display getting out of hand let's cap the number of images to see at 10 so we've got n is by default it's going to be 10 but let's just make sure that it stays there adjust display if n is too high so if n is greater than 10 let's just readjust this let's set n equal to 10 and display shape we'll turn off the display shape because if we have 10 images our display may get out of hand so just print out here for display purposes n shouldn't be larger than 10 setting to 10 and removing shape display now i only know this because i've had experience cooking this dish before in other words i've i've written this this type of code before so you can customize the beautiful thing about python and pytorch is you can customize these display functions in any way you see fit so step number three what are we doing set the random seed for reproducibility okay set the seed so if seed let's set random dot seed equal to that seed value and then we can keep going so number four is let's get some random sample indexes so we can do that by going get random sample indexes which is step number four here so we've got a target data set that we want to inspect we want to get some random samples from that so let's create a random samples idx list and i'm going to randomly sample from a length of our data set or sorry a range of the length of our data set and i'll show you what this means in a second and the k excuse me have we got enough brackets there i always get confused with the brackets the k is going to be m so in this case i want to randomly sample 10 images from the length of our data set or 10 indexes so let's just have a look at what this looks like we'll put in here our train data custom here so this is going to take a range of the length of our train data custom which is what 225 we looked at that before just up here length of this so between 0 and 255 we're going to get 10 indexes if we've done this correctly beautiful so there's 10 random samples from our trained data custom or 10 random indexes that is so we're up to step number five which was loop through the random sample images or indexes let's create this to indexes indexes and plot them with matplotlib so this is going to give us a list here so let's go loop through random indexes and plot them with matplotlib beautiful so for i tag sample in enumerate let's enumerate through the random random samples idx list and then we're going to go tag image and tag label because all of the samples in our target data set are in the form of tuples so we're going to get the target image and the target label which is going to be data set targ sample we'll take the index so it might be one of these values here we'll index on that and the zeroth index will be the image and then we'll go on the data set as well we'll take the tag sample index and then the index number one will be the label of our target sample and then number seven oh excuse me we've missed a step that should be number six did you catch that number five is set up plot so we can do this quite easily by going plot figure this is so that each time we we iterate through another sample we're going to have quite a big figure here so we set up the plot outside the loop so that we can add a plot to this original plot here and now this is number seven where we make sure the dimensions of our images line up with matplotlib so if we recall by default pi touch is going to turn our image dimensions into what color channels first however matplotlib prefers color channels last so let's go adjust tensor dimensions for plotting so let's go tag image let's call this tag image adjust equals tag image dot commute and we're going to alter the order of the indexes so this is going to go from color channels or the dimensions that is height width and we're going to change this width if i could spell to height with color channels beautiful that one will probably catch you off guard a few a few times but we've seen it a couple of times now so i'm going to keep going with this plot adjusted samples so now we can add a subplot to our matplotlib plot and we want to create we want one row of n images this will make a lot more sense once we visualize it and then for the index we're going to keep track of i plus 1. so let's keep going so then we're going to go plot in show and i'm going to go targ image adjust so i'm going to plot this image here and then let's turn off the axis and we can go if the classes variable exists which is up here a list of classes let's adjust the title of the plot to be the particular index in the class list so title equals f class and then we're going to put in here classes and we're going to index on that with the target label index which is going to come from here because that's going to be a numerical format and then if display shape let's set the title equal to title plus f we're going to go new line shape this is going to be the shape of the image tag image adjust dot shape and then we'll set the title to plt dot title so you see how if we have display shape we're just adjusting the title variable that we created here and then we're putting the title onto the plot so let's see how this goes that is quite a beautiful function let's pass in one of our data sets and see what it looks like let's plot some random images so which one should we start with first so let's display random images from the image folder created data set so this is the inbuilt pi torch image folder let's go display random images the function we just created above we're going to pass in the train data and then we can pass in the number of images let's have a look at five and the classes is going to be the class names which is just a list of our different class names and then we can set the seed we want it to be random so we just set the seed to equal none doesn't that look good so this is from our original train data made with image folder so option number one up here option one there we go and we've passed in the class name so this is sushi resize to 64 64 3 same with all of the others but from different classes let's set the seed to 42 see what happens i get these images we got a sushi we got a pizza we got pizza sushi pizza and then if we try a different one we just go num we get random images again wonderful now let's write the same code but this time using our train data custom data set so display random images from the image folder custom data set so this is the one that we created display random images i'm going to pass in train data custom our own data set oh this is exciting let's set n equal to 10 and just see see how far we can go with with our plot or maybe we set it to 20 and just see if our code for adjusting the plot makes sense class names and seed equals i'm going to put in 42 this time there we go for display purposes n shouldn't be larger than 10 setting to 10 and removing shape display so we have a stake image a pizza image pizza steak pizza pizza pizza pizza steak pizza if we turn off the random seed we should get another 10 random images here beautiful look at that steak steak sushi pizza steak sushi class i'm reading out the different things here pizza pizza pizza pizza okay so it looks like our custom data set is working from both a qualitative standpoint looking at the different images and a quantitative how about we change it to five and see what it looks like do we have a different shape yes we do the same shape as above wonderful okay so we've got train data custom and we've got train data which is made from image folder but the premises remain we've built up a lot of different ideas and we're looking at things from different points of view we are getting our data from the folder structure here into tensor format so there's still one more step that we have to do and let's go from data set to data loader so in the next video let's see how we can turn our custom loaded images train data custom and test data custom into data loaders so you might want to go ahead and give that a try yourself we've done it before up here turn loaded images into data loaders we're going to replicate the same thing as we did in here for our option number two except this time we'll be using our custom data set i'll see you in the next video aren't they some good looking images and even better that they're from our own custom data set now we've got one more step we're going to turn our data set into a data loader in other words we're going to batchify all of our images so they can be used with the model and i gave you the challenge of trying this out yourself in the last video so i hope you gave that a go but let's see what that might look like in here so i'm going to go 5.4 let's go what should we call this so turn custom loaded images into data loaders so this is just a goes to show that we can write our own custom data set class and we can still use it with pytorch's data loader so let's go from utils torch.utils that is utils.data import data loader we'll get that in here we don't need to do that again but i'm just doing it for completeness so we're going to set this to train data loader custom and i'm going to create an instance of data loader here and then inside i'm going to pass the data set which is going to be train data custom i'm just going to set a universal parameter here in capitals for batch size equals 32 because we can come down here we can set the batch size we're going to set this equal to 32 or in other words the batch size parameter we set up there we can set the number of workers here as well if you set to zero let's go see what the default is actually torch utils data loader what's the default for number of workers zero okay beautiful and recall that number of workers is going to set how many cores loads your data with a data loader and generally higher is better but you can also experiment with this value and see what value suits your your model and your hardware the best so just keep in mind that number of workers is going to alter how much compute your hardware that you're running your code on uses to load your data so by default it's set to zero and then we're going to shuffle the training data wonderful and let's do the same for the test data loader we'll create test data loader custom and i'm going to create a new instance so let me make a few code cells here of data loader i'm going to create a data set or pass in the data set parameter as the test data custom so again these data sets are what we've created using our own custom data set class we're going gonna set the batch size equal to batch size and let's set the num workers equal to zero in a previous video we've also set it to cpu count you can also set it to one you can hard code it to four all depends on what hardware you're using i like to use opa os dot cpu count and then we're not going to shuffle the test data false beautiful and let's have a look at what we get here train data loader custom and test data loader custom and actually i'm just going to reset this instead of being os cpu count i'm going to put it back to zero just so we've got it in line with the one above and of course num workers we could also set this num workers equals zero or os dot cpu count and then we could come down here and set this as num workers and num workers and let's have a look to see if it works beautiful so we've got two instances of utils.data.dataloader now let's just get a single sample from the train data loader here just to make sure the image shape and batch size is correct get image and label from custom data loader we want image custom and i'm going to go label custom equals next and i'm going to ita over the train data loader custom and then let's go print out the shapes we want image custom dot shape and label custom do we get a shape here beautiful there we go so we have shape here of 32 because that is our batch size then we have three color channels 6464 which is in line with what which is in line with our transform that we set all the way up here transform we transform our image you may want to change that to something different depending on the model you're using depending on how much data you want to be comprised within your image recall generally a larger image size encodes more information and this is all coming from our original image folder custom dataset class so look at us go and i mean this is a lot of code here or a fair bit of code right but you could think of this as like you write it once and then if your data set continues to be in this format well you can use this over and over again so you might put this this uh image folder custom into a helper function file over here such as dataset.pi or something like that and then you could call it in future code instead of rewriting it all the time and so that's just exactly what pytorch has done with tortvision.datasets.imagefolder so we've got some shapes here and if we wanted to change the batch size what do we do we just change it like that 64. remember a good batch size is also a multiple of eight because that's going to help out computing and batch size equals one we get a batch size equal of one we've been through a fair bit but we've covered a very important thing and that is loading your own data with a custom data set so generally you will be able to load your own data with an existing data loading function or data set function from one of the torch domain libraries such as torch audio torch text touch vision torch rec and later on when it's out of beta torch data but if you need to create your own custom one well you can subclass torch.utils.data.dataset and then add your own functionality to it so let's keep pushing forward previously we touched a little bit on transforming data and you may have heard me say that torch vision transforms can be used for data augmentation and if you haven't that is what the documentation says here but data augmentation is manipulating our images in some way shape or form so that we can artificially increase the diversity of our training data set so let's have a look at that more in the next video i'll see you there over the last few videos we've created functions and classes to load in our own custom data set and we learned that one of the biggest steps in loading a custom data set is transforming your data particularly turning your target data into tensors and we also had a brief look at the torch vision transforms module and we saw that there's a fair few different ways that we can transform our data and that one of the ways that we can transform our image data is through augmentation and so if we went into the illustration of transforms let's have a look at all the different ways we can do it we've got resize it's going to change the size of the original image we've got center crop which will crop we've got five crop we've got grey scale we've got random transforms we've got gaussian blur we've got random rotation random mafine random crop we could keep going and in fact i'd encourage you to check out all of the different options here but oh there's auto augment wonderful there's random augment this is what i was hinting at data augmentation do you notice how the original image gets augmented in different ways here so it gets artificially changed so it gets rotated a little here it gets darkened a little here or maybe brightened depending how you look at it it gets shifted up here and then the the colors kind of change here and so this process is known as data augmentation as we've hinted at and we're going to create another section here which is number six other forms of transforms this is data augmentation so how could you find out about what data augmentation is well you could go here what is data augmentation and i'm sure there's going to be plenty of resources here wikipedia there we go data augmentation in data analysis are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data so i'm going to write down here data augmentation is the process of artificially adding diversity to your training data now in the case of image data this may mean applying various image transformations to the training images and we saw a whole bunch of those in the torch vision transforms package but now let's have a look at one type of data augmentation in particular and that is trivial augment but just to illustrate this i've got a slide here ready to go we've got what is data augmentation and it's looking at the same image but from different perspectives and we do this as i said to artificially increase the diversity of a data set so if we imagine our original image is over here on the left and then if we wanted to rotate it we could apply a rotation transform and then if we wanted to shift it on the vertical and the horizontal axis we could apply a shift transform and if we wanted to zoom in on the image we could apply a zoom transform and there are many different types of transforms as i've got a note here there are many different kinds of data augmentation such as cropping replacing shearing and this slide only demonstrates a few but i'd like to highlight another type of data augmentation and that is one used to recently train pytorch torch vision image models to state-of-the-art levels so let's take a look at one particular type of data augmentation used to train high torch vision models to state-of-the-art levels now just in case you're not sure why we might do this we would like to increase the diversity of our training data so that our images become harder for our model to learn or it gets a chance to view the same image from different perspectives so that when you use your image classification model in practice it's seen the same sort of images but from many different angles so hopefully it learns patterns that are generalizable to those different angles so this practice hopefully results in a model that's more generalizable to unseen data and so if we go to torch vision state of the art here we go so this is a recent blog post by the pie torch team how to train state-of-the-art models which is what we want to do state of the art means best in the business otherwise known as sota you might see this acronym quite often using torch vision's latest primitives so torch vision is the package that we've been using to work with vision data and torch vision has a bunch of primitives which are in other words functions that help us train really good performing models so blog post here and if we jump into this blog post and if we scroll down we've got some improvements here so there's an original resnet50 model resnet50 is a common computer vision architecture so accuracy at one so what do we have well let's just say they get a boost in what the previous results were so if we scroll down there is a type of data augmentation here so if we add up all of the improvements that they used so there's a whole bunch here now as your extracurricular i'd encourage you to look at what the improvements are you're not going to get them all the first go but that's all right blog posts like this come out all the time and the recipes are continually changing so even though i'm showing you this now this may change in the future so i just scrolled down to see if this table showed us what the previous results were doesn't look like it does oh no there's the baseline so 76 and with all these little additions it got right up to nearly 81 so nearly a boost of five percent accuracy that's pretty good so what we're going to have a look at is trivial augment so there's a bunch of different things such as learning rate optimization training for longer so these are ways you can improve your model random erasing of image data label smoothing you can add that as a parameter to your loss functions such as cross entropy loss mix up and cut mix weight decay tuning fixed res mitigations exponential moving average which is ema inference resize tuning so there's a whole bunch of different recipe items here but we're going to focus on one we're going to break it down let's have a look at trivial augment so we'll come in here let's look at trivial augment so if we wanted to look at trivial augment can we find it in here oh yes we can it's right here trivial augment so as you'll see if you pass an image into trivial augment it's going to change it in a few different ways so if we go into here let's write that down so let's see this in action on some of our own data so we'll import from torch vision import transforms and we're going to create a train transform which is equal to transforms dot compose we'll pass it in there and this is going to be very similar to what we've done before in terms of composing a transform what do we want to do well let's say we wanted to resize one of our images or an image going through this transform let's change its size to two two four two two four which is a common size in image classification and then it's going to go through transforms we're going to pass in trivial augment wide and there's a parameter here which is number of magnitude bins which is basically a number from 0 to 31 31 being the max of how intense you want the augmentation to happen so say we we only put this as 5 our augmentation would be of intensity from 0 to 5 and so in that case the maximum wouldn't be too intense so if we put it to 31 it's going to be the max intensity and what i mean by intensity is say this rotation if we go on a scale of 0 to 31 this may be a 10 whereas 31 would be completely rotating and same with all these others right so the lower this number the less the maximum upper bound of the applied transform will be then if we go transforms dot to tensor wonderful so there we've just implemented trivial augment how beautiful is that that is from the pytorch torch vision transforms library we've got trivial augment wide and it was used trivial augment to train the latest state-of-the-art vision models in the pi torch torch vision models library or models repository and if you wanted to look up trivial augment how could you find that you could search it here is the paper if you'd like to read it how it's implemented it's actually a very very i would say let's just say trivial augment i didn't want to say simple because i don't want to downplay it trivial augment leverages the power of randomness quite beautifully so i'll let you read more on there i would rather try it out on our data and visualize it first test transform let's go transforms compose and you might have the question of which transforms should i use with my data well that's the million dollar question right that's the same thing as asking which model should i use for my data there's a fair few different answers there and my best answer will be try out a few see what work for other people like we've done here by finding that trivial augment worked well for the pie torch team and try that on your own problems if it works well excellent if it doesn't work well well you can always oh excuse me we've got a spelling mistake if it doesn't work well well you can always set up an experiment to try something else so let's test out our augmentation pipeline so we'll get all the image paths we've already done this but we're going to do it anyway again just to reiterate we've covered a fair bit here so i might just rehash on a few things i'm going to get list image path which is our let me just show you our image path we just want to get all of the images within this file so we'll go imagepath.glob glob together all the files and folders that match this pattern and then if we check what do we get we'll check the first ten beautiful and then we can leverage our function from before to plot some random images plot random images we'll pass in or plot transformed random transformed images that's what we want let's see what it looks like when it goes through our trivial augment so image paths equals image path list this is a function that we've created before by the way transform equals train transform which is the transform we just created above that contains trivial augment and then we're going to put n equals three for five images and we'll do c it equals none to plot or sorry n equals three for three images not five beautiful and we'll set the seed equals none by the way so look at this we've got class pizza now trivial augment it resize this now i'm not quite sure what it did to transform it per se maybe it got a little bit darker this one looks like it's been the colors have been manipulated in some way shape or form and this one looks like it's been resized and not too much has happened to that one from my perspective so if we go again let's have a look at another three images so trivial augment works and what i said before it harnesses the power of randomness it kind of selects randomly from all of these other augmentation types and applies them at some level of intensity so all of these ones here trivial augments just going to select some at random and then apply them at some random intensity from 0 to 31 because that's what we've set on our data and of course you can read a little bit more in the documentation or sorry in the paper here but i like to see it happening so this one looks like it's been cut off over here a little bit this one again the colors have been changed in some way shape or form this one's been darkened and so do you see how we're artificially adding diversity to our training data set so instead of all of our images being this one perspective like this we're adding a bunch of different angles and telling our model hey you got to try and still learn these patterns even if they've been manipulated so we'll try one more of these so look at that one that's pretty manipulated there isn't it but it's still an image of steak so that's what we're trying to get our model to do is still recognize this image as an image of stake even though it's been manipulated a bit now will this work or not hey it might it might not but that's all the nature of experimentation is so play around i would encourage you to go in the transforms documentation like we've just done illustrations change this one out trivial augment wide for another type of augmentation that you can find in here and see what it does to some of our images randomly i've just highlighted trivial augment because it's what the pytorch team have used in their most recent blog post for their training recipe to train state-of-the-art vision models so speaking of training models let's uh let's move forward and we've got to build our first model for this section i'll see you in the next video welcome back in the last video we covered how the pytorch team used trivial augment wide which is the latest state of the art in data augmentation at the time of recording this video to train their latest state-of-the-art computer vision models that are within torch vision and we saw how easily we could apply trivial augment thanks to torchvision.transforms and we'll just see one more of those in action just to highlight what's going on so it doesn't look like much happened to that image when we augment it but we see this one has been moved over we've got some black space there this one has been rotated a little and now we've got some black space there but now it's time for us to build our first computer vision model on our own custom data set so let's get started we're going to go model 0 we're going to reuse the tiny vgg architecture which we covered in the computer vision section and the first experiment that we're going to do we're going to build a baseline which is what we do with model 0 we're going to build it without data augmentation so rather than use trivial augment which we've got up here which is what the pytorch team used to train their state-of-the-art computer vision models we're going to start by training a computer vision model without data augmentation and then so later on we can try one to see with data augmentation to see if it helps or doesn't so let me just put a link in here cnn explainer this is the model architecture that we covered in depth in the last section so we're not going to go spend too much time here all you have to know is that we're going to have an input of 64 64 3 into multiple different layers such as convolutional layers relu layers max pool layers and then we're going to have some output layer that suits the number of classes that we have in this case there's 10 different classes but in our case we have three different classes one for pizza steak and sushi so let's replicate the tiny vgg architecture from the cnn explainer website this is going to be good practice right we're not going to spend too much time referencing their architecture we're going to spend more time coding here but of course before we can train a model what do we have to do well let's go 7.1 we're going to create some transforms and loading data we're going to load data for model 0. now we could of course use some of the variables that we already have loaded but we're going to recreate them just to practice so let's create a simple transform and what is our whole premise of loading data for model zero we want to get our data from the data folder from pizza state sushi from the training and test folders from their respective folders we want to load these images and turn them into tenses now we've done this a few times now and one of the ways that we can do that is by creating a transform equals transforms dot compose and we're going to pass in let's resize it so transforms dot resize we're going to resize our images to be the same size as the tiny vgg architecture on the cnn explainer website 64 64 3 and then we're also going to pass in another transform to tensor so that our images get resized to 6464 and then they get converted into tensors and particularly these values within that tensor are going to be between 0 and 1. so there's our transform now we're going to load some data if you want to pause the video here and try to load it yourself i'd encourage you to try out option one loading image data using the image folder class and then turn that data set that image folder data set into a data loader so batchify it so that we can use it with a pi torch model so give that a shot otherwise let's go ahead and do that together so one we're going to load and transform data we've done this before but let's just rehash on it what we do so from torch vision import data sets then we're going to create the train data simple and i call this simple because we're going to use at first a simple transform one with no data augmentation and then later on for another modeling experiment we're going to create another transform one with data augmentation so let's put this here datasets image folder and let's go the root equals the training directory and then the transform is going to be what it's going to be our simple transform that we've got above and then we can put in test data simple here and we're going to create data sets dot folder and then we're going to pass in the root as the test directory and we'll pass in the transform is going to be the simple transform again above so we're performing the same transformation here on our training data and on our testing data then what's the next step we can do here well we can to turn the data sets into data loaders so let's try it out first we're going to import os then from torch.utils.data we're going to import data loader and then we're going to set up batch size and number of workers so let's go batch size we're going to use a batch size of 32 for our first model num workers which will be the number of oh excuse me i've got a typo up here classic number of workers which will be the what the number of cpu cores that we dedicate towards loading our data so let's now create the data loaders we're going to create train data loader simple which will be equal to data loader and the data set that goes in here will be train data simple then we can set the batch size equal to the batch size parameter that we just created or hyper parameter that is recall the hyper parameter is something that you can set yourself we would like to shuffle the training data and we're going to set num workers equal to num workers so in our case how many calls does google collab have let's just run this find out how many num workers there are i think there's going to be two cpus wonderful and then we're going to do the same thing for the test data loader test data loader simple we're going to go data loader we'll pass in the data set here which is going to be the test data simple and then we're going to go batch size equals batch size we're not going to shuffle the test data set and then the num workers will just set it to the same thing as we've got above beautiful so i hope you gave that a shot but now do you see how quickly we can get our data loaded if it's in the right format i know we spent a lot of time going through all of these steps over multiple videos and writing lots of code but this is how quickly we can get set up to load our data we create a simple transform and then we load in and transform our data at the same time and then we turn the data sets into data loaders just like this now we're ready to use these data loaders with a model so speaking of models how about we build the tiny vgg architecture in the next video and in fact we've already done this in notebook number three so if you want to refer back to the model that we built there right down here which was model number two if you want to refer back to this section and give it a go yourself i'd encourage you to do so otherwise we'll build the tiny vgg architecture in the next video welcome back in the last video we got set up starting to get ready to model our first custom data set and i usually do the challenge to try and replicate the tiny vgg architecture from the cnn explainer website which we covered in notebook number three but now let's see how fast we can do that together hey i'm gonna write down here section 7.2 and i know we've already coded this up before but it's good practice to see what it's like to build pie touch models from scratch create tiny vgg model class so the model is going to come from here previously we created our model there would have been one big change from the model that we created in section number three which is that our model in section number three used black and white images but now the images that we have are going to be color images so there's going to be three color channels rather than one and there might be a little bit of a trick that we have to do to find out the shape later on in the classifier layer but let's get started we've got class tiny vgg we're going to inherit from nn.module this is going to be the model architecture copying tiny vgg from cnn explainer and remember that it's a it's quite a common practice in machine learning to find a model that works for a problem similar to yours and then copy it and try it on your own problem so i only want two underscores there we're going to initialize our class we're going to give it an input shape which will be an int we're going to say how many hidden units do we want which will also be an int and we're going to have an output shape which will be an int as well and it's going to return something none of type none then if we go down here we can initialize it with super dot underscore init beautiful and now let's create the first com block so com block one which recalled will be this section of layers here so com block one let's do an nn dot sequential to do so now we need conv relu conv relu max pool so let's try this out and then com2d the in channels is going to be the input shape of our model the input shape parameter the out channels is going to be the number of hidden units we have which is from oh i might just put enter down here input shape hidden units we're just getting those two there let's set the kernel size to three which will be how big the convolving window will be over our image data there's a stride of one and the padding equals one as well so these are the similar parameters to what the cnn explainer website uses and we're going to go nnn relu and then we're going to go nn com 2d and i want to stress that even if someone else uses like certain values for these you don't have to copy them exactly so just keep that in mind you can try out various values of these these are all hyper parameters that you can set yourself hidden units out channels equals hidden units as well then we're going to go kernel size equals three stride equals one and we're going to put padding equals one as well then we're going to have another relu layer and i believe i've forgotten a comma up here another relu layer here and we're going to finish it off with nn dot max pool 2d and we're going to put in the kernel size equals 2 and the stride here equals 2. wonderful so oh by the way for max 42d the default stride value is same as the kernel size so let's have a go here what can we do now well we could just replicate this block as block two so how about we copy this down here we've already had enough practice writing this sort of code so we're going to go com block 2 but we need to change the input shape here the input shape of this block 2 is going to receive the output shape here so we need to line those up this is going to be hidden units hidden units and i believe that's all we need to change there beautiful so let's create the classifier layer and the classifier layer recall is going to be this output layer here so we need at some point to add a linear layer that's going to have a number of outputs equal to the number of classes that we're working with and in this case the number of classes is 10 but in our case our custom data set we have three classes pizza steak sushi so let's create a classifier layer which will be an n sequential and then we're going to pass in nn.flatten to turn the outputs of our convolutional blocks into feature vector into a feature vector sorry and then we're going to have nn.linear and the in features do you remember my trick for calculating the shape for n features i'm going to put hidden units here for the time being out features is going to be output shape so i put hidden units here for the time being because we don't quite yet know what the output shape of all of these operations is going to be of course we could calculate them by hand by looking up the formula for input and output shapes of convolutional layers so the input and output shapes are here but i prefer to just do it programmatically and let the errors tell me where i'm wrong so we can do that by doing a forward pass and speaking of a forward pass let's create a forward method because every time we have to subclass nn.module we have to override the forward method we've done this a few times but as you can see i'm picking up the pace a little bit because you've got this so let's pass in the com block one we're going to go x then we're going to print out x dot shape and then we're going to reassign x to be self.com block 2 so we're passing it through our second block of convolutional layers print x dot shape to check the shape here now this is where our model will probably error is because the input shape here isn't going to line up in features hidden units because we've passed all of the output of what's going through conf block 1 comp block 2 to a flattened layer because we want a feature vector to go into our nn.linear layer our output layer which has an out features size of output shape and then we're going to return x so i'm going to print x dot shape here and i just want to let you in on one little secret as well we haven't covered this before but we could rewrite this entire ford method this entire stack of code by going return self.classifier and then going from the outside in so we could pass in com block 2 here column block 2 and then self conv block one and then x on the inside so that is essentially the exact same thing as what we've done here except this is going to benefits from operator fusion now this topic is beyond the scope of this course essentially all you need to know is that operator fusion behind the scenes speeds up how your gpu performs computations so all of these are going to happen in one step rather than here we are reassigning x every time we make a computation through these layers so we're spending time going from computation back to memory computation back to memory whereas this kind of just chunks it all together in one hit if you'd like to read more about this i'd encourage you to look up the blog post how to make your gpus go burr from first principles and bur means fast that's why i love this post right because it's half satire half legitimately like gpu computer science so if you go in here yeah here's what we want to avoid we want to avoid all of this transportation between memory and compute and then if we look in here we might have operator fusion there we go this is operator fusion the most important optimization in deep learning compilers so i will link this making deep learning go bir from first principles by horace hair are great blog posts that i really like right here so if you'd like to read more on that it's also going to be in the extracurricular section of the course so don't worry it'll be there now we've got a model oh where do we where did we forget a comma right here of course we did and we've got another we forgot another comma up here did you notice these beautiful okay so now we can create our model by going torch or an instance of the tiny vgg to see if our model holds up let's create model 0 equals tiny vgg and i'm going to pass in the input shape what is the input shape it's going to be the number of color channels of our image so number of color channels in our image data which is three because we have color images and then we're going to put in hidden units equals 10 which will be the same number of hidden units as the tiny vgg architecture 1 2 3 4 5 6 7 8 9 10. again we could put in 10 we could put in 100 we could put in 64 which is a good multiple of eight so let's just leave it at 10 for now and then the output shape is going to be what it's going to be the length of our class names because we want one hidden unit or one output unit per class and then we're going to send it to the target device which is of course cuda and then we can check out our model 0 here beautiful so that took a few seconds as you saw there to move to the gpu memory so that's just something to keep in mind for when you build large neural networks and you want to speed up their computation is to use operator fusion where you can because as you saw it took a few seconds for our model to just move from the cpu which is the default to the gpu so we've got our architecture here but of course we know that this potentially is wrong and how would we find that out well we could find the right hidden units shape or we could find that it's wrong by passing some dummy data through our model so that's one of my favorite ways to troubleshoot a model let's in the next video pass some dummy data through our model and see if we've implemented the forward pass correctly and also check the input and output shapes of each of our layers i'll see you there in the last video we replicated the tiny vgg architecture from the cnn explainer website very similar to the model that we built in section 03 but this time we're using color images instead of grayscale images and we did it quite a bit faster than what we previously did because we've already covered it right and you've had some experience now building pytorch models from scratch so we're going to pick up the pace when we build our models but let's now go and try a dummy forward pass to check that our forward method is working correctly and that our input and output shapes are correct so let's create a new heading try a forward pass on a single image and this is one of my favorite ways to test the model so let's first get a single image get a single image we want an image batch maybe we'll get an image batch get a single image batch because we've got images that are batches already image batch and then we'll get a label batch and we'll go next iter train data loader simple that's the data loader that we're working with for now and then we'll check imagebatch.shape and labelbatch dot shape wonderful and now let's see what happens try a forward pass oh i spelt single wrong up here try a forward pass we could try this on a single image trying it on the same batch will will result in similar results so let's go model 0 and we're just going to pass it in the image batch and see what happens oh no of course we get that input type torch float tensor and weight type torch cudaflow tensor should be the same or input should be so we've got tensors on a different device right so this is on the cpu the image batch whereas our model is of course on the target device so we've seen this error a number of times let's see if this fixes it oh we get another error and we kind of expected this type of error we've got runtime error amount 1 and mat 2 shapes cannot be multiplied 32 so that looks like the batch size 2 5 6 0 and 10. what is 10 well recall that 10 is the number of hidden units that we have so this is the size here that's 10 there so it's trying to multiply a matrix of this size by this size so 10 has got something going on with it we need to get these two numbers the middle numbers to satisfy the rules of matrix multiplication because that's what happens in our linear layer we need to get these two numbers the same and so our hint and my trick is to look at the previous layer so if that's our batch size where does this value come from well could it be the fact that a tensor of this size goes through the flattened layer recall that we have this layer up here so we've printed out the the shape here of the comp block the output of conf block one now this shape here is the output of convbot2 so we've got this number the output of convloc 1 and then the output of com block2 so that must be the input to our classifier layer so if we go 10 times 16 times 16 what do we get 2560 beautiful so we can multiply our hidden units 10 by 16 by 16 which is the the shape here and we get 2 5 6 0 let's see if that works we'll go up here times 16 times 16 and let's see what happens we'll rerun the model we'll rerun the image batch and then we'll pass it look at that our model works well the shapes at least line up we don't know if it works yet we haven't started training yet but this is the output size we've got the output it's on the cuda device of course but we've got 32 samples with three numbers in each now these are going to be as good as random because we haven't trained our model yet we've only initialized it here with random weights so we've got 32 or a batch worth of random predictions on 32 images so you see how the output shape here 3 corresponds to the output shape we set up here output shape equals length class names which is exactly the number of classes that we're dealing with but i think our number is a little bit different to what's in the cnn explainer 16 16 how did they end up with 13 13 you know what i think we got one of these numbers wrong kernel size stride padding let's have a look jump into here if we wanted to truly replicate it is there any padding here i actually don't think there's any padding here so what if we go back here and see if we can change this to zero and change this to zero zero i'm not sure if this will work by the way if it doesn't it's not too bad but we're just trying to line up the shapes with the cnn explainer to truly replicate it so the output of the com block 1 should be 30 30 10 what are we working with at the moment we've got 32 32 10. so let's see if removing the padding from our convolutional layers lines our shape up with the cnn explainer so i'm going to rerun this rerun our model i've set the padding to zero on all of our padding hyper parameters oh and we get another error we get another shape error of course we do because we've now got different shapes wow do you see how how often that these errors come up trust me i spend a lot of time troubleshooting these shape errors so we now have to line up these shapes so we've got 13 13 10. now does that equal 1690 let's try it out 13 13 10 16 90 beautiful and do our shapes line up with the cnn explainer so we've got 30 30 10 remember these are in pi torch so color channels first whereas this is color channels last so yep we've got the output of our first comp block is lining up here that's correct and then same with the second block how good is that we've officially replicated the cnn explainer model so we can take this value 13 13 10 and bring it back up here 13 13 10 remember hidden units is 10 so we're just going to multiply it by 13 13. you could calculate these shapes by hand but my trick is i like to let the error codes give me a hint of where to go and boom there we go we get it working again some shape troubleshooting on the fly so now we've done a single forward pass on the model we can kind of verify that our data at least flows through it what's next well i'd like to show you another little package that i like to use to also have a look at the input and output shapes of my model and that is called torch info so you might want to give this a shot before we go into the next video but in the next video we're going to see how we can use torch info to print out a summary of our model so we're going to get something like this so this is how beautifully easy torch info is to use so give that a shot install it into google colab and run it in a cell here see if you can get something similar to this output for our model zero and i'll see you in the next video we'll try that together in the last video we checked our model by doing a forward pass on a single batch and we learned that our forward method so far looks like it's intact and that we don't get any shape errors as our data moves through the model but i'd like to introduce to you one of my favorite packages for finding out information from a pie torch model and that is torch info so let's use torch info to get an idea of the shapes going through our model so you know how much i love doing things in a programmatic way well that's what torture info does before we used print statements to find out the different shapes going through our model and i'm just going to comment these out in our ford method so that when we run this later on during training we don't get excessive printouts of all the shapes so let's see what torch info does and in the last video i issued a challenge to give it a go it's quite straightforward of how to use it but let's see it together this is the type of output we're looking for from our tiny vgg model and of course you could get this type of output from almost any pytorch model but we have to install it first and as far as i know google colab doesn't come with torch info by default now you might as well try this in the future and see if it works but yeah i don't get this module because my google colab instance does it have it installed no problem with that let's install torch info here install torch info and then we'll import it if it's available so we're going to try and import torch info if it's already installed we'll import it and then if it doesn't work if that tri block fails we're going to run pip install torch info and then we will import torch info and then we're going to run down here from torch info import summary and then if this all works we're going to get a summary of our model we're going to pass it in model 0 and we have to put in an input size here now that is an example of the size of the data that will flow through our model so in our case let's put in an input size of 1 3 64 64. so this is an example of putting in a batch of one image you could potentially put in 32 here if you wanted but let's just put in a batch of a singular image and of course we could change these values here if we wanted to to do 4224 but what you might notice is that if it doesn't get the right input size it produces an error there we go so just like we got before when we printed out our input sizes manually we get an error here because what torch info behind the scenes is going to do is it's going to do a forward pass on whichever model you pass it with an input size of whichever input size you give it so let's put in the input size that our model was built for wonderful so what torch info gives us is oh excuse me we didn't comment out the printouts before so just make sure we've commented out these printouts in the forward method of our tiny vgg class so i'm just going to run this then we run that run that just to make sure everything still works we'll run torch info there we go so no printouts from our model but this is look how beautiful this is i love how this prints out so we have our tiny vgg class and then we can see it's comprised of three sequential blocks and then inside those sequential blocks we have different combinations of layers we have some conv layers some relu layers some max pool layers and then the final layer is our classification layer with a flatten and a linear layer and we can see the shapes changing throughout our model as our data goes in and gets manipulated by the various layers so are these in line with the cnn explainer so if we check this last one we've already verified this before and we also get some other helpful information down here which is total params so you can see that each of these layers has a different amount of parameters to learn now recall that a parameter is a value such as a weight or a bias term within each of our layers which starts off as a random number and the whole goal of deep learning is to adjust those random numbers to better represent our data so in our case we have just over 8 000 total parameters now this is actually quite small in the future you'll probably play around with models that have a million parameters or more and models now are starting to have many billions of parameters and we also get some information here such as how much the model size would be now this would be very helpful depending on where we had to put our model so what you'll notice is that as a model gets larger as more layers it will have more parameters more weights and bias terms that can be adjusted to learn patterns and data but its input size and its estimated total size would definitely get bigger as well so that's just something to keep in mind if you have size constraints in terms of storage on in your future applications so ours is under a megabyte which is quite small but you might find that some models in the future get up to 500 megabytes maybe even over a gigabyte so just keep that in mind for going forward and that's the crux of torch info one of my favorite packages it just gives you an idea of the input and output shapes of each of your layers so you can use twitch info wherever you need it should work with most of your pi touch models just be sure to pass it in the right input size you can also use it to verify like we did before if the input and output shapes are correct so check that out big shout out to tyler yep and everyone who's created the torch info package now in the next video let's move towards training our tiny vgg model we're going to have to create some training and test functions if you want to jump ahead we've already done this so i encourage you to go back to section 6.2 in the functionalizing training and test loops and we're going to build functions very similar to this but for our custom data set so if you want to replicate these functions in this notebook give that a go otherwise i'll see you in the next video and we'll do it together how'd you go did you give it a shot did you try replicating the train step in the test step function i hope you did otherwise let's do that in this video but this time we're going to do it for our custom data sets and what you'll find is not much if anything changes because we've created our train and test loop functions in such a way that they're generic so we want to create a train step function and by generic i mean they can be used with almost any model and data loader so train step is takes in a model and data loader and trains the model on the data loader and we also want to create another function called test step which takes in a model and a data loader and other things and evaluate the model on the data loader and of course for the train step and for the test step each of them respectively are going to take a training data loader oh i just might make this a third heading so that our outline looks nice and beautiful section seven is turning out to be quite a big section of course we want them to be uh respectively taking their own data loader so train takes in the train data loader test takes in the test data loader without any further ado let's create the train step function now we've seen this one in the computer vision section so let's see what we can make here so we need a train step which is going to take in a model which will be a torch nn.module and we want it also to take in a data loader which will be a torch.utils.data.dataloader and then it's going to take in a loss function which is going to be a torch nnn dot module as well and then it's going to take in an optimizer which is going to be torch optim dot optimizer wonderful and then what do we do what's the first thing that we do in a training step well we put the model in train mode so let's go model.train then what should we do next well let's set up some evaluation metrics one of them being loss and one of them being accuracy so set up train loss and train accuracy values and we're going to accumulate these per batch because we're working with batches so we've got train loss and train ack equals zero zero now we can loop through our data loader so let's write loop through data loader and we'll loop through each of the batches in this because we've batchified our data loader so for batch x y in enumerate data loader we want to send the data to the target device so we could even put that device parameter up here device equals device we'll set that to device by default and then we can go x y equals x dot 2 device and y.2 device beautiful and now what do we do well remember the pi torch the unofficial pi torch optimization song we do the forward pass so why pred equals model on x and then number two is we calculate the loss so calculate the loss let's go loss equals loss function and we're going to pass it in y pred y we've done this a few times now so that's why we're doing it a little bit faster so i hope you notice that the things that we've covered before i'm stepping up the pace a bit so it might be a bit of a challenge but that's all right you can handle it and then so that's accumulating the loss so we're starting from zero up here and then each batch we're doing a forward pass calculating the loss and then adding it to the overall train loss and so we're going to optimize a zero grad so zero the gradients of the optimizer for each new batch and then we're going to perform back propagation so lost backwards and then five what do we do optimize a step step step wonderful look at that look at us coding a train lube in a minute or so now let's calculate the accuracy and accumulate it calculate the you notice that we don't have an accuracy function here that's because accuracy is quite a straightforward metric to calculate so we'll first get the the wide thread class because this is going to output model logits as we've seen before the raw output of a model is logit so to get the class we're going to take the arg max torch.softmax so we'll get the prediction probabilities of ypred which is the raw logits what we've got up here across dimension one and then also across dimension one here beautiful so that should give us the labels and then we can find out if this is wrong by checking it later on and then we're going to create the accuracy by taking the y print class checking for equality with the right labels so this is going to give us how many of these values equal true and we want to take the sum of that take the item of that which is just a single integer and then we want to divide it by the length of y print so we're just getting the total number that are right and dividing it by the length of samples so that's the formula for accuracy now we can come down here outside of the batch loop we know that because we've got this helpful line drawn here and we can go adjust metrics to get the average loss and accuracy per batch so we're going to set train loss is equal to train loss divided by the length of the data loader so the number of batches in total and the train accuracy is the train ack divided by the length of the data loader as well so that's going to give us the average loss and average accuracy per epoch across all batches so train ack now that's a pretty good looking function to me for a train step do you want to take on the test step so pause the video give it a shot and you'll get great inspiration from this notebook here otherwise we're going to do it together in three two one let's do the test step so create a test step function so we want to be able to call these functions in an epoch loop and that way instead of writing out training and test code for multiple different models we just write it out once and we can call those functions so let's create def test step we're going to do model which is going to be if i could type torch nn module and then we're going to do data loader which is torch utils.data.dataloader capital l there and then we're going to just pass in a loss function here because we don't need an optimizer for the test function we're not trying to optimize anything we're just trying to evaluate how our model did on the training data set and let's put in the device here why not that way we can change the device if we need to so put model in a vowel mode because we're going to be evaluating or we're going to be testing then we can set up test loss and test accuracy values so test loss and test ack we're going to make these zero we're going to accumulate them per batch but before we go through the batch let's turn on inference mode so this is behind the scenes going to take care of a lot of pi torch functionality that we don't need that's very helpful during training such as tracking gradients but during testing we don't need that so loop through data loader or data batches and we're going to go for batch x y in enumerate data loader you'll notice that above we didn't actually use this batch term here and we probably won't use it here either but i just like to go through and have that there in case we wanted to use it anyway so send data to the target device so we're going to go x y equals x dot 2 device and same with y.2 device beautiful and then what do we do for an evaluation step or a test step well of course we do the forward pass forward pass and we're going to let's call these test pred logits and get the raw outputs of our model and then we can calculate the loss on those raw outputs calculate the loss we'll get the loss is equal to loss function on test pred logits versus y and then we're going to accumulate the loss so test loss plus equals loss dot item remember item just gets a single integer from whatever term you call it on and then we're going to calculate the accuracy now we can do this exactly how we've done for the training data set or the training step so test spread labels we're going to you don't i just want to highlight the fact that you actually don't need to take the soft max here you could just take the arg max directly from this the reason why we take the softmax so you could do the same here you could just directly take the arg max of the logits the reason why we get the softmax is just for completeness so if you wanted the prediction probabilities you could use torch.softmax on the prediction logits but it's not 100 necessary to get the same values and you can test this out yourself so try this with and without the softmax and see if you get the same results so we're going to go test accuracy plus equals now we'll just create our accuracy calculation on the fly test spread labels we'll check for equality on the y and then we'll get the sum of that we'll get the item of that and then we'll divide that by the length of the test pred labels beautiful so it's going to give us accuracy per batch and so now we want to adjust the metrics to get average loss and accuracy per batch so test loss equals test loss divided by length of the data loader and then we're going to go test ack equals test ack divided by length of the data loader and then finally we're going to return the test loss not lost and test accuracy look at us go now in previous videos that took us or in previous sections that took us a fairly long time but now we've done it in about 10 minutes or so so give yourself a pat in the back for all the progress you've been making but now let's in the next video we did this in the computer vision section as well we created did we create a train function oh no we didn't but we could so let's create a function to functionalize this we want to train our model i think we did actually def train we've done so much i'm not sure what we've done oh okay so looks like we might not have but in the next video give yourself this challenge create a function called train that combines these two functions and loops through them both with an epoch range so just like we've done here in the previous notebook can you functionize this so just this step here so you'll need to take in a number of epochs you'll need to take in a trained data loader and a test data loader a model a loss function an optimizer and maybe a device and i think you should be pretty on your way to all the steps we need for train so give that a shot but in the next video we're going to create a function that combines train step and test step to train a model i'll see you there how'd you go in the last video i issued you the challenge to combine our train step function as well as our test step function together in their own functions so that we could just call one function that calls both of these and train a model and evaluate it of course so let's now do that together i hope you gave it a shot that's what it's all about so we're going to create a train function now the role of this function is going to as i said combine train step and test step now we're doing all of this on purpose right because we want to not have to rewrite all of our code all the time so we want to be functionizing as many things as possible so that we can just import these later on if we wanted to train more models and just leverage the code that we've written before as long as it works so let's see if it does we're going to create a train function i'm going to first import tqdm tqdm.auto because i'd like to get a progress bar while our model is training there's nothing quite like watching a neural network train so step number one is we need to create a train function that takes in various model parameters plus optimizer plus data loaders plus a loss function a whole bunch of different things so let's create def train and i'm going to pass in a model here which is going to be torch nn dot module you'll notice that the inputs of this are going to be quite similar to our train step and test step i don't actually need that there so we also want a train data loader for the training data torch.utils.data.dataloader and we also want a test data loader which is going to be torch.utils.data.dataloader and then we want an optimizer so the optimizer will only be used with our training data set but that's okay we can take it as an input optimizer and then we want a loss function this will generally be used for both our training and testing step because that's what we're combining here now since we're working with multi-class classification i'm going to set our loss function to be a default of nn.cross entropy loss then i'm going to get epochs i'm going to set five we'll train for five epochs by default and then finally i'm going to set the device equal to the device so what do we get wrong here that's a right we'll just keep coding we'll ignore these little red lines if they stay around we'll come back to them so step number two i'm going to create this is a step you might not have seen but i'm going to create an empty results dictionary now this is going to help us track our results do you recall in a previous notebook we outputted a model dictionary for how a model went so if we look at model one results yeah we've got a dictionary like this so i'd like to create one of these on the fly but keep track of the result every epoch so what was the loss on epoch number zero what was the accuracy on epoch number three so we'll show you how i'll do that we can use a dictionary and just update that while our model trains so results i want to keep track of the train loss so i'm going to set that equal to an empty list and just append to it i also want to keep track of the train accuracy we'll set that as an empty list as well i also want to keep track of the test loss and i also want to keep track of the test accuracy now you'll notice over time that these what you can track is actually very flexible and what your functions can do is also very flexible so this is not the the gold standard of doing anything by any means it's just one way that works and you'll probably find in the future that you need different functionality and of course you can code that out so let's now loop through our epochs so for epoch in tqdm let's create a range of our epochs above and then we can set the train loss have i missed a comma up here somewhere type annotation not supported for that type of expression okay that's all right we'll just leave that there so we're going to go train loss and train ack recall that our train step function that we created in the previous video train step returns our train loss and train ack so as i said i want to keep track of these throughout our training so i'm going to get them from train step then for each epoch in our range of epochs we're going to pass in our model and perform a training step so the data loader here is of course going to be the train data loader the loss function is just going to be the loss function that we pass into the train function and then the optimizer is going to be the optimizer and then the device is going to be device beautiful look at that we just performed a training step in five lines of code so let's keep pushing forward it's telling us we've got a whole bunch of different things here epochs is not defined maybe we just have to get rid of this we can't have the type annotation here and that'll that'll stop um that'll stop google collab getting angry at us if it does anymore i'm just going to ignore it for now epochs anyway we'll leave it at that we'll find out if there's an error later on test loss you might be able to find it before i do so test step we're going to pass in the model we're going to pass in a data loader now this is going to be the test data loader look at us go creating training and test step functions loss function and then we don't need an optimizer we're just going to pass in the device and then behind the scenes both of these functions are going to train and test our model how cool is that so still within the loop right this is important within the loop we're going to have number four is we're going to print out let's print out what's happening print out what's happening we can go print and we'll do a fancy little print statement here we'll get the epoch and then we will get the train loss which will be equal to the train loss we'll get that to let's go four decimal places how about that and then we'll get the train accuracy which is going to be the train ack we'll get that to four maybe three decimal or four just for just so it looks nice looks aesthetic and then we'll go test loss we'll get that coming out here and we'll pass in the test loss we'll get that to four decimal places as well and then finally we'll get the test accuracy so a fairly long print statement here but that's all right we'd like to see how our model is doing while it's training beautiful and so again still within the epoch we want to update our results dictionary so that we can keep track of how our model performed over time so let's pass in results we want to update the train loss and so this is going to be this and then we can append our train loss value so this is just going to expend the list in here with the train loss value every epoch and then we'll do the same thing on the train accuracy append train ack and then we'll do the same thing again with test loss dot append test loss and then we will finally do the same thing with the test accuracy test accuracy now this is a pretty big function but this is why we write the code now so that we can use it multiple times later on so return the filled results at the end of the epoch so outside the epochs loop so now loop we're outside it now let's return results now i've probably got an error somewhere here and you might be able to spot it okay train data loader where do we get that invalid syntax maybe up here we don't have a comma here was that the issue the whole time wonderful you might have seen that i'm completely missed that but we now have a train function to train our model and the train function of course is going to call out our train step function and our test step function so what's left to do well nothing less than train and evaluate model 0. so our model is way back up here how about in the next video we leverage our functions namely just the train function because it's going to call our train step function and our test step function and train our model so i'm going to encourage you to give that a go you're going to have to go back to the workflow maybe or maybe you already know this so what have we done we've got our data ready and we turned it into tensors using a combination of these functions we've built and picked a model while we've built a model which is the tiny vgg architecture have we created a loss function yet i don't think we have or an optimizer i don't think we've done that yet we've definitely built a training loop though we aren't using torch metrics we're just using accuracy but we could use this if we want we haven't improved through experimentation yet but we're going to try this later on and then save and reload the model we've seen this before so i think we're up to picking a loss function and an optimizer so give that a shot in the next video we're going to create a loss function and an optimizer and then leverage the functions we've spent the last two videos creating to train our first model model 0 on our own custom data set this is super exciting i'll see you in the next video who's ready to train and evaluate model zero put your hand up i definitely am so let's do it together we're gonna start off section 7.7 and we're going to put in train and evaluate model 0 our baseline model on our custom data set now if we refer back to the pytorch workflow i issued you the challenge in the last video to try and create a loss function and an optimizer i hope you gave that a go but we've already built a training loop so we're going to leverage our training loop functions namely train train step and test step all we need to do now is instantiate a model choose a loss function and an optimizer and pass those values to our training functions so let's do that all right this is so exciting let's set the random seeds i'm going to set torch manual seed 42 and torch cuda manual seed 42 now remember i just want to highlight something i read an article the other day about not using random seeds the reason why we are using random seeds is for educational purposes so to try and get our numbers on my screen and your screen as close as possible but in practice you quite often don't use random seeds all the time the reason why is because you want your model's performance to be similar regardless of the random seed that you use so just keep that in mind going forward we're using random seeds to just exemplify how we can get similar numbers on our page but ideally no matter what the random seed was our models would go in the same direction that's where we want our models to eventually go but we're going to train for five epochs and now let's create a recreate an instance of tiny vgg we can do so because we've created the tiny vdg class so tiny vgg which is our model zero we don't have to do this but we're going to do it anyway just so we've got all the code in one place tiny vgg what is our input shape going to be that is the number of color channels of our target images and because we're dealing with color images we have an input shape of three previously we used an input shape of one to deal with grayscale images i'm going to set hidden units to 10 in line with the cnn explainer website and the output shape is going to be the number of classes in our training data set and then of course we're going to send the target model to the target device so what do we do now well we set up a loss function and an optimizer loss function and optimizer so our loss function is going to be because we're dealing with multi-class classification and then cross entropy if i could spell entropy loss and then we're going to have an optimizer this time how about we mix things up how about we try the atom optimizer now of course the optimizer is one of the hyper parameters that you can set for your model and a hyper parameter being a value that you can set yourself so the parameters that we want to optimize are our model 0 parameters and we're going to set a learning rate of 0.001 now recall that you can tweak this this learning rate if you like but i believe did i just see that the default learning rate of atom is zero zero one yeah there we go so adam's default learning rate is one to the power of ten to the negative three and so that is the default learning rate for adam and as i said oftentimes different variables in the pie touch library such as optimizers have good default values that work across a wide range of problems so we're just going to stick with the default if you want to you can experiment with different values of this but now let's uh start the timer because we want to time our models we're going to import from time it we want to get the default timer class and i'm going to import that as timer just so we don't have to type out default timer so the start time is going to be timer this is going to just put a a line in the sand of what the start time is at this particular line of code it's going to measure that and then we're going to train model 0. now this is using of course our train function so let's write model zero results and then we wrote model one but we're not up to there yet so let's go train model equals model zero and this is just the training function that we wrote in a previous video and the train data is going to be our train data loader and we've got train data loader simple because we're not using data augmentation for model one and then our test data loader is going to be our test data loader simple and then we're going to set our optimizer which is equal to the optimizer we just created the friendly atom optimizer and the loss function is going to be the loss function that we just created which is nn cross entropy loss finally we can send in epochs is going to be num epochs which is what we set at the start of this video to five and of course we could train our model for longer if we wanted to but the whole idea of when you first start training a model is to keep your experiments quick so that's why we're only training for five maybe later on you train for 10 20 tweak the learning rate do a whole bunch of different things but let's go down here let's end the timer see how long our model took to train end the timer and print out how long it took so in a previous section we created a helper function for this we're just going to simplify it in this section and we're just going to print out how long the training time was total training time let's go end time minus start time and then we're going to go point we'll take it to three decimal places hey seconds you're ready to train our first model our first convolutional neural network on our own custom data set on pizza steak and sushi images let's do it you ready three two one no errors oh there we go okay should this be train data loader did you notice that what is our train data take as input oh we're not getting a dock string oh there we go we want train data loader data loader and same with this i believe let's try again beautiful oh look at that lovely progress bar okay how's our our model is training quite fast okay all right what do we get so we get an accuracy on the training data set of about 40 percent and we get an accuracy on the test data set of about 50 percent now hmm what's that telling us it's telling us that about 50 of the time our model is getting the prediction correct but we've only got three classes so even if our model was guessing it would get things right 33 of the time so even if you just guessed pizza every single time because we only have three classes if you guessed pizza every single time you get a baseline accuracy of 33 so our model isn't doing too much better than our baseline accuracy of course we'd like this number to go higher and maybe it would if it trained for longer so i'll let you experiment with that but if you'd like to see some different methods of improving a model recall back in section number o2 we had an improving a model section improving a model here we go so here's some things you might want to try we can improve a model by adding more layers so if we come back to our tiny vgg architecture right up here we're only using two convolutional blocks perhaps you wanted to add in a convolutional block three you can also add more hidden units right now we're using 10 hidden units you might want to double that and see what happens fitting for longer this is what we just spoke about so right now we're only fitting for five epochs so if you maybe wanted to try double that again and then even double that again changing the activation functions so maybe relu is not the ideal activation function for our specific use case change the learning rate we've spoken about that before so right now our learning rate is 0.001 for atom which is the default but perhaps there's a better learning rate out there and change the loss function this is probably not in our case not going to help too much because cross entropy loss is a pretty good loss for multi-class classification but these are some things that you could try these first three especially you could try quite quickly you could try doubling the layers you could try adding more hidden units and you could try fitting for longer so i'd give that a shot but in the next video we're going to take our model 0 results which is a dictionary or at least it should be and we're going to plot some loss curves so this is a good way to inspect how our model is training yeah so we've got some values here let's plot these in the next video i'll see you there in the last video we trained our first convolutional neural network on custom data so you should be very proud of that that is no small feat to take our own data set of whatever we want and train a pytorch model on it however we did find that it didn't perform as well as we'd like it to we also highlighted a few different things that we could try to to do to improve it but now let's uh plot our model's results using a loss curve so i'm going to write another heading down here we'll go i believe we're up to 7.8 so plot the loss curves of model 0. so what is a loss curve so i'm going to write down here a loss curve is a way of tracking your model's progress over time so if we just looked up google and we looked up lost curves oh there's a great guide by the way i'm going to link this but i'd rather if in doubt code it out then just look at guides yeah lost curves so yeah loss over time so there's our loss value on the left and there's say steps which is epochs or batches or something like that then we've got a whole bunch of different loss curves over here essentially what we want it to do is go down over time so that's the ideal loss curve let's go back down here and a good guide for different loss curves can be seen here we're not going to go through that just yet let's focus on plotting our own models lost curves and we can inspect those let's get the model keys get the model 0 results keys i'm going to type in model 0 results dot keys because it's a dictionary let's see if we can write some code to plot these values here so yeah over time so we have one value for train loss train act test loss and test act for every epoch and of course these lists would be longer if we train for more epochs but let's just how about we create a function called def plot loss curves which will take in a results dictionary which is of string and a list of floats so this just means that our results parameter here is taking in a dictionary that has a string as a key and it contains a list of floats that's what this means here so let's write a dock string plots training curves of a results dictionary beautiful and so we're in this section of our workflow which is kind of like uh we're kind of doing something similar to tensorboard but what it does i'll let you look into that if you want to otherwise we're going to see it later on but we're really evaluating our model here so let's write some plotting code we're going to use matplotlib so we want to get the lost values of the results dictionary so this is training and test let's set loss equal to results train loss so this is going to be the loss on the training data set and then we'll create the test loss which is going to be we'll index on the results dictionary and get the test loss beautiful now we'll do the same and we'll get the accuracy get the accuracy values of the results dictionary so training and test then we're going to go accuracy equals results this will be the training accuracy train ack and accuracy or we'll call this test accuracy actually test accuracy equals results test ack now let's create a number of epochs so we want to figure out how many epochs we did we can do that by just counting the length of this value here so figure out how many epochs there were so we'll set epochs equal to a range because we want to plot it over time our model's results over time that's that's the whole idea of a loss curve so we'll just get the the length of our results here and we'll get the range so now we can set up a plot let's go plt.figure and we'll set the fig size equal to something nice and big because we're going to do four plots we want one for or maybe two plots one for the loss one for the accuracy and then we will go plot the loss plt dot subplot we're going to create one row two columns and index number one we want to put plt.plot and here's where we're going to plot the training loss so we'll give that a label of train loss and then we'll add another plot with epochs and test loss the label here is going to be test loss and then we'll add a title which will be loss plt let's um put a label on the x which will be epochs so we know how many steps we've done this plot over here lost curves it uses steps i'm going to use epochs they mean almost the same thing it depends on what scale you'd like to see your loss curves we'll get a legend as well so that we uh the labels appear now we're going to plot the accuracy so plt.subplot let's go one two and then index number two that this plot's going to be on plt.plot we're going to go epochs accuracy and the label here is going to be train accuracy and then we'll get on the next plot which is actually going to be on the same plot we'll put the test accuracy that way we have the test accuracy and the training accuracy side by side test accuracy same with the train loss and train uh sorry test loss and then we'll give our plot a title this plot is going to be accuracy and then we're going to give it a next label which is going to be epochs as well and then finally we'll get the plot the legend a lot of plotting code here but let's see what this looks like hey if we've done it all right we should be able to pass it in a dictionary just like this and see some nice plots like this let's give it a go and i'm going to call plot loss curves and i'm going to pass in model 0 results alrighty then okay so that's not too bad now why do i say that well because we're looking here for mainly trends we haven't trained our model for too long quantitatively we know that our model hasn't performed at the way we'd like it to do so we'd like the accuracy on both the train and test data sets to be higher and then of course if the accuracy is going higher then the loss is going to come down so the ideal trend for a loss curve is to go down from the top left to the bottom right in other words the loss is going down over time so that's the trend is all right here so potentially if we train for more epochs which i'd encourage you to give it a go our model's loss might get lower and the accuracy is also trending in the right way our accuracy we want it to go up over time so if we train for more epochs these curves may continue to go on now they may not they you never really know right you can guess these things but until you try it you don't really know so in the next video we're going to have a look at some different forms of loss curves but before we do that i'd encourage you to go through this guide here interpreting lost curves so i think feel like if you just search up lost curves you're going to find google's guide or you could just search interpreting loss curves because as you'll see there's many different ways that loss curves can be interpreted but the ideal trend is for the loss to go down over time and metrics like accuracy to go up over time so in the next video let's cover a few different forms of loss curves such as the ideal loss curve what it looks like when your model is under fitting and what it looks like when your model's overfitting and if you'd like to have a primer on those things i'd read through this guide here don't worry too much if you're not sure what's happening we're going to cover a bit more about loss curves in the next video i'll see you there in the last video we looked at our model's loss curves and also the accuracy curves and a loss curve is a way to evaluate a model's performance over time such as how long it was training for and as you'll see if you google some images of lost curves you'll see many different types of loss curves they come in all different shapes and sizes and there's many different ways to interpret lost curves so this is uh google's testing and debugging and machine learning guide so i'm going to set this as extra curriculum for this section so we're up to number eight let's have a look at what should an ideal loss curve look like so we'll just link that in there now lost curve i'll just rewrite here is a loss curve is i'll just make some space a loss curve is one of the most helpful ways to troubleshoot a model so the trend of a loss curve you want it to go down over time and the trend typically of an evaluation metric like accuracy you want it to go up over time so let's go into the keynote loss curves so a way to evaluate your model's performance over time these are three of the main different forms of loss curve that you'll face but again there's many different types as mentioned in here interpreting loss curves sometimes you get it going all over the place sometimes your loss will explode sometimes your metrics will be sometimes contradictory testing loss will be higher than your training loss we'll have a look at what that is sometimes your model gets stuck in other words the loss doesn't reduce let's have a look at some lost curves here in the case of under fitting overfitting and just right so this is the goldilocks zone underfitting is when your model's loss on the training and test data sets could be lower so in our case if we go back to our loss curves of course we want this to be lower and we want our accuracy to be higher so from our perspective it looks like our model is underfitting and we would probably want to train it for longer say 10 20 epochs to see if this trend continues if it keeps going down it may stop underfitting so underfitting is when your loss could be lower now the inverse of underfitting is called overfitting and so two of the biggest problems in machine learning is trying to reduce underfitting so in other words make your loss lower and also reduce overfitting these are both active areas of research because you always want your model to perform better but you also want it to perform pretty much the same on the training set as it does the test set and so overfitting would be when your training loss is lower than your testing loss and why would this be overfitting so it means overfitting because your model is essentially learning the training data too well and that means the loss goes down on the training data set which is typically a good thing however this learning is not reflected in the testing data set so your model is essentially memorizing patterns in the training data set that don't generalize well to the test data set so this is where we come to the just right curve is that we want ideally our training loss to reduce as much as our test loss and quite often you'll find that the loss is slightly lower on the training set than it is on the test set and that's just because the model is exposed to the training data and it's never seen the test data before so it might be a little bit lower on the training data set than on the test data set so under fitting the model's loss could be lower overfitting the model is learning the training data too well now this would be equivalent to say you were studying for a final exam and you just memorized the course materials the training set and when it came time to the final exam because you'd only memorize the course materials you couldn't adapt those skills to questions you hadn't seen before so the final exam would be the test set so that's overfitting the train loss is lower than the test loss and just right ideally you probably won't see lost curves this exact smooth i mean they might be a little bit jumpy ideally your training loss and test loss go down at a similar rate and of course there's more combinations of these if you'd like to see them check out their google's lost curve guide that you can check that out there that's some extra curriculum now you probably want to know how do you deal with underfitting and overfitting let's look at a few ways we'll start with overfitting so we want to reduce overfitting in other words we want our model to perform just as well on the training data set as it does on the test data set so one of the best ways to reduce overfitting is to get more data so this means that our training data set will be larger our model will be exposed to more examples and we'll thus in theory it doesn't always work these all come with a caveat right they don't always work as with many things in machine learning so get more data give your model more chance to learn patterns generalizable patterns in a data set you can use data augmentation so make your model's training data set harder to learn so we've seen a few examples of data augmentation you can get better data so not only more data perhaps the data that you're using isn't that the quality isn't that good so if you enhance the quality of your data set your model may be able to learn better more generalizable patterns and in turn reduce overfitting use transfer learning so we're going to cover this in a later section of the course but transfer learning is taking one models that works taking its patterns that it's learned and applying it to your own data set so for example i'll just go into the torch vision models library many of these models in here in torch vision the models module have already been trained on a certain data set and such as imagenet and you can take the weights or the patterns that these models have learned and if they work well on an imagenet dataset which is millions of different images you can adjust those patterns to your own problem and often times that will help with overfitting if you're still overfitting you can try to simplify your model usually this means taking away things like extra layers taking away more hidden units so say you had 10 layers you might reduce it to 5 layers why does this what's the theory behind this well if you simplify your model and take away complexity from your model your kind of telling your model hey use what you've got and you're going to have to because you've only got five layers now you're going to have to make sure that those five layers work really well because you've no longer got 10 and the same for hidden units say you started with 100 hidden units per layer you might reduce that to 50 and say hey you had 100 before now use those 50 and make your patterns generalizable use learning rate decay so the learning rate is how much your optimizer updates its or your optimizer updates your model's weight every step so learning rate decay is to decay the learning rate over time so you might look this up you can look this up go high torch learning rate scheduling so what this means is you want to decrease your learning rate over time now i know i'm giving you a lot of different things here but you've got this keynote as a reference so you can come across these over time so learning rate scheduling so we might look into here do we have schedule scheduler beautiful so this is going to adjust the learning rate over time so for example at the start of when a model is training you might want a higher learning rate and then as the model starts to learn patterns more and more and more you might want to reduce that learning rate over time so that your model doesn't update its patterns too much in later epochs so that's the concept of learning rate scheduling and the closer you get to convergence the lower you might want to set your learning rate think of it like this if you're reaching for a coin at the back of a couch can we get an image of that coin at back of couch images so if you're trying to reach a coin in the cushions here so the closer you get to that coin at the beginning you might take big steps but then the closer you get to that coin the smaller the step you might take to pick that coin out because if you take a big step when you're really close to the coin here the coin might fall down the couch the same thing with learning rate decay at the start of your model training you might take bigger steps as your model works its way down the loss curve but then you get closer and closer to the ideal position on the loss curve you might start to lower and lower that learning rate until you get right very close to the end and you can pick up the coin or in other words your model can converge and then finally use early stopping so if we go into an image does it is that early stopping here early stopping loss curves early stopping so what this means is that you stop yeah there we go so there's heaps of different guides early stopping with pi torch beautiful so what this means is before your testing error starts to go up you keep track of your model's testing error and then you stop your model from training or you save the weight or you save the patterns where your model's loss was the lowest so then you could just set your model to train for an infinite amount of training steps and as soon as the testing error starts to increase for say 10 steps in a row you go back to this point here and go i think that was where our model was the best and the testing error started to increase after that so we're going to save that model there instead of the model here so that's the concept of early stopping so that's dealing with overfitting there are other methods to deal with underfitting so recall underfitting is when we have a loss that isn't as low as we'd like it to be our model is not fitting the data very well so it's under fitting so to reduce under fitting you can add more layers slash units to your model you're trying to increase your model's ability to learn by adding more layers or units you can again tweak the learning rate perhaps your learning rate is too high to begin with and your model doesn't learn very well so you can adjust the learning rate again just like we discussed with reaching for that coin at the back of a couch if your model is still underfitting you can train for longer so that means giving your model more opportunities to look at the data so more epochs that just means it's got looking at the training set over and over and over and over again and trying to learn those patterns however you might find again if you try to train for too long your testing error will start to go up your model might start overfitting if you train too long so machine learning is all about a balance between underfitting and overfitting you want your model to fit quite well and so this is a great one so you want your model to start fitting quite well but then if you try to reduce underfitting too much you might start to overfit and then vice versa right if you try to reduce overfitting too much your model might underfit so this is one of the most fun dances in machine learning the balance between overfitting and underfitting finally you might use transfer learning so transfer learning helps with overfitting and underfitting recall transfer learning is using a model's learned patterns from wrong problem and adjusting them to your own we're going to see this later on in the course and then finally use less regularization so regularization is holding your model back so it's trying to prevent overfitting so if you do too much preventing of overfitting in other words regularizing your model you might end up under fitting so if we go back we have a look at the ideal curves underfitting if you try to prevent under fitting too much so increasing your model's capability to learn you might end up overfitting and if you try to prevent overfitting too much you might end up underfitting we are going for the just right section and this is going to be a balance between these two throughout your entire machine learning career in fact it's probably the most prevalent area of research is trying to get models not to underfit but also not to overfit so keep that in mind a loss curve is a great way to evaluate your model's performance over time and a lot of what we do with the loss curves is try to work out whether our model is underfitting or overfitting and we're trying to get to this just right curve we might not get exactly there but we want to keep trying getting as close as we can so with that being said let's now build another model in the next video and we're going to try a method to try and see if we can use data augmentation to prevent our model from overfitting although that experiment doesn't sound like the most ideal one we could do right now because it looks like our model is underfitting so with your knowledge of what you've just learned in the previous video how to prevent underfitting what would you do to increase this model's capability of learning patterns in the training data set would you train it for longer would you add more layers would you add more hidden units have a think and we'll start building another model in the next video welcome back in the last video we covered the important concept of a loss curve and how it can give us information about whether our model is underfitting in other words our model's loss could be lower or whether it's overfitting in other words the training loss is lower than the test loss or far lower than the validation loss that's another thing to note here is that i've put training in test sets here you could also do this with a validation data set and that the just right the goldilocks zone is when our training and test loss are quite similar over time now there was a fair bit of information in that last video so just wanted to highlight that you can get this all in section 04 which is the notebook that we're working on and then if you come down over here if we come to section eight what should an ideal loss curve look like we've got underfitting overfitting just right how to deal with overfitting we've got a few options here we've got how to deal with underfitting and then we've got a few options there and then if we wanted to look for more how to deal with overfitting you could find a bunch of resources here and then how to deal with underfitting you could find a bunch of resources here as well so that is a very fine line very fine balance that you're going to experience throughout all of your machine learning career but it's time now to move on we're going to move on to creating another model which is tiny vgg with data augmentation this time so if we go back to the slide data augmentation is one way of dealing with overfitting now it's probably not the most ideal experiment that we could take because our model 0 our baseline model looks like it's underfitting but data augmentation as we've seen before is a way of manipulating images to artificially increase the diversity of your training data set without collecting more data so we could take our photos of pizza sushi and steak and randomly rotate them 30 degrees and increase diversity forces a model to learn or hopefully learn again all of these come with a caveat of not always being the silver bullet to learn more generalizable patterns now i should have spelled generalizable here rather than generalization but similar thing let's go here let's create to start off with we'll just write down now let's try another modeling experiment so this is in line with our pi touch workflow trying a model then trying another one and trying another one so and so over again this time using the same model as before but with some slight data augmentation well maybe we want not slight that's probably not the best word we'll just say with some data augmentation and if we come down here we're going to write section 9.1 we need to first create a transform with data augmentation so we've seen what this looks like before we're going to use the trivial augment data augmentation create training transform which is as we saw in a previous video what pytorch the pie torch team have recently used to train their state-of-the-art computer vision models so train transform trivial this is what i'm going to call my transform and i'm just going to from torch vision import transforms we've done this before we don't have to re-import it but i'm going to do it anyway just to show you that we're re-importing or we're using transforms and we're going to compose a transform here recall that transforms help us manipulate our data so we're going to transform our images into size 64 by 64. then we're going to set up a trivial augment transforms just like we did before trivial augment wide and we're going to set the number of magnitude bins here to be 31 which is the default here which means we'll randomly use some data augmentation on each one of our images and it will be applied at a magnitude of 0 to 31 also randomly selected so if we lowered this to 5 the upper bound of intensity of how much that data augmentation is applied to a certain image will be less than if we set it to say 31. now our final transform here is going to be two tensor because we want our images in tensor format for our model and then i'm going to create a test transform i'm going to call this simple which is just going to be transforms dot compose and all that it's going to have oh i should put a list here all that it's going to have i'll just make some space over here is going to be transforms all we want to do is resize the image size equals 64 64. now we don't apply data augmentation to the test data set because we only just want to evaluate our models on the test data set our models aren't going to be learning any generalizable patterns on the test data set which is why we focus our data augmentations on the training data set and i've just readjusted that i don't want to do that beautiful so we've got a transform ready now let's load some data using those transforms so we'll create train and test data sets and data loaders with data augmentation so we've done this before you might want to try it out on your own so pause the video if you'd like to to test it out create a data set and a data loader using these transforms here and recall that our data set is going to be creating a data set from pizza steak and sushi for the train and test folders and that our data loader is going to be batchifying our data set so let's um turn our image folders into data sets data sets beautiful and i'm going to write here train data augmented just so we know that it's it's been augmented we've got a few of similar variable names throughout this notebook so i just want to be as clear as possible and i'm going to use i'll just re-import torch vision data sets we've seen this before the image folder so rather than use our own custom class we're going to use the existing image folder class that's within torch vision data sets and we have to pass in here a route so i'll just get the dock string there root which is going to be equal to our trainer which recall is the path to our training directory i've got that saved and then i'm going to pass in here the transform is going to be train transform trivial so our training data is going to be augmented thanks to this transform here and transforms trivial augment wide you know where you can find more about trivial augment wide of course in the pi torch documentation or just searching transforms trivial augment wide and did i spell this wrong trivial oh train train transform i spelt that wrong of course i did so test data let's create this as test data simple equals data sets dot image folder and the root dir is going to be here the test directory and the transform is just going to be what the test transform simple beautiful so now let's turn these data sets into data loaders so turn our data sets into data loaders we're going to import os i'm going to set the batch size here to equal to 32 the number of workers that are going to load our data loaders i'm going to set this to os.cpu count so there'll be one worker per cpu on our machine we're going to set here the torch manual seed to 42 because we're going to shuffle our training data train data loader i'm going to call this augmented equals data loader now i just want to i don't need to re-import this but i just want to show you again from torch.utils you can never have enough practice right.data let's import data loader so that's where we got the data loader class from now let's go train data augmented we'll pass in that as the data set and i'll just put in here the parameter name for completeness that's our data set and then we want to set the batch size which is equal to batch size i'm going to set shuffle equal to true and i'm going to set num workers equal to num workers beautiful and now let's do that again with the test data loader that this time test data loader i'm going to call this test data loader simple we're not using any data augmentation on the test data set just turning our images our test images into tensors the data set here is going to be test data simple i'm going to pass in the batch size equal to batch size so both our data loaders will have a batch size of 32 i'm gonna keep shuffle on false and num workers i'm going to set to num workers look at us go we've already got a data set and a data loader this time our data loader is going to be augmented for the training data set and it's going to be nice and simple for the test data set so this is really similar this data loader to the previous one we made the only difference in this modeling experiment is that we're going to be adding data augmentation namely trivial augment wide so with that being said we've got a data set we've got a data loader in the next video let's construct and train model one in fact you might want to give that a go so you can use our tiny vgg class to make model one and then you can use our train function to train a new tiny vgg instance with our training data loader augmented and our test data loader simple so give that a go and we'll do it together in the next video i'll see you there now that we've got our data sets and data loaders with data augmentation ready let's now create another model so 9.3 we're going to construct and train model one and this time i'm just going to write what we're going to doing going to be doing sorry this time we'll be using the same model architecture but we're changing the data here except this time we've augmented the training data so we'd like to see how this performs compared to a model with no data augmentation so that was our baseline up here and that's what you'll generally do with your experiments you'll start as simple as possible and introduce complexity when required so create model one and send it to the target device that is to the target device and because of our helpful selves previously we can create a manual seed here torch dot manual seed and we can create model one leveraging the class that we created before so although we built tiny vgg from scratch in this video in this section sorry in subsequent coding sessions because we've built it from scratch once and we know that it works we can just recreate it by calling the class and passing in different variables here so let's get the number of classes that we have in our train data augmented classes and we're going to send it to device and then if we inspect model one let's have a look wonderful now let's keep going we can also leverage our training function that we did you might have tried this before so let's now train our model i'm just going to put here wonderful now we've got a model and data loaders let's create what do we have to do we have to create a loss function and an optimizer and call upon our train function that we created earlier to train and evaluate our model beautiful so i'm going to set the random seeds torch dot manual seeds and torch dot cuda because we're going to be using cuda let's hit the manual seed here 42 i'm going to set the number of epochs we're going to keep many of the parameters the same set the number of epochs num epochs equals five we could of course train this model for longer if we really wanted to by increasing the number of epochs but now let's set up the loss function so loss fm equals nn cross entropy loss don't forget this just came into mind loss function often as well in pytorch is called criterion so the criterion you're trying to reduce but i just like to call it loss function and then we're going to have optimizer let's use the same optimizer we used before torch.optim.adam recall sgd and adam are two of the most popular optimizers so model one dot parameters they're the parameters we're going to optimize we're going to set the learning rate to 0 0 1 which is the default for the atom optimizer in pi torch then we're going to start the timer so from time it let's import the default timer as timer and we'll go start time equals timer and then let's go train model one how can we do this well we're going to get a results dictionary as model 1 results we're going to call upon our train function inside our train function we'll pass the model parameter as model one for the train data loader parameter we're going to pass in train data loader augmented so our augmented training data loader and for the test data loader we can pass in here test data loader simple then we can write our optimizer which will be the atom optimizer our loss function is going to be nn cross entropy loss what we've created above and then we can set the number of epochs is going to be equal to num epochs and then if we really wanted to we could set the device equal to device which will be our target device and now let's end the timer and print out how long it took took end time equals timer and we'll go print total training time for model one is going to be end time minus start time and oh it would help if i could spell we'll get that to three decimal places and that'll be seconds so you're ready we look how quickly we built a training pipeline for model one and look how big easily we created it so go us for coding all of that stuff up before let's train our second model our first model using data augmentation you ready three two one let's go no errors beautiful we're going nice and quick here so oh about just over seven seconds so what what gpu do i have currently just keep this in mind that i'm using google collab pro so i get preference in terms of allocating a faster gpu your model training time may be longer than what i've got depending on the gpu it also may be faster again depending on the gpu but we get about seven seconds but it looks like our model with data augmentation didn't perform as well as our model without data augmentation hmm so how long did our model before without data augmentation take the train oh just over seven seconds as well but we've got better results in terms of accuracy on the training and test data sets for model zero so maybe data augmentation doesn't help in our case and we kind of hinted at that because the loss here was already going down we weren't really over fitting yet so recall that data augmentation is a way to help with overfitting generally so maybe that wasn't the best step to uh try and improve our model but let's nonetheless keep evaluating our model in the next video we're going to plot the loss curves of model one so in fact you might want to give that a go so we've got a function plot loss curves and we've got some results in a dictionary format so try that out plot the lost curves and see what you see let's do it together in the next video i'll see you there in the last video we did the really exciting thing of training our first model with data augmentation but we also saw that quantitatively it looks like that it didn't give us much improvement so let's keep evaluating our model here i'm going to make a section recall that one of my favorite ways or one of the best ways not just my favorite to evaluate the performance of a model over time is to plot the loss curves so a loss curve helps you evaluate your model's performance over time and it will also give you a great visual representation or a visual way to see if your model is underfitting or overfitting so let's plot the loss curves of model one results and see what happens we're using this function we created before and oh my goodness is that going in the right direction it looks like our test loss is going up here now is that where we want it to go remember the ideal direction for a loss curve is to go down over time because loss is measuring what it's measuring how wrong our model is and the accuracy curve looks like it's all over the place as well i mean it's going up kind of but maybe we don't have enough time to measure these things so an experiment you could do is train both of our models model 0 and model 1 for more epochs and see if these lost curves flatten out so i'll pose you the question is our model underfitting or overfitting right now or both so if we want to have a look at the loss curves our just right is for the loss that is this is not for accuracy this is for loss over time we want it to go down so for me our model is under fitting because our loss could be lower but it also looks like it's overfitting as well so it's not doing a very good job because our test loss is far higher than our training loss so if we go back to section 4 of the learn pi torch.io book what should an ideal loss curve look like i'd like you to start thinking of some ways that we could deal with overfitting of our model so could we get more data could we simplify it could we use transfer learning we're going to see that later on but you might want to jump ahead and have a look and if we're dealing with underfitting what are some other things that we could try with our model could we add some more layers potentially another convolutional block could we increase the number of hidden units per layer so if we've got currently 10 hidden units per layer maybe you want to increase that to 64 or something like that could we train it for longer that's probably one of the easiest things to try with our current training functions we could train for 20 epochs so have a go at this reference this try out some experiments with see if you can get these lost curves more towards the ideal shape and in the next video we're going to keep pushing forward we're going to compare our model results so we've done two experiments let's now see them side by side we've looked at our model results individually and we know that they could be improved but a good way to compare all of your experiments is to compare your model's results side by side so that's what we're going to do in the next video i'll see you there now that we've compared our models lost curves on their own individually how about we compare our model results to each other so let's uh have a look at comparing our model results and so i'm going to write a little note here that after evaluating our modeling experiments on their own it's important to compare them to each other and there's a few different ways to do this there's a few different ways to do this number one is hard coding so like we've done we've we've written functions we've written helper functions and whatnot and manually plotted things so i'm just going to write in here this is what we're doing then of course there are tools to do this such as pi torch plus tensorboard and so i'll link to this pytorch tensorboard we're going to see this in a later section of the course tensorboard is a great resource for tracking your experiments if you'd like to jump forward and have a look at what that is in the pi touch documentation i'd encourage you to do so then another one of my favorite tools is weights and biases so these are all going to involve some code as well but they help out with automatically tracking different experiments so weights and biases is one of my favorite and you've got platform for experiments that's what you'll be looking at so if you run multiple experiments you can set up weights and biases pretty easy to track your different model hybrid parameters so pi torch there we go import weights and biases start a new run on weights and biases you can save the learning rate value and whatnot go through your data and just log everything there so this is not a course about different tools we're going to focus on just pure pie torch but i thought i'd leave these here anyway because you're going to come across them eventually and ml flow is another one of my favorites as well we have ml tracking projects models registry all that sort of stuff if you'd like him to look into more ways to track your experiments there are some extensions but for now we're going to stick with hard coding we're just going to do it as simple as possible to begin with and if we wanted to add other tools later on we can sure do that so let's create a data frame for each of our model results we can do this because our model results recall in the form of dictionaries so model 0 results but you can see what we're doing now by hard coding this it's quite cumbersome can you imagine if we had say 10 models or even just 5 models we'd have to really uh write a fair bit of code here for all of our dictionaries and whatnot whereas these tools here help you to track everything automatically so we've got a data frame here model zero results over time these are our number of epochs we can notice that the training loss starts to go down the testing loss also starts to go down and the accuracy on the training and test data set starts to go up now those are the trends that we're looking for so an experiment you could try would be to train this model 0 for longer to see if it improved but we're currently just interested in comparing results so let's set up a plot i want to plot model 0 results and model 1 results on the same plot so we'll need a plot for training loss we'll need a plot for training accuracy test loss and test accuracy and then we want two separate lines on each of them one for model zero and one for model one and this particular pattern would be similar regardless if we had 10 different experiments or if we had 10 different metrics we wanted to compare you generally want to plot them all against each other to make them visual and that's what tools such as weights and biases what tensorboard and what ml flow can help you to do so i'm just going to get out of that clean up our browser so let's set up a plot here i'm going to use matplotlib going to put in a figure i'm going to make it quite large because we want four subplots one for each of the metrics we want to compare across our different models now let's get number of epochs so epochs is going to be length or we'll turn it into a range actually range of len model zero df so that's going to give us five beautiful range between zero and five now let's create a plot for the train loss we want to compare the train loss across model 0 and the train loss across model 1. so we can go plt.subplot let's create a plot with two rows and two columns and this is going to be index number one will be the training loss we'll go plt dot plot i'm going to put in here epochs and then model 0 df inside here i'm going to put train loss for our first metric and then i'm going to label it with model 0. so we're comparing the train loss on each of our modeling experiments recall that model 0 was our baseline model and that was tiny vgg without data augmentation and then we tried out model one which was the same model but all we did was we added a data augmentation transform to our training data so plt we'll go x label they both used the same test data set and plt dot legend let's see what this looks like wonderful so there's our training loss across two different models so we notice that model 0 is trending in the right way model 1 kind of exploded on epoc number that would be 0 1 2 or 1 depending how you're counting let's just say epoch number two because that's easier the loss went up but then it started to go back down so again if we continued training these models we might notice that the overall trend of the loss is going down on the training data set which is exactly what we'd like so let's now plot we'll go the test loss so i'm going to go test loss here and then i'm going to change this i believe if i hold ctrl or command maybe nope or option on my mac keyboard yeah so it might be a different key on windows but for me i can press option and i can get a multi-cursor here so i'm just going to come back in here and that way i can backspace there and just turn this into test loss wonderful so i'm going to put this as test loss as the title and i need to change the index so this will be index 1 index 2 index 3 index 4. let's see what this looks like do we get the test loss beautiful that's what we get however we notice that model one is probably overfitting at this stage so maybe the data augmentation wasn't the best change to make to our model recall that even if you make a change to your model such as preventing overfitting or underfitting it won't always guarantee that the change takes your model's evaluation metrics in the right direction ideally loss is going from top left to bottom right over time so looks like model 0 is winning out here at the moment on the loss front so now let's plot the accuracy for both training and test so i'm going to change this to train i'm going to put this as accuracy and this is going to be index number 3 on the plot and do we save it as yeah just ack wonderful so i'm going to option click here on my mac this is going to be train and this is going to be accuracy here and then i'll change this one to accuracy and then i'm going to change this to accuracy and this is going to be plot number 4 two rows two columns index number four and i'm going to option click here to have two cursors test ack and then i'll change this to test ack and i'm gonna get rid of the legend here takes a little bit to plot because we're doing four graphs in one hit wonderful so that's comparing our models but do you see how we could potentially functionalize this to plot however many model results that we have but if we had say another five models we did another five experiments which is actually not too many experiments on a problem you might find that sometimes you do over a dozen experiments for a single modeling problem maybe even more these graphs can get pretty outlandish with all the lines going through so that's again what tools like tensorboard weights and biases and ml flow will help with but if we have a look at the accuracy it seems that both of our models are heading in the right direction we want to go from the bottom left up in the case of accuracy but the test accuracy that's training oh excuse me is this not training accuracy i messed up that did you catch that one so training accuracy we're heading in the right direction but it looks like model one is yeah still overfitting so the results we're getting on the training data set aren't coming over to the testing data set and that's what we really want our models to shine is on the test data set so metrics on the training data set are good but ideally we want our models to perform well on the test data set data it hasn't seen before so that's just something to keep in mind whenever you do a series of modeling experiments it's always good to not only evaluate them individually evaluate them against each other so that way you can go back through your experiments see what worked and what didn't if you were to ask me what i would do for both of these models i would probably train them for longer and maybe add some more hidden units to each of the layers and see where the results go from there so give that a shot in the next video let's see how we can use our trained models to make a prediction on our own custom image of food so yes we used a custom data set of pizza steak and sushi images but what if we had our own what if we finished this model training and we decided you know what this is a good enough model and then we deployed it to an app like nutrify dot app which is a food recognition app that i'm personally working on then we wanted to upload an image and have it be classified by our pytorch model so let's give that a shot see how we can use our trained model to predict on an image that's not in our training data and not in our testing data i'll see you in the next video welcome back in the last video we compared our modeling experiments now we're going to move on to one of the most exciting parts of deep learning and that is making a prediction on a custom image so although we've trained a model on custom data how do you make a prediction on a sample slash image in our case that's not in either the training or testing data set so let's say you are building a food recognition app such as nutrify take a photo of food and learn about it you wanted to use computer vision to essentially turn foods into qr codes so i'll just show you the workflow here if we were to upload this image of my dad giving two thumbs up for a delicious pizza and what is nutrified predicted as pizza beautiful so macronutrients then you get some nutrition information and then the time taken so we could replicate a similar process to this using our trained pie torch model albeit it's not going to be too great of results or performance because we've seen that we could improve our models based on the accuracy here and based on the loss and whatnot but let's just see what it's like the workflow so the first thing we're going to do is get a custom image now we could upload one here such as clicking the upload button in google colab choosing an image and then importing it like that but i'm going to do so programmatically as you've seen before so let's write some code in this video to download a custom image i'm going to do so using requests and like all good cooking shows i've prepared a custom image for us so custom image path but again you could use this this process that we're going to go through with any of your own images of pizza steak or sushi and if you wanted to train your own model on another set of custom data the workflow will be quite similar so i'm going to download a photo called pizza dad which is my dad two big thumbs up and so i'm going to download it from github so this image is on the course github and let's write some code to download the image if it doesn't already exist in our colab instance so if you wanted to upload a single image you could click with this button just be aware that like all of our other data it's going to disappear if collab disconnects so that's why i like to write code so we don't have to re-upload it every time so if not custom image path dot is file let's um open a request here or open a file i'm going to open up the custom image path with write binary permissions as f short for file and then when downloading this is because our image is stored on github when downloading an image or when downloading from github in general you typically want the raw link need to use the raw file link so let's write a request here equals request.get so if we go to the pi torch deep learning repo then if we go into i believe it might be extras not in extras it's going to be in images that would make a lot more sense wouldn't it daniel let's get o4 pizza dad so if we have a look this is pie torch deep learning images oh for pizza dad there's a big version of the image there and then if we click download it's just going to give us the raw link yeah there we go so that's the image hey dad how you doing is that pizza delicious it looks like it let's see if our model can get this right what do you think will it so of course we want our model to predict pizza for this image because it's got a pizza in it so custom image path we're going to download that i've just put in the raw url above so notice the raw github user content that's from the course github then i'm going to go f.right so file write the request content so the content from the request in other words the raw file from github here similar workflow for if you're getting another image from somewhere else on the internet and else if it is already downloaded let's just not download it so print f custom image path already exists skipping download and let's see if this works or run the code so downloading data04pizzad.jpg and if we go into here we refresh there we go beautiful so our data or our custom image sorry is now in our data folder so if we click on this this is inside google colab now beautiful we got a big nice big image there and there's a nice big pizza there so we're going to be writing some code over the next few videos to do the exact same process as what we've been doing to import our custom data set for our custom image what do we still have to do we still have to turn it into tenses and then we have to pass it through our model so let's see what that looks like over the next few videos we are up to one of the most exciting parts of building deep learning models and that is predicting on custom data in our case a custom image of a photo of my dad eating pizza so of course we're training a computer vision model on here on pizza steak and sushi so hopefully the ideal result for our model to predict on this image will be pizza so let's keep going let's figure out how we can get our image our custom image our singular image into tensorform loading in a custom image with pytorch creating another section here so i'm just going to write down here we have to make sure our custom image is in the same format as the data our model was trained on so namely that was in tensorform with data type torch float32 then of shape 64 by 64 by three so we might need to change the shape of our image and then we need to make sure that it's on the right device command mm beautiful so let's see what this looks like hey so if i'm going to import torch vision now the package you use to load your data will depend on the domain you're in so let's open up the torch vision documentation we can go to models that's okay so if we're working with text you might want to look in here for some input and output functions so some loading functions torch audio same thing torch vision is what we're working with let's click into torch vision now we want to look into reading and writing images and videos because we want to read in an image right we've got a custom image we want to read it in so this is part of your extracurricular by the way to go through these for at least 10 minutes each so spend an hour if you're going through torch vision you could do the same across these other ones it will just really help you familiarize yourself with all the functions of pytorch domain libraries so we want to look here's some options for video we're not working with video here's some options for images now what do we want to do we want to read in an image so we've got a few things here decode image oh i've skipped over one we can write a jpeg if we wanted to we can encode a png let's let's jump into this one read image what does it do reads a jpeg or png into a three-dimensional rgb or grayscale tensor that is what we want and then optionally converts the image to the desired format the values of the output tensor are you int 8 okay beautiful so let's see what this looks like hey mode the read mode used optionally for converting the image let's let's see what we can do with this i'm going to copy this in so we'll write this down we can read an image into pi torch using we'll go with that so let's see what this looks like in practice read in custom image i can't explain to you how much i love using deep learning models to predict on custom data so custom image we're going to call it uiint8 because as we read from the documentation here it reads it in uint8 format so let's have a look at what that looks like rather than just talking about it dot torchvision.io read image what's our target image path well we've got custom image path up here this is why i like to do things programmatically so if our collab notebook reset we could just run this cell again get our custom image and then we've got it here so custom image uiint 8. let's see what this looks like oh what did we get wrong unable to cast python instance oh does it need to be a string expected a value type of string or but found posix path so this the path needs to be a string okay if we have a look at our custom image path what did we get wrong oh we've got a posix path so let's convert this custom image path into a string and see what happens oh look at that that's our image in integer form i wonder if this is plottable let's go plt dot m show custom image u int 8 maybe we'll get a dimensionality problem here invalid shape okay let's sound permuted permute and we'll go one two zero is this going to plot it's a fairly big image there we go two thumbs up look at us so that is the power of torch vision.io io stands for input output we were just able to read in our custom image now how about we get some metadata about this let's go we'll print it up here actually i'll keep that there because that's fun to plot it let's find the shape of our data the data type and yeah we've got it in tensor format but it's uint 8 right now so we might have to convert that to float 32 we want to find out its shape and we need to make sure that if we're predicting on a custom image the data that we're predicting on the custom image needs to be on the same device as our model so let's print out some info print let's go custom image tensor and this is going to be a new line and then we will go custom image uiint 8 wonderful and then let's go custom image shape we will get the shape parameter custom image shape or attribute sorry and then we also want to know the data type custom image data type but we have a kind of an inkling because the documentation said it would be uint 8 u int 8 and we'll go d type let's have a look what do we have so there's our image tensor and it's quite a big image so custom image shape so what was our model trained on our model was trained on images of 64 by 64. so this image encodes a lot more information than what our model was trained on so we're going to have to change that shape to pass it through our model and then we've got a image data type here or tensor data type of torch uint eight so maybe that's going to be some errors for us later on so if you want to go ahead and see if you can resize this tensor to 6464 using a torch transform or torch vision transform i'd encourage you to try that out and if you know how to change a torch tensor from uint 8 to torch float 32 give that a shot as well so let's try and make a prediction on our image in the next video i'll see you there in the last video we loaded in our own custom image and got two big thumbs up from my dad and we turned it into a tensor so we've got a custom image tensor here it's quite big though and we looked at a few things of what we have to do before we pass it through our model so we need to make sure it's in the data type torch float32 shape 64643 and on the right device so let's make another section here we'll go 11.2 and we'll call it making a prediction on a custom image with a pie torch model with a trained pie torch model and albeit our models aren't quite the level we'd like them at yet i think it's important just to see what it's like to make a prediction end to end on some custom data because that's the fun part right so try to make a prediction on an image now i want to just highlight something about the importance of different data types and shapes and whatnot and devices three of the biggest errors in deep learning in let's see what happens if we try to predict on uiint8 format so we'll go model one dot eval and with torch dot inference mode let's make a prediction we'll pass it through our model one we could use model zero if we wanted to here they're both performing pretty poorly anyway let's send it to the device and see what happens oh no what did we get wrong here oh runtime error input type ah so we've got uint 8 so this is one of our first errors that we talked about we need to make sure that our custom data is of the same data type that our model was originally trained on so we've got torch cuda float tensor so we've got an issue here we've got a uint 8 image data or image tensor trying to be predicted on by a model with its data type of torch cuda float tensor so let's try fix this by loading the custom image and convert to torch dot float 32 so one of the ways we can do this is we'll just recreate the custom image tensor and i'm going to use torch vision.io dot read image we don't have to fully reload our image but i'm going to do it anyway for completeness and a little bit of practice and then i'm going to set the type here with the type method to torch float 32 and then let's just see what happens we'll go custom image let's see what this looks like i wonder if our model will work on this let's just try again we'll bring this up copy this down to make a prediction and custom image dot to device our image is in torch float 32 now let's see what happens oh we get an issue oh my goodness that's a big matrix now i have a feeling that that might be because our image our custom image is of a shape that's far too large custom image dot shape what do we get oh my gosh four thousand and three thousand twenty four and do you notice as well that our values here are between zero and one whereas our previous images do we have an image there we go that our model was trained on were between zero and one so how could we get these values to be between zero and one well one of the ways to do so is by dividing by two five five now why would we divide by two five five well because that's a standard image format is to store the image tensor values in values from 0 to 255 for red green and blue color channels so if we want to scale them so this is what i meant by 0 to 255 if we wanted to scale these values to be between 0 and 1 we can divide them by 255 because that is the maximum value that there can be so let's see what happens if we do that okay we get out image values between zero and one can we plot this image so plt.mshow let's plot our custom image we've got to permute it so it works nicely with matplotlib what do we get here beautiful we get the same image right but it's still quite big look at that we've got a pixel height of or image height of almost 4000 pixels and a width of over 3000 pixels so we need to do some adjustments further on so let's keep going we've got custom image to device we've got an error here so this is a shape error so what can we do to transform our image shape and you might have already tried this well let's create a transform pipeline to transform our image shape so create transform pipeline or composition to resize the image because remember what are we trying to do we're trying to get our model to predict on the same type of data it was trained on so let's go custom image transform is transforms dot compose and we're just going to since our image is already of a tensor let's do transforms.resize and we'll set the size to the same shape that our model was trained on or the same size that is so let's go from torch vision we don't have to rewrite this it's already imported but i just want to highlight that we're using the transforms package we'll run that there we go we've got a transform pipeline now let's see what happens when we transform our target image transform target image what happens custom image transformed i love printing the inputs and outputs of our different pipelines here so let's pass our custom image that we've just imported so custom image transform our custom image is recall of shape quite large we're going to pass it through our transformation pipeline and let's print out the shapes let's go original shape and then we'll go custom image dot shape and then we'll go print transformed shape is going to be custom image underscore transformed dot shape let's see the transformation oh would you look at that how good we've gone from quite a large image to a transformed image here so it's going to be squished and squashed a little so that's what happens let's see what happens when we plot our transformed image we've gone from 4 000 pixels on the height to 64. and we've gone from 3000 pixels on the height to 64. so this is what our model is going to see let's go custom image transformed and we're going to permute it to be one two zero okay so quite pixelated do you see how this might affect the accuracy of our model because we've gone from custom image is this going to oh yeah we need to plot.m show we've gone from this high definition image to an image that's a far lower quality here and i can kind of see myself that this is still a pizza but i know that it's a pizza so just keep this in mind going forward is that another way that we could potentially improve our model's performance if we increased the size of the training image data so instead of 6464 we might want to upgrade our model's capability to deal with images that are of two two four two two four so if we have a look at what this looks like two two four two two four wow that looks a lot better than 6464. so that's something that you might want to try out later on but we're going to stick in line with the cnn explainer model how about we try to make another prediction so since we've transformed our image to be the same size as the data our model was trained on so with torch inference mode let's go custom image pred equals model 1 on custom image underscore transformed does it work now oh my goodness still not working expected all tensors on the same device of course that's what we forgot here let's go to device or actually let's leave that error there and we'll just copy this code down here and let's put this custom image transform back on the right device and see if we finally get a prediction to happen with our model oh we still get an error oh my goodness what's going on here oh we need to add a batch size to it so i'm just going to write up here this will error no batch size and this will error image not on right device and then let's try again we need to add a batch size to our image so if we look at custom image transformed dot shape recall that our images that pass through our model had a batch dimension so this is another place where we get shape mismatch issues is if our model because what's going on in neural network is a lot of tensor manipulation if the dimensions don't line up we want to perform matrix multiplication and the rules if we don't play to the rules the matrix multiplication will fail so let's fix this by adding a batch dimension so we can do this by going a custom image transformed let's unsqueeze it on the first dimension and then check the shape there we go we add a single batch so that's what we want to do when we make a prediction on a single custom image we want to pass it to our model as an image or a batch of one sample so let's finally see if this will work let's just not comment what we'll do this well maybe we'll try anyway this should work added a batch size so do you see the steps we've been through so far i'm just going to unsqueeze this unsqueeze on the zeroth dimension to add a batch size oh it didn't error oh my goodness it didn't error have a look at that yes that's what we want we get a prediction logit because the raw outputs of our model we get a logit value for each of our custom classes so this could be pizza this could be steak and this could be sushi depending on the order of our classes let's just have a look class to idx um did we not get that class names beautiful so pizza steak sushi we've still got a ways to go to convert this into that but i just want to highlight what we've done so note to make a prediction on a custom image we had to this is something you'll have to keep in mind for almost all of your custom data it needs to be formatted in the same way that your model was trained on so we had to load the image and turn it into a tensor we had to make sure the image was the same data type as the model so that was torch float 32 and then we had to make sure the image was the same shape as the data the model was trained on which was 64 64 3 with a batch size so that was 1 3 64 64. and excuse me this should actually be the other way around this should be color channels first because we're dealing with pi torch here 64. and then finally we had to make sure the image was on the same device as our model so they are three of the big ones that we've talked about so much same data type or data type mismatch will result in a bunch of issues shape mismatch will result in a bunch of issues and device mitch match will also result in a bunch of issues if you want these to be highlighted they are in the learn pie torch.io resource we have putting things together where do we have it oh yeah no it's in the main takeaway section sorry predicting on your own custom data with a trained model is possible as long as you format the data into a similar format to what the model was trained on so make sure you take care of the three big pie torch and deep learning errors wrong data types wrong data shapes and wrong devices regardless of whether that's images or audio or text these three will follow you around so just keep them in mind but now we've got some code to predict on custom images but it's kind of all over the place like we've got about 10 coding cells here just to make a prediction on a custom image how about we functionize this and see if it works on our pizza dad image i'll see you in the next video welcome back we're now well on our way to making custom predictions on our own custom image data let's keep pushing forward in the last video we finished off getting some raw model logits so the raw outputs from our model now let's see how we can convert these logits into prediction labels let's write some code so convert logits to prediction labels or let's go convert logits let's first con convert them to prediction probabilities probabilities so how do we do that let's go custom image pred probs equals torch dot softmax to convert our custom image pred across the first dimension so the first dimension of this tensor will be the inner brackets of course so just this little section here let's see what these look like this will be prediction probabilities wonderful so you'll notice that these are quite spread out now this is not ideal ideally we'd like our model to assign a fairly large prediction probability to the target class the right target class that is however since our model when we trained it isn't actually performing that all that well the prediction probabilities are quite spread out across all of the classes but nonetheless we're just highlighting what it's like to predict on custom data so now let's convert the prediction probabilities to prediction labels now you'll notice that we used softmax because why we are working with multi-class classification data and so we can get the custom image thread labels or the integers by taking the arg max of the prediction probabilities custom image print props across the first dimension as well so let's go custom image thread labels let's see what they look like zero so the index here with the highest value is index number zero and you'll notice that it's still on the cuda device so what would happen if we tried to index on our class names with the custom image pred labels or maybe that doesn't need to be a plural oh there we go we get pizza but you might also have to change this to the cpu later on otherwise you might run into some errors so just be aware of that so you notice how we just put it to the cpu so we get pizza we've got a correct prediction but this is as good as guessing in my opinion because these are kind of spread out ideally this value would be higher maybe something like 0.8 or above for our pizza dad image but nonetheless our model is getting two thumbs up even on this 64 by 64 image but that's a lot of code that we've written let's functionize it so we can just pass in a file path and get a custom prediction from it so putting custom image prediction together let's go building a function so we want the ideal outcome is let's plot our image as well ideal outcome is a function where we plot or where we pass an image path to and have our model predict on that image and plot the image plus the prediction so this is our ideal outcome and i think i'm going to issue this as a challenge so give that a go put all of our code above together and you'll just have to import the image you'll have to process it and whatnot i know i said we were going to build a function in this video but we're going to save that to the next video i'd like you to give that a go so start from way back up here import the image via torch vision i o read image format it using what we've done change the data type change the shape change the device and then plot the image with its prediction as the title so give that a go and we'll do it together in the next video how'd you go i just realized i had a typo in uh the previous cell but that's all right did you give it a shot did you put together the custom image prediction in a function format i'd love it if you did but if not that's okay let's keep going let's see what that might look like and there are many different ways that you could do this but here's one of the ways that i've thought of so we want a function that's going to pred and plot a target image we want it to take in a torch model and so that's going to be ideally a trained model we want it to also take in an image path which will be of a string it can take in a class names list so that we can index it and get the prediction label in string format so let's put this as a list of strings and by default this can equal none just in case we just wanted the prediction it wants to take in a transform so that we can pass it in some form of transform to transform the image and then it's going to take in a device which will be by default the target device so let's write a little doc string here makes a prediction on a target image with a trained model and plots the image and prediction beautiful now what do we have to do first let's uh load in the image load in the image just like we did before with torch vision so target image equals torch vision dot io dot read image and we'll go string on the image path which will be the image path here and we convert it to a string just in case it doesn't get passed in as a string and then let's change it into type torch float32 because we want to make sure that our custom image or our custom data is in the same type as what we trained our model on so now let's divide the image pixel values by 255 to get them between zero or to get them between zero one as a range so we can just do this by target image equals target image divided by two five five and we could also just do this in one step up here two five five but i've just put it out there just to let you know that hey read image imports image data as between 0 and 255 so our model prefers numbers between 0 and 1 so let's just scale it there now we want to transform our data if necessary in our case it is but it won't always be so we want this function to be pretty generic pred and plot image so if the transform exists let's set the target image to the transform or we'll pass it through the transform that is wonderful and the transform we're going to get from here now what's left to do well let's make sure the model is on the target device it might be by default but if we're passing in a device parameter we might as well make sure the model is there too and now we can make a prediction so let's turn on eval inference mode and make a prediction with our model so model we call a vowel mode and then with torch dot inference mode because we're making a prediction we want to turn our model into inference mode or put it in inference mode context let's add an extra dimension to the image let's go target image we could do this step above actually but we're just going to do it here i'm kind of remembering things on the fly here of what we need to do we're adding a this is let's write this down this is the batch dimension eg our model will predict on batches of one x image so we're just unsqueezing it to add an extra dimension at the zeroth dimension space just like we did in a previous video now let's make a prediction on the image with an extra dimension otherwise if we don't have that extra dimension we saw that we get a shape issue so right down here target image pred and remember this is going to be the raw model outputs raw logit outputs we'll go target image pred and yeah i believe that's all we need for the prediction oh wait there was one more thing two device me too also make sure the target image is on the right device beautiful so a fair few steps here but nothing we can't handle all we're really doing is replicating what we've done for batches of images but we want to make sure that if someone passed any image to our print and plot image function that we've got functionality in here to handle that image and do we get this oh we want just target image to device did you catch that error so let's keep going now let's convert the logits our models raw logits let's convert those to prediction probabilities this is so exciting we're getting so close to making a function to predict on custom data so we'll set this to target image pred probs which is going to be torch dot softmax and we will pass in the target image pred here we want to get the softmax of the first dimension now let's convert our prediction probabilities which is what we get in the line above we want to convert those to prediction labels so let's get the target image pred labels equals torch dot arg max we want to get the arg max of or in other words the index which is the maximum value from the pred probes of the first dimension as well now what should we return here well we don't really need to return anything we want to create a plot so let's plot the image alongside the prediction and prediction probability beautiful so plot.mshow what are we going to pass in here we're going to pass in here our target image now we have to squeeze this i believe because we've added an extra dimension up here so we'll squeeze it to remove that batch size and then we still have to permute it because matplotlib likes images in the format color channels last one two zero so remove batch dimension and rearrange shape to be hc hwc that is color channels last now if the class names parameter exists so we've passed in a list of class names this function is really just replicating everything we've done in the past 10 cells by the way so right back up here we're replicating all of this stuff in one function so pretty large function but once we've written it we can pass in our images as much as we like so if class names exist let's set the title to uh showcase that class name so the pred is going to be class names let's um index on that pred image or target image thread label and this is where we'll have to put it to the cpu because if we're using a title with matplotlib matplotlib cannot handle things that are on the gpu this is why we have to put it to the cpu and then i believe that should be enough for that let's add a little line in here so that we can have it oh i've missed something an outside bracket there wonderful let's add the prediction probability because that's always fun to see so we want target image pred probs and we want to get the maximum pred prob from that and we'll also put that on the cpu and i think we might get this three decimal places now this is saying i'll print labels we don't need that we need just non-plural beautiful now if the class names doesn't exist let's just set the title equal to f f string we'll go pred target image pred label is google colab still telling me this is wrong target image print label oh no we've still got the same thing it just hasn't caught up with me yet i'm coding a bit fast here and then we'll pass in the prob which will be just the same as above we could even copy this in beautiful and let's now set the title to the title and we will turn the axis off plt axes false fair bit of code there but this is going to be a super exciting moment let's see what this looks like when we pass it in a target image and a target model some class names and a transform are you ready we've got our transform ready by the way it's back up here custom image transform it's just going to resize our image so let's see oh this file was updated remotely or in another tab sometimes this happens and usually google collab sorts itself out but that's all right doesn't affect our code for now pred on our custom image are you ready save failed would you like to override yes i would so you might see that in google colab usually it fixes itself there we go save and plot image i was going to say google collab don't found me now we're about to predict on our own custom data using a model trained on our own custom data image path let's pass in custom image path which is going to be the path to our pizza dad image let's go class names equals class names which is pizza steak and sushi we'll pass in our transform to convert our image to the right shape and size custom image transform and then finally the target device is going to be device are you ready let's make a prediction on custom data one of my favorite things one of the most fun things to do when building deep learning models three two one how did it go oh no what did we get wrong cpu okay such a so close but yet so far uh has no attribute cpu oh maybe we need to put this to cpu that's where i got the square bracket wrong so that's what we needed to change we needed to because this is going to be potentially on the gpu tag image print label we need to put it on the cpu we need to do that why because this is going to be the title of our matplotlib plot and matplotlib doesn't interface too well with data on a gpu let's try it again three two one running oh look at that prediction on a custom image and it gets it right two thumbs up i didn't plan this our model is performing actually quite poorly so this is as good as a guess to me you might want to try this on your own image and in fact if you do please share it with me i would love to see it but you could potentially try this with another model see what happens stake okay there we go so even though model one performs uh worse quantitatively it performs better qualitatively so that's the power of visualize visualize visualize and if we use model 0 also which isn't performing too well it gets it wrong with a prediction probability of 0.368 which isn't too high either so we've talked about a couple of different ways to improve our models now we've even got a way to make predictions on our own custom images so give that a shot i'd love to see your custom predictions upload an image here if you want or download it into google colab using code that we've used before but we've come a fairly long way i feel like we've covered enough for custom data sets let's summarize what we've covered in the next video and uh i've got a bunch of exercises and extra curriculum for you so this is exciting stuff i'll see you in the next video in the last video we did the very exciting thing of making a prediction on our own custom image although it's quite pixelated and although our model's performance quantitatively didn't turn out to be too good qualitatively it happened to work out but of course there are a fair few ways that we could improve our model's performance but the main takeaway here is that we had to do a bunch of pre-processing to make sure our custom image was in the same format as what our model expected and this is quite a lot of what i do behind the scenes for nutrify if you upload an image here it gets pre-processed in a similar way to go through a image classification model to output a label like this so let's get out of this to summarize i've got a colorful slide here but we've already covered this predicting on custom data these are three things to make sure of regardless of whether you're using images text or audio make sure your data is in the right data type in our case it was torch float32 make sure your data is on the same device as the model so we had to put our custom image to the gpu which was where our model also lived and then we had to make sure our data was in the correct shape so the original shape was 64 64 3. actually this should be reversed because it was color channels first but the same principle remains here we had to add a batch dimension and rearrange if we needed so in our case we used images of this shape batches first color channels first height width but depending on your problem will depend on your shape depending on the device you're using will depend on where your data and your model lives and depending on the data type you're using will depend on what you're using for torch float 32 or something else so let's uh summarize if we go here main takeaways you can read through these but some of the big ones are pytorch has many built-in functions to deal with all kinds of data from vision to text to audio to recommendation systems so if we look at the pytorch docs you're going to become very familiar with these over time we've got torch audio data torch text torch vision is what we practiced with and we've got a whole bunch of things here for transforming and augmenting images data sets utilities operators and torch data is currently in beta but this is just something to be aware of later on so it's a prototype library right now but by the time you watch this it might be available but it's another way of loading data so just be aware of this for later on and if we come back to up here if pytorch built-in data loading functions don't suit your requirements you can write your own custom data set classes by subclassing torch.utils.data.dataset and we saw that way back up here in option number two option two here we go loading image data with a custom data set we wrote plenty of code to do that and then a lot of machine learning is dealing with the balance between overfitting and under fitting we've got a whole section in the book here to check out what an ideal loss curve should look like and how to deal with overfitting how to deal with underfitting it's it is a fine line so much of the research and machine learning is actually dedicated towards this balance and then three big things for being aware of when you're predicting on your own custom data wrong data types wrong data shapes and wrong devices this will follow you around as i said and we saw that in practice to get our own custom image ready for a trained model now we have some exercises here if you'd like to link to it you can go to learn pie torch dot io section number four exercises and of course extra curriculum a lot of the things i've mentioned throughout the course that would be a good resource to check out contained in here but the exercises this is this is your time to shine your time to practice let's go back to this notebook scroll right down to the bottom look how much code we've written goodness me exercises for all exercises and extra curriculum see here turn that into markdown wonderful and so if we go in here you've got a couple of resources there's an exercise template notebook for number four and example solutions for notebook number four which is what we're working on now so of course i'd encourage you to go through the pie torch custom data sets exercises template first try to fill out all of the code here on your own so we've got some questions here we've got some dummy code we've got some comments so give that a go go through this use this book resource to reference use all the code we've written use the documentation whatever you want but try to go through this on your own and then if you get stuck somewhere you can look at an example solution that i created which is here pi torch custom data sets exercise solutions and just be aware that this is just one way of doing things it's not necessarily the best it's just a way to reference what you're writing to what i would do and there's actually now live walkthroughs of the solutions errors and all on youtube so if you go to this video i'm just going to mute so this is me live streaming the whole thing writing a bunch of pytorch code if you just keep going through all of that you'll see me writing all of the solutions running into errors trying different things etc etc but that's on youtube you can check that out on your own time but i feel like we've covered enough exercises oh by the way this is in the extras exercises tab of the pi torch deep learning repo so extras exercises and solutions are contained in there far out we've covered a lot look at all that so that has been pytorch custom data sets i will see you in the next section holy smokes that was a lot of pine torch code but if you're still hungry for more there is five more chapters available at learnpinetorch.org which cover transfer learning my favorite topic pytorch model experiment tracking pytorch paper replicating and pi torch model deployment how do you get your model into the hands of others and if you like to learn in this video style the videos for those chapters are available at zero2mastery.io but otherwise happy machine learning and i'll see you next time