PyTorch Beginner Guide

hi this video will be an introduction to pytor the most popular deep learning framework for python that meta Facebook created seven years ago and it's behind all open a models like gpt3 GPT 4 Del also the Tesla's autopilot for example and table diffusion to name a few of the services but it's behind many of the AI models that you are using every day also there is another big player in the Deep learning Frameworks Arena which is the Google's tensor flow which was released in 2015 so 2 years before python and it was more popular before but we can see here in Google Trends that year by year pyos has been the most popular framework apparently so in this tutorial we'll learn the basics of pyto so nothing too much complex but the first steps in just 10 cells of kagle notebooks so kagle notebooks if you don't know is like Jupiter notebooks or we collab from kagle kagle is the most popular data science platform which was aced by Google like 7 years ago and here we can find not only this collabs but also competitions data sets so free data here also models like pre trining models the code notebooks that we will be using today so here you can create a new notebook and also the discussions that are like forums where people ask questions or share experience and also these free courses that are really nice to start learning from python to more Advan topics like machine learning visualization and other kinds of data and AI topics so today as mentioned we'll be using this public public notebook that I created for this video so you can copy it and run the CES from there or you can create your own one and copy every cell from scratch okay so today we will see how to install an import torch how to create tensors and operate with them how to do automatic differentiation data sets and data loaders how to create neural networks and how to train them with the L function optimizers and creating a training loop with them also how to do all of this in GPU because these kagle notebooks allow us to use the GPU for free so we have 30 hours per week of GPU we'll see how to do this and finally we'll see how to save load and use this train P models so let's go first of all how to install torch so luckily in this K notebooks as in go collab it's already installed so we don't have to install with Pip these libraries but if you want to run torch in your personal computer or in your server you can come to this link of here and you'll have here the python website where you can choose the version your operating system for each windows but you can choose another one peip also python it's available for C++ and Java but I think that 90 plus% of the people is using it with python also if you have GPU you can choose the Cuda version very important you don't have to install Cuda before in your computer if you have installed before it's okay but this torch installation will be installing Cuda as well and if you don't have a GPU you can install it also for CPU with this command up here so it's also available for CPU but the main advantages of pytorch are to deal with GPU so that's a better option if you have it like an media GPU so here after selecting all of these options you can copy this command of here and run it in your terminal it can be pip3 or maybe only pip for your case Okay so whenever we have installed torch it's already installed here in this kle notebook as mentioned we can run this cell and maybe the first time that you run it it will be starting the notebook so now I already started before but maybe it can take some seconds even a minute to start the notebook session so don't worry and here now we have imported torch so we can now run torch commands Okay so next step is to create tensors and tensors are very similar as NPI arrays but they can be used in the GPU by default they are created and operated in the CPU but we'll see that there are some options to send these tensors to the GPU that it's very useful whenever we have lots and lots of computation let's say so very large tensors that have to be operated together this will be faster done in the CPU for now we are doing it in the CPU and this is how we create the torch tensors for example sending here the First Dimension and the second dimension like NP array so if you don't know about numai it's better to do first U napai tutorial so learn a little bit the basics of NPI but otherwise you can follow along also here in this notebook cells you can copy this notebook as mentioned and run every cell but you can also create a new cells and play around with these things that I'm telling you so you can check here this tensor that we have created and for example one thing that we can do with tensors is slicing them or selecting parts of it like in nonp Array so with this zero we are taking the first element of our tensor with a one we will take the second element of the tensor so the second array of here and also we can take from the second array the first element which is the this three of here so we can do many of the operation that we can do with nonp so the next step is tensor operations we can do very similar operations as we can do with nonp arrays so here we are adding the two tensors remember that our tensor was this array of here so it's 1 2 3 4 so here we are adding every element wise let's say 1 + one it's this two 2 + two is this four and so on here we are doing similarly but with multiplication so this tensor mole that we are printing here is like uh 1 by one 2 by two okay 1 by one 2 by two and here we have the matrix multiplication with this m Mo that is giving this results of here and there are many other operations so you can Google any specific operation the next step is automatic differentiation and this is a key aspect in pythons because this allows the neural networks to improve so to learn how to reduce the error after every training EPO so we can do this by first of all enabling the required grat by enabling the gradient in every tensor so this will be building a graph of the operations between the different tensors then here we can declare y as x to the power of two and here we are Computing the derivative of y respect to the X and this is a very simple example to understand this concept but in reality in naral networks this is done across many many operations and many numbers let's say this is when it makes sense and also most of the times we don't do these lowlevel operations we use some higher level operations that are doing this in behind but it's very important to know these basic concepts to understand what we are doing when we are using like more advanced functions that does this in the backend and here if we want to see our gradient so X is our variable this one well this tensor that has only a single value but this x gr is storing the gradient of our operation so the gradient in this case is four so it's like the derivative of our operation and like this is how the neural networks learn how to improve the parameter internal parameters of the neural network step after step so the fifth cell of our tutorial is data sets and data loaders those two are not like essential so we can train models with Pythons without using this two but are very helpful because data sets allow us to declare our data so our data and also labels for example our images or our time series data it allows us to declare it and also do some Transformations if needed and also declare what are the labels in every case and then data loaders allow us to very efficiently generate every batch of data and also to load it to the GPU and do other Advanced functionalities that otherwise we should do by ourself like shuffling the data and other Advanced methods so data loaders already arrange this in a very efficient way so to train a model we need a data set of data and also we need to pass this data to the model and this data set and data loers classes that torch makes us available in here like torch utils data allow us to reduce the buer plate and do these operations more efficiently so for example imagine that we have here this very very small and basic data set that has like these three features this will be the inputs and also this target so the first element of every one of the features and the target is the first example this is the second example the third example you can imagine this for example I don't know imagine that this is the temperature of one city of another city and another city so this could be the three temperatures from three cities and this is the four temperature that we want to predict from the fourth city let's say it could be many other things but that's the first example that came to my mind so this would be the training set so every one of these features in order to predict the target so if this were our data set here like a dictionary in this case so first of all we create a data frame for convenience so we pass from this dictionary to this data frame DF with pandas that we have imported here and now we can create our data set class like this so we we create a class we can call it who we want in this case custom data set but it could be like City temperature data set or whatever name that you want and very importantly it has to inherit from the data set class from py that we have imported here it has to inherit from it and first of all we have to init our data set so we put here the self and also the data frame that we just created here we will tell that this is the data frame what are the features and what are the targets this is one way of doing it but in this example of here this will be a very convenient way of doing it so you can see here that for example in features we are dropping the target class and then doing like this and getting the values so this will be now an ire array of only the features and here we'll have only the targets here it's very important to Define how we can measure how many examples we have in our data set and in this case is the length of the data frame so this determines how many instances we have in our data set and also we have to create a get item method so these are the three basic methods we can create others more advanced ones but we always have to tell the data set class what is our data how long is our data set how many examples we have and how to get every one of them like this so we will tell that features is sell features idx okay very important how we access every one of our examples so with the index of this nay array so the index 0 1 2 3 and it will know how many indexes are there from this length function that we defined here so it will go from zero to in this case five because there are six elements so the fifth one is the last one and we will transform in here in this case our NY array to tensor so every example here will be transformed to a tensor torch array taking the data in this case as an npay array and also the targets so the targets are the self targets as defined here and we are telling also that the features idx so the feur zero goes with the target zero but important the features one goes with the targets one okay featur zero are these three and Target zero is this one up here and we will be returning for every example the features and the Target that now are T sensor and this will be automatically taken by the data loader and pass to the model so we don't have to deal with many of these transfer function that this classes will be implementing for us so here we Define how is this data set and here we are inating it so here we are creating this data set using the custom data set class that we Define here and we have to pass the input this data frame as we put here that the input of our class is the data frame so here this DF of here is this data frame that will be later used as we Define here now we can see here the length of the data set so let's run this and the length is six samples okay the length of the data set and also here we have the data loader so the data loader is this class that we have imported here and here we don't need to Define our custom data loader class we just can create the data loader calling this class that we imported telling what is the data set what is the bat size and also Shuffle true or false these are two of the most common parameters there are many other ones but these are the most common parameters that you want to Define and the B size will be telling for every train step of the train Loop in one EPO how many samples at one time will be the model seeing in this case it will be two in this example we could put the entire data set because it's very small and so but if we had like thousands and thousands of images of high quality maybe our GPU memory only can take up to let's say 50 for example so maybe we have 1,000 images and our GPU can take only 50 here we should tell the bat size to be as large of 50 maybe 45 for example and the data loader will take care of sending 45 images at every Loop of the training Loop and shuffle through makes that that every new training will have a different order of the examples this is useful for training for testing we don't do that but then the mod don't overfit as much so here we can try to use our data set and data loader to check if it's working good doing this like data loader eer and now we'll have an eer object that we can take every next example and we'll see here sample X and Sample y so our data loader what it returns is this the features and the target so we have here sample x x is the features and why is the labels and here we can see the examples so if you can see here I'm printing sample bats maybe let's put here another new line character so sample bats sample X this x so this tensor of here is our first two samples of features of inputs so this is the first one feure one feature two fature three uh this is the second sample and here's the level of the first sample and the level of the second sample so this is how our data loader automatically sends a patch of two samples to our model but now here we are only printing it to make sure that we declared everything correctly okay so let's see the next step which is Nal networks and this is how we Define models in pythons and we'll see here one of the most basic neural networks which is a fully connected neural network and we can import this layers of the naral network from torch NN torch nural Network and here we have to create a class for our model you can call it whatever you want simple NN for example or my temperature model or whatever you want to call it and here we have to import nn module because we are inheriting from this class here we have to define the layers first of all here in the init we have to Define every layer that we will be using and in the forward step we will be defining how the information how the data goes through every layer of our neural network in this case we are using the fully connected layer number one which will have two neurons that have three inputs then we'll have another layer that will have two inputs for a single neuron is is a percept also the relo activation function that allows has to have nonlinearity operations so basically uh R function makes that if the signal is negative it returns at zero okay so the input is the xaxis and the output is the y axis so for every negative input the output will be zero and for every positive input the output will be that number or another positive number let's say you can check more about this in Google and YouTube but this allows our model to learn nonlinear correlation in our data so here we are defining them we are what is the layer one layer two and real and here we are telling in the forward step how to connect this okay so we'll have the inputs going first of all here in the fully connected one and this will be the output of the first layer of our Nar Network then we'll have this R here so we put this R between our layer one and layer two and then the layer two will output a single value that will be our Target value because as we have three inputs we have to tell that we have three inputs we have two n in the first layer on one neuron to calculate the output the target of our model so we can instantiate our model like this so telling model equals to our simple NN and here we can print the the model and here we can see the representation of our very simple model okay so to understand our neural network that we just buil our very basic neural network we have three inputs those are not neurons are only inputs then we have a hidden layer here with two neurons with two perceptrons the r is applied after them okay but the representation of it is like this because the r is applied from the outputs of the first layer and then the output of this Rue is passed to the second neuron which its output will be directly the target of our prediction so here every connection of every input to every neuron has implicitly a weight and a bias here that will be parameters that the neural network will be learning at every back preparation process with the automatic differentiation that have seen before so here every connection not only is connecting every input or every step with every Next Step but also has learning parameters and those are represented by matrices so pythons internally uses tensors to represent the weights of these connections also the biases with another tensor and it's adding and multiplying them like this okay the weights are multiplied and the biases are added in every one of these steps and also here with less parameters but also those are learned and with the rum it allows to extract nonlinear signal and here the output of this cell so basically taking these values with this uh weights and biases parameters so the output is directly the Target and what it does we'll see now but in training it looks at what is the pr Target what is the true value what is the label the difference is the error how large is the error how wrong it was and with the automatic differentiation it can learn how to adjust every one of these steps to go closer to the correct answer let's say that's more or less what we do so yeah here we can understand better this fully connected layer which is all of this the fully connected two the radios here after the outputs of the fully connected one and how the target is calculated from the inputs using our model that is all of these of here okay so now you understand better how the inputs travel through the fully connected one here we have the output of the layer we pass the output to the re then we pass the output of the re to the fully connected to and then this is already our final result okay we already run all of this and the next step is the loss function and the optimizer so to train a neural network we need those two and the L function is this error that I was mentioning before so the L function helps us to calculate the error and needs to be differentiable because it will be this function that will be derivating with B propagation to learn how to improve it and the optimizer will be the function that allows to navigate this lost derivation to optimize our model so to go toward ours a lower error so the good thing is that also python provide us many loss functions and many optimizers so we don't have to implement them step by step making it available to run in GPU and to be differentiable and so on so for our case to predict from our inputs and Target we can use the Ms Loss the mean square error loss this will be our Criterion our loss function and our Optimizer will be a very common one the SGD the stochastic GR in descent and we call them like this very simply they have parameters and so on but this is a very simple way of calling them and this is the learning rate that is telling this Optimizer how fast to go towards the direction of the error this is a hyper parameter that for every case needs to be adjusted okay so let's go to the next step which is our training Loop we'll be putting everything together to get our model we'll put it in train mode like here and we'll be passing all our data examples 10 times so 10 EPO that are called and every EPO will have different batches so we'll have 10 EPO every EPO will have all our feates and labels and every batch will have as we mentioned before the batch size is two so every batch will be of two samples with each features and labels we have to create here the optimizers with zero gr so we starting here our optimizers and here we are already passing our inputs through our model getting here the outputs we squeeze it to have in a single array then we pass here our outputs to the Criterion so our loss function here we have the outputs here the labels and it calculating for every output predict from our model that it's starting with random weight so it will be very bad at the first EPO probably so it's calculating how large is the loss how large is the error then we are doing the backward so we are doing the derivation of our internal weights and biases our internal parameters and we are optimizing One Step using this bware so now here we can print the number of our bats and the loss so how large is the error in that step and if we run this okay I forgot to to run these two cells of here the neural network and the loss and Optimizer remember to run every cell and now if we run this training Loop we can see that it's very fast because we have very low data the model is so small but we can see here at every EPO we have one batch so we are printing every batch and at every new EPO we are printing this sign up here so EPO one we have in the B one this loss 14 15 42 EPO two and we see that at every epoch the loss function is getting lower and lower and lower we could do a plot of this and we will see the loss curve how it's improving in this case so it looks good and finally it gets like stable so it doesn't learn further it's a very simple example but with more and more data we could get better results in a real case okay so we have been done all of this so far in the CPU but if we wanted to use the GPU and if we had one as here in our K notebook so first we need to activate it because by default it came with no accelerator so here we can come and select we have two GPU options and one TPU which is a more advanced accelerator that we have available so let's SE here the GPU p00 they are telling that we have 30 hours per week as I told before so turn on the GPU P 100 okay here are telling us how much hours we have spent so far of our quot for this week now we'll need to restart our notebook because when we change the accelerator we need to restart our notebook so let's run again every cell here now it's starting a new session okay if you do shift enter you can run every new cell after the other so let's run them quickly now we have CPU but also GPU but so far everything is running still in the CPU in order to know if we have a GPU available or a CPU we can do it like this doing torch bua is available and now that we have a GPU it's true when we were using the CPU before this would be false so like this we can automatically know if we have a GPU also we have another parameter that is torch Cuda device con and we have only one GPU if we were using like two gpus like here then we will see here two device con and then we will need to select maybe one of them or tell how to deal with them and also we can finally know what's the name of our GPU like this torch Cuda get device name and this is our p00 with 16 GB of V Ram pretty good for ring free okay now that we see that we have torch Cuda available we can tell torch device to be Cuda if torch Cuda is available otherwise use the CPU so now here we have our device and if we create a new torch tensor this one up here a single Dimension tensor for example this could be any tensor after creating this tensor we can send it to the GPU so we are creating it in the CPU as before but we can send it to the GPU like this so our torch tensor dot to and device and if device is GPU now this s will be sent to the GPU and also so our model in the same way we can send it to the GPU like this we instantiate our model that we created before it's the same model and we send it to the GPU like this to device and device is GPU again so now if we run this you see that we are using Cuda we are using the GPU we see that our new tensor that we are printing here new sample our new sample is our input zero input two input three is in the device Cuda now also our model is in our GPU and now if we run like this we use our model with our new features okay so we train our model before and we are sending the features into it so our prediction is this one of here this is our prediction and it's in the Cuda so we can now send it back to the CPU but now we use our train model into the GPU so in a similar way if we want to train our model also in the GPU we could use these two method with Device GPU in here so our training Loop we could send the model to the GPU first and then every input to GPU before sending it to the model and then all the training logic would be happening in the GPU maybe in this case would be slower because sending the data from CPU to GPU takes some time and if the calculations are so simple it's not worth to run them to the GPU but if our data was so large and our model was so complex with many many neurons and many layers with many parameters to multiply and do the back propagation then with a very complex model it makes sense to use the GPU so the last step will be know knowing how to save this model because we have been training it and also using it to predict this new feature in our notebook but now if we restart the notebook we'll lose all of this information especially our train model so if we want to save our train model as a file only the weights of it we can do it like this tor save and then we pass here the model State dick and we put here the name that we want for this file simple an name bth but this could be other extensions and then this will be a file let's run this and here in this output section we can see the output of our notebook so here now we can see our simple NN path that is only 2 kiloby so here we can even download our model so this is our model file and now in the next session if we want to load this model without having to retrain it again we can load it again here as an input of our notebook we can pass this simple nn. paath file and then here we will need to declare our simple NN so we need to Define this um this logic of here so we need to Define model but we don't need to train it again we can instantiate our model here from the fine code and then load the state dick so torch load this simple NN path file that we have here in our outputs and here we'll see model loads successfully so it works now we can pass again this loaded model to the device with is GPU now that we still have the GPU here active and now we can create a new different input so to predict with it these three input values we pass our new sample to device so to GPU again so we have our loaded model to GPU our input to the GPU and with Toral rat because we are now not using rat to improve the model we are only predicting with the model so we have our loaded model our inputs here and this will be the prediction and we can see here the prediction now it was 1.5 in this case so something reasonable and basically this is how we save a train model how we load it this could be a different notebook a different python file to put it in production and how we can predict new features in the GPU or in the CPU using this loaded model so that was the basic introduction to pyth now you can learn more advanc things like compal neural networks or how to deal with more advanced and more complex models and larger data sets but we have seen so far how to install and import how to create tensors how to do basic operations with them how to derivate these tensor operations how to do back propagation of these tensor operations we have seen how to create our data set and data loaders to later when we create neural networks how to pass this data efficiently and to train our models then we have seen how to create the training Loop to put all together and create models that learn after every iteration and also how to do all of this in the GPU and using the kagle 3 gpus finally we have seen how to save a train model and how to load it again in a separate environment being able to pass new inputs so basically putting our model into production to use it after being trained once we can use it now as many times as we want to predict new future samples that come without a label so we are predicting those future cases so you can find again the link of this notebook in the description of the video this is a fundamental knowledge if you want to know more about building more complex and more advanced models like llms or computer vision to name a few of them I hope you enjoy it subscribe and leave it a like if you did and see you in the next video bye-bye

Transcript for:PyTorch Beginner Guide

Transcript for:
PyTorch Beginner Guide