Transcript for:
Training CNN with PyTorch for CIFAR-10

what is going on guys welcome back in this video today we're going to learn how to train a convolutional neural network to classify different classes of images like dogs frogs trucks ships and so on using p torch so let us get right into it not a g all right so we're going to train a convolutional neural network in pytorch today which is going to be able to classify 10 different types of images the 10 classes present in the Cypher 10 data set which we're going to use for training and it includes vehicles and animals so the classes are dog cat horse ship truck car and so on and we're going to go through the full process we're going to load the data we're going to have training and testing data we're going to train the model and we're going to evaluate the model now the first thing we want to do is we want to install the packages needed for this video today these are four packages so we're going to do pip or pip 3 install numpy pillow torch and torch Vision uh so these are the four packages I already have them installed so I'm not going to run the command again and then what we want to do is we want to open up a Jupiter notebook I recommend working with a jupyter notebook not with an ordinary python file because in this way you can rerun individual cells you don't have to constantly train the model if you make a small mistake so uh I recommend working in a notebook um we're going to start with the Imports first of all import numpy S&P uh from pill we want to import image then we want to import torch we want to import torch.nn asnn we want to import torch. nn. functional as F and we want to import torch. optim as optimization or as optim um and then we also want to import torch Vision we want to import torch Vision torch vision. transforms as transforms and I think actually that's it so these are the Imports and then we're going to start by defining Transformations that we want to apply to any incoming Data before it comes into the network so when we get image data it's going to be between 0 and 255 these are the RGB values we want to scale them to be between -1 and 1 and we want them to have or we want them to be tensors so we're going to start by saying that our transform is going to be equal to uh transforms. compose so we're going to compose we're going to combine a couple of transforms the first one is going to be transforms. to tensor this turns it into a tensor and it also already normalizes it to be between 0 and one so 0 and 255 is scaled down to 0o and one to a range of 0 and one and um what we need to do now to get the values between negative 1 and 1 is we need to normalize with a mean of 0 .5 and a standard deviation of 0.5 because that way we subtract 0.5 um and we also divide by 0.5 which leads to first of all having uh first of all getting to 0.5 and then um or first of all getting between negative 0.5 and 0.5 and then we basically multiply uh by two and we get negative 1 and one so we say transforms and we do that by the way to speed up convergence this is just uh a better range to work with so do transforms normalize and we pass 0.5 for each Channel and for the standard deviation 0.5 for each Channel as well all right so that is our transform this is what we're going to do with the data before we feed it into the neural network and now we need to get the train data by saying uh torch Vision data sets cyar 10 you can also go with cyar 100 if you want to use 100 different classes of images uh we're going to define the directory to be dot SL dat so just a data directory here current directory and then a data directory uh we want to load the train data so we're going to say train equals true we want to apply a transform so the transform is going to be equal to transform uh we want to download the data so download equals true and uh actually that's it then we want to copy that and change that to test data the only difference here is that we're going to set train to false and then we want to create data loaders that uh load the data from from here so we want to say train loader is going to be equal to torch utils data data loader actually with a capital D and L so the class uh here we pass our train data we pass a batch size of 32 we want to shuffle the data so we set that to true and we want to use workers to do that in parallel just to speed it up a little bit and test loader is going to be the same thing but with test data of course so this is now our data this already downloads uh the cipher 10 data set we can keep coding uh while this is happening the next thing that we want to do is we want to take a look at the data so that we can see how it's uh structured what the shape is for that we're going to say uh image and label is going to be equal to train data zero and the image is going to look like this or let's go and say image shape or actually image size um we can see that the size here or the shape is 3 32 32 so we have three channels RGB and 32 * 32 pixels that is how an image looks in our data set now this is going to be our input shape and then we also need to know what the output is the output is going to be one of the 10 classes and we need to actually uh code in the labels here so I'm going to say class names is going to be equal to the following list we're going to have plane we're going to have car we're going to have Birch we're going to have cat we're going to have deer dog frog horse uh ship and truck so these are the 10 classes these are the labels for the 10 classes because our network is going to produce a number between 0 and 9 representing the individual classes zero is going to be a plane one is going to be a car and so on nine is going to be a truck these are the class names all right so let's get to the interesting part our neural network for this I'm going to create a class neural net which is going to extend from nn. module this is just the base class from pytor for a neural network and the important thing is here now we need to have two methods first of all the init method which is the Constructor uh take self here as an argument this is going to define the structure and the second thing is the forward function which is going to define the feed forward logic so we do that we say super init first so that we can call the init method of the parent class um and then we Define our architecture the first thing we want to have is a convolutional layer so we're going to say self conon one the first convolutional layer is going to be equal to NN con 2D convolution 2D and and now we need to specify um what we want to have here now we want to start with a shape of um 32 * 32 pixels this is our input and what we want to do then is we want to Define um that we want to have three input channels we want to have a certain number of output channels or as we call it in convolutional neural networks uh feature Maps so these are the results of applying the kernel um or the filter I don't want to go too deep into the uh theory of neural networks here if you want to have a detailed explanation convolutional neural networks let me know in the comment section down below but basically we have a kernel or a filter that has a certain size we're going to use now a 5 * 5 kernel this is like a thing that moves over the image and um extracts features basically by calculating a DOT product now what the filter is in the beginning it's going to be some random stuff over time it's trained so the filter itself or the kernel itself uh is being trained during the process during the training process but basically we have our image 32 * 32 uh pixels we have a 5 * 5 pixel kernel that moves across the image calculates a DOT product and creates feature Maps so we have three input channels this is defined by the shape of our data uh and we want to produce a certain number of feature Maps now let's just go with 12 in this case and our kernel will have 5 * 5 as a size so in this case we have 5 * 5 as a kernel moving across the image and producing 12 feature Maps as a result um now what happens here is that we get a new shape so the calculation here is that we have our input size minus the kernel size so we have 32 - 5 uh which leads to uh to 27 we floor that if necessary uh or actually we divide that first of all so we have 32 - 5 divided by The Stride so how much much our kernel moves while while going so if it just moves one pixel it's a stride of one If It Moves two pixels it's a stride of two in our case we're going to use the default of one so we have a division by one which is 27 / 1 which is 27 and two that now we add plus one or we add one which leaves us with a new shape of 28 so our new shape after the convolutional uh layer is uh 12 Channels with 28 * 28 pixels that is or 12 feature Maps you could say this is now our new shape after this what we want to do is we want to apply a Max pooling layer so we're going to say pool is going to be equal to neural network. maxpool 2D and this basically turns uh depending on the size we're going to use 2 * 2 this is going to take uh 2 * 2 pixels so 2 * 2 2 * 2 area uh four pixels and it's going to create one pixel um out of that so it's going to basically uh extract the most important information and this is going to lead to a division by two when it comes to our shape so this is going to leave us now with 12 * 14 uh * 14 as a shape then what we're going to do is we're going to have a second convolutional layer uh which is going to be NN conf 2D this is going to take now 12 input channels obviously uh we're going to produce 24 as an output so we're going to create from the 12 feature Maps again another abstraction 24 feature Maps again with a 5 * 5 kernel leading of course again to the same calculation we have the size 14 - 5 divided by 1 this is going to be 9 uh + 1 is going to be 10 so now we have um actually our new shape is 24110 now we're going to apply another Max pooling operation here but we don't need to define the layer again so we're just going to reuse this one um so actually what happens here is we have this again 24 55 um I'm going to define or we are going to Define that this happens in the forward logic we don't need to do it here by creating an extra layer so what I'm going to do is I'm going to say conf one pull conf 2 pull which I don't have to specify here because I already have the layer and then we're going to move to fully connected layers so to De or or to dense layers in tensorflow language so self. fully connected one is going to be equal to a linear layer which is again just a dense layer or a fully connected layer and and since we have 24 uh * 5 * 5 what we want to do after this layer is we actually want to flatten all of this so the shape is going to be 24 * 5 * 5 and because of that I'm going to get 24 * 5 * 5 is an input here and I want to Output 120 neurons uh then I want to have another fully connected layer that takes these 120 neurons and outputs uh 84 neurons for example self. fully connected 3 is then going to take these uh ad4 neurons and now the important thing is we need to have 10 output neurons this is important everything in between you can play around with now you need to have three input channels obviously but you can choose a different number of feature maps you can choose a different kernel size a different Max pooling size you can choose different channels here you can choose a different number of um neurons in here now of course you need to keep everything compatible so you cannot just choose any number here this number needs to be compatible with the shape that you get here but in between you can do anything so you can go with 28 here uh you can you can use different numbers it doesn't really matter uh you can go with 140 here and with 92 here it doesn't matter but at the end you need to have 10 you can play around with different parameters and layers you can add more layers uh you can change the numbers here but you need to get three input channels and 10 output neurons which are the 10 classes the probabilities for the 10 classes uh that is our model itself now the forward method here is going to take self and X as a parameter so X is going to be the input and the forward function just applies these layers onto the input so first of all what happens is we take our input value and we feed it through the first convolutional layer so con one applied to X the result of that is then fed through a relu function Rectify linear unit this is basically just zero if the input is zero or less than zero and it's the input if the input is above zero so this is basically uh if we plot this as a coordinate system the function takes an input and produces zero most of the time and then it's just a linear function that's the relu function the relu activation function which is a very efficient and useful function so we do self uh actually not self sorry f. relu applied onto the output of the convolutional layer to break l linearity here this is the important thing about an activation function it breaks linearity um and then this result is feed fed through a pooling layer so self. pool applied onto f. value applied onto self. convolutional one on the input the next thing we do is we do the same thing again self. poool f. Rue self. conon 2 x and then we just apply a flatten operation we say x is equal to torch. flatten um X1 and then we say x is equal to uh applying Ru onto the output of the fully connected layer one so we take the fully connected layer uh feed the input through it apply a rue function we do the same thing with fully connected two and then in the end we get self do fully connected 3 applied onto X now of course what this means is we just feed the input data we apply the kernels or filters onto the data we apply a rue function and so on and everything that these layers do the kernels and here also the weights and biases all of this is going to be trained all of this is random in the beginning or initialized in a certain way in a certain random way and then it is trained into the neural network by showing examples so we return X and that is basically our neural network structure and logic now to actually train this uh we're going to Define first of all the network itself the net is going to be neural net um instance here and the last function that we're going to use for this is categorical um cross entropy so we're going to use uh NN cross entropy loss here as a function and we're going to say that the optimizer is going to be a stochastic radient to sense so optim is going to be or Optimizer is going to be optim SGD uh we're going to train the parameters of our Network so this is what we're applying this optimization on with a learning rate of 0.001 and we're going to use momentum optimization we're going to pass 0.9 here as a parameter for that so this is now everything defined to actually train this we need to now um go through some Epoch so I'm going to say for Epoch in range 30 uh what I'm going to do is I'm going to say training Epoch let's actually use an F string here training Epoch Epoch like this and um then we're going to say that the running loss so we're going to add up the losses for the individual um instances here so we're going to say running loss is 0.0 in the beginning and then 4 I data in enumerate uh train loader what we're going to do here is we're going to get the batches from the train loader and we're going to say that the inputs and the labels is going to be equal to data um and this is basically what we get from the train loader so the individual batches and then we say Optimizer uh do zero grat so we reset the gradients we set all the gradients to none and then what we do is we say the outputs of the neural network is whatever the network produces given the input so we take the inputs from the data we feed them through the network we get some output in the beginning it's going to be completely random and wrong and then we calculate the loss so we want to know compared to the actual output that we desired compared to the labels what are our outputs so we say loss equals loss function and we fit in here the outputs that we got and the actual desired outputs so labels um and then what we do is we back propagate the loss so we call backward loss. backward to do the gradient computation and then we take a step using our Optimizer so all of this is implemented already in pytorch we don't need to implement back propagation itself but what we do is we get the output of our Network we calculate the loss we propagate the errow back through the network to compute the gradients and we take a small step uh depending on the learning rate into the correct direction for each parameter that we have and in the end what we do is we say uh running loss plus equals l. item and at the end of every Epoch we want to print um the loss so the loss is going to be running loss divided by the length of the train loader so divided by the data we want to have an average here um rounded up to four decimal places all right so that is our training loop I can run this now and we're going to see what happens it trains Epoch zero it's going to take some time and it's going to show us hopefully my recording can handle this uh and it's going to show us um the laws now I think did I mess up something because this seems to be oh yeah I messed up something because we need to say plus equals not equals plus this seemed to be a little bit too low for a loss so let me restart this again run all this again and then we should start a training Loop training Epoch zero we should have a loss in the beginning that's a little bit higher than zero point something so it should be something yeah around two um 2.1 training Epoch 1 and so on and the thing that you're hoping for here is to see a convergence you want to see that the loss goes down so it's two then 1.6 and then 1 Point 4 1.3 and so on up until a certain point where it's going to um not show a lot of progress so hopefully something around 0 point something 0.4 0.3 um and you can keep going you can use more Epoch you can go for 40 Epoch 50 Epoch whatever but you want to see that the loss is decreasing if the loss is staying the same or even uh diverging even going up and down you know not converging to a uh to a value then probably something's wrong with your network or you're training it for too long long you have a too high learning rate or something like that um but if you have something like this where it's constantly decreasing a little bit and uh converging to some small loss then this is what you want to have now while this is training I want to keep coding because afterwards once this is trained we're going to export our um model parameters so that we don't have to retrain the model when we want to use it again so assuming that this is already trained we're going to do torch save um net and we're going to get the state dictionary of that neural network so the parameters and we're going to export them to train net. pth that is going to be our code here um and then in order to load that in another script now we don't need to do it here but we can do it here we need to create a neural net instance so we need to Define this architecture as well again because we're not exporting the whole model we're just exporting the weights so the parameters uh or the weights and biases um and internals and the network needs to be defined as a structure and then we can load the state dictionary so we can say say net load um State dictionary and as a parameter here we're going to pass torch load trained net. pth like this um yeah and then we can move on with the evaluation part so I can actually run this now once this is uh done training you can see it's still decreasing the loss once it's done training we can save all that um and then we can evaluate it on the test data how do we do that we do that by saying correct equals z uh total equals z because what we want is we want to have an accuracy metric so how many of the total actually with an a here how many of the total instances were classified correctly um and for this what we do is we set the network to evaluation mode which um because the network works different ly during evaluation and during training depending on what you do in a network certain parts of the uh network will behave differently during evaluation inference um and we also say with torch. no grat to not do any gradient computations actually now I ran this this is not good um but we're we're not doing any gradient computations because we're not training the mod we're just feeding input forward through the network and getting a result so you can see it's still decreasing here so it might make sense to run it for more Epoch maybe for 50 or something like that uh but I'm going to keep uh I'm going to stop doing this here now at uh 30 epox so with torch no grat disabling the gradient computation we say for data in test loader so now we're getting our test data what we do is we get images and labels is going to be equal to data again uh and then we say the outputs is going to be equal to uh our neural network appli on the images so on the 32 images uh or on 32 image batches and then we want to see um what the prediction is because our network uh outputs 10 values and we want to have a label so we want to say that the prediction here is torch. Max of the outputs and we want to have the one instance or the one thing that is um that has the highest activation which is going to be our label and we want to say total plus equals plus equals uh labels. size 0 so 32 in our case uh 32 instances that we classified and the correct ones are going to be plus equals um where predicted is equal to label uh to labels um and we're going to sum them up and get the item so basically we're counting how many of them have a correct classification how many of them were classified in general so 32 is going to be the total for each batch and correct is going to be how many of them are being classified correctly and the accuracy is obviously going to be in the end just uh 100 time correct divided by total and I want to print that um as aetric here accuracy is going to be equal to accuracy percent so I can run this now and this gives us in this case 68 8.99% it will probably be better if you continue training this here if you have more Epoch but this is already quite good if you consider that we have 10 different classes so this would be not so good if you just had cat or dog but considering that we have 10 different classes this is quite good and we can actually see if it works on example images so I have two example images here example one uh and example two which are two images which are not from the data set I got them from the internet uh and we can feed them in now and they they also don't have 32 * 32 pixels so we need to actually resize them and this is what we're going to do now in the final cell we're going to say from pillow import image or actually we did that I think we did that in the beginning yeah we already imported image so we don't need to do that but we need to define a new transform and the new transform is going to be the same as the old transform but we're going to have an extra step so we're going to say transforms compose and we're going to do the same thing of transforms. tensor and transforms. normalize because we need to have the same input format so again here 0.5 0.5 0.5 and again 0.5 0.5 and 0.5 but we also need to add an additional operation here transforms. resize and we want to resize everything that we get to three uh or actually three is not necessary to 32 * 32 pixels so that is also what we need to do here since our images are not going to always be of that format then we're going to define a function load image this function is going to take an image path and what we're going to do here is we're going to just say image equals image. openen image path and then image is going to be whatever we get after applying our transform on the image and then the image is going to add an additional or we're going to add an additional Dimension here by saying UNS squeeze un squeeze zero the reason we do that is because we want to have a batch so we want to have a batch Dimension added onto this because even if it's a single image it has to be presented as a batch of one image uh just for the for the shape to be compatible and then we return the image here and what we want to do then is we want to say the image paths that I have here and now you can input all the images that you want so in my case it's example 1. jpack and example 2. jpack and the images here are going to be uh and if you have a folder full of images you can do that here as well but I'm going to say load image for image um in image paths like this and all we need to do now here is we need to do the evaluation so we set the network again to evaluation mode we say with torch dot no grat so no gradient computations again for image in images what we want to do is we want to get the output from the neural network applied onto the image and we want to get the prediction the predicted is equal to torch Max uh and then our output one and then we print that our prediction is going to be the class name of the prediction so we get a number but we want to have the item um the number value and we want to feed it um or we want to get the respective class name here so when I run this you can see I get prediction plane and prediction doc which is accurate this is a plane and this is a doc so these images were classified correctly they didn't have 32 * 32 pixels so they were resized and this is how you can do that this is how you can uh use that model now to classify new images and I think if it's not online already I will soon upload a video where I show you how to deploy such a model onto an actual server so either it's already on the channel or you're going to see that um video in a couple of days here showing how to deploy such a machine learning model as an application on an actual server so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye