Lecture on Convolutional Neural Networks (CNNs)

okay so can you guys see my screen yes you can see your screen okay so today we'll be going over cnns which is uh known as convolutional neural networks it's a subset of AI which is used a lot because it's really efficient and it can handle a lot of data so it's you're going to be working with it and you might want to use it in your final project too so yeah we just want to do a whole lesson on it because it goes really in depth and yeah so today's agenda is first we're going to just review neural networks we went over it briefly in the first class which was last week but we kind of went really surface level we didn't really like teach you that much so we're just going to review all the stuff you guys like know Basics about neural networks and then we'll go into obviously CNN we'll go into convolutional neural networks and then we'll go into breakout room discussions cuz when we go over uh cnns it's going to be a lot of information that we're just going to like throw at you so it'll be helpful for you guys to just talk to each other and kind of discuss about like anything you have questions about or anything you want to clarify yeah just to add on to that um in breakout rooms okay I'm sorry um but in breakout rooms we will also do some programming practices it will um split you up random L um you can get to like meet some of your peers and also just um uh work on an example yeah we have a co- skeleton [Music] for continue try figure out okay I think I got it okay so yeah we're going to go over like also Tech techniques that you can use to improve the models like to make it more efficient to run faster Les have less computational strain to be more accurate stuff like that um and then we're going to go over ways that you can evaluate your model so like how accurate is it um like how good is it it at performing task it's supposed to um and then we're going to go into some Hands-On projects um and we'll show you kind of what it would look like if you actually had to code it so we're going to begin into um the actual lesson if everybody has no questions um I think good so I'm just going to go okay so the first part of it is going to be a review on neural networks so a neural network is a specialized algorithm that's capable to improve on itself and utilizing a data set that you provide it so it learns from the data set and improves on itself to perform its specific tasks so convolutional neural networks um they're used in images they um they're usually usually used in images to do variety of tasks like image segmentation classification um you can see it in like self-driving cars you could see Medical Technology there's a bunch of different uses um yeah it's used for object detection it's used for image segmentation like I just said and image classification so object detection would be like if you wanted to uh detect a specific object like if you said if you had a picture of a bunch of animals and you wanted to detect like a cat out of it that would be object detection image segmentation would be like if you wanted to segment out like the cat like if you want to get an actual um analysis of exactly where the cat is so that would be kind of like that and then image classification would just be classifying in inside of an image if you had an image of a cat yeah just to add on to that we'll be covering the two orange things we'll be doing um image classification and digit recognition today so yeah we'll cover that in deep okay so now I'm just going to briefly go over the neural network layer so kind of what we went over last time so the input layer obviously is the first layer it receives the input so whatever you want to train your model on your data set that's what is going to be in the input and the hidden layers are a series of nodes which have transformations to manipulate that data and learn complex patterns so maybe one layer if you have one layer it would only learn one thing but if you had multiple you could learn different things so one layer could learn like the shape of it one layer could learn edges something like that and then the output layer would just be what those hidden layers output at the end which if you were doing like binary classification would be a yes or a no or a certain diagnosis or not diagnosis so it would be like that so neural networks on their own like a basic neural network is usually going to be used for a binary classification which means it's either it either is something or it isn't something so you're diagnosing between one and two so yeah the number of nodes correspond to the number of outputs so the more nodes you have would correspond to more outputs because you need more um you need more computational efficiency to actually get to those outputs so if you have a little bit of nodes if you have a few nodes that you're probably going to have one hidden layer and that's going to correspond to like I said like a binary classification but once you start to bring in more nodes it just becomes more um it becomes more feasible to use a convolutional neural Network as opposed to just using a basic architecture neural network so usually you're going to see one that's binary classification so there's a lot of different types of data so there's the numerical data which is obviously numbers but um a lot of um deep learning networks uh cnns included they take a lot of different things like images too so they take various datas that aren't just quantitative like they have quantitative analysis but they go beyond that so so they also have categorical which is like the gender or country so if you wanted to diagnose somebody you would need to know their age right that would be numerical but you would also want to know where they lived because that would also have some implications and you don't want to know their gender because that could also have some implications in the diagnosis yeah AR can you go back a little bit Yeah so just to add on to that um qualitative would be categorical qualitative as in it's a quality and quantitative um as in a number which is numerical so anytime you have like a number it would be qualitative uh it would be quantitative or numerical and whenever you have something like a color of something like um the item is blue that would be um a qualitative um type of data or categorical data and yeah adding on to that um every sing AI models can be trained on any type of data even uh qualitative like um text but every single thing is represented into um numbers and major sees using some linear algebra we'll we'll cover that um in the Hands-On project a little bit yeah we also um covered this in last week's session so feel free to go back and watch last week's recording to do some reviews okay so we want to also just specify between uh supervised and unsupervised learning because that's a pretty important thing that you should know how to distinguish between so supervised learning is when you provide a data set so not not a data a labeled data set I should say so if you wanted to um distinguish types of fruits and your data set if it was supervisor learning would have the fruit images right but it would also have the labels like okay so this is a watermelon this is an orange so you'd be training the model on those images but you'd also be giving them the label so they could learn so they're like they see the the watermelon then they see the label and that's how they would learn so yeah if if you wanted to classify emails as spam or not spam you'd have emails and then you would have it label as okay so this email is Spam and then this email is not spam something like that and then unsupervised is when you don't have labeled data so when the you're just feeding in raw data and the algorithm has to try to identify patterns between that data and make its uh prediction so usually what that would fall under is something called clustering and clustering is kind of like it like let's say you gave it a bunch of numbers right it doesn't know what the numbers are um it just has a bunch of numbers it would try to Cluster them into groups based off of patterns it sees and it would then uh you'd then have a bunch of different categories but you wouldn't know what each category is it wouldn't be like these ones are even these ones are odd it would just cluster one cluster two you wouldn't know what it is yeah so customer segmentation and marketing um so I guess you could if you were to do that you could go on a website right and if a if someone who hosts a website wanted to see where its customers are going like what their most in demand stuff is what what customers who use by one thing would like they could take customer data and then they could cluster it and then they'd look at those to find different patterns and be like okay so oh I think this these two things have high correlation they're usually in similar groups so that must be one and then they try to analyze it and say Okay so this person who likes shoes usually likes uh sweatpants or something and that would be kind of a way you can do that so you'd have to make the connection on your own you wouldn't have it given it to you so features and labels are also things that you can look at for um machine learning models which are features are the attributes of the data that is being used for predictions so in a housing data set features could be different things about the house so like the bedrooms the location of the house the size of the house so it's things that the data can use to make its predictions and then labels would be the outcome um that the model's trying to predict so label would be like I said before it would be like you have houses right and then it would be like if you want the class y maybe the type of house like if it's like a pre-colonial house or if it's like a like a a Manor or something uh that would be the label like it' be like the type of house or or it could also be the price of the house so there's a different stuff that the labels could be looking at but the features are just a way that you can take the data and then label it well not label it you can take the data and then you could list a lot of different features about it so that the model can make the connections to reach that and label yeah and uh adding on to that the features and labels particularly for image classification the way that they're organized in a regular file directory um you have a folder of images and they can be split into multiple folders for example you have like um he said AR gave a good example of manners and like um like regular houses it can be split into different folders like the house folder has multiple houses and the manner folder has multiple manners and U there will be an Excel file um there's a CSV file but it's an Excel file if you guys are familiar with that and it has has two rows one is full of all the image paths and one is full of all the labels so we'll kind of see this on the Hands-On project and how we can actually load that data for our program to use yeah okay before you go to the next slide I'm going to ask everyone oh you can just go to the blank slide okay so I have a question to ask everyone um just to to um recall what you guys learned last um session so what do you think a Canan algorithm is feel free to type in the chat or open your mic or actually just type it in the chat yeah what is a KNN algorithm just try to explain it in your own words you don't have to write a very formal like explanation just try to like yeah use your own words also um just for those who can't remember what the K&N algorithm is it stands for the K nearest neighbors yes I see some great answers let's see if we can get one more good answer I see some two great answers okay um AR you can continue yes so K near s we're going to also go over that right now so if you have a certain data point which would be like a certain U let's say patient if you had a data set of people that were diagnosed with disease the algorithm would find a certain number of points that are near it and then based off of that it would assign a label to ones that are comment it so if um let's say you were labeling patience and one of your features was age right and a lot of the people that were the same age usually got diagnosed with a certain disease like let's say Alzheimer's so k n neor would see that and it would see a lot of people with similar ages that feature has a pretty big correlation and then it would diagnose based off of that so selecting K neighbors is a parameter that needs to be chosen so um like K would be the number of uh neighbors that you're looking at to the number of points that you're looking at that are near it so if you have a few points that you're looking at let's say you're only looking at one or two points that are near your certain data point then it would be less accurate and it might not have the strongest correlation between um like the actual feature you're looking at and the label that you're giving but the more neighbors you look at the more you'll be able to see that correlation but don't but there's also like a point like you don't want to use too many because then that would also lead to um you kind of classifying it like really broadly because if you use too many of its neighbors that would be like you're looking at features that are kind of really separated and you're diagnosing anyways like it's not really looking at anything specific yeah so cake if you only if you have a small K it can lead to too much noise sensitivity like um if there's like a small variation it automatically switches out but a large SK can make the classification too much like it classifies it into certain thing too much even when the variation is so big so you want to find a good middle ground does any questions on that yeah and I just wanted to add on to this um for those who came to our last live session can you recall why we would want a not odd number for K instead of an even number so numbers such as like 357 we'll wait a little bit for answers mhm I see some good answers yeah that those are some good answers great great great yeah yeah yeah yeah that's correct that's correct okay so yeah the reason why we go for odd numbers is because if you have an even number um it can't decide in the middle it has to choose um yeah it doesn't allow for a talk yes a voting process is a great analogy for them okay so uh before we get into more of the CNN stuff we want to go over the distinguishing distinguish uh dising thing between machine learning and deep learning so machine learning is pretty straightforward you have an input then you extract some features and then you do your classification into your labels which in this case car not car deep learning goes a little bit beyond that so have your input and you're doing feature extraction and classification at the same time so it's extracting more complex features it's not just features that you're giving it like age and lab and um age and um like let's say location it's extracting more complex features between let's say patients if you're doing that and then um You' get your label so you have um something called input layers hidden layers and then the output layer would be um in this case car and not car and also you'd also be able to go more complex you wouldn't always need to do a binary classification you'd be able to distinguish into more comp complex things like what type of car compared to is this even a car like is this a car and if it is a car what type of car so yeah something like that that concludes our review section um we'll do an opener that which is a fun activity or click on the link it's just a open yeah okay so we mentioned this very briefly last class this is a website this is a cool pretty cool website you can also so um go on to it right now if you have a comp if you're on a computer it's called this person does exist.com you can refresh it to get um a new person so um you guys or you can do like a demonstration yeah so it looks very realistic yeah continue so every time you refresh it you're going to get a different person and what it is doing it's trained on a bunch of different images of people and then it's something called a general adverso Network or again which learns to create new images based off of that so this might look like a person right but this isn't like technically a person that it's like been photographed it's it's a fake image so yeah can you go back to the slide thank you all right yeah the website is pretty fun um you can try the link in the chat let me see if I can yeah mhm that would be great you guys can also try it out it's actually yeah it's just called this person does exist.com go next slide yes it is a pretty fun um website okay so how is this site powered so as arav talked about earlier it's called a gan or a general erational um oh wait no no a generative ADV verial Network so um can you click next okay so it is a generative avial Network in generative it means it generates something it makes something uh AV verial it means something battles against each other you can think of it as like Game Theory like some kind of game theory you know um but we don't have to worry about the inversal part too much today and the network which is the n in Gan is um stands for two neuron networks in this case so um as uh the bottom bullet point you can see the components of Gans it consists of two neuron networks the generator and the discriminator um and Gans in general are a class of machine learning Frameworks they generate new data symbols that resemble based on the training data set so as we talked about last week um machine learning is essenti Al part of data science and um all you do all they do is Tak in um some input during the training process you give it that's called a data set and um a again basically uses that data set to try to generate something new but still tries to resemble the original um data set that is has been trained on so essentially this person does not exist.com is trained on human faces but because it's again um it's a generative aial Network it generates um new images so in the chat I will send an image that kind of describes a flowchart that describes how again would work if you're interested it has something called the noise vector and we talked about vectors last um week um as well um if you are interested you can look deeply into that and um also rewatch last week's recording next slide yeah so we talked about um the Gan so every time um it has been trained on like some data set um for example if you had a data set of like five million images of human faces um the m so in every Epoch or um a complete cycle through the training data set it takes an all five million images and usually in this this case um it will generate uh a noise Vector for the game specifically but we're not really covering Gans today so um next slide EPO are basically um it is universal basically for um all sorts of machine learning models and the thing with Epoch and machine learning models in general is that with each Epoch the generat improves at producing realistic samples or images in this case and the discriminator um which is part of the generative avial Network becomes better at detecting fakes in this case so yeah um uh this is something I would recommend you guys take notes on an epot consists of one complete cycle through the training data set so it's not like I take like um 5,000 of the five million images I would have to go through all of the five million images within my data set yeah yeah so continue with that um for the Gan specifically it starts with an initial layer and this is true for most image Generation Um models it starts with a very low resolution image it might look like very um noisy very like low resolution in general but you can click continue um Subs layers generally increase the resolution by adding details which creates a realistic um highrisk resolution image that you see on this person does not exist.com and then um I'll give it over to J or arov yeah I guess I could do this one yeah so yeah so nodes you start with input nodes which are in your obviously it's in your input layer and they uh take say signals oh wait no the input of a node is that you take signals of the value from a previous input data and you send it to the outputs of other you send you send it to another node so before you send it to another node you perform some trans transformation like I said right so each input of a node is associated with the weight so that's the kind of the weight of the feature that the node holds so let's say one node is performing something that can make the model understand the edges inside of an image so it would be like the weight of that node would correspond to the weight of that feature in determining um the final output class you also have something called biases um it's additional parameter that allows the activation function to be shifted to the left or right and it introduces nonlinearity into the actual um node um we're also going to go over what an activation function is in the next slide so don't be worried if you don't know that but uh weights and biases are associated with the node to emphasize how important that node is in the overall process in overall final prediction so um if I kind of want to go over the whole process it would be the node would receive an input from another node from another layer and then whatever it does to that input what would be kind of what that node is holding it's would be what is within that node um and then it would that would be associated with the weight or bias which it passes on to the next um layer yeah or can you go back to the previous slide yeah so I would also recommend you guys take notes on uh two um concept of weights and biases this is very important um for machine learning as AR said weights are parameters that um a model learns during training and basically um is changed as you train it or fine-tune it or whatever the term would be um usually it's training um through like EPO and and the like um the unit for training would be EPO that we just talk about and bias would um allow the activation function to be shifted left to the right or to the right and it's basically um kind of another um factor that can affect the um outcome or result of your um model okay so um so the node output would be what it's feeding into the next layer and the way that's calculated is it takes a weighted sum of the input and it multiplies it by its corresponding weight so that's kind of how the weight plays a role in terms of what's being passed on to the next layers and kind of how important that feature is um because the weight is being um is kind of influencing the emphasis that the input of that uh node is giving to the next layer so uh it takes the sum of the input multiplied by the weight plus the bias so the weights and biases are kind of what's affecting the importance of a node and after you get the weighted sum it's that the value is passed into an activation function there's multiple different types of activation functions and we're going to go over that later too but it's going to be passed into a certain activation function which would kind of introduce nonlinearity and it would decide the node's output so what the final output that's being passed into another node would be would be decided by the activation function so that's why uh the reason why that exists is because different models would want to emphasize different things um and just having a basic weighted sum would kind of make the model kind of very linear so it would' be decent for B binary classification but once you want to do more complex things like if you want to manipulate an image manipulate audio if you want to diagnose between more than two classes um then you would need um activation function functions and that's kind of where that would play a role so that's why that those don't exist in a basic neural network like in that previous um image that we showed you where was just input layers hidden layers output layer um that you need an activation function because you're doing binary classification and the nodes have equal weight like they have equal weight they have equal bias so the activation function wouldn't really do anything but since each node has a different um influence you would need an activation function to see what you want to emphasize and what you want each layer to kind of take from the previous layer that is a great explanation for those um who might be a little bit confused on the weight to sum um the e sign or Sigma sign that you see in the first picture it's called a sigma notation um I feel like it would benefit you guys would benefit from me explaining this a little bit so what the first function means the Z equals means is that it takes so um have n and n IAL one let me okay so over here you guys yeah wa okay oh before you get started um it'll be if you know what's a for Loop in um programming like a regular for loop it's very similar to that yes that's also a good comparison um so over here you can see I boxed and and I equals one so basically what that that part means is um you don't have have to care about the sigma sign yet so you start from the variable I to one and end at the number n so so for example if n equals 10 right um basically you would have I = 1 2 3 all the way to N9 and 10 and essentially you plug the I equals like these 10 numbers into this section let me change to a color different color into this F so um when I equal 1 you would you would multiply the weights the dot means a multiply sign multiply by the input as you can see here so weight times the input plus the bias so this is how it will work so you have something like WX plus b for IAL one and then you do it for two and you add all of these terms this is for I 1 and then you do it until IAL n so basically again what this is doing it is basically you can just think of it verbally right you multiply the weights with the inputs and you add the bias onto them and the bias as we talked about shift it left left or right and um yeah I I hope that um clarified some things for those who don't really um have never seen the sigma notation I also had sent a picture in the chat that you guys can refer to okay so before we move on to cnns we want to go over one thing that you would need to know about basic neural networks so basic neural networks have fully connected layers so each node is connected to another node right it goes completely forward and it um is all kind of connected that's why it's called fully connected layers um so ends what they have is in the beginning they don't have that right but at the end when they're doing classification they use that uh basic neural network fully connected layers um so that it can kind of um uh reduce the computational strain so that's one thing that you're going to have to keep in mind is that this is used in CNN it's not something that's just uh exclusive to neural networks so each connection of between um the node has its own weight and um this leads to a high number ofand if you're using a basic neural network because that means every single node is connected to every single other node and each have their own weight so that would kind of so it's kind of U leading to a very high U number of parameters and large input sizes so it would take a lot of computational efficiency and really a lot of GPU to kind of run it and make it accurate this way so it makes it prone to overfitting on the data where it kind of over classifies everything because it it puts so much emphasis on each node since they're all being fed forward um that it's overfits so anything that's related to um a single node so let's say one node let's say you had a neural network that was looking at distinguishing between animals and you gave it a picture of a horse so it would see that the horse is four legs and then it would see oh one of these uh nodes has shown me that this is the shape of the leg that's usually associated with a dog and then it might classify as a dog because it's putting so much emphasis on that one feature and it's doing that for all features so it's kind of like emphasizing all of the features to an extent where all of them have so much influence over what the final output is that if anything is similar between two different data points it classifies it as a thing even if it's completely different so CNN's are kind of different so they have something called conv convolutional layers which is kind of where it's learning all of its features it has something known as a Max ping layer and a dense layer so that's all before actually convolutional and Max ping are kind of what's exclusive exclusive to it and then the dense layer is where it goes back into the fully connected layers and kind of neural network basic neural network um framework for its final output so what it learns through the convolutional layers and Max sping layers it then pass pass to its dense layer so that it can do the final [Music] output yeah so CNN convolutional neural networks are a class of deep learning models that are primarily used in image tasks so something related to images um and what how it does that is it automatically learns through spatial hierarchies of features um and it has multiple layers so spatial hierarchies would be like um it's volume its shape the edges of it and that's how it's able to learn and it puts emphasis on one over the other so depending on what's learning from one might be valued uh to get a more accurate diagnosis or not diagnosis an accurate classification so on the right you can see um that is a MRI of a liver and the things in red are segmentations of liver cysts so if you wanted to teach a model how to D to segment those things which are liver cysts which are kind of like um CIS are the like uh deform tissue or like U not deform tissue but like deform abnormalities inside of liver liver um the way it would segment that is it's learning through different parts of that so if it sees a certain shape like a circular shape if it sees these um specific edges if it sees like a specific area it's in that's kind of the features that it's looking at yeah okay so this is um another example of the the model framework so you have its input and that's fed into convolutional layers and pooling layers which can be repeated you get that so whoever has their mic on kind of mute um arov can you pause for a second well answer a question okay so javil I think that's how you pronounce your name I'm sorry if I put your name um said that a neuron Network would classify even when there's no s okay I didn't really understand your question I think I know what he's talking about he's saying um a neural network is prone to overfitting so that what overfitting means is ah okay if there's no similarity between two data points oh he wasn't even in the call I think he left so I'll wait for him to get back in before I explain oh I think he's here I think he's here okay so I think he just joined so what that means is that if you have two different data points two data points that are not in the same class not in the same label um and they had some similarity like they had similar shapes like a watermelon and an orange have a similar shape they have different sizes but similar shape right and uh if you had overfitting it would realize that oh this is the same shape um and since everything is emphasized so much and that and all the weights are are not being really affected and don't have really any influence it would see the shape and then automatically say this is a watermelon even if you give it a picture of an orange so that that's kind of what overfitting is if you put something in a class that it's not because of a certain feature that is being overemphasized yeah I'll also add on to that um overfitting basically what it is is it it doesn't really learn it only it learns the patterns between like the training data in each like label for example but also learns all the noise and outlines basically that is what it is so um for example if you don't have enough Fe you have too many features um it doesn't really know what to do um and a symptom of that which we talked about a little bit earlier training right there's a training accuracy and a test accuracy so usually what happens is that the model or or your network does really really well on like um just normal training sets but whenever you give it um a test set which is usually which is not exposed in your training set it doesn't it it fails to generalize that knowledge and yeah there's just a very huge gap and usually when you see the two graphs um diverge you see you know that is overfitting and you'll have to do something to uh counteract that and we'll talk a lot about um ways to to counteract um to mitigate overfitting in future slides yes that's um kind of what we're going to talk about the techniques to make your model more accurate so that's one of the ways one of the things that we're trying to make it more accurate on to make it less uh prone to overfitting so convolutional layers are what we were talking about that's how it learn patterns so it's a bunch of different operations that is being applied to the input data um to extract features of that data so it's using something called filters or kernels which are small matrices that slide over the input data and detect patterns so if you were using numbers it's pretty clear what it's doing but if you weren't using numbers then it kind of get a little more complex as to what you're doing but um so what it's doing is it's outputting the feature maps that highlight the presence of specific features it's looking for so The Stride is how much the filter is moving so on the right you can see an example and let's say you gave it a stride of two it would it would do its 3x3 on the top left and then it would Skip by two instead of one so um let me try to use this so that you can annotate it so in its first one it's 3x3 right so we'll take this one and then its next one it would take that one if it had a stride of one or oh sorry it would take so in this next one it would take this if it had a stri of one right but if it had a stri of two it would go from the original one and then it would go straight to um here yeah so uh that's kind of what the stride is doing so it gives you a little bit less um as an output but you might want to do it if you uh think there's a specific thing that wants to be emphasized or if you want to reduce computational strain so yeah right here you have something called Edge detection so it's looking in it's 3x3 right and then it's comparing every single one to the ones over here the actual thing it's doing is a little bit more complex than I think we should go into because it's using like a lot of different math operations into what it's looking for and it's a lot more clear when you're not using numbers but um yeah so it it's the output it gives is kind of think about it like what it's learning from the data so if you wanted to take this data right this data this this specific kernel and you wanted to place it over all of these so you'd place it a bunch of different times that's what the Comal layer would be doing it would place it for a bunch of different times for with a stri of one since you're not specifying it's going to be one and it's going to look at each of these and it's going to compare um the kernel into what it gets so it's going to go go one by one into each of these so in this one let's say it goes on the top three you would get like this specific like uh output on its conv convolutional result and it it'll be a little bit more clear when I go into the max fing layer but it it's kind of when you have a convolutional layer it's going to feed into different layers and take its output so that it can predict on the specific class like if you want to predict on um actually there's not much you can predict with these numbers but it's kind of just an example yeah so stride of one moves one pixel at a time stride of two would skip every other pixel so max pooling layers was what was in that image so what pooling layers are doing is it's downsampling the dimensions of the input so when we said that CNN's usually work with images right if you were feeding a 3D image every single layer it would be really um it it would take a long time to train and a long time to make its predictions right so you want to reduce the dimensions of the image make it smaller smaller image and what that does it it reduces computational complexity and it also helps control overfitting because it it doesn't look at all every single um it doesn't look at every single feature with as much emphasis it tries to narrow down into specific features so it retains the important features and it discards other ones so that's how it kind of prevents that overfitting is is it is it's not emphasizing certain features it's not emphasizing everything equally so a pooling window um is usually going to be 2 x two or 3x3 and what that's doing is it could either be looking at the maximum value in within that pooling window or it can be looking at the average value so let's say you were looking at the maximum value right and you took 2 by two so the maximum value of this one would be nine right so this nine over here and the maximum value of this one eight so eight over here and then it goes down and seven n so yeah it's kind of this one a lot more simple because all it's doing is reducing the dimensions of the image so if you had more features cuz right now you're just losing using numbers so it's really binary it's not even binary but um yeah it's just um numbers right so you wouldn't even have like two different classifications so what it's doing is it's just looking at it and picking the highest value but let's say you had an in you had an image right and you wanted to do Max spoling it would do that by looking at pixels um and it would try to to down sample the pixels based off of the most informative ones which would be certain intensities of the pixel so pixel intensity would be if you were on a gradient of between zero and one and you were just looking at black and white then the ones that were closer to White and closer to Black would be the ones that were more informative and just more profound as opposed to ones that are kind of in the middle so yeah that's kind of what you could do but pooling layers just make the image smaller and convolutional layers can lower in those features yeah I'll also add it add something onto AR you go back once SL yeah so pulling layers as Albert AR explained basically is um just shrinking down your data set um your input as you can see it shrink down from 4x4 to 2x two and basically it makes um um Computing a lot easier and also helps to control overfitting so I would recommend you guys take notes on that just say pulling layers equals reduce computational complexity plus control overfitting something like that and yeah the most important part is of course um it detects some of the complex features which is very important if you have uh for example x-ray scans um small changes they're um for example like diseases and like xray scans that can be shown they're really small different differ es between the images themselves so pooling theirs would help with that detecting the small differences between images MH yeah for those who are interested in how it prevents overfitting um I feel like some of y'all might have a question on that it basically helps mitigate overfitting by because you're you're like um separating them into smaller sections and then aggregating them right um it introduces us um something called a spatial invariance I'll type it in the chat um second introduces spatial invariance and basically um it makes the network less sensitive to the position of the feature within the image so for example if you had a if you had a picture of like a chicken and its feet was always like on the ground or something like that it basically learns that um the feet is always on the bottom part of the image so if you flip it if you like flip it 180° and just like have it upside down it just has no idea what it is so that um it would be a great example of how um that would be very useful uh does anyone have any questions all right we can continue okay so I actually want to go back slide I want to go over the convolutional layers a little bit more yeah of course I don't think it was that clear what I was explaining so what's doing is looking at this kernel it's multiplying every single one and then it's adding it together so for each kernel so you get a certain value right so you're doing one by one zero by one and you're looking by each kernel and then you're adding together the results and then that's what you get for that kernel so the result you would get for this kernel would be one and then if you went down one like but not down one to the right one um you would get um negative one so if you went this one would be negative one then this one would be uh negative one two and then you go down and then this one would be two and yeah you go through each one and you get single value for each kernel and that's the convolutional result so you get a mat you get a matrix for this one um but it's a series of individual values you get from each kernel by in this case what this doing is it's just multiplying each value but like 1 * 1 0 * 1 negative 1 * 1 and then um actually looks like I'm just saying it for each like um individual one but it's multiplying the exact same uh position by the exact same position in the kernel and then it's adding it together so so yeah so and then activation functions are kind of functions you apply to the output of each node so it introduces nonlinearity into the model and it learns complex f so there's a lot of different types of activation functions the most common are going to be sigmoid and um railu which is a rectified linear unit um and rectified linear unit is probably the most accurate because it has good computational efficiency while also being pretty accurate um and these are pretty simple um uh like R is just looking at the the maximum value between zero and the value that you get so like let's say you get a specific specific value for a kernel it's just looking at the maximum value for that and then that's your output but then some get a little more complex like you can look at max out that looks like it's doing something a lot more complex but when you deal with something like that you would typically get something that's a little bit too computationally expensive for it to be viable in most scenarios and it might not even be tailored to what you're doing so it might be prone to overfitting or it might be less accurate depending on your task so yes you have your input then you have the nodes and right those are associated weights and it what the nodes outputting is fed into the activation function before it gets its final output yeah uh can you go back this slide uh okay TR you can you can speak yeah so the difference between sigmore and reu like when to use it reu is really good for detecting edges like it really helps you detect particular edges and sigo is good for um for example if you have a lot of like noisy um matrices from images it helps like generalize it so Darren um he asked why do we want nonlinearity it's of course like for example if a value is less than zero it's going to ignore it it's going to put at zero but if it's higher than um zero it's going to put that value it really helps eliminate some of like unneeded um information in the matric it helps better detect edges so you'll see in the Hands-On project how we can use um wheu to our advantage MH yeah uh I'll explain um activations function a little bit more um so basically the reason why you want nonlinearity um Darren is because um with like probabilities and stuff there's a lot of noise and there is a lot of um there's a lot of outputs that you generally don't want the reason why you have relu relu is basically basically anything the probability that is negative um although it might like seem a little bit weird but um basically it will uh um like um a model or a network will sometimes output negative values and you don't want any negative values because it's just noise basically essentially so you basically um just have the probability of everything being like very low being a negative number just directly set it to to be none which is why you see like a a straight line for the r graph I'm going to do some annotations over here okay so you see this over here right um basically you can think of this as like um a coordinate plane um anything on the negative side this part would be negative and this far would be positive would be basically the probability um would be set to zero so basically you can think of um the activation function um visually you can think of it as setting the uh um a number that um your network returns into a probability that can it can actually be used for stuff like image classification or predictions um I am going to send uh we're mainly going to focus on relu and sigmoid functions um for Activation functions um and all these functions yes um J they are non-re repeating and the reason why you wouldn't want to use something that's repeating like sign is because um you want something that would be consistent like for example for relu it basically sets all the negative values to zero and have a linear um kind of path for the positive values so basically um this is especially helpful for hidden layers and yeah and both relo and sigmoid on are nonlinear so that it um compresses the output or the probability into smaller range that's that's something that's actually useful to us does that answer you guys's question M yeah yeah so most of the times we'll use sigmo or re or one of the two okay so if you want when you get your final value it's going to be used through something called Soft Max um which it happens in the classification part of the CNN so it's not happening after each node it's not happening after even the convolutional layers it's happening um at the end so at the end once you have your all of your individual values obtained after you do your series of convolutional and pooling layers you'll get your values right and you'll feed them into something called Soft Max for your final probability so the pro that's the probability each thing is going to be in a given class so if you were kind of picking between cars you wanted to say what type of car a certain image is um it would give probabilities for each individual type of car so let's say Honda is the most likely and it gives you 60% accur 60% probability that's a Honda and it gives you like a 5% it's a Lamborghini that would be it would give you the probability for each individual class that you're looking for so it's going to be between zero and one and it's going to be over the number of classes that you have so no matter what your number of classes are if it's a 100 it'll give you the probability for each of the hundred classes but one of them will be the most at of the most um it would be the highest which is the most probabilistic that that is in a specific class so the one that's highest is what you're going to assign as a certain class so all the classes are also going to sum to one but since they're all probabilities um they're all going to sum to one which should be 100% um and if you get 100% one class that means it's 100% certain that it's going to be into that specific class so that would be a one if it gives a one first class so the steps it does to get that is it uses the exponentiation of each logit and what a logit is is the raw unnormalized score outputed by the final layer of a neural network so it's whatever that random number is that you get from a neural network that's what that is and expens exponentiation is e which is the U it's e to the power of X and that is kind of going to give you a way that can normalize it instead of just having X because let's say you had a bunch of negative values you had a bunch of positive values um it would get kind of it it would be pretty inaccurate depending on what uh way that your value is leaning if it's kind of closer to negative closer to positive it would give you different values if you just used X that's why it kind of uses e to the F power of X so that it's all going to be positive no matter what and that won't be kind of an issue um and then it's going to compute the sum of those exponentiated logits so uh it's gonna add up together every single one of them and then it's going to divide each individual one by that sum to see the probabilities so it's always going to be over zero um unless the one has like a one like 100% probability it's going to be over zero and it's going to be between Z and one so yeah the kind of way that this works is the output layer is going to give you a series of numbers you're going to feed it into that function and it's going to give you probabilities for each class so this outp layer gave you 5.1 for something and says that has a 90% probability right so these would be individual classes individual like labels so it's more clear when you do it with the qualitative analysis so let's say you feed in a picture of a dog and then the CNN feeds you in these numbers related to each label so cat got uh got 35.4 and it's really hard to interpret what that means in terms of probability so when you feed it into the softmax function you only get Z .06 that's like 6% um and that means probably not the final class is not going to be cat and dog got 94% so 0.94 so you assign the final class as dog yeah that is correct um can you go back one slide um this is the last slide before we take a a little break but before you go I just want to talk um about uh what this kind of means so basically this is also similar to relu but it's it's basically a different shape it's kind of like you can think of it as um some sort of blend between uh reu and a sigmo function it it still looks like a sigo function but yeah so you can visualize it as that and essentially what I was talking about earlier about a negative value essentially you have the CNN return a negative value so um for horse however this is like a lot of noise we don't want negative values right so we we want to essentially for all negative values we're going to set the probability straight to zero so we don't have any like noise and just like unwanted stuff in our probability in like the actual output that gives us the probability of cat dog or horse um this is not a very good color but okay yeah also the way that it's doing that and kind of mitigating against those negative values is by making e to the power of something so you're not going to get a negative value then and it also finds probability for each one so we kind of normalizes it into something that you're able to interpret MH yes exactly so it gets the output from your output layer and actually gives it in the form of probability and as you can see um a lot of cases with these activation functions is that it will um magnify the I guess the percentage of like your large largest um return so for example um with dog right here it returns 38 and it turns into a very large percentage because that is the most confident answer that the um convolution neuron Network can return so it gives you and after the processing through going through the softmax function activation function it will return this probability so a like what five minute break um yeah 10 minutes is good let's do we do on a 10-minute break and then um we'll come back in and go more in depth on those techniques we were talking about that can mitigate the stuff like computational strain and um overfitting yeah and uh if anyone has any questions I'll be um feel free to ask them I'll try to answer them to the best of my go yeah just feel free to open your mic because um right now this is a break beard yeah I'm going to send a 10-minute timer so everyone come back at um 12:21 p.m. cfp it might be different in your time zone but just remember 10 minutes later e e e e e e e e e e e e e e e e e e e e e hey everyone um it's been 10 minutes um everyone come back um yes nice nice nice for those who have come back type hi in the chat or something like that just type what you want you know like um say where you're tuning in from I know we have a lot of people from usbekistan yeah hello hello hello um we will do an attendance link right now um nice nice nice nice from India do you have any people from the US anyone I'm from the US uh umn check this pleas yeah okay everyone here's the um attendance link you can just one tap to check in I'm pretty sure I'm just click the link in that I just sent in the chat and we'll be all good to go um uh okay um all right so everyone um I'm back from the Break um sit back down and we'll get started once again and also click the um attendance link uh yeah will the slides be provided um we will not provide the slides but you can have access to the entire recording um you can just skip to where you left off on YouTube and all you have to do is to um do the exit ticket the exit ticket is basically a reference for us that you listen to the entire um the entire lecture the entire um and you like took notes and stuff it's basically just going to cover um Concepts from each part in in like the lecture you know um there might be some about there might be one question about uh the review that we did there might be some questions about um yeah just a lot of stuff yeah also your score on it doesn't really matter but you still try do the best like analiz you or anything but for you to see like if you're actually understanding everything and if you can really um utilize it for your final project yeah exactly so um as AR said again you can redo the um exit ticket how many times you want to get um a full marks or Max score or whatever you want to call it um just please please do it and if you see any mistakes that you make from try to um try to fix those mistakes because um that uh it would be very good practice for your understanding and yeah human minds are programmed basically for practice and doing stuff like the exit ticket really helps you remember the concept that we covered we'd also recommend you take notes if possible yeah let's start okay so I think you can get this one yeah so starting off um um resuming where we left off we talked about overfitting and underfitting just then here's a pretty good visualization of this so for a trading set you can imagine just plotting points on like um this graph as you can see with the blue points over here there is some Trend but there's also some noise and that's usually the case with most of your data there is going to be some form of a trend and but there's also going to be some noise that um basically is what you don't want so on the first image that we can see um over here this image this is an example of underfitting so basically it's a straight line it's very basic it doesn't capture the trend the trend as you can see if we were to in like draw it out it would look something like this right it follows the trend of um the dots basically but however um for neuron networks it is um I guess more complicated for them and you would have to have a proper fit to basically capture this trend this um yes um and as you can see proper fitting it is basically a good um visualization a good representation of the trend without it doesn't care about the noise over here for example there's some noise down here there's some noise over here for example it doesn't really capture that because the main the majority of the points are um on this trend line however as you can see on the right hand side there is an example of overfitting and what you can see is that it like connect the dots basically it is very very specific it doesn't really capture a trend but more like um fits every single data point which includes the noise so you can see it's no longer a very smooth line it is like a very riggidy line and um basically that's something that you don't want because essentially with overfitting it learns the test data very well so this would have a very very low training error as you can see here because it basically captures um what the trend of the data sets would look like however if you feed at a different point for example that it is not trained on that is tested on the generalization error is very very high so yeah this is the reason why you wouldn't want overfitting or underfitting you would want something that's just a proper fit to actually capture the pattern yeah um adding on to this um underfitting usually happens when your model it's not complex enough you don't have too many layers or um you need to fine-tune your hyper parameters um we will talk about hyperparameters when we do the hands- on project and overfitting usually happens when your model is too complex or there's too little data for example the data set um is only like 100 images that is very uh small so you really want to be in the perfect um middle and just wanted to add to this and and of course um last session we talked about regression regression is a perfect example of a um good fit but regression necessarily it won't be applied to these types of data points right because um you can have very much noisy data from one extreme to another and it can fluctuate a lot so machine learning algorithms that's the whole basis machine Lear learning algorithms is basically a more complex form of um regression and that's why you see those those type of graphs and uh I mean those types of functions can be generated using machine learning functions yes also for those who just came back um click the attendance Link in chat if you haven't already you can go to the next SL R yeah okay so now we'll talk about loss functions um R do you want to explain this part or do it yeah sure I can explain this so what L functions are doing is it's measuring how accurate I don't want to say accurate but it's measuring your model right it's actually quantifying how good it is so measure the difference between the predicted output and actual Target that is in your data set so loss functions are used for supervised learning because you need to have a Target value you need to know what the actual thing is so let's say it a loss function is comparing each predicted output to the actual output like what the actual label is for that class so there's different types of loss functions depending on what you want to do exactly um and that includes uh mean squ error which is uh usually used for regression but it's also used a lot in um kind of um it's used in super resolution for images it's used sometimes even in audio like in um in NLP tasks um and all that's doing is it's subtracting the predicted um the actual value minus the predicted value and then it's squaring it so it's it's it's the squared difference between the two and it's summing that for every single one so if you were doing on an image that's why I use super resolution a lot is if you're doing on an image you could see the difference between each individual pixel squared so you're able to see how the the variance is for each pixel if you're doing super resolution like if you're trying to make an image more clear um and then binary Closs binary cross entropy is used for classification tasks between one and another so it's just binary classification tasks so it's just summing the log between the predicted plus the difference between the log of the the actual one and it's summing that one and um it it's a lot more complex but it's usually used for binary classes because it's when you have a usually use binary cross enty when you have a large data set and you want to um kind of just use a Easy classification task so even though it looks a lot more complicated it's used for a little bit simpler tasks compared to mean Square erir um just cuz it's cuz um main square is just more informative on average of what it's showing in binary cross enty kind of it just shows like the differences between like all of the different data points and it's a lot more complex to interpret and then on the right this is a training and validation loss graph um the X values are just the EPO which is how many times you fed in all of the data and then the loss is on the left so it's is on the Y so you can just kind of see the how the loss is being minimized because you always want to minimize the loss in your loss function and you're seeing how the loss is being minimized the more you train it so the more you train it it's getting more accurate and what something would kind of look like um if you had overfitting on a lost graph what that would kind of look like is your training would be really really good but then your validation would be so much like um worse um on this it's fine like this is kind of normal like it's always validation usually is going to be a little bit worse but the validation usually would not be like significantly worse it would be like if the right here you can see that the training is going closer to zero like it's below 0.1 on the 500 Epoch but um the validation if that was like 0.5 if the loss was like 50% I don't know what this loss function they're using is but like it's 0.5 let's say it was like five times more that would kind of be a indication that there's some overfitting going on so back AR can you go back one song yeah uh one seconding this okay so again just to summarize things um this is what you can write down in your notes a loss is basically how good or how well um this machine learning model or neuro Network performs and it's basically we can think of it as the difference between the predicted output and the actual Target values that um the machine is supposed to output and a loss function is um a function that calculates the loss and you can think of the loss as accuracy and um for the graph on the right hand side you can see the training data is just um your training sets and you test basically with each Epoch you test the results the lost the accuracy of the um model or the network on um the training set and as the epoch number increases so as um the number of times you go through the entire data set increases it learns the patterns through the training um sets and so you can see the training loss or training accuracy decreased significantly as for the validation set because it's not exposed in training so you can think training and validation are two things you train and you test and they never um interfere which you they never overlap so the validation would be the actual representation of your model or your network skills and once you see usually for these loss graphs we also talked about overfitting earlier if you see the validation go like this go up and it diverges with the training that means that's a signal of it overfitting and you should probably like stop right here you know you shouldn't have kept trained and we'll talk about this um more in the in next slide yeah so one analogy for like validation versus testing U I mean sorry training imagine training split is like your lesson content in school and then validation is your actual test except in machine learning this H happens multiple times um the number of epoch and then um it gets feedback from the validation Loop and updates its weights and biases okay so back propagation is a supervised learning algorithm where you trying to minimize the loss function so it updates the weights depending on the the output you get in the loss function the loss function is kind of telling you the error in the model and you want to minimize the loss function you want to get close to zero right and the way that you do that is back propagation so you're updating the weights um to change the loss in your loss function so first the network passes through for pass and get cre prediction then you compute the loss so compute the loss of that prediction and then you compute something called the gradient which is when you're you're Computing um how much a difference in changing the weight so how much if you change this this node's weight how much will that impact the loss that's what that gradient is and you're Computing that for all of your nodes right and then you try to um update the weights to minimize the loss in the most effective way so there's a bunch of different back propagation algorithms right but um think about as like an Optimizer there's one thing called atoms Optimizer that's one thing that you're probably going to use if you use a CNN um all it's doing is it's optimizing the weights to minimize the loss and make your model more accurate is there any questions on that back propagation is very important for supervised learning which is what we will cover today so there's a bunch of different strategies for mitigation for mitigating overfitting yes yeah for over to reduce overfitting we talk about early stopping just stopping the model once you notice it start to Veer off and start to get really high loss values um there is something just simplifying the model if it's too complex you're going to start to kind of overfit and you're going to start to train too much on your own data and not be applicable for other data you could also increase your trading data because it could be underfitting or overfitting could be caused by having less data so it starts to kind of um hyperfixate on small things within that data so you can increase the training data if that's it um regularization um it starts to kind of penal uh penalize large um large um outliers and it prevents it from fitting within the noise so it stops it from including noise in its data and it kind of just doesn't include it so it regularizes the data into more normalized values um and then drop out which randomly drops nodes um from the network so that it prevents co- adoption Co adaptation so what that is it's it kind of like if you randomly took um nodes and their connections away from neural network so if you sto looking at features at random because usually in your own networks I know the diagrams only have like four nodes but you're going to have a lot more nodes right each layer would have way more nodes and each node's looking at completely different things and some of them you're not going be able to see exactly what they're looking at so when we you say that looking at edges and shapes it's not like we're manually doing that it's not like they're telling us they're doing that we're kind of hypothesizing that's what they're looking at because that's what the final output is the final output includes something that if we were to look at it we'd like okay so this is circle and right so that's things that we're going to notice about it but each node kind of does something very specific so by randomly dropping out nodes you're able to kind of stop overfitting by making small specif specificities uh stop making huge differences in the model so if it's looking at something so specific that's not really impacting the final class and it keeps hyper fixating on that by dropping nodes at random you might get rid of that it's it it's it's a lot more accurate than it sounds at at actually reducing the overfitting okay so now we're going to go into some techniques that you can use to you know prevent over fitting to make your model more accurate just overall a lot of that stuff we're going to go into it way more in depth right now so I think um Sam you want to do that sure um one second okay um okay so we go into CNN techniques what is co adapt adaptation yeah we will cover um this okay so next slide so we'll talk about future hierarchy um repeating convolutional and pooling layers to detect more sensitive and smaller features so basically what that is is essentially um as you can see uh uh this is detecting the number one and as you can see what is the um I guess feature of the number one that this is writing we'll we'll we'll do this um in a like interactive activity later but um this is the number one and in this picture it is written like this so the model repeats pooling layers to detect those smaller layers I'll use a different color the most distinct here it notifies that this this Square over here is basically this feature of the number one so the pooling layers that we talked about earlier is um because it separates in it separates the entire image into smaller and smaller um I guess um parts and then aggregate it together it is able to detect sensitive um features and next slide and it also allows a redu computational strain as we talked about earlier and it also improves the feature that the model can detect so overall like the reduction in computational strain is that you're looking at smaller and smaller dimensions of an image increasing the specificity of what you're looking at that's ma spooling so it's just reducing the dimensions yes also is kind of like if two different layers two different nodes I meant like two different nodes are supposed to be looking at two different features hyper fixate on the same thing that it kind of gives like double importance to that same feature so it's going to create overfitting based off of that so let's say it starts to hyperfixate on let's say you were distinguishing animals right and for some reason two of the the nodes are looking at something related to four legs the specific animals having four legs and then it starts to overfit and it starts to think that in your validation that a lot of the cats that you're showing are horses because they have four legs and horses have four legs too so it must be horse that's kind of like overfitting because it's two different things that are separate but still contributing to the same thing so creating Dropout what that kind of does is since you have so many nodes it ensures that your nodes are at least going to be a little bit varied yes and then we'll talk about Skip connection I think I can do skip connections um just because I have a little bit more like oh yeah sure sure feel free to do I use skip connections a lot in like personal projects of mine and Skip connections are usually used when you're doing two different different paths that are connecting to each other right it's not like a CNN going left to right all the way through it's something that is connecting to another path that has a completely different purpose right so you're not going to use it unless you're doing something with images usually like you're usually going to use it for images right and the thing that uses it the most something called a unet which is used for medical images and Skip connections what that do what does is it it provides output about features and dimensions to one layer uh from another layer and it doesn't always have to be adjacent it can be like it can go anywhere right um and as you can see here so the white skip connects to the the white right so what that's going to be passing is whatever the feature the white is learning is going to be passed to the same one on the right and that is going to be passing Dimensions features all that stuff so let me kind of go more in depth on this right so the encoding path would be learning all of those features the encoding path would learn the complex features it would reduce the dimensionality through Max pooling layers or whatever pooling layers and that would kind of give you the features and it would make the image a lot smaller right decoding path is let's say you want to segment the image um you would make the image back to its original size and then you would display those features onto the image back onto the image as those segmentations like if you wanted to segment the eyes off of a dog that would be the segmentation that would what decoding would put onto that onto that image of a dog right it would be that segmented eyes so that's kind of what that's doing and if you didn't have these skip connections it would be really hard to like display those features that you learned and it would be hard to keep all those while you're reducing Dimensions right because you're reducing the dimensions right so something you learn when the image is like let's say 256 x 256 pixels might be hard to still keep intact when the image is like 24x 24 pixels at the end right so you use skip connections to display the images dimensions and the features that you learned at that stage onto its corresponding stage and the decoder path right um so yeah it's just a good way to pass that information and it's usually used in unet like if you guys want to use image segmentation I would suggest unit right so if you guys decide that for your final project you're going to need to use skip connections so I would take note of that we'll allow people to uh wait a second to maybe take some notes that's okay yeah that's a great explanation uh you can go to the next slide and then we'll talk about pattern so this is a lot of text basically you can think of it as adding extra pixels which is zeros um around the border of the feature map before doing doing um more convolution um stuff I guess so um this would be let me find if I can yeah I actually have an example of this so this is what it would look like SEC so something like that um I sent this in the chat um so basically you can see it pads the image with zeros and if it's going through um a convolutional filter it will um add a different padding which is the two different types of paddings so um for valid padding there are no extra boundaries added and the dimensional dimensions are actually smaller than the inputs um as for the same padding which is the example that I showed in chat um the padding is added to the input Dimensions so that the output Dimensions match the original um input Dimensions okay so overall padding is used to preserve the spatial dimensions and it adds extra pixels or extra um zeros usually um around the features yeah now we'll talk about dropout so we talked about Dropout earlier and basically it is a technique to just remove certain neurons as you can see um in the image on the left on the right hand side for Dropout these um nodes are basically just removed to prevent overfitting usually if you have a neuron that has um there is a probability of between like 20 to 50% as you can see over here to drop out um a neuron or or node and um next slide so how does this work this is also a good um visualization what what Dropout would look like basically during each training set um a some parts of the input some nodes would be set to zero and it ensures that the model doesn't entirely rely on one neuron so that it doesn't just like take all the information from one neuron and um there's some random chance for it to use um to just remove the one that relies too much on and use a different neuron for more consist well not more consistent but more stable results that is usually more correct to avoid um anomalies from happen yeah then we'll talk about Bachelor there will be a good visual visualization of the in the next slide but basically um this is very important it is a technique to normalize the inputs of the values to have zero mean and un variance and basically speeds up the training and provides some regularization so usually when you're training something um you would have to whether it is fine-tuning or just training in general you would have some or testing as well you would have to have a certain batch size and basically a regularly uh it provides this regularization that um not only reduces the need for dropout but also makes the output more consistent um J or do you have anything to add to this really but I think we're good yeah yeah we'll go to the next slide there was a pretty good visualization of what this batch normalization look like um this is also what you use in your code as well you have to mention um the batch size for yeah there's this shift right here I think it's you can see for the yeah go on yeah you can see the sorry okay so so this image is for Dropout as you can see it uses like um all of the nodes and just doesn't rely on One Singular node because there's this random chance yeah you can yeah uh another thing to mention can you go up more yeah you can see the little um check mark like a flatline and that guess what type of activation function that would be starts with yes yes yes for those who know the answer um wait a little bit before putting it in chat let's wait for everyone to um get the answer and wait for like 10 seconds is that good another hint if the input value is less than zero it's going to be zero but if the input value is less than I mean more than zero and it's going to be um the output will be the inputs uh itself yeah yes re good job yes everyone put put put your answers in here good job everyone yes good job good job I don't know if there's yeah I don't think there's a good visualization that was a visualization for Dropout actually but yeah B normalization you can show like a code um as you can see yeah the 16 here would be the batch size and the activation function would be Rel oh that's so um that is used using the library called tensorflow we'll code so we'll use pip torch which is similar to it but the code would be the same you'll be able to see very we add the activation functions we won't do batch normalization in this code of our hands- on Project since we don't have a lot of noisy data and it and the convolutional network itself is very simple so yeah yeah also I don't think it was very clear um what a batch is like we didn't really explain what a batch is um batch is the amount of training points or data points that you put into your model at once right an Epoch is when you put all of your data points when you feed it's not when you put all of your data points it's once you have put all of your data points in and trained on all of your data points once that's one Epoch right you're not going to just throw in all of your data points at once you're going to put it in batches right so a batch of 16 means that you put 16 data points into your model and that's what you use right and if you had let's say um if if you had 32 data points totally then you'd only need two batches before one Epoch but if you had way more than that you need multiple batches so it's kind of like that yeah another thing about batches to be careful of um the higher the number amount of batches um the more the higher the chance that your model will perform better but you you will face GPU limits like if you have a very large data set or very like large images and you put like large um batch sizes like 32 is a pretty standard size but if you put like 6 4 for like a big data set and you're going to run out of uh RAM or vram mm yes that um I sent uh the definition for the two like a very clear definition for the two I I would recommend you guys take notes of that so Epoch is to cycle through all of the data and a batch is how much data should be fed into the model F1 so for each Epoch you would usually have multiple multiple batches and yeah this varies of course based on how large your data set is and just to add on to what ji said um the larger the batch size the more vram it will take um so if you like for Google collab um if you're using like a a slower GPU you would want you would usually run into some vrm issues so you would have to um turn down the batch size and in turn it would cause the training or whatever you're doing to be slower yeah we'll see the GPU usage for our Hands-On projects uh when we're training it later yeah yes so we'll talk about regularization as well this was the thing that um I don't know if you guys remember this is the L1 and L2 that we saw um in the first week so basically um uh there's this concept of the usian um distance but we're not not really talking too much about that basically all you have to know is that regularization is a technique to normalize the values for the inputs to have um a mean and variance of zero and basically doing such a regularization for your data for your inputs would speed up your training and also reduce the need for Dropout because if you're dropping out um like your notes at like a very high chance like a 80% like 100% that's that's like not generally not a very good practice you want the Dropout to be um a little bit lower in the 20 to 50 range yeah now we'll talk about model evaluation so this is a confusion Matrix or Rob will explain it okay so what a confusion Matrix kind of shows you is um right now what you're looking at the four or the 2x two is is for binary classification right you're only looking at two things right and you're either going to get true positives false positives false negatives true negatives so let's say you were diagnosing Alzheimer's right true positive would be like person has Alzheimer's and you diagnosed it as Alzheimer's the model diagnosed it as Alzheimer's and then a false positive would be like they don't have Alzheimer's but you diagnosed it as such and then same thing false negative is um you didn't diagnose it as Alzheimer's but they did have Alzheimer's and then true negative would be they didn't have Alzheimer's you didn't diagnose as it so um you want it to be true positive true negative that's when you're the most accurate if everything was true positive true negative then 100% accuracy um recall is measuring the percentages of all of your true positives compared to all of the positives right all your true positives all the false negatives are all the positive instances in your data set so recall is the amount of positive instances you said versus the amount of total positive inst instances then Precision is the amount of positive instances that the model said versus the amount of total positive instances of the model which means that it's the true positives and the false positives so all of the instances where the model said this is a person with Alzheimer's disease like if you fed in a brain skin and they was diagnosing it um it's the amount of accurate true positives versus the amount of all of the positives that the model gave you and then Precision accuracy would just be all of the ones that it got right right compared to all of the other ones so true positive and true negative then you also have something called F1 score that's also used a lot because it's pretty accurate and shows a lot of good things it's usually a good way that is usually used in L literature to um evaluate the efficacy of a machine learning model and that is two times the Precision times the recall over the Precision plus the recall so it's kind of putting more emphasis on um the all of the instances of the data as a whole cuz you're including the true positive false negative and false positive and it's a good way to kind of quantify how accurate the model was without using all of the instances because accuracy can be misleading because if there's way more instances of negatives than positives then you might have a high accuracy but you didn't get any of the positives and that's really important because you want to diagnose Alzheimer's right and if you're saying everyone doesn't have Alzheimer's but even the people the very few that did you didn't actually diagnose them you might have a high accuracy but it's not very accurate so having an F1 score can kind of tell you what's the like actual accuracy of the model like not the definition word of accuracy but kind of like when you say accuracy how good is the model something like that um then there's something called under curve um so right you're looking at is I think the true positive rate and the and the true negative rate right and you want to see the area under the curve oh no not the true the false negative rate that's what I meant so you're looking at all of the positive instances and you want to see the area under curve of them so where do they overlap that's where the they would overlap I might be mistaken actually let me check what it's plotting because I know the x is is true positive rate and then the the the Y is is um false positive rate that wait false negative okay that was right it was false negative right so yeah those are the two things plotting and it kind of tells you um uh how uh accurate your model is in terms of the overlap between its true positives and it's uh false POS false positives I think yeah false positives so the overlap between those two so how many did it catch out of the total positive instances so area under curve is usually used in instances where you're looking at more than two classes so when you're looking at things that aren't binary classifications that's usually where you're you're going to use area under curve because it can kind of tell you all of the instances Beyond just binary classifications so right now if you're looking at area under curve it's kind of doing the same thing as recall but if you have more than those two instances and you have more classes than that area under curve can kind of tell you um the amount of positive instances that you caught for all of the class classes so that's kind of when you use it if you're doing something Beyond binary classification yeah um going back to the metrics and confusion Matrix I have an example uh to show AR can you screen sharing for me yeah all right thank you yeah so I did uh I trained a model for knee degradation OST arthitis let me see if I can pull up the data set like for example looking at knee cartilages and determining if it has a case of arthritis and how severe it is so there were three classes healthy moderate and severe and these are confusion Matrix right so for example this is the actual classification with labels and this is the predicted so you can see that the Healy um had the most accuracy because it it correctly gets 630 and then classifi misclassified um some healthy cases into moderate and then you can see for moderate and mainly it's consistent and it did do misclassifications for severe and healthy and then for severe you can see just guest moderate and then mainly severe this could be due to an imbalanced data set but we can get our final accuracy as 9682 and this was the different metrics such as Precision recall F1 score and support so that's like a real life example of like a confusion Matrix what it would um actually look like in a real case scenario yeah confusion matrixes are matrices are usually just a good way of visualizing it and because instead of having a bunch of numbers you can kind of plot the accuracy on things Beyond just like Precision recall you can see the actual like details cuz you're either going to have positives or negative right if you have a binary classification either going to be positive negative um and you can see all of the instances the numbers for all the instances and it's just a good way of visualizing it without having to analyze or plot uh not analyze um without having to find metrics for everything okay so uh we're almost done uh we're going to go over some errors can you uh re can you like reload the um Google Slides oh yeah made some changes yeah I can also present that part okay so some errors to look out for cuz obviously we talked about overfitting but there's a lot more things that you want to keep in mind when training your model um you're going to have to look at Dimension mismatching this is something that I've encountered a lot this is really annoying um most models you can't put in any size image you want like uh you need to put in a specific sized image if you want to do multiple sizes of images it's going to take a lot more computational efficiency you might not be able to run it your model is going to have to be more complex so you usually want to put all of your dimensions of your all of your images as the same right let's say you wanted to put in a picture of a cat that was 125 by 125 then you have to put in all of your data set images as 125 by 125 or you could normalize the the dimensions of that image to be similar to all of the other ones like if you had some outliers like if most of your images were 125 by 125 but one of your images was 128 by 128 you could clip off some pixels to make it the same size um and then neural network layers mismatching um that's kind of when uh if your layers aren't able to uh give the same features right if they're giving features that are completely different you're going to get two things that are kind of arguing with each other layers are feeding in different outputs and both of those outputs are completely different from each other and they indicate two different things that could be good they could make your layer more uh your model more complex but if you have a small model it could lead to really weird results in a lot of different outliers even if your model is fine so you would wouldn't have to worry about layers mismatching if you had a very large model if you had a very complex model but if you have a small one then you should try to keep that in mind and make sure you don't have any weird values because that might be one of the errors and then you might want to make your model a little bigger then memory management um that's why we always worry about computational strain that's why so many of our uh techniques were related to trying to make the model easier to run like all the pooling layers um it's just trying to make the model fast and easy to run is as important as making the model accurate because if you have a model that you can't run then it's not going to like do anything and if you have a model that takes too long to run then you won't know if it's making any errors until it goes all the way through so you won't want that's like a good middle um and then device Computing make sure that you're whatever you're going to run it on can actually compute like the model and is able to handle like the model um that's why a lot of people try to invest in gpus or you can go on Google collab and buy some GPU um like online GPU there um those are pretty good um so always make sure that you're able to actually run the model and another thing to add to device Computing a lot of the times you're if you have a GPU available you're going to train it on um the GPU memory that's called the vram um but then you also have to transfer over through the CPU like for example when you when you predict when you do model prediction you want the all of the calculations for the model um to be done on the GPU but when you when you want to finally like interact with the probabilities and load it into a soft Max you have to convert it back into CPU same applies for model training you have to send your labels and your images back into you um into your GPU but then you have to convert it back into CPU to like of course um calculate the metrics how accurate it uh accurate it is the average loss and such okay so the final thing that we're going to go over that you should definitely take into account that a lot of people don't think about is model degradation which is the decline in performance or effectiveness of your model due to like abnormalities or variations in data so the drifting is when you have changes in how your data is distributed um which is kind of like um if your data is like kind of if you're training on your data and your first batch is shifted towards one of your classes and then your next batch is shifted to another one of your classes um that's kind of inaccurate you don't want to train your model on one thing and then go complete 180 and train it on another thing you want to have a good distribution so it's able to learn between those images and not overfit on one thing and then try to backpedal and then retrain because that's kind of what it's going to be doing if you have too much variation in terms of how you're Distributing your data um also feature drift is when you have a change in the quality of your features like if you're inputting like 4K images and then you start to input like 240p images um your model is going to get really confused it's going to start to take it as features like it's going to take the quality of the image as a feature even if you didn't want it to like oh this this part being blurry that might be a feature so then it might have nodes that relate to like those type of things and pixel intensities because pixel intensities could be caused by blur or noise in the image because it has low quality and it might start to look at that so make sure that your data is pretty normalized that's why you do pre-processing we're going to go into pre-processing two in a future lesson make sure you pre-process your data so it has same dimensions good distribution no variations something like variations but not huge variations something like that um also um lower performance can also be a uh indication of model degradation um which means that shifting towards a specific way like it's either underfitting or overfitting your data and it's either putting all your classes as one classification when they're not and or it's not classifying anything that is the classification and it's underfitting so that all those biases all the underfitting overfitting could also be a indication of model degradation and then the accuracy and performance metrics randomly becoming really bad could also be model degradation so if you see that you're is like 92 like okay that's pretty good and then you put it in for another few Epoch and then it comes out as like 85 and your model accuracy is getting worse try to figure out why it's getting worse because it could be model degradation but it also could be like some other thing it's not it's not always an error your accuracy just might be getting more more indicative of your actual accuracy and having high accuracy could have just been that one Epoch but oh just make sure that it's not like your model is become coming worse because it's overfitting or underfitting so before we stop we're just going to briefly go over um uh CNN and then unit because we also mentioned that so CNN you have your filters right you have your kernels you take a part of the image and then you do feature mapping which is um uh convolutional layers and Max pooling that's what feature mapping is um and what that does is you learn the patterns and you make your image smaller and then you put it into another convolution layer and then you do that again so that pair that pair between um convolutional layers and um uh Max pulling or pooling would be feature mapping and then after you get as many compilation layers as many pooling layers as you want then you go into a normal neural network and that's kind of just the classification part you're just feeding in noes your outputs and your learn features to get your final uh class ification into whatever it is or final segmentation or something like that so um in this either taking a brain scan what they could be doing is segmenting something like the ventricles here or they could be diagnosing something from here so there's a lot of different stuff that you could do from the same input so what you have over here and your data set what you have in like the convolution layers and your neural networks the way that the model is structured and your data set is kind of how you distinguish what the model is going to do and then this is unit this is the architecture for unit I mentioned before it it's called Unit because it's in a u um end coding is the one of the Left End coding Pathways where it learns the features uses pooling layers to reduce the dimensions and then it has skip connections which um pass down dimensions and features onto the decoding layer which brings the which um brings the image back to its original size and displays learned features onto the image which in unit's case is usually going to be segmentation so if you are thinking about doing your final project on segmenting like specifically medical images then unet is probably what you're going to use so if you want to segment brain tumor something like that you're probably going to use unit you could also use it for cancer too something like that so it's pretty good at Medical images and segmentation okay so before we go we have one more uh activity kind of think Sam wanted to go over that one uh Yes actually we'll take a break it's actually been an hour we'll take like a five five minute break um so get some food get some water if that's the case we'll be back in a little bit does that sound good all right so everyone um be back by 118 CST should we set anything up um yes just uh see if you can go into Google collab um this will be the link cab. gooogle we will send a link for um we will first do a demonstration that won't require you to like um use Google collab or VSS code or anything like that but um after the demonstration we'll give you um a little code puzzle that will put you into breakout rooms and you guys can discuss with each other e e e e e e hey everyone um uh five minutes is up you can like um kind of come back um that was a shorter break all right and um we'll do some code analysis if you you guys who are interested in um actually programming stuff we'll start off with a very basic CNN um I'll be introducing a pre-train model and it is based on the cfr1 data set which is just basically very small images it's actually quite a popular data set um you have stuff like airplanes cats birds cars you know the sort um uh okay I will also share this an example of this any Wonder with the link will be able to view all right copy the link and for those who are interested you can also follow along on here but we will um I'll I'll be sharing my screen um AR can you uh stop sharing yes thank you thank you I will where's this can everyone see this can everyone see this page yes we can see it mhm okay so I'll just briefly cover um what this uh program essentially is doing so basically um you have the first block I guess and basically what this is this is basically just installing your packages this is um importing this part is importing the pre-train model um I'll do live demo a little bit later um and this basically installs your necessary libraries and it and it Imports all the libraries that we need and this command over here basically sets your device to use your GPU as you can see over here um in Google collab um if you're not familiar you can choose all the run times that you can want you can have your CPU but um for typically more intensive ml training or testing you would want to use a GPU of some sort right here I'm using the L for example for the free version most of all of you guys will have access to T4 GPU and um there will be some some uh some amount of free quota that you can use like something like 3 hours um but that is that is plenty for our use case Okay so and over here this is the data preparation part we're going to Lo load the data set as you guys can see we also have a batch size for loading these data sets and um uh does anyone remember what a batch size is what what what is a batch feel free to type in the chat good answers good job yes yes yes so in this case um the batch size would be 128 okay and over here this would be um H I believe this is uh um some this is this would be this would be the bias I'm pretty sure um and we got this from the pre-train model and um this is a model that you can also access on GitHub um I might also share this um if you want to play around share this code if you want to play around with it you can like make a copy and this would be the biases um for uh for this model and so um yeah sorry but like the transform function will use it for um our Hands-On project but imagine it is it's basically a way to define what do you want the bottle to be consistent on right for example normaliz that's um using some baates and biases to make sure that if there are any pixel intensities like wearing brightness or they're like bright colors it makees sure it tones it down or um increases it to make sure all the images are consistent in this function um in this list of functions to do you can like reshape images you can do like a bell curve um if you guys are familiar with like statistics or also know as a gazan filter if you have like um very like bright images um and you want to like make them like same brightness just for the sake of training the model you can also apply a goian filter or a bell curve okay yeah that's a great um explanation and over here you can see the different classes um that this convolution network will return you see a plane a car a bird and etc etc and we will load the model over here we just downloaded model from here and then we will um load this I'll actually connect the runtime and we can I already start running because downloading will probably take some time while explain this yeah okay so here's the actual um convolution Network now we have set up um CH did you want to add anything yeah what I just wanted to add is reset 18 itself is a convolutional NE network but the difference between um resent 18 and regular convolution NE networks is that it has a predefined structure we're basically loading that structure in and then loading our data set and then training the model it updates the weights and biases in the in the classification module of the original architecture and you're not bounded to that architecture for example what if for example it has like um it has has 10 of these uh pairs like con like a convo layer then like a reu activation function and then like a a padding um not sorry P pooling pooling layer has like 10 of those layers right so that's the they can be a predefined architecture so you can see that um that that's a very good example of a CNN it has a convolution layer we do batch normalization then we also do Max pooling and then um we also have a fully connected classification module at the end so um does anyone remember the diff uh what does Max pooling do I'll put your answer in the chat yeah you can see the different layers over here so com one would be the first convolution layer and comp two would be the second convolution layer and pooling would be the PO so what does the max pooling layer do to a feature map uh put your answer in the chat yeah this might be um a little bit hard but I believe if you guys don't know um it's fine we can Define it just just put it's not it's it's okay to not know anything uh we here for you guys to like clear your doubts and everything so don't feel embarrassed personally yeah just try to put out your best guess yeah good job Jil that that's what basically a Max pooling does it's it's it's in the name finding the maximum value in uh in a matrix of feature maps and then assigning that pixel value so then average pooling would take all the average values of of a matrix and give you the average value that's what it that's what it's doing so max pooling is really um good for finding edges or finding like um extremely bright Parts also um you guys don't have to worry too much about oh how this code is super complex we're go um we're basically we just cover the concepts we will slowly um get you into like getting you familiar with coding with um machine learning we're going to give you a code skeleton at first so um don't worry about that oh okay we okay we encountered some errors but that is fine that's that's um the whole purpose of a light demo let's see if we can uh oh my God okay I'm gonna just like redo the around time probably bro uh um okay while while this is contined downloading we'll explain this a little bit more right so we talked about batch normalization earlier in our presentation so this would be what it would look like in um your actually coding experience so you don't have to really um write a whole lot of code and like write out the actual function as you can see Rel over here this should this should just be um pretty simple you use um the torch Library and you will um do torch. Rel oh yeah by the way um to introduce you what is p torch so is everyone familiar with the concept of modules and libraries in programming if not uh just put in the chat or you can personally DM me if you're not fine with it I will I will explain it to you guys if you guys if you guys need some explanation so does everyone know what a library is in the module um okay yeah it's fine um oh we'll briefly explain it so so is everyone familiar with Chrome extensions like um they add a little bit of extra functionality to your Google Chrome browser you can see right here I have a lot of extensions yeah so what um libraries do is that you can add a little bit of extra functionality to your code right and that's code other people have written basically so for example pytorch is written by actually don't know who p which is created by um let me quickly search that up so pytorch is actually created by it was created by the Facebook's uh research lab in 2016 so you can see that Facebook itself created um by torch as a machine learning library at code for you to do it because there's a lot of complex math um like graduate level math something you will learn in like higher education such as like a a college or PhD program so what they have done is allow is they allow you to use and do machine learning projects without necessarily knowing all the math behind it that's all basis of libraries without being an expert in that field um without being an expert in that in their in that field you can like use their programming right so you can see pyour we using it heavily then numpy numpy is a really important library for doing linear algebra calculations in general so you will see how lumai is used in our code in the future but yeah so um mainly High torch uh there are two main um libraries that are used for machine learning High torch and tensorflow so tensorflow is created by Google itself and it's mainly used for like um you can say embedded devices for example you want to have a security camera that can detect um people the tensorflow will be used that why because um Google itself creates special small computer circuit boards where you can run tensor flow on and I've done a bunch of effic efficiency programming where you can run um tensor flow on like small computers not requiring that much power and then pytorch pytorch is mainly created to use it um envidia so is everyone familiar with the company NVIDIA and the gpus like all the gaming gpus and such if not um I can I can go more into detail Nvidia is actually a very big name in the AI space yeah so Nvidia basically creates gpus and the main reason why gpus are used over CPUs is because gpus can do a lot of processing uh multiple times right for example like your CPU imagine doing um you have to solve 10 math like a 100 math problems right a CPU will do one by one but a GPU can do all 100 pro problems at once that's a really good uh thing about gpus so um that's why gpus are used Nvidia creates those so what Nvidia has done is they have created a program called cuda Cuda is basically a program that allows you to use the GPU at its rawest level it uses allows you to use the GP at its maximum level so you won't be directly interacting with Cuda that's what py does in the background so pytorch and Nvidia gpus go go hand in hand because pytorch can use the Cuda program s to make sure what what whatever model training you're doing it's happening to its fullest happening to its fastest and it's using the maximum amount of memory available and tensorflow cannot do that and why does Nvidia dominate the GPU Market because it's obvious used in AI projects there is there are barely any other AI programs like Cuda there's I'm I'm pretty sure Cuda is the only program there might be open source ones but open source ones are really um not you don't you're not going to be guaranteed that it's going to be maintained well and it might not work on your system so that's why Cuda and Pythor is used together we will use Cuda it's pretty easy to use Cuda with py torch that's why the main reason we're using pytorch for this program yeah yeah you can see right there yeah right here I just highlighted for. device use Cuda so the little zero means use to GPU available because sometimes they train models using multiple gpus if you have heard of open AI the company that created CH G G PT they had like thousands of gpus to train their um large language model uh GPT 40 so that's why um you have to specify which what device to use Google cab only provides one GPU unit so if you just put Cuda instead of Cuda col zero that's fine and the reason we did Cuda if towards. Cuda is available is because if you're training it on another device a device might have might not have Cuda installed or GP so that's why it will be trained on CPU bottle training is possible on CPU but it will take a long time okay I think I fixed it um good job yeah so here's just loading all the function definitions we talked about this a little bit earlier here is basically the structure of your CNN you can see there's a convolution error there's a batch normalization error there's actually two of that there's three of that and um there's a pulling uh a pulling layer not pulling error I'm sorry um and here is basically what the training would look like so you would right here I defined the EPO to be 10 so it will go through um if I wanted to train this model it would go through 10 EPO go through the entire data set 10 times and as you can see with this python for Loop over here for Epoch in range Epoch so it will go it will go through the entire um entire data sent 10 times um I don't think we talk about gradient descent I think we will cover this more in depth next week um when we actually start doing some more intensive and um programming test uh programming like um examples um but for today I think you guys can just get familiar with um reu and activation function um and defining like a CNN for example over here and I would recommend you guys check the documents um like um P py torch documents or something like that and usually there will be a lot of directions if you want to like set this up or something yeah so Darren um for example so as we learned you asked what does the forward function do so as we learned in the whole um CNN diagram you first have your layers to like detect feature to get feature extractions such as cona Max pooling and then batch normalization so forward is the actual neural workor itself that takes all the feature extraction levels and then puts them converts them into single neurons and then you can have multiple layers like for example in um our project we have two layers or we're going to have two layers so it's the final neural that it's the actual neural network that classifies the image as the different classes like is it a plane is it a car or not so let me pull up a diagram to better show you um okay so if you're more familiar with um here High classes it would be basically this oh man uh arov can you let me can you turn on Multi participant um it would be very similar it would be it would serve the same function as call here Sam can you unpresented then I'll then I can present sure okay thank you I think you both should be able to present now okay thank you yeah so um you can see this is our convolutional layers but this whole part is our forward layer that's our or forward layer does that answer your question which actually does it's actual neural network that you have been learning about it do the classification sorry for the bad annotation well yeah thanks so yeah and I'll talk a little bit more about the training function that we have set up although for this demoe particularly we aren't training it we're just using a pre-trained model um but I'll let me share my screen again uh Windows there we go can you guys see this oh it seems to be a black screen on my screen okay yeah so for training over here we're running this through the loss function that we talked about cross entropy loss I think we talked about in this in the slides and this is how it is implemented and we're also using an Optimizer called Adam and as we go through the training data we perform a bunch of forward and backward passes so as you can see the backward pass is over here and this is to update the model weights because essentially what you're doing in the training process is essentially you're updating the model weights based on the data that you're exposing it the training data that you're giving it and um over here it also prints like the loss as as your training and um in the test over here um there is no gradient descent but um it will only Pro provide you with the accuracy and this is just the shower images we don't have to care too much about that now we have every every all of our um previous cells written we um ran we installed our packages we we prepared our data we um defined our CNN we trained the CNN we and we test and we also defined some functions for testing showing the images now if I run this main it should give me some something and yeah I also want to add on to the training it prints the loss at like a regular interval as as this is loading um this is printing at a regular interval so you have a good idea on how the training is going okay as we can see the fils are already downloaded and our accuracy would be 91% and here's the images that it was exposed to and here are um here are four questions that I picked out cat ship ship plane and it detected successfully that is cat ship ship plane and we have a 91% accuracy so yeah um I just made some changes to the Imports and right now we should be in the clear me yep yeah so does that give you a good idea of what um running a basic CNN would look like actually if you look at it it isn't super difficult um most of the code are essentially um taken from a template and mainly the main things you have to consider would be like loss functions setting up the CNN and um using your reu your activation functions and all of that yeah the hardest part about training new models is of course doing the trial and error right figuring out what layers are needed when to use a convolutional when to use a REO when to use a Max booling or average pooing when to do m u normalization typically there are a lot of um pre-trained models ready for you available like for example um to do General classification you can use like uh efficient net which is a very small model but can be also used for like General classification I actually use that for the confusion Matrix example I showed you yeah this is the um pre-trained model that I was using I just took from I just cloned his G brao and model yeah you can see bunch of models like pre-trained models like vgg then like all the restent ones and there are not only one type of model like you see resent 18 34 and 50 all that means is there are more layers the higher the number more layers yes um because of time constraints I couldn't really like um train my icnn because there will J Will present with you with his um train model which also is a little bit more advanced but is on the M data set which is the digit recognition um yeah this is do we want to get do we want to get started on uh actually training the model uh are you talking to me or you yeah oh no well this is just like an example to Showcase like how this will work um but we will also provide um this example you guys can play around with it and see try here's like um something you can do maybe I'll assign it as like additional bonus homework or something if you something like that but it would be purely optional um see if you can get my code to train and run test if you're a more advanced user on um the data set that we have loaded C IFR 10 and see if our um how many layers is this probably like something like around eight or nine layers is good enough or is comparable to resonant 18 which has 18 yeah oh we'll be right back also just to talk about um continue to talk about the image classification um usually when you have like um these data sets those convolution networks will usually learn features will learn features and usually there are stuff like edges and textures and um contrary to what you think you um it may not look directly at a chicken for example for its head it might look at like oh how it's How is its feet shaped or what um what is the texture of its feathers or something like that yes does anyone have any questions about um our overview of a what um the code for programming testing running a CNN would look like also don't worry if it this seems very complicated in the beginning of the stages for um our first homework assignment or first mini project if you want to call that um we will give you the majority of the code and it will be an extension of the Mist data set that um G is going to cover today also arov can you um split them can you like uh set up the um breakout rooms can you do that or can you give me co-host yeah let me try to figure that out to be random hey guys don't join the breakout rooms yet we're just going to have it set up oh no people are joining no okay uh let's close it yeah just close it yeah if if you have it like set up that's good also feel free to give um me and J like co-host so we can make sure screen stuff yeah mine yeah yeah let me just set it up um yeah can everyone see my screen hello yeah I can see your screen yeah okay we can see it so make sure you're logged in with their Google so first search up Google collab yeah or you can just type cab. gooogle or you can go to here and just will be the same as your Google account yeah I put the link in the chat so um you guys can actually code this live since we um it doesn't matter if you have a bad laptop or not so first create a new notebook and we're going to name it um you can name it whatever you want I'm going to be nameing it uh naming it CNN right so for your runtime you want to use a GPU right change runtime type and right now it's CPU so do T4 you don't want just just leave it that just put okay select T4 GPU and click save right so first we going to be importing libraries wait a minute okay as um J is setting setting up um you can just copy the uh import libraries and stuff like that yeah so import torch and torch vision and import torch we'll see where this will be used and import yes those are just um torch would just be Pi torch this would be essential because we are using pytorch obviously and yeah most likely you would also have to install it do um pip install yeah well not for Google cab they have all the already pre-trained the models so nnn is a class with all the layers the convolutional The Bash normalization so I'll be using that and from import op so optim cares about optim is the actual class program that Chang all the weights and biases in your final classification module so um I talked about the atom Optimizer we're going to using this and one one last thing will be to import your train hey try should I uh do some explanation about it yeah okay yeah so so as um J is coding this up so just to reinstate this we're we're building a CNN from scratch using pie torch and we're turning it on the endet data set for those who are watching the recording yeah so um through tensor so basically you know how I redescribed that images are represented as matrices a tensor is basically a matrix except it's used to do calculations on the GPU so first we also have to import that now we going to be downloading the actual data set so now we're going to set up data set so first we're going to declare our Transformations but to do yeah so what compost does it it it's a list of Transformations we have to do so the only transformation we're going to do for now is doing the tensor it can also be used for like um reshaping images for example you have like 1,24 by 1,24 images but you also have 512 by 512 so you want to make sure all the image dimensions are consistent and we're going to be actually making the data set so um so torch actually mest is a popular data set used and then um it's it's a very popular data set so it's built into py yeah I I just wanted to add something so in the first lecture we talked about scalers which are just numbers you're like one two three four five we talked about vectors vectors are one-dimensional arrays so it would be in square brackets something like one two three as um J is setting up data sets right now um this would be a vector and a matrix could be something higher dimensional it could it usually is um two-dimensional so you would have something like um uh it is two-dimensional sorry just to correct myself um something like this yeah so you can see you have a separate data set called train and valid data set and you can see that um we want to download those data sets onto Google collab and one of them has the train set and the fall set so you never want to have images that are both in the training and validation and you always want to have more training than validation a good split is usually 90% training and 10% validation or 80% training 10% validation and then 10% for testing like um the the images that the model has never encountered so we loaded the data sets except we cannot use we have to create data loaders which includes which in memory in the computer's memory it puts the label and the images together and feel free to like uh quote along with me it is this is all done on the on the browser so nothing that your computer will be so on so this is very declare the batch size you can see we are doing 64 images at a time mhm yeah so everyone if you want to code along just go to the link I sent in chat um it is cab. research.com and you can just create a new file and code with us so now we run this code you can see it's downloading the data set you can see how many images it has uh almost a million oh almost 10 million yeah oh yep oh so you can see oh sorry those are the images we downloaded the labels now we are downloading the labels and images so if you click on the file icon here you can actually see the images such as ra all but we can't see them yet they're in a different format we'll visualize them does anyone have any questions for those who are following along can you sure the data loader code yes I will I'll give you um this data loader code I'll paste this in chat so basically the train data loader loads the training data set and the valid data loader loads the validation data set yeah and continue where I I was um explaining tensors so matrixes are two-dimensional you see them as like a rectangle but um tensors can go beyond the the second dimension can be like three dimensional or higher it can be like a cube of numbers as G yeah sorry I'm I'm getting an error um yeah and um with with like making these kind of things um yeah sorry guys program M ml it is a lot of trial error so yeah I'm just trying to visualize the data set not that it's needed but would be better for you guys to see how it's actually presented h on yours yeah and um in the meantime um here is uh the code for importing stuff and and here's the import cell and here is the data loading cell for those who want to follow along now you can see how it's represented as values the whole image and you can visualize it so that's why you're going to be using M PL live a graphing uh library and we also want to show it as black and white so you can see this is the number five and then the whole image is 28 by 28 By the way so that's how you can see the dimensions here so let's keep on coding let's actually create our um model now um tril uh what library are you referring to so the copy library we'll use it later um we'll use it to actually copy the model for for some training and validation Loop so you don't need it right now I I'll just import it later good question though but yes so let's actually Define all so this um so the creator of CNN's is called um I forgot ban and Leon so that's why this model is called the Lee model you're copying the architecture from the e model because it's very simple it's very small not not super complex for our use case so we're going to do create CNN so this is where where we create the L so what um so and and and sequential is a subass it's used for Simplicity and um readability a bunch of like um sequential means like one by one like each layer is one by one like one the image has to go through one layer then another layer then another layer that's we're using sequential so um that's why we're using n then not sequential so it's really easy you just put all your layers in the models so the first layer we want is um is a con convol layer a 2d convol layer so what it does it first so first you specify the channel right so um majri C's right the matrices 0 to 255 numbers that's of what a matrices is in a regular image there are three matrices that represent the image the the red value the green value and the blue value for our use case we're only going to one white or black because there it's 2D images and then uh we want to apply six filters and the kernel size will be 5 by five and the padding will be by two then we're going to have a Vu um activation layer and we're going to have a average pooling layer as you can see that um you can see the image that we have some like gray edges so we have to really make it smooth because the handr images we're going to have it's not going to be like pixelated like this right um we're going to have it actually sharp so that's why we want to use the average pooling there instead of Max so average TP and it's going to be um we're going to swap we're going to decrease the activation by the way you have to add commas to each single one sorry so you're going to be uh you're going to decrease the feature Map size by two and it's going to have um it's going to be a The Stride will be two we learned about striding before then we're going to have another kind of lual layer this one is going to have six channels right so notice how um because we had six feature Maps here we have to specify six channels here we're going to have 16 filters for this one then um 5 by five and the padding will be again two and then we're going to have a no the padding will be zero sorry since we have so many filters we don't need extra padding yeah can you scroll up a little bit um someone wants to see how um uh visualiz the code the image I meant here can you see it no no like uh the code for doing that oh sorry I'll I'll like yeah I'll take a screenshot all right yeah you can continue so we're going to have um another REO layer and we're going to have a another average cool beer and this one's going to be two and again this stride will be two so going back to the previous example of um of the neural network layer right you had the forward function that uh someone was asking about instead of creating a separate function for it we're just going to implement it into our whole architecture since our model is very small we don't need a separate forward function it is um recommended so the image is a 2D image right so we have a bunch of matrices a John matric to convert it into everything into one layer let me pull up the diagram again so we have 2D layers right to convert into one layer like these three dots right these um sorry six dots we're going to do flatten it that's called flattening it we going to do and then. flatten and then we're going to do create the layers right so we're going to have 400 neurons and then we're going to convert them we want to simplify them into 120 uh neurons another REO layer so we use relo a lot because images right it's just black and white so it's very important to detect edges that's why we're using relo a lot and that's why sigmoid is not important right now it's just black and white there are not that many colors this will also prevent overfitting so we're going to have another linear layer 120 and we want to convert it back into four neurons then another W layer and then another linear layer so um I forgot to mention this but the whole data set right the whole data set um it has 10 images sorry 10 classes 0er to n so this is why the final line we take 84 classes and then we simplify it to 10 NE Rons you can see even here it's goes from zero to nine right this is basically our model except um except this part is a little different but it goes from zero to 9 that's why the final layer has to be 10 layers representing the amount of classes that's standard across everything and then we're going to return on model so let's create this function yep no errors okay so um yeah I think I um I already sent the the reason for doing flatten and um for linear uh do you want to explain this or yeah so the linear is just um converting each layer into another like I'll show you a diagram yeah um anyone if you have like um some basic questions that you totally do not understand like um like what is a library just just please private chat to me and I'll just answer to you if you're um embarrassed um uh don't don't don't shy away from asking questions we're here to help you guys learn and um yeah this is yeah so this is a good example so notice how in this image it goes from 120 to 84 layers that's what we're basically doing linear is to see how many layers we want to convert right so doing some MTH when you flatten the image for example we end up with 16 channels when we uh plan the image we have 400 points 400 new Rons right um 400 neurons in the final layer 400 of these circles and then we convert it into 120 then from 120 we convert it into 84 and then from 84 we converted back back to 10 so this is what does the classification itself so we learned about weights and biases this is what the weights and biases are actually in and this is what we're going to change uh through model training so any more questions how do you determine the parameters that are put into the function example the padding strides Etc so yeah so for this one padding really depends on your image right so um we're following the Leon diagram that's why we know what the padding is but padding can be determined how many edges you want to detect um how much how complex is the image so it's just really experimenting draal and error what what how much the padding uh helps and how much the stride helps um I'll see if I can find a more detailed um like a detailed way to see how to determine the padding and such in your model and I'll send it in the Discord server after this and yeah and also Sam made a good point you can see each layer reduces the the dimension so we can find um exact edges and exact features which helps with overfitting that's the whole point so now we're going to declare our training and validation blue so first we're going to determine our validation do yeah so first we want to get the the model load the model layer and we also want to get the actual data right so a very important task is to do model. eval what this does it sets the mo models mode to evaluation instead of training mode it goes to validation mode you need to do this when you're predicting and your validation Lo then uh we going to see how many are total how many are correct and how many are wrong so set up VAR for those and then device so we're sending the parameters all of our weights and biases there's no way to um as far as I know to visualize the weights and biases of each layer we're sending all those values to um we're declaring the device for it ifcl our CPU then we bought no gradient descent since this is a small model we don't need gradient descent so for each images and labels in the in data we want to send them to the device itself uh the GPU then we're going to find the output of all the models and then we're going to find um the predictions for each of them then we're going to find um the total then we're going to return our accuracy now first we have we also have to do correct find the correct amount before pretty predicted um tables for the correct predictions you have to see people find the sum and also get the value and then we're going to return the validation so let's just run to make sure we have no errors no errors are we're going to add a training Loop yeah guys just wait yeah so again in the training Loop we want to specify the amount of epoch we want in learning rate so the learning rate is basically a hyperparameter which controls the step size how much we should change the rates and biases it basically determines how quickly the model will learn and um through training so for example if you said too low of a learning rate then the model what it's going to do it's going to barely make any changes and kind of stagen it progress if you're do too high of a learning rate the model will over overfit a lot so you want to find a sweet spot usually a lower learning rate helps better you might need EPO but it helps better in the longer uh run and then our device for now I'm just going to put default is through that then we want to we're going to plot the accuracies over time so let's make a list for that then let's get our model which will be create cnn. to device so we're getting the model and sending it to device which is C then we're going to uh get our ENT uh entropy cost that we talked about so what the lot it's basically a loss function which computes the soft Max probability and U the how what's the loss between the predicted and actual labels so the loss itself is used for um back propagation and to give us a metric we're going to get our optimizer so we're going to pass our models parameters so you can actually change them and the learning rate is a learning rate from the input then you're also going to do um ma Max accuracy we're going to only save the model or only give the model if the Max uh for the max accuracy so it's going to be zero for now so for each EOD the range in range of eods basically repeating over the amount of eods we're going to first train the model and for each image in the train loader uh we're going to first send the images to uh to our device then we're going to do um no gradient descent then we're going to get the predictions for our model to do back propagation then calculate the loss so you can see we're getting the prediction and tabls then we're going to do our back propagation this line and uh optimize the hyper paramet so to explain line by line what we do we go over the amount of epochs so for each Epoch we first train the model and for each images and label we can put them on Cuda uh then we find uh we do zero gradient descent we find the loss how what's the difference between accuracy and um and the mispredictions we do back propagation then we change the model parameters so that's our training Loop now we're going to do our validation D let's put the model back into evaluation [Music] um then we're going to do we're going to find the accuracy we're going to run the validation Loop over the CNN model and our data will be the data loader the validation data loader and we're going to do accuracy append um accuracy so if the accuracy is greater than the max accuracy then we do best model copy copy so yeah back to re question to feel um we're basically copying the model and saving it into best model so that's where this is where will import the CNN I mean the import the copy module and rerun this cell and then we're going to do Max accuracy is equal to our current accuracy and then we're going to do print this model is accuracy of um current accuracy and we're going to print our Epoch progress and our accuracy then uh once we're done with the whole validation loop we're going to plot our our accuracies and then we're going to return the rest so let's run to make sure we have no problems one more code C this is our final code so so let's actually the final model it's we're going to call a training Loop so we're going to do 10 EPO for now since that's a very good number the epox 10 so we will keep the learning rate the same and then the device the same since we have access to Cuda and then we also have to add a model set so torch. save then our final model that's St and the name of the file would be weights. so what this does it saves the weights of the model right so when we actually doing predicting um we also have to redefine this model again but we use these weights instead of starting from ground zero so let's run this C and now let's run this one what is the error I might have SP sequential one um CH yeah run this model now let's R this okay um let's the optimizer oh parameters it's a function yep it loads it's going to see it's going to load and meanwhile if you want to visualize your GPU Ram you see it's very minimal the GPU Ram that's how small the data centers expected anwers to be on same device found at these two devic so you can see this is a common error like forgetting where not forgetting but um not remembering where you're devices like what your um if it's on CPU or Cuda memory that's what you have to know so let's see where does it say what's the problem what live and accuracy. validation do you need the device one ler that the device what oh this thing oh one sec I sent it to you yes we will send the um actual coll out file in Discord as well um we will send the cart 10 example we will send the this training example if you want to do it and there will also be an assignment posted a little bit later yeah well actually actually um we will just post a c41 example uh I don't know I don't know we'll probably post all three yeah we'll post all three but um we would continue um where we left off um next week because we didn't have time we're not going to have time today to do the interactive activity that we we have planned we kind of went a little bit overboard with our time yeah sorry guys it's been a long yeah um thank you so much for staying for 3 hours um we prepped very hard for um giving you a balance of canity and stuff yeah you can see our first Depo the accuracy is 94.6 you can see the you can see our uh accuracy increases so this is a really simple bottle so that's why off the dot it starts at 94 but it's expected the model to be stored below 50% and it keeps on going better and better mhm so see you can see it saves the best file in memory the best model and if it starts going back down or up then you'll see fluctuations too you'll see in the final draft how happens so any questions so far um and is everyone's code running so far the model training yeah if you need um a particular code cell um we can provide that to you I know I changed the validation Loop code uh so here's the updated code you can see that it's you can see here from like Epoch 4 to Epoch 5 Epoch 5 did a little bit worse so that's why I didn't save the model but we're doing pretty good so you can see epox L to8 it's like very minimal so we're almost done training our model Epoch 9 you can see start doing a little bit worse but it's like very minimal so it does doesn't matter too much but it saved uh Epoch 7 so you can see this is our graph so far it started doing good so you're done with model training now what you're going to do is you see model. paath download it on your local file Buton yeah right right click the file um you can click the files on the sidebar and then right click the weights. pth file and you can download it and that's how you would save it save the model onto your own computer essentially um saving your progress of training yeah now I'm going to reshare my whole screen so um you guys can better see so yeah wait can everyone see my whole screen uh yeah yeah it it is okay yeah yeah so we have the the F hats ready this is for us from a previous one now I'm just going to name it um digit rates right so we're going to open a folder U this is vs code by the way so does if does everyone have vs code installed yeah um I can discuss this yeah so vs code it's basically you can think it is a text editor but um you can think of it as IDE it is basically where you like program stuff yeah I'm gonna I made a new folder on my directory mhm yes I'm going to drag the weights in or you can just control X and control V can everyone see my screen yeah yeah yeah your screen's good yeah let me just set up some stuff wait a minute so yeah um we're going to import some libraries so we're going to install everything locally right so um so first thing you want to do is install torch and torch Vision on your computer so mine's already installed so I'll will show um something different than your screen but we're going to be doing CPU Computing if you have a GPU we uh it'll take extra steps to use your GPU but we won't cover this today then we're going to install um pame which will be used for the actual JY uh guy sorry to do handr um recognition then numpy I'm pretty sure it's installed with python by default but just make sure to everything is installed for me on mine yeah so let's import some libraries so we're going to importing torch again then to declare our function um structure we will need to import the neural network class so you can go back into your old code and then um copy the cell where we do def create CNN create the function oh sorry from for so yeah just just recreate the model let me just fix the basically the same function in your Google collab if you have any questions or any errors that are happening uh Sam is going to help you and after that I can so now we're going to actually create our GUI so no first we want to load the model right create a function to load the model so we're going to take in um the file pad of the wave use actually use the baits right so we're going the model is equal to create CNN get the model architecture then load the model for. load so the the file path of the loading function will be the file path our input and our location will be the CPU I don't have a GPU and if you do have a GPU or like an Nvidia GPU then uh it will take extra steps to download Cuda and make sure you have the right version of torch um that could use Cuda so for now you're just going to do use the CPU and we're going to return the uh put the model in evaluation mode and return now our model we going to actually load our model so load model we'll do that later let's create all the functions now let's create the canvas so we want to we're going to save a image file and run the classification on there our file path will be where we want to store or what name so we're going to use pame so py game.it initialize it oh I forgot to import pame and also have to import um sorry yeah so import P game up here import from pillow which is a module built into python for image opening and processing import image so initialize P game in our screen for p game all uh how big do we want it you want to be two 280 by 280 so it's small but um so we can draw the image 280 by 280 it's a two bill by the way so have it's a single parameter 280 by 280 then uh High set display Now set caption to like uh name your uh screen basically pame do explain. set caption um digit recogition then uh you're going to create some variables what was the last position so while running equals to so while uh running all the K is running so for every single Event Event means in P game uh any interaction like clicking the mouse button or keyboard detection for event if the event yeah sorry guys so if event. type is equal to uh I by quit putting the closing the screen then you want to be running equals FS to close the L I'll save if our Mouse running this down you want to do drawing equals true so event. heals to down or uh we're moving the button so yeah the reason we did uh separate we want to see if the mouse button is down and we are moving the um the uh the mouse that's why we did drawing we used the be P while drawing and the mouse position going to get the position of the mouse the last position sorry last not post the last position the last position is to they're going to draw a line basically and we want it to be [Music] black pitch black draw a line on the last position then our bouse position basically the line we're drawing a bunch of lines and 20 bit and then the last position of the mouse is position so in a p game the way the lines are drawn is you get two coordinates from X1 to y1 to X2 to Y2 that's a restoring last and mouse [Music] position yeah so if you guys are getting um errors just like um take screenshots and you're going to of course just ask for what cells are not working we'll provide you the code for that so and at the end of the day we're also going to um send the final program we're almost done so if the mouse button goes up right then we're going to drawing equals FS and then last position equals none and then uh once we finally press enter to to store our image and do predictions on it have to check for that oh sorry if event type is equal to [Music] yeah ke and the event. key is equal to um a return return if this guys guys we're probably gonna go over it next time briefly since this meeting went like way too long um like we'll just go over the code yeah so don't worry about that sorry not L rookie mistake yeah then we want to save the image into the file and we also want to set it screen whole screen and the F that yeah and then running equals [Music] false and while it's running we want to flip the image because um what P game does it flips all your flips your screen and then after the whole Loop is done after we put running equals false this Loop it will break basically break we're going to quit P game now we're actually going to do the model prediction so if uh um you know what so that this part if name is equal to if main what this does is um if you're familiar with python you can load piece functions as a as a class and use it in a different file since you only have one file adding this part so if you're directly running it from the terminal then um it's going to only run this part of the code so just for the sake of Simplicity let's load the model and our file path is um digit weights. T to make sure we have no erors and let's create our canvas and the file path will be uh digit image.png now let's among this so load the model um cannot import torch on NN oh sorry not from NN what's import t. and then as and then rookie mistake what yes it loads the model okay yeah so you can see if if we draw a number it's not going to do anything press enter you can see our our image just got saved so this is what the image we'll be predicting on so let's actually run the prediction model so we first first loaded the model now what we're going to do is create a canvas so iny um create canvas digit image. then we're going to load our image so we have to write some code to load the image and convert it into file for prediction since we have to like reformat it so file pad then um we're going to open our image using the pillow module so the name will be file path you want to convert it into black and white we just want to make sure that the it's already black and white but we want to make sure the the image we import it doesn't have three matrices it only has one for black and white that's what we have to do convert. then we're going to resize the image on 28x 28 that's the image resolution that our model was trained on 28 by 28 and then we going to put the image as an array as a matric so convert it into array that's what num is used then we con convert the image into a t for 10 for um running predictions on the image and we want the data type to be a TCH float and 32 bits and then this is um the hard part so like I talked about Dimensions this is where uh we need to make sure our dimensions are right so um currently the image Dimension what it will show is 28 by 28 we also have to show how many images the model has to process and what's what are the how many matrices it has right so RGB image would have three matrices but this one only has one that's why we have to do UNS squeeze twice to add the the batch size and the amount of matrices and then we're going to return the image so after we have done that we're going to load our image so load it image so load image and this would be digit image the file path which would be our digit image.png then we're going to use a softmax algorithm so first we have to um what you do to predict the model if you go back to our code let me share my screen screen um can everyone see my screen by the way hello yeah see your screen but yeah I feel like people probably got to go soon so why don't we finish the code next time because taking wa oh yeah just just a time you um it's almost done but we can see that we pass images in here so we're basically going to do the same thing here so we're going to do uh the probabilities get the probabilities outputs equals model or oh we have to store the loaded model model into model loaded image and we're going to run a softmax algorithm on this so our probabilities are sorry my you put is yeah torch we're going to do a soft Max on this and our Dimension has to be one since we want it across one um linear Vector so that's where you have to put the dimension as one so we going to um uh detach the probabilities from P torch so we don't accidentally modify them put them uh back into CPU memory even if this is not needed we'll just do it to Ure num five and then we're going to do class then for uh I and probability and enumerate in the numeric pops uh index zero and then to print the class what what number it is what's the probabilities so our whole project should work now so let's do it um P game is initialized let's pick the number three press enter you can see predicted 3 at 100% And we can see all the image saved and it reformatted into 30 or 28 by 28 so let's try another number the number nine 100% let's try another number um number four you can see that well this is this is probably an error we need to retrain the model it thought of the class 4 as 3% and Class 7 has 7 7% so the data set has bunch of images centralized so we probably need to enhance the data set more or retrain the model or increase model complexity because we only did it for 10 Epoch while there were like a million um images so let's try one more like number eight see classified as see it's kind of bugging out because we didn't use the whole data set if you do like the number six yeah so that this is like a a good representation how a CNN model will be used to do prediction load the model uh use the Vates from the weights file load the image as a tensor make sure it's in the right format and then uh do anything with the image so yeah this was how um this was so this was a great example of how image training and classification could be done any questions guys oh good job good job javil on saving the model we'll send the code briefly on uh through a GitHub link or a Google Drive Link is probably we're going to wait till next week then we're going to go over it again and then we're going to send the code because a lot of people left since it went like extremely long and I mean it was probably hard to follow just because of how much time it took so we're probably going to go over it briefly again and then we're going to send the code after next week yeah thank you guys for attending we're super sorry this session took um super long but yeah it was it was nice with you guys thank you right bye bye everyone

Transcript for:Lecture on Convolutional Neural Networks (CNNs)

Transcript for:
Lecture on Convolutional Neural Networks (CNNs)