Understanding Max Pooling in CNNs

Hello all, my name is Krishnagam. Welcome to my YouTube channel. Today in this particular video, we'll be discussing about max pooling and guys, we'll try to understand why max pooling is basically used. In my previous video, I've already gave you some hint about max pooling. I told you something about a very key term which is called as location invariant. If you go and read Jan Lekun research paper with respect to CNN, this particular term is specifically mentioned over there called as location invariant. Okay, if you want the research paper, I'll try to provide that in the description box. You can go and read for that Now what does this location variant do? Suppose this is my image of 4x4 size and I am applying a filter of 2x2 considering my padding is equal to 1 and my stride is equal to 1 At that time, I will usually get my 3 cross 3 output. The reason is that if I subtract n minus f plus 1, which is my filter size, n is 4 minus 2 plus 1, which is nothing but 3 cross 3. So this will basically be the size of my output. Now, I told you that I will get the output of n minus f plus 1. you that this is my convolution operation right so this is my convolution operation that I have done so usually after convolution operation what we do is that we can also stack multiple convolution layers or we can also apply max pooling now we'll try to understand what exactly is this max pooling now guys I told you about the term location invariant suppose in this particular image I have multiple cat images okay suppose I have multiple cat images like this Now when I have this multiple cat images and suppose this is my filter which is actually detecting the faces You know now as we go on Horizontally into the higher level this faces should be easily detected and more precisely detected So for that particular reason, when we come to this particular layer, we will be able to see some information regarding the face. But as we go to the higher level, we should be able to clearly see the images. I mean, see the shape of the faces at least. So for this, we basically use something called as max pooling. Now how does the max pooling operation take place? Now suppose this is my output, which is a 3x3 size. I will go and apply another filter. Okay, this is called as a max pooling filter and it can be of 3 cross 3 or 2 cross 2 But just the most important thing is to understand how this particular operation takes place now in this max pooling What we do is that remember this word called as max Right, so once we take this filter, we will try to place it in top of this. So initially we place it over here Now when we place it the max pooling operation basically takes that that which of the value is higher in this particular pixels Now I can see that 4 is higher than all the other pixels Now the 4 will be considered and it will be placed in the first value as my output okay then and in this case we usually take the stride as two right and my jump my stride jump is basically two that basically means after this the directly jump will go to the next place like this now you may be considering that don't don't can't we just take s is equal to one but just understand guys we are trying to do the max pulling over here max pulling basically means at a specific location i am taking the higher intensity value and i'm placing it in my output so from this wherever the face is basically detected clearly I'll be taking that information and putting it in my output. So similarly what we do, we take a stride jump of 2 We will jump it over here and go to the next stride. Over here, we don't have any pixels So the max pulling will take from these two values and from these two values I see that the 6 is the highest so I'll be placing 6 in my next value over here in the output. Now after this again my This filter will jump here Okay, over here I see you don't see any value. So what will happen from this 8 will be taken up because 8 is the maximum 1 Okay, and then it will go and jump over here like this from here. I have just one value that is 4 So instead of 8 here 4 will be taken But now you can understand that the maximum intensity value is basically picked up from the output of this convolution layer With the help of max pooling and finally we are getting this particular values. So So as I told you that the example over here if I have multiple cat images right in the first filter suppose some shape has been detected some shape like this okay some shape some some something like this it has been detected where they are able to detect the face shape but as we come to this particular layer the face will be properly detected. Only this phase will be properly detected because we are taking the high intensity high pixel values from over here with the help of max Pulling now, they are also other pulling techniques. There's something called as mean pooling. There is also called as mean pooling So one is called as mean pooling Instead of taking the higher intensity value will take the lower intensity value lower pixel value and there is also called as mean M E A N mean pooling I'll not say mean, instead I'll say average pooling. So all these kind of different different poolings are there. Okay. Again guys this can also be horizontally stacked anywhere after some convolution operations. In some of the neural network which I will discuss as we go ahead you will be able to see that we will be having a combination of horizontally stacked of convolution layer and max pooling layer. And at the end of the day guys you will be able to see different different architecture when we are learning basically about transfer learning. There is a very good concept about transfer learning. If you go and see in the transfer learning how this convolution and max pooling layer are stacked, you will be surprised to see it, how efficiently they are able to bring the output. After getting this particular output, there is also a very good layer called as fully connected layer. How this will get converted into a fully connected layer, that we'll discuss into our next class. But understand guys, this particular filter in the back propagation will also get updated. Like how we update in the filter operation in this particular convolution layer. How we update the filter value inside this. I told you that after we get the output, we apply the relu, right? Now when I'm doing the back propagation this filter will get updated. You know like how we update the weights. It will get updated unless and until we get the right value of detecting every object that are present in this particular images properly. You know so that is how this is matched pulling basically work i hope you understood this particular videos guys so i hope you understood what exactly is max pulling i hope you understood it very clearly so this was all about this particular videos guys i hope i hope you like this particular video please do subscribe the channel if you have not already subscribed please share with all your friends i'll see you all in the next video where we'll discuss about the fully connected layer thank you one and all

If you go and read Jan Lekun research paper with respect to CNN, this particular term is specifically mentioned over there called as location invariant. Okay, if you want the research paper, I'll try to provide that in the description box. You can go and read for that Now what does this location variant do?

Suppose this is my image of 4x4 size and I am applying a filter of 2x2 considering my padding is equal to 1 and my stride is equal to 1 At that time, I will usually get my 3 cross 3 output. The reason is that if I subtract n minus f plus 1, which is my filter size, n is 4 minus 2 plus 1, which is nothing but 3 cross 3. So this will basically be the size of my output. Now, I told you that I will get the output of n minus f plus 1. you that this is my convolution operation right so this is my convolution operation that I have done so usually after convolution operation what we do is that we can also stack multiple convolution layers or we can also apply max pooling now we'll try to understand what exactly is this max pooling now guys I told you about the term location invariant suppose in this particular image I have multiple cat images okay suppose I have multiple cat images like this Now when I have this multiple cat images and suppose this is my filter which is actually detecting the faces You know now as we go on Horizontally into the higher level this faces should be easily detected and more precisely detected So for that particular reason, when we come to this particular layer, we will be able to see some information regarding the face. But as we go to the higher level, we should be able to clearly see the images.

I mean, see the shape of the faces at least. So for this, we basically use something called as max pooling. Now how does the max pooling operation take place? Now suppose this is my output, which is a 3x3 size. I will go and apply another filter.

Okay, this is called as a max pooling filter and it can be of 3 cross 3 or 2 cross 2 But just the most important thing is to understand how this particular operation takes place now in this max pooling What we do is that remember this word called as max Right, so once we take this filter, we will try to place it in top of this. So initially we place it over here Now when we place it the max pooling operation basically takes that that which of the value is higher in this particular pixels Now I can see that 4 is higher than all the other pixels Now the 4 will be considered and it will be placed in the first value as my output okay then and in this case we usually take the stride as two right and my jump my stride jump is basically two that basically means after this the directly jump will go to the next place like this now you may be considering that don't don't can't we just take s is equal to one but just understand guys we are trying to do the max pulling over here max pulling basically means at a specific location i am taking the higher intensity value and i'm placing it in my output so from this wherever the face is basically detected clearly I'll be taking that information and putting it in my output. So similarly what we do, we take a stride jump of 2 We will jump it over here and go to the next stride. Over here, we don't have any pixels So the max pulling will take from these two values and from these two values I see that the 6 is the highest so I'll be placing 6 in my next value over here in the output.

Now after this again my This filter will jump here Okay, over here I see you don't see any value. So what will happen from this 8 will be taken up because 8 is the maximum 1 Okay, and then it will go and jump over here like this from here. I have just one value that is 4 So instead of 8 here 4 will be taken But now you can understand that the maximum intensity value is basically picked up from the output of this convolution layer With the help of max pooling and finally we are getting this particular values. So So as I told you that the example over here if I have multiple cat images right in the first filter suppose some shape has been detected some shape like this okay some shape some some something like this it has been detected where they are able to detect the face shape but as we come to this particular layer the face will be properly detected. Only this phase will be properly detected because we are taking the high intensity high pixel values from over here with the help of max Pulling now, they are also other pulling techniques.

There's something called as mean pooling. There is also called as mean pooling So one is called as mean pooling Instead of taking the higher intensity value will take the lower intensity value lower pixel value and there is also called as mean M E A N mean pooling I'll not say mean, instead I'll say average pooling. So all these kind of different different poolings are there.

Okay. Again guys this can also be horizontally stacked anywhere after some convolution operations. In some of the neural network which I will discuss as we go ahead you will be able to see that we will be having a combination of horizontally stacked of convolution layer and max pooling layer.

And at the end of the day guys you will be able to see different different architecture when we are learning basically about transfer learning. There is a very good concept about transfer learning. If you go and see in the transfer learning how this convolution and max pooling layer are stacked, you will be surprised to see it, how efficiently they are able to bring the output. After getting this particular output, there is also a very good layer called as fully connected layer.

How this will get converted into a fully connected layer, that we'll discuss into our next class. But understand guys, this particular filter in the back propagation will also get updated. Like how we update in the filter operation in this particular convolution layer.

How we update the filter value inside this. I told you that after we get the output, we apply the relu, right? Now when I'm doing the back propagation this filter will get updated. You know like how we update the weights.

It will get updated unless and until we get the right value of detecting every object that are present in this particular images properly. You know so that is how this is matched pulling basically work i hope you understood this particular videos guys so i hope you understood what exactly is max pulling i hope you understood it very clearly so this was all about this particular videos guys i hope i hope you like this particular video please do subscribe the channel if you have not already subscribed please share with all your friends i'll see you all in the next video where we'll discuss about the fully connected layer thank you one and all

Transcript for:Understanding Max Pooling in CNNs

Transcript for:
Understanding Max Pooling in CNNs