hello everyone in this video we will be implementing handwritten digits classification using tensorflow we are targeting extreme beginners who are starting their career in the field of computer vision and deep learning at the end of this video we will show you demo that how a digit can be recognized by a deep learning architecture so this is based on a paint program microsoft paint in which any person write a digit like from 0 to 9 the deep learning architecture will identify that which digit a person has written so far let's get started my name is chanullah and i'm a phd student at enhan university south korea my major is computer vision and deep learning before diving into implementation let's have some brainstorming this is an image of a cat and for computer image is nothing but a set of end integers and we have for example in this case 800 by 600 image okay so this is 600 and this is 800 right so in this case every pixel this is a pixel right every pixel corresponds to an integer value if this is integer value is 255 we will see that this is white in color and if the pixel is 0 it will be black in color in between everything will be gray color like over here you can see that we have multiple strips of black and white color but they are not pure black or neither pure white they are gray in color right so in this case we have two strips such as this corresponding area is far from 255 and closer to zero that is why it corresponds to black in color similarly we have another strip which is which could be black in color which are far again far from 255 so the closer to zero we will be black in color and so closer to two five five you it will be white in color but this was a typical example of gray color right as i told you that zero two two five five early correspond to gray color so if we have multiple channels like three channels of gray colors it will be formed as a color image like in the color image we have three channels red green and blue now that we have understand what is image like 2 28 into 28 this is 28 by 28 pixels image so this pixel corresponds to 255 which is white in color similarly this corresponds to 0 which is black in color in order to extract the features like image classification is nothing but based on salient features we need to classify image which category it belongs to these days we are using deep learning architectures which automatically detects everything by itself right in this example we have shown you multiple layers like convolution pooling and dense layers three type of layers we have shown or shown you over here this image is seven like this seven is cropped and we have given deep learning architecture this seven image and based on saline features it actually um extract all features using convolution layer so conclusion layer is nothing but a feature extractor then we are have pooling layer which actually we can have max pooling and average pooling right so pooling is to reduce the size as you can see over here the square was bigger and then the size become like it was 28 by 28 image size then it become 14 into 14 so pooling is for reducing the size and moving or forwarding all those pixels which are more valuable similarly we have again again another convolution layer so we will extract more information and then we will reduce the size again and we will allow only highest or best features and at the end of this deep learning architecture we have three dense layer and then slayer is nothing but a typical neural network which is connected to all neurons and to sum up all these features which we have extracted so far from summon so this is a decision over here we have decision layers right so the last one is a decision layer which is a also known as technically classification layer and it must have the same size as we as a number of the classes over here we can see that we have from 0 to 9 like 10 classes so the last dense layer must have the same number of neurons like so in dense layer we have 10 neurons in the last tense layer which are equal to the number of the classification we must have some data set in order to train and generalize our solution so in this case we are using amnes dataset which contains 60 000 data set like 60 000 images of 28 by 28 and they contain mostly contain like this 0 to 1 written by different people in different style you see over here five is five is written in different style so different human can write a different the same digit but in different style so in order to generalize the solution the training will be performed and deep learning will learn how a human ah a human can sorry how a human can write any digit right so convolution layer is nothing but feature extractor as i told you before and these are the operations if we have five by five into three filter it extract all the features based on the weights of the of the filter so if you want to detect an edge from an image for example over here is an edge and we have edge as well when edge will multiply with edge you will get our answers will be boosted up but if you have a something circle and other thing it will give we will have a lower answers right so convolution is for feature extraction so we must decide what what kind of feature we are going to extract so based on the feature extraction technique you are using you can just initialize your weight matrix but in these days we thanks to tensorflow and other deep learning libraries and thanks to revolutionizing the deep learning architectures we do not need to initialize our weight matrix by ourself rather these days the deep learning architectures initialize and using back propagations other techniques it adjust their weights correspondingly now that we have developed some understanding that how an image can be classified let's dive into the first basic handwritten digits classification in this regard the tools we will be requiring is anaconda anaconda is a set of multiple libraries and ids ids is nothing but in a program in which we write our code such as vs code spider jupiter notebook in this program we will be like in this video we will be focusing jupiter notebook and after installing anaconda and jupiter notebook we will install our deep learning library which is tensorflow and using opencv we will be um loading an image and the uh for loading and overlaying some rectangles if you want to and for overlaying some uh writing digits we will be using opencv in order to install anaconda all you need is to just write download anaconda in google search bar and once you write it the first link you will open over here and then at the down most side of this link you will find some executables to download so based on your specification which operating system you are using like windows or mac os and linux you can just download the executable and after that it's super simple just click next next next and it will be installed correspondingly now i believe that you have installed anaconda if you just visit write the anaconda in the search bar of the windows if you click anaconda navigator so you will reach over here at this kind of window so over here you can observe that some of them are black sorry blue in color and some of them are green in color so for your case all of them will be green in color because you have not installed any kind of uh ide so in this case i have already installed jupiter notebook so that is why it is blue in color so in please in click this install button in your case for jupyter notebook now i believe that you have installed anaconda and jupiter notebook correspondingly after installation uh please make a folder in your case it will be empty so in my case i have already performed some operations so i believe that in your case it will be empty so again write anaconda but in this case rather than clicking anaconda navigator you should click anaconda prompt so after that just copy the path and then write a jupyter notebook so once you have installed you can launch over here but it's better that you just open that folder and then you should go to the a jubilee notebook okay so this kind of your folder will be open in jupyter notebook so after that please open uh empty jupiter notebook so let me separate for you guys so now i believe that you have this kind of window so the first uh important uh thing we need to do is that we need to install tensorflow so we need to import a tensor flow as tf but in your case it can like you have to install pip install tensor flow so to install tensorflow what you need to do is that just visit anaconda prompt again in this case this time you have to right click and just run as administrator right so once you run as administrator i believe that you have run as administrator all you need is to just write pip install tensor flow in my case i have already installed so i don't need to install tensorflow so i believe that you have installed tensorflow so once you have installed tensorflow then the second thing you need to install is opencv and in for installing opencv you need in right pip install open cv dash python because opencv is also for like um in c language it is also available so for python you should write opencv python so this is the command for installing opencv right so once i believe that you have installed tensorflow then the shortcut to execute this command is shift enter right so you need to in order to execute you can execute over here like this run but the shortcut is shift plus enter so i have press shift plus enter so my tensorflow will be executed like will be loaded okay so let me make some notes like it's the beauty of this uh jupiter notebook is that you can write uh your notes right so whatever you have written it will be converted into text data so all you need is to just uh go visit the cell and then convert the cell type into markdown so it will be converted into the text otherwise if you your cell type is code like this cell type is code right and this cell type is for nodes markdown so uh we don't need to download amd's data set uh tensorflow will download let me create a variable mnist and then if i say as i have like uh imported the tensorflow as tf when i write tf dot keras keras is again an api for tensorflow if you install tensorflow keras will be automatically installed then press data set if you press tab all of the options will come and then if you again like in data sets if you press tab so you see over here c4 imdb and a lot of other data sets are available so uh in our case you are interested in mnist which is handwritten digits dataset okay so and let me write some notes so all of these are basically handwritten characters okay so now i have uh like m is that this variable contains all this data set like almost 60 000 training samples and 10 000 testing samples so in your case it might start downloading but in my case as as i have already executed during my practice so that is why in your case once you execute this command it will start downloading so again some nodes i need to write in this case i will visit cell and cell type to mark down okay so after loading the mnist data set let's divide into training and test data sets so it's better that we always divide our training and test data set because for training we will have much more uh like 60 000 it's already available and for testing it is already divided right so and the ratio will be same the it will it is divided randomly right so in both case we will have similar kind of data because it is unfair that if we train on something else and testing on something else so it will be entirely different results uh and again it should not be testing should not be part of training so there they should be separated so that we can verify the performance of our develop of our deep learning architecture okay so now mnist mnis.load data is actually it was a variable and now it has loaded and it has um like for example if i check train dot shape so it has 60 000 images and everyone i like all every image has 28 into 28 size right as we have discussed in the start that this is image so again in order to like plot image i need to import matplotlib dot p by plot sp lt okay so in your case you have to actually install again pip install matplotlib okay so you have to execute this command map paper install matplotlib and you will good to go okay so let me check uh matt plot lib dot py plot as plt okay okay sorry for the mistake you have to check that spelling is matplotlib right okay so shift enter is the command and let me just uh just okay so this is matplotlib dot py plot as plt after importing i'm just checking the first image of this x strain and plt dot i am sure is the command to show this image and in the start we don't know that either it's color image or binary so we believe that it is binary so that is why we in order to show it we are doing it cmap is plt.cm.binary right so this is an image and the size is 28 into 28. this is one of the image because it's a array of 60 000 but i'm just taking the first instance okay so again uh we have shown first like pl2 dot i am sure pl2 dot show has displayed this image and then once i convert it into binary it it is the binary image right okay so now let me convert in again into my nodes okay so checking the value of each pixel as i told you that if it is near to white it will be 255 it is near to if it is near to black it will be zero in the number right so if i just print it rather than plotting it initially i plot it and now i am just printing it so before normalization we have you see 255 is the white color and these are gray in color and these are black in color okay okay so these are the images and actually it is 28 by 28 so basically although we are showing when we binarize this we have shown that this is white in color this is black but actually the back side is black and this is white so we convert into cm.binary that is why it it actually reverted the all of the columns so over here we can witness that all of them are like this you see i have written the five in color so all of them are black in color and only the digit is white in color so you see over here all of them are black in color so this image do not confuse with this image actually when once we d did the plt.cm.binary it has reverted the pixels however inside this image we have black side just like this image right in the microsoft paint program this all of them are black and white okay so the background is black in color and the five the digit five you see over here this it seems like a five this like this and this and this is a kind of five right so it is five in in number so okay again let me convert into my nodes shift enter is the command okay so before diving into the implementation uh again you uh okay so it's gray image and all values are from zero to two five five and in order to train it we need to normalize it so x strain is tf.keras.utils.normalize is the command to normalize this we can just divide into two five five the one method is to do it like this exchange divided by two five five but you don't need to do it like we do have a built-in command to do it so the axis is one okay so we have zero and one's where is the column one is the row so in row wise it will divide so in this case as it the size is 28 by 28 so we don't need to bother about the axis but anyone is is okay right if you write 0 or 1 because all we need is to just normalize and the size is 2 28 by 28 so that is why we don't need to care about it so once we have normalized let's check again like after normalizing this is my okay so now i believe that i have normalized all of them let's check the values okay now that we have normalized you see all of the values are from zero to one because it has been divided by two five five so two five five become one and closer to one and then all others are from zero to one so it is really necessary step like normalization we need to perform normalization uh because um so that if the color changes we do not need to care about it okay so let's verify the corresponding label okay so either the label is like you see the y train 0 x strain is this one okay so x strain has data and y train has label so if i correspondingly print this so it has only one value such as five so it is already labeled that this five digit x actually correspond to five so for deep learning our architecture we need to tell the deep learning that this image correspond to five so that it can target the label so in our case we are going to again let me change okay uh we have we do need another library like pip again you need to install a pip install numpy okay so once you execute this command pip install numpy so numpy will be installed for you and the image size is 28 so over here what we are doing after normalizing we are just uh increasing one dimension for kernel operation so okay so what we are doing that is minus 1 correspond to 60 000 right so in python the maximum size correspond to -1 so over here you see over here this size was 60 000 so it will be sixty thousand sixty thousand into twenty eight into twenty into one so we are increasing one because we have to perform conclusion operation so for that we need one extra dimension so we have plotted right so after doing this as it was minus one so sixty thousand into twenty eight into twenty eight into one you see and for uh testing samples we have ten thousand twenty eight into twenty one okay sorry now we need to create our deep learning architecture okay so for deep learning architecture the most important thing is that we from tensorflow like from tensorflow.keras.models we can import sequential and what is sequential actually sequentially connected deep learning layers right different layers such as what are the layers like layers are could be dense layer drop out activation flatten and convolution layers and max pooling layer so these are the layers which we will be needing correspondingly so let's create our neural network uh first we can initialize like model is equal to this our variable this is our variable name and we can initialize using sequential and this means that we are going to actually uh concatenate all our layers right like sequentially we are adding not concatenation sorry okay so in first case the first convolution layer is uh we are the number of uh like uh the number of the filters or kernels are 64 and each filter has three cross three size right in the introduction you have seen that what is kernel or filter uh we the for filter we are using like this is kernel and it is also called as filter right so in some books we say filter in some books we call it is as kernel so we have three cross three kernel and we have total 64 layers sorry one conversion layer having 64 different uh filters right so over here every uh like every kernel will be initialized with different like 64 different kind of kernels will be multiplied like will be convolved with this input image right and over here we are doing it like x strain uh r into like this so over here we are we are saying x train r uh why because we have reshaped it that is why we are calling it x train r so dot shape one two up to n why we are skipping one zero because we have zero one two three this size right and what was the size like 60 000 sorry 60 000 into 28 into 28 into one so in this case we have 60 000 images and we don't need this image right so we only need the single image okay for operating uh for our operations so uh the this sequential model will take care of it when we will be compiling our model it will be take care of it itself so it will automatically i trade through it then we have we are active we have activation function so activation function is nothing but uh for example to make it to make it linear okay sorry non-linear okay to make it non-linear such as actually we have different like relu and other activation functions so for example all the values less than 0 if any value is less than 0 it will drop those values right or it will remove those values and all values from greater than 0 it will allow right to move to for the second layer like to the next layer so we have max pooling layer okay and the pool size is two into two and then again we are seeing max pooling so what it will perform like it will uh into 2 cross 2 matrix only maximum will only one single single maximum maximum value of 2 cross 2 matrix it will get and rest it will drop right or remove okay so now uh we have completed our first conclusion layer then we can have like two more convolution layers as well so it depends upon your accuracy you have to check uh it it's not hard and fast rule but as uh i have practiced already so that is why over here we are focusing on three conclusion layers so similarly we have extracted the feature and then only allow the positive values and then we have reduced the size and maximum only maximum values we are propagating to the next layer similarly again we are using 64 kernels different kernels that each will have different matrix and the size is three cross three and different three cross three uh three cross three filters we will be using so now that we have three conclusion layers now we can have our fully connected layer but before having fully connected layer we need to flatten like its size will be like something 28 into 28 into 1 it was so it will be similarly it will be it can be any right so 28 minus 3 plus 1 what is it uh 26 so it will be 26 into 26 again over here it will be again 26 minus 3 plus 1 it is 24 into 24 okay similarly over here again 24 sorry 24 into 24 we will see over here when we will check the model summary you will be notified that what kind of input and output you have so i was just giving you an idea that in this flattened layer we actually multiply for example if we have 20 into 20 what is it it will be 2 to 4 and the size will be 400 okay so flattened layer is actually 2d to two dimensional to one dimension it will convert like two dimension to one dimension similarly we have dense layer dense layer is actually you can say that it it is a neural network layer all of them are neural network actually but uh the intuitively we say that it is dense layer because all neurons are connected the conclusion layer is also based on neural network and each this filter you can say that each filter is a neuron right so each filter having these three cross three is a neuron right so over here we have 64 neurons and all of them are connected so this 400 each these 400 neurons will be connected to 64. all like each 400 will be connected to each of the 64. okay then again we have a activation layer such as relu and we can have again a second activation layer sorry the dense layer and activation layer in this case now we are going towards down and we are reducing our size why because we need to reach to 10 right we have this last size is 10 like number of the classes are 10 so last layer must be 10 so this last dense layer must be equal to 10 okay so the activation okay make sure that activation function of the last layer must be soft max or sigmoid or other right so the most famous one is soft max which gives you the class probabilities and when you have multiple classes like not for the binary so if you have for example binary classification like only two classes then we will have one neuron in intense layer okay like this would be one in case we have binary classification and then activation will be sigmoid it's a dense layer and our activation will be sigmoid most of the time i'm just telling you about the practice but as we have multiple uh layers and as we have multiple classes like 10 so in this case we have to the softmax is the best for giving us the classification okay so i can just print i can just execute shift enter as the command so now i can just check model dot summary let me check okay so this is the summary of my model such as i am starting from convolution layer you see 26 is the size as i told you before that we need to minus 28 and minus 3 plus 1 okay so 26 into 26 is the size and 64 is because of this we have the 60-fold convolution layers okay and then activation layers we have it do not have any parameters and you see over here you can check the it correspondingly okay i just forgot this was not i was just giving you the answer because but i forgot the max pooling layer over here you see max pooling layer has reduced size almost to half that is why it is after this it is not 26 because we had max pooling layer so 26 become 13. see it has already reduced the size then we have convolution layers and then hf you can see over here correspondingly okay now that we have everything so okay let me print my total training samples which are sixty thousand so in order to compile my model before training i need to compile my model right so the loss is sparse the famous one is sparse categorical cross entropy we do have a lot of other losses as well and similarly for optimizer we have add a grade and others but the best one is adam so far and the the target mattress is accuracy we need to improve the accuracy so that is why we are focusing on that so okay so i can just compile my model and then so this is training step right so in this case i'm going to train my model like model.fit okay so what is my data data is x train r and what is my label label is y train right as i told you before that for this corresponding data we have one label each right for each image we do we know that the digit is five or okay we know the label for sixty thousand we will uh sixty thousand data we will train and then you will test for ten thousand data let's start our training so it's really fast because i'm using the gpu okay so as it trains let's discuss like we have we can have uh we should have started before but anyhow so this is a typical your typical new uh deep learning architecture containing uh we can have uh other keras uh functional apis as well we can have implementation using functional api but this is not functional api right so this is sequential model so we have in keras we do have two types of um implementation one is sequential model and another one is functional api so in this case we do not have functional api i mean we are not using a functional api so secondly you can see that we have uh 81 000 parameters it is really important uh thing because if you're keep concatenating or adding some layers uh it will increase your model size right so which will not be easy to deploy so as it was a very easy problem like only 28 by 28 size we are just uh image we are classifying so that is why it is super simple so for five epochs i'm just training in this case and you can see over here that our accuracy the accuracy you see over here it's 98 and the best thing is that our validation accuracy is also 98 so you must take care of it that these two guys should be closer like validation accuracy and simple accuracy like why because actually it divides you see we say we say validation spread is 30 percent so actually 70 percent of the like 60 000 data out of 60 000 for 70 percent it is doing it uh training and for 30 of the 60 000 it is validating itself it is checking itself you see in the start it was 96 then 97 so it is trying to reach so as long as both of them are similar like 20 uh okay if your validation accuracy is equal or is almost equal to like almost equal to your um like accuracy like simple accuracy you it see it seems like it's you're you're doing well but if for example your accuracy is very high but it's less than your validation accuracy right so your validation accuracy is for example 30 percent and your actual accuracy is 98 percent so then you're you're suffering from overfitting so overfitting is a term like you are actually overdoing your network right so in this case it's an ideal one uh we have already it's a very simple problem that is why so in that case it was overflooding but over here we do not have any overfitting problem if we do have overfitting so the solution to the overfitting is drop out layer and many other solutions you do have but this was a simple tutorial for amnest so that is why we are not discussing a lot of other concepts okay now that we have trained it okay so let me check the accuracy so this is again a very important now i'm checking on the test accuracy so i i have three types of data set right one was training i split the training into validation like two thing one is on pure training and one is validation and then now i am testing on 10 000 data center right so let's check so i have executed on 10 000 data set so um you see the accuracy is 98.3 so the validation accuracy is was 98.17 so it is 19.32 so i all of these three are closer to each other so that is why i'm like it seems like i'm going well so you can also save your model okay but over here we are we we can just predict actually rather than saving it let's predict okay so the prediction is that actually what i have predicted am i going to the right direction or not so this is actually my prediction but this is again as i told you this is a cross uh like a software sorry and this is a um soft max right the last layer was soft max and it has class probabilities right so now we have a lot of uh class probabilities we need to decode it the maximum probability uh for the class we will get is will be our answer so let's convert our you see this is 7 so i didn't print the seven let's print the seven so it has predicted that it is seven let's check either we do have the seven or not so at zeroth location uh okay so i have given it so it is seeing that it has someone but let's check is it you see inside the image it has salmon okay let's if you don't know uh let's see for example another value randomly okay so i'm just again converting and as it has the prediction has been performed to all data set right so it has 10 000 data set so it has predicted all all of them now we need to check like first let me check the 21 28 one so the 128th one is eight right so let me check either with in the 28th 128th location what is the image you see we have eight okay so now it is uh super simple that we have we are getting answers so let's we have for example okay so let me have let me create a new uh okay i don't want to save so in order to check my uh met uh like uh answer let me write my uh like a handwritten digit by myself in order to check it so let me change the so let me write it again like this okay oh really sorry why it's not working i don't know because in sorry bla it was white in color black in color that is why so really sorry for that let me write it okay so okay let me write it with a little bit clear because it was our it is our first test okay so let me save it like this i'm replacing this eight digit okay so now that i have written in the microsoft paint program i have just made my own digit 8 for example so this is not my 8 right this is different so for import cv2 i need to do as i told you before that you need to pip install for installing opencv you need to do pip install open cv dash python as i told you that it is also available for opencv is also available for c language okay now that i have tested it let me do it for it okay so this is the command to load an image okay let me okay i'm sure so this is my eight right that i have just written in my pain program this is my eight right so now in order to classify i need to do it all the steps because as you know that the size of the network is 28 by 28 you see 25 it is so it is 28 by 28 in my case it is 400 somewhere right so i need to uh reduce the size let me change the size and convert into okay i need to convert into gray sorry okay so because it was color image actually i when i stored it it was a color image so i need to first convert into gray image okay and now once i have converted i can show you for example i am g dot shape you see 903 into 413 into 3 was the size and now if i show you the size of the gray so this is third the three channels are gone and now it is only one channel right and the width and height and now i need to convert this into now it will work see so now if i check resized dot shape so it is 28 by 28 now it is 28 by 28 but one thing we must not forget that before going uh giving this image to my neural network prediction uh i need to normalize it right so that is again as i told you that 255 and we need to divide and we need to convert into zero to one scaling right we are doing some scaling so now uh we have done scaling but one important thing we need to modify the size you see for kernel operation uh of convolution layer we need to perform like this right we need to convert 28 into 28 into one and and one is will be the size okay so let me check new mg dot shape so you see one into 28 by 20 into one now everything is done now i can predict my network using the model just i have trained okay this is the model name which i have trained okay for example everywhere you see i have used model.evolute model dot so this is my model okay i have already trained so let's check either it is working on my own one or not okay uh let me you see it has predicted eight so i have given it my own value eight and it has predicted eight okay so we can do any other as well and you can just make your own one as well so now uh actually i have prepared another demo for you guys okay so this is my code for my demo okay let me remove all this extra stuff so this is a video demo actually sorry it's a video demo okay so let me convert into markdown okay this is a video demo actually and in this case we are again we don't need to do this okay because we already imported opencv but anyhow okay so we don't need to do it as well because if if you want to install numpy you can just uh pip install numpy is the command okay so this is the phone uh font configuration which kind of font you are using you want to use and which what is the scale and i am recording uh i'm just reading a video okay so actually uh i have stored this video in this video i have actually uh recorded my my own video for example if i'm writing zero if i'm writing one okay so if i have written i have make my own video you can do it using webcam camera as well but i have just recorded my own video and you can you can record your own using ocam i have used ocam and you can use any screen capturing device and you can make your own video right now you can just do it using live webcam as well so once uh you have done it it you see over here just i did it before so i have recorded all of them okay so now that i have recorded all of them so i can just load this you see i have shown you this video 517 so i am just calling this video and then i can just like uh over here i'm just checking whether my this is this can be webcam and this can be video as well because for webcam and video code is same all you need to just over here you need to write one in case of the webcam so this is my code and uh what i'm doing is actually i'm i was intending to plot uh uh draw a rectangle okay and at the top of the my screen and then after reading this video like cap as a video right so over here in the cab this is a video and it is frame by frame i'm reading okay and i'm just skipping one video one image because to make it little fast i can do it like uh if i want to skip 20 or any other images so i'm just skipping the image so that i can make it faster so similarly over here you can see that again i'm just okay so over here what i have done this is the command which i have just used you see i'm just converting any image this is the image this is my image i'm converting all the images into gray color and then resizing it and then normalizing it and then reshaping it and then predicting my image right so these are the same commands which i have done for my this single image right so now i can just do it for my demo okay and whatever is the status it will be actually it was in the integer so now i need to convert into text data okay so these are my like put rectangle i'm putting rectangle but actually both of them are blank so my and like this is blank okay let me make it uh green is better b g r so this is bgr so it bgr so this is will be green okay and the text color will be black it will be red okay b g r so this will be red okay so this is the handwritten digits uh code and let's execute it you see over here you can see that this is 0 it is predicting 0 and if it is 1 it is predicting 1 and if it is predicting we are writing 2. so this is my video i have just shown you how to do it okay so that's all for this video thank you so much and keep watching uh videos on our channel thank you so much for your support