Introduction to Convolutional Neural Networks with TensorFlow

hi everyone my name is nathan paccini i am a marketing manager from data science dojo uh and today i have neil leiser here with me and he is going to be giving um a presentation on an introduction to convolutional neural networks with tensorflow and this will include a little tutorial at the end i might refer to it as part two at the end here but neil is a data scientist at iwaka which is a fintech startup and he's also the host of the ai stories podcast where he brings on data professionals and tech leaders and they talk about you know their careers and how they use ai and day-to-day life to to impact our world so um so with that neil why don't you go ahead and um uh and you know things should be good to go yeah thanks thanks a lot for the intro can can you all see my screen is that good yes we can okay perfect um so yeah hi everyone today i will give you an introduction to convolutional neural networks with tensorflow i really wanted to thank data science dojo for organizing this event and yeah hope you will find this useful and interesting so just a few words about me first i don't think you know me quite well so as you can hear i'm not from the uk i was born and raised in belgium but i moved to london seven years ago to study civil engineering at imperial college and then in 2019 i kind of realized that i was really interested in data science and machine learning so i did a master in london and there at the end of my master i did a research thesis on predicting the output of solar panels with satellite images and this is where i kind of went deep into convolutional neural networks because i had to deal with images and so that's also yeah how i got interested in this topic and that's why i'm giving this talk today as mentioned already i i'm a data scientist at iwaka which is a fintech startup and there i build machine learning algorithms for credit risk so we are a lending company we lend money to businesses and so we have algorithms which will decide who we are accepting for a loan how much money we are offering also some tools for losses prediction so this is what i'm working on and in my spare time i also host the ai stories podcast where i invite people from all around the world to talk about their career and how they use ai in their day to day season one is now over and season two will start again in september october so if you're interested yeah feel free to follow all right so let's start with the talk now um just a brief overview of what i want to talk about i'm first going to give a brief introduction like five minutes on ai and neural networks just to bring everyone on the same page then we're gonna go through the theory behind convolutional neural network and again this is just to show you the intuition behind how they actually work and finally we'll have the collab where i will show you how you can build your own cnn algorithm with python and tensorflow so hope you will find this useful and interesting let's start with the intro so in 2012 you may or may not know but there is this alex net paper which is very famous and this is where deep learning becomes well known to everyone and the alex net paper basically sets a new record on the imagenet challenge and imagenets so you have a picture here you don't see things very well but it's just an image classification problem you have loads of images i think 1000 different classes and you need to classify the images correctly and so you can see that on the graph at the bottom like in 2010 and 2011 the best algorithms had an error rate of 28 or 26 but in 2012 suddenly with this new alexnet's deep learning algorithm the error decreases to 16 and that's like a big boom a big impressive thing in the world of ai and lots of researchers from computer vision who were doing non-deep learning research actually moved to deep learning because people kind of realized that deep learning is very powerful and could actually be the future of computer vision so there is actually a paper from 1985 which trains a neural network using back propagation and convolutional neural networks have been around since the 1990s so how come that it's just in 2012 that we realized that neural networks and especially cnns can actually be so powerful i think there are two reasons for that the first one is we actually need a lot of computing power to train a neural network and this wasn't really available or known at the time and the second reason is we actually need a lot of data and the alexnet paper showed that with a lot of compute power and a lot of data you can actually train a neural network nowadays it feels easy to train cnn but back at the time it was much more difficult so because i don't know how familiar you are and what's your background i'm just going to give a brief intro on neural networks before moving to convolutional neural networks because neural networks is really the basics and neural networks is essentially a network of neurons so each dot here each circle is a neurons and it's inspired by the brain and all those neurons are kind of connected to each other by arrows and each arrow represents a different weight so neural networks have three different parts they have the input layer right which is the input data there's the output layer which is the prediction essentially and in between that you've got the hidden layers um here we've got two hidden layers of four neurons each but actually you can have 10 hidden layers 20 hidden layers with how many neurons you want there isn't really any rule for that and this is just a toy example and yeah each arrow is a different weight so here we've got three times four so we've got 12 weights in this layer here we've got 16 weights and here we've got four weights so 22 weights in total and using back propagation you can train this neural network and find what the optimum sets of weights are um i'm not going to go into back propagation or deep into neural networks if you're interested there are lots of readings on that i will provide some references at the end but hopefully you get some idea of what is a neural network and how it works um and yeah i believe we're gonna do questions at the end but or do we wanna i see we have a question now or a hand um but yeah let's leave the questions at the end of the first part because i might answer some questions um yeah in the coming slides um so yeah that's how neural network works and cnn are actually parts well you will see neural networks within cnns so let's move to the interesting part of the torch convolutional neural networks and the first thing that we can see so i described before cnns were used for image classification right with the imagenet paper and an image is actually just a matrix right it's a matrix with and each value within this matrix is a pixel or a number so here you can see it's just a 28 by 28 matrix and so the question is why do we want a neural network like why do we actually need why do we need a cnn and why not just a neural net of the first part so why not using just a neural network instead of a convolutional neural network right an image is just a matrix so what you can do is flatten this matrix into a vector and you can then feed this input into a neural network right and that's it that's done why don't we do that well first of all as a human let's say you want to classify dogs and cats how do you do you're gonna look actually at groups of pixels right you're gonna look at eyes at ears at the color of the skin and that's exactly what cnns do they take spatial dependencies into accounts and they look at groups of pixels neural network will only look at individual pixels and that's very difficult if you just have like a vector of pixels and need to identify dogs and cats the task will be much more difficult even just for a human so that's the first reason why we use cnns and the second reason is that they are actually more efficient to process images so here you have some kind of toy example of a cnn the input is an image and then the network the algorithm is divided into two parts the first part which is in blue is the cnn so we have convolutional layers and then pulling layers and the second part after the flattened layer is just a neural network and so the neural network is going to take the features from the cnn and use those features to make the classification but the cnn is going to extract the useful features from the images and so we've got convolution and pulling layers and i'm going to talk about this in more details in the next slides but for the moment just keep in mind that we've got a first part which is convolution and pulling layers and a second part which is a neural network that you should yeah hopefully you're kind of familiar with so the intuition behind neural networks is that um behind cnn sorry is that on the first few layers of the model the cnn will be able to capture tiny details within the image like edges for example and as you get deeper and deeper into the model the algorithm manages to extract more kind of human interpretable features like eyes heads flowers and so basically the first part of the network looks at details and the last part looks at higher level features in some sense so let's dive into convolutions here you have a very clear example of what a convolution actually is and there is a note here a comprehensive guide to cnns um i really recommend you to read this i'm taking a lot of images from there and yeah it's very good if you want to have a good understanding of cnns really recommend reading this blog post so convolution here we have a five by five image which is represented in green right with some pixel values which are zeros and ones and the convolution is actually a kernel so it's this three by three yellow square which goes around the image that's a conv that's a kernel and that's the process of doing convolution you can see the red numbers on the image the times one or time zeros those are the weights of the kernel so those are the weights that the neural network need to learn in order to um well extract meaningful features from the image and so by kind of rotating this kernel around the image we get what we call this convolute feature which is this kind of pink box and so the first value for example the four here is obtained by multiplying this kernel by the values on this left top left corner so here we've got one times one here we've got one times zero here we've got one times one zero one zero zero zero one so in total we've got four ones and um six five zeros and so that's why you get a four here and you repeat that and you get the control feature so that's that's it that's what a convolution is is just a kernel which rotates around the image to extract meaningful information now you can again see a kernel here and i want to describe two additional words which are quite important strides and padding so the stride is by how much you should shift the kernel so you can define the strides as you want here in this case the stride is two because you can see that the kernel shifts by two squares on the right and by two squares from top to bottom so the stride is two the second thing that's important is padding so you can see here those dotted lines um which kind of upsample this the image that's what we call padding so here we've got horizontal and vertical padding of one because we're adding one square on each side of the image and the reason for padding is that sometimes you have interesting features on the corners of the image and if you don't actually use padding the kernel might be missing those features that are just close to the edges and by using padding you can actually capture better information sometimes finally i discussed the pooling so if you remember here we have a convolution so we apply the kernel to the image and here i'm just showing a single kernel but i don't see any reason why you cannot apply 10 20 or 32 corners to this image right you will just have 32 different control feature and that's what we do in practice so that's why here you can see 24 by 24 by n1 is just n1 kernels that we've applied to the image and that's why you have kind of different images here or uh yeah control features after the convolution it is quite common practice to apply this pulling operation and all this pulling operation does is it reduces the size of the control feature that's it so here you can see it's 24 by 24 by n1 after the pulling operation the dimensions here are are 12 by 12 by n1 so we reduce the dimension the dimensions by half how does this pooling operation works um you have different kinds of pooling you have max pooling or average pooling max pooling is probably the most well-known technique and the one that works usually quite well um and essentially here again you just have a kernel that rotates around this image or this control feature and max pooling is very simple all it does is it takes the maximum from a set of pixels or values so here you've got four values if you apply the max pooling it takes the maximum which is 20 and extracts the 20 here um we use a stride of two here so if we move the kernel two squares on the right we have another max pooling and here the highest value is 30 and that's how we get this 30 number here average pulling is similar but instead of taking the max we just take the average and we use pooling to reduce the dimension of the image it makes it much easier to process the image but it also helps to reduce overfitting because it's actually robust to rotations and translations so a short example let's say that you have an i in this feature um in this square and let's say that suddenly you're rotating this i you will rotate the values within this square but if you apply the max pooling the largest value will still be 20 and so the max pooling will still extract the feature even though you've rotated or translated the picture and so it makes the cnn kind of robust to rotations and translations and that's basically it actually you have your input you apply a set of kernels convolutions you have this max pooling layer you can have another convolution followed by another max pooling operation and here we've got two convolution but you can have 10 20. and there is no rule for that and then after that you can just flatten the output and feed everything to the neural network which will then do the job for you um so that's hopefully that gives you some kind of intuition about how neural networks work and if it's still not clear hopefully it will be much clearer with part two where we're gonna build our own algorithm using python and tensorflow but first i believe we have some time for questions about the theory and then we can go into the code so i'm seeing one question um and this is from nahal and uh they want to know if you can share you have a source the comprehensive guide to cnns um can you share that link with everyone i know we'll share the slides with everyone with the recording and and so you'll be able to access it through there but are we able to share that source now um so here i've got a few sources here um should i just yeah i can try just share the slides on the chat since i'm done with the presentation maybe yeah and then um data science dojo team if you can share the google collab with everyone now that'd also be time to do it um uh sophia i know you have your hand raised if you can ask your question in the chat or not in the chat in the in the q a tab i'd appreciate it um and then neil whenever you're ready i have a second question for you yeah so i'm sure the slides and the references are at the end um but yeah i'm ready for the second question okay um so this is uh from maruccio uh so they're they're referring to a second step i'm guessing in the in the neural network but why does it have so many layers in the second step um can we yeah can we clarify the question which second step exactly is it the one from i am not exactly sure let's wait for clarification i think uh mauricio i think you're on one of our live streams but if you can just clarify for us um which second step neural network here is it's the neural network from this cnn that you're talking about or yeah which second step is it and then i i would be able to answer and then uh sophia has a question that i think will be a lot easier to answer right now is how do you determine the kernel so the size of the kernel that's a good question um so here for example we've got a 3x3 kernel and that rotates around the image and the size of the kernel is actually a hyper parameter so we don't know what a right size actually is we need to try different things and see what works which is usually how it works for neural networks and you typical sizes are three by three five by five seven by seven um so you should try a bunch of those see what works but there isn't a clear rule on what kernel works best if you have a very small image i would recommend trying smaller kernels if you have a large image just try bigger kernels that's usually how it works okay and then this one is from brent and they would like to know how to create the convolved feature uh they had a hard time following so if i just recap here you have the kernel and you have some weights right the times one times zero times one those are weights that the neural network will learn here is just examples of zeros and ones but in real life it's not like that it's just values that the neural network would learn and then you just rotate this kernel around the image so you see every time you rotate the kernel you get a value here on this control feature how do you get that value by multiplying the weights from the kernel so the zeros and ones that are written in red by the value the pixel values in the image so to again describe the examples the example that i shared here you would do one times one right because that's the first weight of the kernel plus one times zero which is zero plus one times one which is a one so where we have a value of two at the moment plus zero times zero which is zero plus one times one plus one times zero plus zero times one plus zero times zero plus one times one so in the end we've got four ones if we sum all of them together we get the value of four and we repeat that for by shifting the kernel around the image to get this convolved feature hopefully that's clearer all right and then uh james i want you to know i see your question we're gonna wait to answer that one at the end because i think that one's gonna take a little longer to answer um should be good to go on that so let's answer one more question um um just before we we start the demo um and i want to go back to where is he or she uh marichio um so the the question about um um the layers in the second step uh they are saying yes uh from the the image in the slide the okay i think it was the one that you showed um yeah i see so the intuition here actually you nee you can have as many layers as you want basically the convolutional neural networks extracts the meaningful features from the image so for example if you want to classify um if you want to detect whether someone has a face there is a face in the image or not the cnn will kind of yeah i know sorry i haven't shared my screen but there is nothing to really show i'm gonna move to the collab after um or maybe i can well yeah so the first part of the cnn is basically extracting meaningful features from the data and those features could be eyes modes color of the skin errors things like that but then you still need a neural network to kind of um well detect all the kind of fine patterns within all those features so the cnn will find all the interesting features but you still want the neural network to see okay if yeah the person has a mouse long hairs and eyes then it's a girl if it has short arrows short hairs eyes and a mouth it's a boy so it's kind of the neural network still has to find patterns within those features but the cnn will find the interesting features and so the job of the neural network is much easier because the cnn does all the hard job or fight of finding the interesting features so yeah perfect um okay and then let's go let's start with the collab so hopefully now you've got some understanding of how cnns actually work so let's see how we can build one and again the collab is divided into two parts a very short intro to tensorflow you'll see it's very simple very similar to numpy so if you know numpy it's just gonna be very similar and then i will show you how you can train your own cnn using the fashion using the fashion mnist data sets so if we go through the code first i'm just installing tensorflow and importing a bunch of libraries nothing that's interesting i import tensorflow and tensorflow.keras keras is a library built on top of tensorflow which makes it very easy to build neural networks so i'm just going to use that for the tutorial and here we've just got a bunch of basically cells which show you how tensorflow works and tensorflow the main idea behind tensorflow is this tensor which is very similar to a numpy array so here i create a tensor of which is essentially a matrix of size 2x3 and if you kind of then print this variable to the screen you can see how it looks like but it's basically very similar to numpy like numpy you can create an array of ones if you call tf.once you specify the shape you get an array of ones and like numpy you can do some operations between the matrices so here we've got two matrices you can multiply add them together you can do an element-wise multiplication you can do a matrix matrix multiplication all of this you don't need to know for the tutorial but it's just some recap and just to explain to you that tensorflow is very similar to numpy but we use this to train neural networks because it works very well it's computationally efficient and it's built specially for that so that's why we're using tensors instead of numpy and like numpy you can also manipulate the matrices so here for example i create a variable one which has a shape of two by three and then i reshape this variable to have a shape of one by six so one row and six columns and you can see that it worked i get another tensor and also again like numpy you can concatenate different matrices on top of each other and well you can have yeah basically here i build this variable one and i concatenate it on top of each other so you get the same matrix twice basically so the first dimension the number of rows is increased from two to four but again this is just play around with it if you're interested it's nothing that you need to know for today but yeah just found it interesting to show how you can use sensors and manipulate tensors so that's the first part hopefully very easy and yeah just play around with it in your free time the second part is the fun bit which is cnn's with the fashion mnist sets so let's go slowly through this code the first thing is just load the data there is this command using keras which loads the fashion and this data set and i have training data training label test data and test label and then here i'm just storing the names of the class so it's an image classification problem and we have 10 different classes you can see we've got t-shirt trouser pullover dress code standout you can see it's different kinds of clothes white which is why we have the name fashion mnist um here just some visualization so you can see that the training data we've got 60 000 images of size 28 by 28 the labels we just have 60 000 labels which is the class and then the test data we have 10 000 images of size 28 by 28. so hopefully here very simple here i have some codes which you don't need to understand it's just to show you what the data set actually looks like so you can kind of visualize the data sets we've got 16 images here you can see it's just different clothes the images are 28 by 28 so the quality isn't great but hopefully you can see that a sandal that's a sandal that's a sandal that's a pull over sandal again shirt ankle boots um bags well different kind of clothes so a classic machine learning image classification problem once we've loaded the data the next thing to do is to actually process the data and the processing is very very simple and only needs three steps the first step is that i divide the data by 255 and that's just because all the values in the images are between 0 and 255 and i want values between 0 and 1. so just divide everything by 255 that just very common practice and then the second thing that i'm doing is that i'm expanding the dimension of my training data and the reason for that so you can see here i print remember that's before sorry this should probably go in front um but basically i'm expanding the dimension of the training data and i will talk about this in a few seconds the third thing that i do is that i'm just play splitting my training set remember we had 60 000 images in my training set and i'm splitting it into training sets and validation sets so that i can i don't touch the test sets i only look at the test sets at the end of the process i train my neural network on the training set and i leave the validation set to see how well my algorithm is performing so here if you look at the shape of the training set it's now 50 000 rows right because i removed the validation set we have 28 by 28 images and then we have this extra dimension here which is just a one and this is what i've done here i just expanded a final dimension and it is a one and the reason why i'm doing this is because the cnn will expect to get an input of this shape and so if you don't expand the dimension the cnn is just not gonna work and that's because usually with more complex images you have actually three the dimension here is three because you've got three different channels for blue red and green colors so here it's simple it's just a one but that's why i'm doing this step here so processing is very simple we've processed the data and now the final thing we need to do is to build the network so this is where i build my convolutional neural network and it might look quite complex at the beginning but we're gonna slowly go through that and you'll see it's actually very very simple so the model is i create this tfk.sequential object and this is just a list well it takes as input a list of layers so each line of code here is going to be a layer from my convolutional neural network so each line is just a different layer so this algorithm is divided into five parts the first part is the input layer that's simple you just need to specify the shape of the data that you're gonna feed into this algorithm right your model should know that it's gonna be images of size 28 by 28 by one so that's what i'm doing here and we've specified the input layer then if we just look at the comments and look at the code we've got the first the second and the third convolutional layer followed by the max pulling layer so that's exactly what i described on the slides we've got a convolutional layer followed by a pooling layer finally at the end we have a neural network so that's just a classic neural network first with 256 neurons and then with 10 neurons at the end because we have 10 classes and so each neuron is going to show at the end it's going to show the probability that the image belongs to a class we take the maximum probability and this maximum probability is going to be the class that's the prediction class essentially so if we go into the convolutional layers in more details the three layers here are exactly the same or very very similar and all it does is first of all you have the convolutional layer which is the tf key tfk.layers.conf2d and so those are just kernels which are going to extract information from your image as i showed in the slides 32 refers to the number of kernels that will apply to the image so in the example we just applied one kernel with zeros and ones remember here we're going to apply 32 different kernels to extract more information from the image the size of the kernel is going to be three by three as i showed you before and that's because the image has a kind of small dimension we set the strides to one which means that we move we rotate while we move the kernel by one square and we have this activation function which makes the cnn non-linear if you if you're interested in activation functions there is lots of literature on that i'm not going to go into too much details here and then all you do here is have the pulling layer of size 2x2 which just reduces the dimension of the control feature by two it divided by two and that's it you do that again with the second layer except that here we want 64 convolved feature instead of 32 and then you do that again with the third convolutional layer so we have three layers here in the end we feed this into the neural network and finally we get the prediction so hopefully that's quite simple all you need is an input layer a bunch of convolutional layers and with max pooling and then finally the neural network and that's it that's your model um here i have there is some reading on the softmax activation function that i'm using at the end of the network if you're interested to learn why i'm using this activation function then there is a very good resource here with some formula but i'm not going to go into too much details so remember what we've done and yeah don't look at the mats here you also don't really need it all we've done is we've loaded the data we've processed the data and we've defined the model we just need one last thing before we can train the model and it's this cell here we need to define an optimizer which this optimizer will help us to find the weights of the suitable weights for the algorithms and we use adam which is very very standard we need a metric and the metric will tell us whether the algorithm is performing well or not and so here we use a sparse categorical accuracy which is very standard for when you use classification it essentially gives a value of one if you've classified the image correctly and a value of zero if you didn't and then we also want a loss function because accuracy is not differentiable so we want a loss function to train the neural network and so here we use the sparse categorical cross entropy and again there is some reference here if you want to know why but i would say this is not really particular to cnn's that just a classic deep learning approach you need an optimizer a matrix and a loss function so read about that if you're interested what you do after that if you compile your model you set those uh those well you set the optimizer the last function and the metric and then all you need to do is train the model so here i'm calling model.fit i'm fitting it to the training sets and you obviously need the labels because the model is learning and i'm using this validation data because i want to see how my model is performing on unseen data to avoid overfitting here i've already trained the model it doesn't take too long using google colab if you use gpu but i've trained it already and you can see here the loss function and basically as we train the model we want the training loss to decrease right because we want the loss to be as small as possible and we want the validation loss to decrease as well so this tells me that okay my model is learning the loss is decreasing let's now see how our model performs on the test sets i'm with this line of code i'm just evaluating my model on the test set that we've never seen before it evaluates and we can see that the accuracy is around 89 which means that 89 of the time the model will classify the images correctly if you want to play around with the codes you can probably improve this by quite a lot you will just need to maybe add additional layers to the model or change the kernel size or change the activation functions you can play around with the model and add more layers if you want but if you want to learn i will encourage you to try to beat this 89 score on the test set i'm sure you can improve on what i've done here we can kind of see the prediction um so again this is the data set and you can see on top you can see the prediction and at the bottom you can see the label so we can see that the model is doing quite well for most of the values here the prediction matches the label right sandal sandal t-shirt t-shirt trouser trouser bag bag so it looks like the model is doing quite well and we can also see don't worry about this code but we can see where the model fails actually so here you can see examples where the model fails um and you can see that it's mostly around t-shirts pullover codes that's where the model seems to struggle um because here it predicts coat when it's a dress here it predicts a t-shirt well it's actually a shirt and we can see that it's actually quite difficult here um here it's a pullover it predicts a pullover but it's actually a coat so anyway you can see where the model is wrong and if you want you can also print the confusion matrix to see where your model struggled the most and try to improve it um but yeah this just shows that with basically trained a convolutional neural network on the amnes data sets and the model seems to perform quite well with like an 89 accuracy on the test set um yeah feel free to try to beat this and improve on that during your free time um and yeah i'm gonna take some questions now hope this was clear okay perfect so we do have uh 16 questions and probably counting so um neil i know yesterday we talked that we might go a little bit over time is that still okay with you if we go a couple minutes over yeah i'm happy to stay 10 15 minutes longer if to answer questions if people are still okay okay perfect so um let's try and get and i've been kind of going through these as we've uh as you've been talking and i kind of want to i want to try and get the longest question out of the way first and i'm hoping you're the best person to answer um and so one i would suggest opening the q a read it yourself i'm going to read it out loud but it is it's about a paragraph so i'm gonna i'm gonna read it out loud while you read it so this is from james um is there a way to align cnns with digital twin simulations while updating real-time data python pipeline i'm currently using unreal engine and unity in creating dynamic immersive vr digital twins while connecting them with cnn optimizers for real-time data input would you understand which optimizer would work best also what is the optimal number of epochs one should use to maximize the accuracy of the cnn cool so um first of all i'm not very familiar with digital twin simulations so i'm not sure i'm a twin i got like i actually have a twin brother um i am also not familiar with digital tools okay so even if you're you're you're a twin and you're not familiar that's um but yeah basically but i i still think i can probably help with which optimizer would work best um adam is quite well the classic one i would say i pretty much always use adam because it's known to be quite good they might be better optimizers for cnn i'm not a specialist but i know adam is pretty good it deals with momentum and things like that so i would recommend adam although if you if it doesn't work do some research i'm sure you can find better ones as well and the second question which is what is the optimum number of epochs one should use to maximize the accuracy on the cnn so i cannot give you one number because it just depends on the problem but you can see here so here i've only used 10 epochs just because i didn't want to train the cnn for too long usually we would use more epochs but at some points so the validation loss here is decreasing but at some point you will see that you will actually start to overfit and what overfitting means is that the training loss will go down because you will learn better and better on your training sets but the validation loss will start to go up and so what i recommend doing is looking at that and when you see that your validation loss starts to go up after i don't know 10 epochs or so you can stop training your model because it means that your model is overfitting and won't be as good so that's how you can find the right number of ebooks i would say and that's called early stopping if you are interested um yeah just look at herb stepping and and i saw james had a second question a little later on so i'm gonna ask that now as well so um in this tutorial are you using the soft max optimizer as a result of having 10 different class outputs so softmax i'm basically using the softmax activation function in the end and so it's not really an optimizer it's an activation function which so basically my network is you're going to have 10 neurons at the end of the network and what the softmax does is it transforms those values into kind of probability so you're going to have 10 probabilities which is the probability that your image belongs to a class so the first neuron would be the probability that it belongs to the first class the second neuron would be the probability that your image belongs to the second class etc and then you can just take the highest probability right to have the prediction so that's what the soft mic does um so i'm using the soft max as the final layer as the activation function of the final layer of the cnn and the neural nets basically okay and then this is this one's from paula um how do you determine the size of padding so again that's the same as kernel size i would say that's a hyper parameter so here you can see there is no padding i didn't use any padding at all um that's i think that's the default in keras um so i didn't specify any padding here but basically the size of the kernel the size of the padding the size of the stride all those things you need to try different things and see what works um here no padding works well sometimes you can use one or two paddings i yeah i haven't seen much more than yeah one or two except if you're really expecting to have a lot of uh information i mean it could be also higher but if you don't expect to have a lot of information close to the corners um yeah i think zero one or two to start with should be all right um yeah you just need to experiment there is no strict rule here all right and this is from pedro what's the main difference between max pooling and average pooling on performance so usually max pooling performs better um and that's because max pooling extracts the dominant feature it takes the maximum value so the highest activation in some sense and it extracts this whereas average pooling does some kind of smoothing um so that's why i think max pooling is often better but i don't think there is a strict rule like it could be that sometimes average pooling works best but from my experience yeah i pretty much always use max pudding it's kind of the standard because it extracts this dominant feature i believe whereas average smooths things out but yeah all right and and this this one's from anushia um how many images approximately do we need for performing cnns i know it's subjective but is there like some type of guideline or um if you can provide some type of idea that would be great very good question and yeah i'm again i'm not really sure i will try to help and say that first of all it depends on the difficulty of the problem uh if you have like if you just want to classify black and white images of numbers for example that's probably easier than if you want to classify very complex large images of i don't know different types of dogs right so it depends on the complexity of the problem and it also depends on the number of classes that you have if you have two different classes you need much less images that if you have 1000 classes right because you want enough samples from each class i would say probably one a couple of thousands of example for each class that would be my guess so if you have 10 classes at least 10 000 examples but again it depends on the complexity of the problem and yeah on the complexity of the problem the size of the images and things like that but yeah couple of thousands at least all right and this is from uh naomi is it possible that using too many convolutional layers could degrade the classification accuracy i think it could be possible but yeah i'm not it will probably depends what you're doing i guess if you add additional layers with the wrong activation functions and things like that i guess it can you just need to try it's also i would say it depends on the problem sometimes fewer layers might work better it's not like all the time increasing the size of the model is better but you will see now that all those state-of-the-art algorithms are very very big and so usually would think that by increasing the size of the layer we will get a better performance um but then yeah if you want to have a very very big algorithm you're probably going to need lots of computing power you're going to need to wait a long time to train your model and so it might not be as good so it's but i think yes it could be possible that adding more makes the performance worse but usually more weights means better performance okay then sorry i'm just filtering out questions as they get asked and um so this one's from zach do you need to optimize the hyper parameters what would you use something like grid search cb yeah exactly you could use that some kind of grid search yeah a grid search with different sets of hyper parameters like kernel size number of layers if you want um the number of channels right of kernels that you're going to have in each layer yeah you can use some kind of grid search to find the best parameters or yeah that's usually good practice and yeah remember that you would also need to look at the learning rate the batch size all of these which are more common deep learning algorithms would also help and i'm pretty sure there are other methods then research that you could use for kind of smarter searches but which might be which might be better but research is exactly something that you can do okay and then this is from murad can we use cnn with other types of data such as genetic data i mean we can classify numeric values not only images or cnn uh will not work properly so do we have to use images in a cnn or can we use other types of data you can use other types of data there is some research on language there's probably some research on language for cnns but they really work best and they are well known for images so if you have like tabular data i would not recommend to use cnns at all because well traditional algorithms like boost xg boost or even regression and random forests will perform much better so yeah don't use cnn's on raw data and images and on text there might be some research but i don't think they're yeah there might be one or two people a few papers or they've been useful but it's really not the standard like cnns are well known for images and ins except if you really have a good reason i would just use them for images or video visual data because that's what they've been built for okay and then this one is from roya sometimes tensorflow takes a lot of time to be run is there any way we can decide better from the beginning how many layers to use so one trick that i have is and i'm gonna go back to this graph you can see here that the training loss is going down and forget the validation loss you can see that the training loss is going down so when you start make sure that that's the case like train a big enough cnn such that the training loss is going down because this means that your algorithm can at least learn the training data and once this is done you can have some techniques to avoid overfitting like dropouts um stronger regularization um things like that but first make sure that your training loss is going down by the amount that you want because i see sometimes people training a neural network and it doesn't work that's because if your model isn't big enough and cannot even learn the training set it's never going to do well on the validation or on the test sets so first of all make sure that your training loss is going down and then you can start looking at your validation loss and reduce overfitting so there isn't a clear size but just make sure your model is big enough to be able to learn from the training data because if you can't overfit your training data then your model will not will never be able to do well on the test set that would be my my advice okay so we're gonna try and get through like two or three more questions before we sign off so uh this is from kumar um how do we decide the initial weights used in the filter that's all random um you do the there is some initialization that you can set i'm not sure what's the default one but it's basically keras does this for you and it's pretty much random there might be some smarter ways i'm not an expert in that there might be some smarter ways to initialize the weight but random should be fine and then by training your model the way it will converge to their optimum value but initially you don't need to worry about anything you just set a random value and by looking at examples of or this is a dog or this is a cat or this is a dog or this is a cat and using back propagation your model will learn by itself what the weights should be so here i believe it's just initializing things pretty much at random because obviously your model doesn't know what the task is going to be um it doesn't know that it's going to be used to classify clauses so in the beginning just initialize things at random and lets the back propagation algorithm and the neural network do the job okay so i think this one's going to be our final question and we're going back to roya um can cnn use a combination of image pixel array and some other related variables from excel for instance merge mri pixel array plus patient demographics from excel if so can you please explain how thanks patient demographics so i assume that the first one is an image and the second one is raw data yep and so yes you can actually do that um and what you would do is you would have the cnn you will train the cnn on the image data only because you don't want to train the cnn on the raw data but then you remember that in the end you flatten the features from your cnn and you feed that into a neural network right so here there is i don't see any reason why you cannot just add additional neurons in the input layer which will be your demographic data so the cnn will should only be trained on the image but at the stage of the neural network instead of feeding it only the input from the cnn you can also feed additional inputs so it's at the second step of the algorithm that i would feed those additional features basically okay perfect and i am going to cut questions off there because we are 10 minutes over and i want to make sure um we have time to do a closeout so um thank you neil i'm just going to share my screen really quick so we can talk about next week's webinar um [Music] give me one second everyone um so next week will be july 27th so seven days from today and um we're going a little earlier than we were today as well this will be happening at 9 30 a.m pacific on july 27th we're gonna have a crash course on designing a dashboard a dashboard in tableau and we'll have irene diome with us she's a data analyst as well as a tableau ambassador um so we're going to be talking about what is tableau how do we design a basic dashboard you know how do we include a bar chart in our dashboard as well as other things um and how are we going to format it for good design practice so if you want to learn some some quick things on tableau tableau join us next week along with irene and um and we'll see you all there um but neil thank you so much for being here with us today we really appreciate it we'll make sure to get your uh the recording of this up it'll be up on our youtube channel um as well as on our website online.datasciencedojo.com also if you are one of our subscribers to our newsletters or if you rsvp'd on our website specifically for this event we'll make sure to send you an email with the with the link to the recording and the slides and the google colab and in all the good things neil shared today so neil we really appreciate you being here everyone thank you for joining and i hope everyone has a good rest of their day thanks thank you thank you very

hi everyone my name is nathan paccini i am a marketing manager from data science dojo uh and today i have neil leiser here with me and he is going to be giving um a presentation on an introduction to convolutional neural networks with tensorflow and this will include a little tutorial at the end i might refer to it as part two at the end here but neil is a data scientist at iwaka which is a fintech startup and he&#39;s also the host of the ai stories podcast where he brings on data professionals and tech leaders and they talk about you know their careers and how they use ai and day-to-day life to to impact our world so um so with that neil why don&#39;t you go ahead and um uh and you know things should be good to go yeah thanks thanks a lot for the intro can can you all see my screen is that good yes we can okay perfect um so yeah hi everyone today i will give you an introduction to convolutional neural networks with tensorflow i really wanted to thank data science dojo for organizing this event and yeah hope you will find this useful and interesting so just a few words about me first i don&#39;t think you know me quite well so as you can hear i&#39;m not from the uk i was born and raised in belgium but i moved to london seven years ago to study civil engineering at imperial college and then in 2019 i kind of realized that i was really interested in data science and machine learning so i did a master in london and there at the end of my master i did a research thesis on predicting the output of solar panels with satellite images and this is where i kind of went deep into convolutional neural networks because i had to deal with images and so that&#39;s also yeah how i got interested in this topic and that&#39;s why i&#39;m giving this talk today as mentioned already i i&#39;m a data scientist at iwaka which is a fintech startup and there i build machine learning algorithms for credit risk so we are a lending company we lend money to businesses and so we have algorithms which will decide who we are accepting for a loan how much money we are offering also some tools for losses prediction so this is what i&#39;m working on and in my spare time i also host the ai stories podcast where i invite people from all around the world to talk about their career and how they use ai in their day to day season one is now over and season two will start again in september october so if you&#39;re interested yeah feel free to follow all right so let&#39;s start with the talk now um just a brief overview of what i want to talk about i&#39;m first going to give a brief introduction like five minutes on ai and neural networks just to bring everyone on the same page then we&#39;re gonna go through the theory behind convolutional neural network and again this is just to show you the intuition behind how they actually work and finally we&#39;ll have the collab where i will show you how you can build your own cnn algorithm with python and tensorflow so hope you will find this useful and interesting let&#39;s start with the intro so in 2012 you may or may not know but there is this alex net paper which is very famous and this is where deep learning becomes well known to everyone and the alex net paper basically sets a new record on the imagenet challenge and imagenets so you have a picture here you don&#39;t see things very well but it&#39;s just an image classification problem you have loads of images i think 1000 different classes and you need to classify the images correctly and so you can see that on the graph at the bottom like in 2010 and 2011 the best algorithms had an error rate of 28 or 26 but in 2012 suddenly with this new alexnet&#39;s deep learning algorithm the error decreases to 16 and that&#39;s like a big boom a big impressive thing in the world of ai and lots of researchers from computer vision who were doing non-deep learning research actually moved to deep learning because people kind of realized that deep learning is very powerful and could actually be the future of computer vision so there is actually a paper from 1985 which trains a neural network using back propagation and convolutional neural networks have been around since the 1990s so how come that it&#39;s just in 2012 that we realized that neural networks and especially cnns can actually be so powerful i think there are two reasons for that the first one is we actually need a lot of computing power to train a neural network and this wasn&#39;t really available or known at the time and the second reason is we actually need a lot of data and the alexnet paper showed that with a lot of compute power and a lot of data you can actually train a neural network nowadays it feels easy to train cnn but back at the time it was much more difficult so because i don&#39;t know how familiar you are and what&#39;s your background i&#39;m just going to give a brief intro on neural networks before moving to convolutional neural networks because neural networks is really the basics and neural networks is essentially a network of neurons so each dot here each circle is a neurons and it&#39;s inspired by the brain and all those neurons are kind of connected to each other by arrows and each arrow represents a different weight so neural networks have three different parts they have the input layer right which is the input data there&#39;s the output layer which is the prediction essentially and in between that you&#39;ve got the hidden layers um here we&#39;ve got two hidden layers of four neurons each but actually you can have 10 hidden layers 20 hidden layers with how many neurons you want there isn&#39;t really any rule for that and this is just a toy example and yeah each arrow is a different weight so here we&#39;ve got three times four so we&#39;ve got 12 weights in this layer here we&#39;ve got 16 weights and here we&#39;ve got four weights so 22 weights in total and using back propagation you can train this neural network and find what the optimum sets of weights are um i&#39;m not going to go into back propagation or deep into neural networks if you&#39;re interested there are lots of readings on that i will provide some references at the end but hopefully you get some idea of what is a neural network and how it works um and yeah i believe we&#39;re gonna do questions at the end but or do we wanna i see we have a question now or a hand um but yeah let&#39;s leave the questions at the end of the first part because i might answer some questions um yeah in the coming slides um so yeah that&#39;s how neural network works and cnn are actually parts well you will see neural networks within cnns so let&#39;s move to the interesting part of the torch convolutional neural networks and the first thing that we can see so i described before cnns were used for image classification right with the imagenet paper and an image is actually just a matrix right it&#39;s a matrix with and each value within this matrix is a pixel or a number so here you can see it&#39;s just a 28 by 28 matrix and so the question is why do we want a neural network like why do we actually need why do we need a cnn and why not just a neural net of the first part so why not using just a neural network instead of a convolutional neural network right an image is just a matrix so what you can do is flatten this matrix into a vector and you can then feed this input into a neural network right and that&#39;s it that&#39;s done why don&#39;t we do that well first of all as a human let&#39;s say you want to classify dogs and cats how do you do you&#39;re gonna look actually at groups of pixels right you&#39;re gonna look at eyes at ears at the color of the skin and that&#39;s exactly what cnns do they take spatial dependencies into accounts and they look at groups of pixels neural network will only look at individual pixels and that&#39;s very difficult if you just have like a vector of pixels and need to identify dogs and cats the task will be much more difficult even just for a human so that&#39;s the first reason why we use cnns and the second reason is that they are actually more efficient to process images so here you have some kind of toy example of a cnn the input is an image and then the network the algorithm is divided into two parts the first part which is in blue is the cnn so we have convolutional layers and then pulling layers and the second part after the flattened layer is just a neural network and so the neural network is going to take the features from the cnn and use those features to make the classification but the cnn is going to extract the useful features from the images and so we&#39;ve got convolution and pulling layers and i&#39;m going to talk about this in more details in the next slides but for the moment just keep in mind that we&#39;ve got a first part which is convolution and pulling layers and a second part which is a neural network that you should yeah hopefully you&#39;re kind of familiar with so the intuition behind neural networks is that um behind cnn sorry is that on the first few layers of the model the cnn will be able to capture tiny details within the image like edges for example and as you get deeper and deeper into the model the algorithm manages to extract more kind of human interpretable features like eyes heads flowers and so basically the first part of the network looks at details and the last part looks at higher level features in some sense so let&#39;s dive into convolutions here you have a very clear example of what a convolution actually is and there is a note here a comprehensive guide to cnns um i really recommend you to read this i&#39;m taking a lot of images from there and yeah it&#39;s very good if you want to have a good understanding of cnns really recommend reading this blog post so convolution here we have a five by five image which is represented in green right with some pixel values which are zeros and ones and the convolution is actually a kernel so it&#39;s this three by three yellow square which goes around the image that&#39;s a conv that&#39;s a kernel and that&#39;s the process of doing convolution you can see the red numbers on the image the times one or time zeros those are the weights of the kernel so those are the weights that the neural network need to learn in order to um well extract meaningful features from the image and so by kind of rotating this kernel around the image we get what we call this convolute feature which is this kind of pink box and so the first value for example the four here is obtained by multiplying this kernel by the values on this left top left corner so here we&#39;ve got one times one here we&#39;ve got one times zero here we&#39;ve got one times one zero one zero zero zero one so in total we&#39;ve got four ones and um six five zeros and so that&#39;s why you get a four here and you repeat that and you get the control feature so that&#39;s that&#39;s it that&#39;s what a convolution is is just a kernel which rotates around the image to extract meaningful information now you can again see a kernel here and i want to describe two additional words which are quite important strides and padding so the stride is by how much you should shift the kernel so you can define the strides as you want here in this case the stride is two because you can see that the kernel shifts by two squares on the right and by two squares from top to bottom so the stride is two the second thing that&#39;s important is padding so you can see here those dotted lines um which kind of upsample this the image that&#39;s what we call padding so here we&#39;ve got horizontal and vertical padding of one because we&#39;re adding one square on each side of the image and the reason for padding is that sometimes you have interesting features on the corners of the image and if you don&#39;t actually use padding the kernel might be missing those features that are just close to the edges and by using padding you can actually capture better information sometimes finally i discussed the pooling so if you remember here we have a convolution so we apply the kernel to the image and here i&#39;m just showing a single kernel but i don&#39;t see any reason why you cannot apply 10 20 or 32 corners to this image right you will just have 32 different control feature and that&#39;s what we do in practice so that&#39;s why here you can see 24 by 24 by n1 is just n1 kernels that we&#39;ve applied to the image and that&#39;s why you have kind of different images here or uh yeah control features after the convolution it is quite common practice to apply this pulling operation and all this pulling operation does is it reduces the size of the control feature that&#39;s it so here you can see it&#39;s 24 by 24 by n1 after the pulling operation the dimensions here are are 12 by 12 by n1 so we reduce the dimension the dimensions by half how does this pooling operation works um you have different kinds of pooling you have max pooling or average pooling max pooling is probably the most well-known technique and the one that works usually quite well um and essentially here again you just have a kernel that rotates around this image or this control feature and max pooling is very simple all it does is it takes the maximum from a set of pixels or values so here you&#39;ve got four values if you apply the max pooling it takes the maximum which is 20 and extracts the 20 here um we use a stride of two here so if we move the kernel two squares on the right we have another max pooling and here the highest value is 30 and that&#39;s how we get this 30 number here average pulling is similar but instead of taking the max we just take the average and we use pooling to reduce the dimension of the image it makes it much easier to process the image but it also helps to reduce overfitting because it&#39;s actually robust to rotations and translations so a short example let&#39;s say that you have an i in this feature um in this square and let&#39;s say that suddenly you&#39;re rotating this i you will rotate the values within this square but if you apply the max pooling the largest value will still be 20 and so the max pooling will still extract the feature even though you&#39;ve rotated or translated the picture and so it makes the cnn kind of robust to rotations and translations and that&#39;s basically it actually you have your input you apply a set of kernels convolutions you have this max pooling layer you can have another convolution followed by another max pooling operation and here we&#39;ve got two convolution but you can have 10 20. and there is no rule for that and then after that you can just flatten the output and feed everything to the neural network which will then do the job for you um so that&#39;s hopefully that gives you some kind of intuition about how neural networks work and if it&#39;s still not clear hopefully it will be much clearer with part two where we&#39;re gonna build our own algorithm using python and tensorflow but first i believe we have some time for questions about the theory and then we can go into the code so i&#39;m seeing one question um and this is from nahal and uh they want to know if you can share you have a source the comprehensive guide to cnns um can you share that link with everyone i know we&#39;ll share the slides with everyone with the recording and and so you&#39;ll be able to access it through there but are we able to share that source now um so here i&#39;ve got a few sources here um should i just yeah i can try just share the slides on the chat since i&#39;m done with the presentation maybe yeah and then um data science dojo team if you can share the google collab with everyone now that&#39;d also be time to do it um uh sophia i know you have your hand raised if you can ask your question in the chat or not in the chat in the in the q a tab i&#39;d appreciate it um and then neil whenever you&#39;re ready i have a second question for you yeah so i&#39;m sure the slides and the references are at the end um but yeah i&#39;m ready for the second question okay um so this is uh from maruccio uh so they&#39;re they&#39;re referring to a second step i&#39;m guessing in the in the neural network but why does it have so many layers in the second step um can we yeah can we clarify the question which second step exactly is it the one from i am not exactly sure let&#39;s wait for clarification i think uh mauricio i think you&#39;re on one of our live streams but if you can just clarify for us um which second step neural network here is it&#39;s the neural network from this cnn that you&#39;re talking about or yeah which second step is it and then i i would be able to answer and then uh sophia has a question that i think will be a lot easier to answer right now is how do you determine the kernel so the size of the kernel that&#39;s a good question um so here for example we&#39;ve got a 3x3 kernel and that rotates around the image and the size of the kernel is actually a hyper parameter so we don&#39;t know what a right size actually is we need to try different things and see what works which is usually how it works for neural networks and you typical sizes are three by three five by five seven by seven um so you should try a bunch of those see what works but there isn&#39;t a clear rule on what kernel works best if you have a very small image i would recommend trying smaller kernels if you have a large image just try bigger kernels that&#39;s usually how it works okay and then this one is from brent and they would like to know how to create the convolved feature uh they had a hard time following so if i just recap here you have the kernel and you have some weights right the times one times zero times one those are weights that the neural network will learn here is just examples of zeros and ones but in real life it&#39;s not like that it&#39;s just values that the neural network would learn and then you just rotate this kernel around the image so you see every time you rotate the kernel you get a value here on this control feature how do you get that value by multiplying the weights from the kernel so the zeros and ones that are written in red by the value the pixel values in the image so to again describe the examples the example that i shared here you would do one times one right because that&#39;s the first weight of the kernel plus one times zero which is zero plus one times one which is a one so where we have a value of two at the moment plus zero times zero which is zero plus one times one plus one times zero plus zero times one plus zero times zero plus one times one so in the end we&#39;ve got four ones if we sum all of them together we get the value of four and we repeat that for by shifting the kernel around the image to get this convolved feature hopefully that&#39;s clearer all right and then uh james i want you to know i see your question we&#39;re gonna wait to answer that one at the end because i think that one&#39;s gonna take a little longer to answer um should be good to go on that so let&#39;s answer one more question um um just before we we start the demo um and i want to go back to where is he or she uh marichio um so the the question about um um the layers in the second step uh they are saying yes uh from the the image in the slide the okay i think it was the one that you showed um yeah i see so the intuition here actually you nee you can have as many layers as you want basically the convolutional neural networks extracts the meaningful features from the image so for example if you want to classify um if you want to detect whether someone has a face there is a face in the image or not the cnn will kind of yeah i know sorry i haven&#39;t shared my screen but there is nothing to really show i&#39;m gonna move to the collab after um or maybe i can well yeah so the first part of the cnn is basically extracting meaningful features from the data and those features could be eyes modes color of the skin errors things like that but then you still need a neural network to kind of um well detect all the kind of fine patterns within all those features so the cnn will find all the interesting features but you still want the neural network to see okay if yeah the person has a mouse long hairs and eyes then it&#39;s a girl if it has short arrows short hairs eyes and a mouth it&#39;s a boy so it&#39;s kind of the neural network still has to find patterns within those features but the cnn will find the interesting features and so the job of the neural network is much easier because the cnn does all the hard job or fight of finding the interesting features so yeah perfect um okay and then let&#39;s go let&#39;s start with the collab so hopefully now you&#39;ve got some understanding of how cnns actually work so let&#39;s see how we can build one and again the collab is divided into two parts a very short intro to tensorflow you&#39;ll see it&#39;s very simple very similar to numpy so if you know numpy it&#39;s just gonna be very similar and then i will show you how you can train your own cnn using the fashion using the fashion mnist data sets so if we go through the code first i&#39;m just installing tensorflow and importing a bunch of libraries nothing that&#39;s interesting i import tensorflow and tensorflow.keras keras is a library built on top of tensorflow which makes it very easy to build neural networks so i&#39;m just going to use that for the tutorial and here we&#39;ve just got a bunch of basically cells which show you how tensorflow works and tensorflow the main idea behind tensorflow is this tensor which is very similar to a numpy array so here i create a tensor of which is essentially a matrix of size 2x3 and if you kind of then print this variable to the screen you can see how it looks like but it&#39;s basically very similar to numpy like numpy you can create an array of ones if you call tf.once you specify the shape you get an array of ones and like numpy you can do some operations between the matrices so here we&#39;ve got two matrices you can multiply add them together you can do an element-wise multiplication you can do a matrix matrix multiplication all of this you don&#39;t need to know for the tutorial but it&#39;s just some recap and just to explain to you that tensorflow is very similar to numpy but we use this to train neural networks because it works very well it&#39;s computationally efficient and it&#39;s built specially for that so that&#39;s why we&#39;re using tensors instead of numpy and like numpy you can also manipulate the matrices so here for example i create a variable one which has a shape of two by three and then i reshape this variable to have a shape of one by six so one row and six columns and you can see that it worked i get another tensor and also again like numpy you can concatenate different matrices on top of each other and well you can have yeah basically here i build this variable one and i concatenate it on top of each other so you get the same matrix twice basically so the first dimension the number of rows is increased from two to four but again this is just play around with it if you&#39;re interested it&#39;s nothing that you need to know for today but yeah just found it interesting to show how you can use sensors and manipulate tensors so that&#39;s the first part hopefully very easy and yeah just play around with it in your free time the second part is the fun bit which is cnn&#39;s with the fashion mnist sets so let&#39;s go slowly through this code the first thing is just load the data there is this command using keras which loads the fashion and this data set and i have training data training label test data and test label and then here i&#39;m just storing the names of the class so it&#39;s an image classification problem and we have 10 different classes you can see we&#39;ve got t-shirt trouser pullover dress code standout you can see it&#39;s different kinds of clothes white which is why we have the name fashion mnist um here just some visualization so you can see that the training data we&#39;ve got 60 000 images of size 28 by 28 the labels we just have 60 000 labels which is the class and then the test data we have 10 000 images of size 28 by 28. so hopefully here very simple here i have some codes which you don&#39;t need to understand it&#39;s just to show you what the data set actually looks like so you can kind of visualize the data sets we&#39;ve got 16 images here you can see it&#39;s just different clothes the images are 28 by 28 so the quality isn&#39;t great but hopefully you can see that a sandal that&#39;s a sandal that&#39;s a sandal that&#39;s a pull over sandal again shirt ankle boots um bags well different kind of clothes so a classic machine learning image classification problem once we&#39;ve loaded the data the next thing to do is to actually process the data and the processing is very very simple and only needs three steps the first step is that i divide the data by 255 and that&#39;s just because all the values in the images are between 0 and 255 and i want values between 0 and 1. so just divide everything by 255 that just very common practice and then the second thing that i&#39;m doing is that i&#39;m expanding the dimension of my training data and the reason for that so you can see here i print remember that&#39;s before sorry this should probably go in front um but basically i&#39;m expanding the dimension of the training data and i will talk about this in a few seconds the third thing that i do is that i&#39;m just play splitting my training set remember we had 60 000 images in my training set and i&#39;m splitting it into training sets and validation sets so that i can i don&#39;t touch the test sets i only look at the test sets at the end of the process i train my neural network on the training set and i leave the validation set to see how well my algorithm is performing so here if you look at the shape of the training set it&#39;s now 50 000 rows right because i removed the validation set we have 28 by 28 images and then we have this extra dimension here which is just a one and this is what i&#39;ve done here i just expanded a final dimension and it is a one and the reason why i&#39;m doing this is because the cnn will expect to get an input of this shape and so if you don&#39;t expand the dimension the cnn is just not gonna work and that&#39;s because usually with more complex images you have actually three the dimension here is three because you&#39;ve got three different channels for blue red and green colors so here it&#39;s simple it&#39;s just a one but that&#39;s why i&#39;m doing this step here so processing is very simple we&#39;ve processed the data and now the final thing we need to do is to build the network so this is where i build my convolutional neural network and it might look quite complex at the beginning but we&#39;re gonna slowly go through that and you&#39;ll see it&#39;s actually very very simple so the model is i create this tfk.sequential object and this is just a list well it takes as input a list of layers so each line of code here is going to be a layer from my convolutional neural network so each line is just a different layer so this algorithm is divided into five parts the first part is the input layer that&#39;s simple you just need to specify the shape of the data that you&#39;re gonna feed into this algorithm right your model should know that it&#39;s gonna be images of size 28 by 28 by one so that&#39;s what i&#39;m doing here and we&#39;ve specified the input layer then if we just look at the comments and look at the code we&#39;ve got the first the second and the third convolutional layer followed by the max pulling layer so that&#39;s exactly what i described on the slides we&#39;ve got a convolutional layer followed by a pooling layer finally at the end we have a neural network so that&#39;s just a classic neural network first with 256 neurons and then with 10 neurons at the end because we have 10 classes and so each neuron is going to show at the end it&#39;s going to show the probability that the image belongs to a class we take the maximum probability and this maximum probability is going to be the class that&#39;s the prediction class essentially so if we go into the convolutional layers in more details the three layers here are exactly the same or very very similar and all it does is first of all you have the convolutional layer which is the tf key tfk.layers.conf2d and so those are just kernels which are going to extract information from your image as i showed in the slides 32 refers to the number of kernels that will apply to the image so in the example we just applied one kernel with zeros and ones remember here we&#39;re going to apply 32 different kernels to extract more information from the image the size of the kernel is going to be three by three as i showed you before and that&#39;s because the image has a kind of small dimension we set the strides to one which means that we move we rotate while we move the kernel by one square and we have this activation function which makes the cnn non-linear if you if you&#39;re interested in activation functions there is lots of literature on that i&#39;m not going to go into too much details here and then all you do here is have the pulling layer of size 2x2 which just reduces the dimension of the control feature by two it divided by two and that&#39;s it you do that again with the second layer except that here we want 64 convolved feature instead of 32 and then you do that again with the third convolutional layer so we have three layers here in the end we feed this into the neural network and finally we get the prediction so hopefully that&#39;s quite simple all you need is an input layer a bunch of convolutional layers and with max pooling and then finally the neural network and that&#39;s it that&#39;s your model um here i have there is some reading on the softmax activation function that i&#39;m using at the end of the network if you&#39;re interested to learn why i&#39;m using this activation function then there is a very good resource here with some formula but i&#39;m not going to go into too much details so remember what we&#39;ve done and yeah don&#39;t look at the mats here you also don&#39;t really need it all we&#39;ve done is we&#39;ve loaded the data we&#39;ve processed the data and we&#39;ve defined the model we just need one last thing before we can train the model and it&#39;s this cell here we need to define an optimizer which this optimizer will help us to find the weights of the suitable weights for the algorithms and we use adam which is very very standard we need a metric and the metric will tell us whether the algorithm is performing well or not and so here we use a sparse categorical accuracy which is very standard for when you use classification it essentially gives a value of one if you&#39;ve classified the image correctly and a value of zero if you didn&#39;t and then we also want a loss function because accuracy is not differentiable so we want a loss function to train the neural network and so here we use the sparse categorical cross entropy and again there is some reference here if you want to know why but i would say this is not really particular to cnn&#39;s that just a classic deep learning approach you need an optimizer a matrix and a loss function so read about that if you&#39;re interested what you do after that if you compile your model you set those uh those well you set the optimizer the last function and the metric and then all you need to do is train the model so here i&#39;m calling model.fit i&#39;m fitting it to the training sets and you obviously need the labels because the model is learning and i&#39;m using this validation data because i want to see how my model is performing on unseen data to avoid overfitting here i&#39;ve already trained the model it doesn&#39;t take too long using google colab if you use gpu but i&#39;ve trained it already and you can see here the loss function and basically as we train the model we want the training loss to decrease right because we want the loss to be as small as possible and we want the validation loss to decrease as well so this tells me that okay my model is learning the loss is decreasing let&#39;s now see how our model performs on the test sets i&#39;m with this line of code i&#39;m just evaluating my model on the test set that we&#39;ve never seen before it evaluates and we can see that the accuracy is around 89 which means that 89 of the time the model will classify the images correctly if you want to play around with the codes you can probably improve this by quite a lot you will just need to maybe add additional layers to the model or change the kernel size or change the activation functions you can play around with the model and add more layers if you want but if you want to learn i will encourage you to try to beat this 89 score on the test set i&#39;m sure you can improve on what i&#39;ve done here we can kind of see the prediction um so again this is the data set and you can see on top you can see the prediction and at the bottom you can see the label so we can see that the model is doing quite well for most of the values here the prediction matches the label right sandal sandal t-shirt t-shirt trouser trouser bag bag so it looks like the model is doing quite well and we can also see don&#39;t worry about this code but we can see where the model fails actually so here you can see examples where the model fails um and you can see that it&#39;s mostly around t-shirts pullover codes that&#39;s where the model seems to struggle um because here it predicts coat when it&#39;s a dress here it predicts a t-shirt well it&#39;s actually a shirt and we can see that it&#39;s actually quite difficult here um here it&#39;s a pullover it predicts a pullover but it&#39;s actually a coat so anyway you can see where the model is wrong and if you want you can also print the confusion matrix to see where your model struggled the most and try to improve it um but yeah this just shows that with basically trained a convolutional neural network on the amnes data sets and the model seems to perform quite well with like an 89 accuracy on the test set um yeah feel free to try to beat this and improve on that during your free time um and yeah i&#39;m gonna take some questions now hope this was clear okay perfect so we do have uh 16 questions and probably counting so um neil i know yesterday we talked that we might go a little bit over time is that still okay with you if we go a couple minutes over yeah i&#39;m happy to stay 10 15 minutes longer if to answer questions if people are still okay okay perfect so um let&#39;s try and get and i&#39;ve been kind of going through these as we&#39;ve uh as you&#39;ve been talking and i kind of want to i want to try and get the longest question out of the way first and i&#39;m hoping you&#39;re the best person to answer um and so one i would suggest opening the q a read it yourself i&#39;m going to read it out loud but it is it&#39;s about a paragraph so i&#39;m gonna i&#39;m gonna read it out loud while you read it so this is from james um is there a way to align cnns with digital twin simulations while updating real-time data python pipeline i&#39;m currently using unreal engine and unity in creating dynamic immersive vr digital twins while connecting them with cnn optimizers for real-time data input would you understand which optimizer would work best also what is the optimal number of epochs one should use to maximize the accuracy of the cnn cool so um first of all i&#39;m not very familiar with digital twin simulations so i&#39;m not sure i&#39;m a twin i got like i actually have a twin brother um i am also not familiar with digital tools okay so even if you&#39;re you&#39;re you&#39;re a twin and you&#39;re not familiar that&#39;s um but yeah basically but i i still think i can probably help with which optimizer would work best um adam is quite well the classic one i would say i pretty much always use adam because it&#39;s known to be quite good they might be better optimizers for cnn i&#39;m not a specialist but i know adam is pretty good it deals with momentum and things like that so i would recommend adam although if you if it doesn&#39;t work do some research i&#39;m sure you can find better ones as well and the second question which is what is the optimum number of epochs one should use to maximize the accuracy on the cnn so i cannot give you one number because it just depends on the problem but you can see here so here i&#39;ve only used 10 epochs just because i didn&#39;t want to train the cnn for too long usually we would use more epochs but at some points so the validation loss here is decreasing but at some point you will see that you will actually start to overfit and what overfitting means is that the training loss will go down because you will learn better and better on your training sets but the validation loss will start to go up and so what i recommend doing is looking at that and when you see that your validation loss starts to go up after i don&#39;t know 10 epochs or so you can stop training your model because it means that your model is overfitting and won&#39;t be as good so that&#39;s how you can find the right number of ebooks i would say and that&#39;s called early stopping if you are interested um yeah just look at herb stepping and and i saw james had a second question a little later on so i&#39;m gonna ask that now as well so um in this tutorial are you using the soft max optimizer as a result of having 10 different class outputs so softmax i&#39;m basically using the softmax activation function in the end and so it&#39;s not really an optimizer it&#39;s an activation function which so basically my network is you&#39;re going to have 10 neurons at the end of the network and what the softmax does is it transforms those values into kind of probability so you&#39;re going to have 10 probabilities which is the probability that your image belongs to a class so the first neuron would be the probability that it belongs to the first class the second neuron would be the probability that your image belongs to the second class etc and then you can just take the highest probability right to have the prediction so that&#39;s what the soft mic does um so i&#39;m using the soft max as the final layer as the activation function of the final layer of the cnn and the neural nets basically okay and then this is this one&#39;s from paula um how do you determine the size of padding so again that&#39;s the same as kernel size i would say that&#39;s a hyper parameter so here you can see there is no padding i didn&#39;t use any padding at all um that&#39;s i think that&#39;s the default in keras um so i didn&#39;t specify any padding here but basically the size of the kernel the size of the padding the size of the stride all those things you need to try different things and see what works um here no padding works well sometimes you can use one or two paddings i yeah i haven&#39;t seen much more than yeah one or two except if you&#39;re really expecting to have a lot of uh information i mean it could be also higher but if you don&#39;t expect to have a lot of information close to the corners um yeah i think zero one or two to start with should be all right um yeah you just need to experiment there is no strict rule here all right and this is from pedro what&#39;s the main difference between max pooling and average pooling on performance so usually max pooling performs better um and that&#39;s because max pooling extracts the dominant feature it takes the maximum value so the highest activation in some sense and it extracts this whereas average pooling does some kind of smoothing um so that&#39;s why i think max pooling is often better but i don&#39;t think there is a strict rule like it could be that sometimes average pooling works best but from my experience yeah i pretty much always use max pudding it&#39;s kind of the standard because it extracts this dominant feature i believe whereas average smooths things out but yeah all right and and this this one&#39;s from anushia um how many images approximately do we need for performing cnns i know it&#39;s subjective but is there like some type of guideline or um if you can provide some type of idea that would be great very good question and yeah i&#39;m again i&#39;m not really sure i will try to help and say that first of all it depends on the difficulty of the problem uh if you have like if you just want to classify black and white images of numbers for example that&#39;s probably easier than if you want to classify very complex large images of i don&#39;t know different types of dogs right so it depends on the complexity of the problem and it also depends on the number of classes that you have if you have two different classes you need much less images that if you have 1000 classes right because you want enough samples from each class i would say probably one a couple of thousands of example for each class that would be my guess so if you have 10 classes at least 10 000 examples but again it depends on the complexity of the problem and yeah on the complexity of the problem the size of the images and things like that but yeah couple of thousands at least all right and this is from uh naomi is it possible that using too many convolutional layers could degrade the classification accuracy i think it could be possible but yeah i&#39;m not it will probably depends what you&#39;re doing i guess if you add additional layers with the wrong activation functions and things like that i guess it can you just need to try it&#39;s also i would say it depends on the problem sometimes fewer layers might work better it&#39;s not like all the time increasing the size of the model is better but you will see now that all those state-of-the-art algorithms are very very big and so usually would think that by increasing the size of the layer we will get a better performance um but then yeah if you want to have a very very big algorithm you&#39;re probably going to need lots of computing power you&#39;re going to need to wait a long time to train your model and so it might not be as good so it&#39;s but i think yes it could be possible that adding more makes the performance worse but usually more weights means better performance okay then sorry i&#39;m just filtering out questions as they get asked and um so this one&#39;s from zach do you need to optimize the hyper parameters what would you use something like grid search cb yeah exactly you could use that some kind of grid search yeah a grid search with different sets of hyper parameters like kernel size number of layers if you want um the number of channels right of kernels that you&#39;re going to have in each layer yeah you can use some kind of grid search to find the best parameters or yeah that&#39;s usually good practice and yeah remember that you would also need to look at the learning rate the batch size all of these which are more common deep learning algorithms would also help and i&#39;m pretty sure there are other methods then research that you could use for kind of smarter searches but which might be which might be better but research is exactly something that you can do okay and then this is from murad can we use cnn with other types of data such as genetic data i mean we can classify numeric values not only images or cnn uh will not work properly so do we have to use images in a cnn or can we use other types of data you can use other types of data there is some research on language there&#39;s probably some research on language for cnns but they really work best and they are well known for images so if you have like tabular data i would not recommend to use cnns at all because well traditional algorithms like boost xg boost or even regression and random forests will perform much better so yeah don&#39;t use cnn&#39;s on raw data and images and on text there might be some research but i don&#39;t think they&#39;re yeah there might be one or two people a few papers or they&#39;ve been useful but it&#39;s really not the standard like cnns are well known for images and ins except if you really have a good reason i would just use them for images or video visual data because that&#39;s what they&#39;ve been built for okay and then this one is from roya sometimes tensorflow takes a lot of time to be run is there any way we can decide better from the beginning how many layers to use so one trick that i have is and i&#39;m gonna go back to this graph you can see here that the training loss is going down and forget the validation loss you can see that the training loss is going down so when you start make sure that that&#39;s the case like train a big enough cnn such that the training loss is going down because this means that your algorithm can at least learn the training data and once this is done you can have some techniques to avoid overfitting like dropouts um stronger regularization um things like that but first make sure that your training loss is going down by the amount that you want because i see sometimes people training a neural network and it doesn&#39;t work that&#39;s because if your model isn&#39;t big enough and cannot even learn the training set it&#39;s never going to do well on the validation or on the test sets so first of all make sure that your training loss is going down and then you can start looking at your validation loss and reduce overfitting so there isn&#39;t a clear size but just make sure your model is big enough to be able to learn from the training data because if you can&#39;t overfit your training data then your model will not will never be able to do well on the test set that would be my my advice okay so we&#39;re gonna try and get through like two or three more questions before we sign off so uh this is from kumar um how do we decide the initial weights used in the filter that&#39;s all random um you do the there is some initialization that you can set i&#39;m not sure what&#39;s the default one but it&#39;s basically keras does this for you and it&#39;s pretty much random there might be some smarter ways i&#39;m not an expert in that there might be some smarter ways to initialize the weight but random should be fine and then by training your model the way it will converge to their optimum value but initially you don&#39;t need to worry about anything you just set a random value and by looking at examples of or this is a dog or this is a cat or this is a dog or this is a cat and using back propagation your model will learn by itself what the weights should be so here i believe it&#39;s just initializing things pretty much at random because obviously your model doesn&#39;t know what the task is going to be um it doesn&#39;t know that it&#39;s going to be used to classify clauses so in the beginning just initialize things at random and lets the back propagation algorithm and the neural network do the job okay so i think this one&#39;s going to be our final question and we&#39;re going back to roya um can cnn use a combination of image pixel array and some other related variables from excel for instance merge mri pixel array plus patient demographics from excel if so can you please explain how thanks patient demographics so i assume that the first one is an image and the second one is raw data yep and so yes you can actually do that um and what you would do is you would have the cnn you will train the cnn on the image data only because you don&#39;t want to train the cnn on the raw data but then you remember that in the end you flatten the features from your cnn and you feed that into a neural network right so here there is i don&#39;t see any reason why you cannot just add additional neurons in the input layer which will be your demographic data so the cnn will should only be trained on the image but at the stage of the neural network instead of feeding it only the input from the cnn you can also feed additional inputs so it&#39;s at the second step of the algorithm that i would feed those additional features basically okay perfect and i am going to cut questions off there because we are 10 minutes over and i want to make sure um we have time to do a closeout so um thank you neil i&#39;m just going to share my screen really quick so we can talk about next week&#39;s webinar um [Music] give me one second everyone um so next week will be july 27th so seven days from today and um we&#39;re going a little earlier than we were today as well this will be happening at 9 30 a.m pacific on july 27th we&#39;re gonna have a crash course on designing a dashboard a dashboard in tableau and we&#39;ll have irene diome with us she&#39;s a data analyst as well as a tableau ambassador um so we&#39;re going to be talking about what is tableau how do we design a basic dashboard you know how do we include a bar chart in our dashboard as well as other things um and how are we going to format it for good design practice so if you want to learn some some quick things on tableau tableau join us next week along with irene and um and we&#39;ll see you all there um but neil thank you so much for being here with us today we really appreciate it we&#39;ll make sure to get your uh the recording of this up it&#39;ll be up on our youtube channel um as well as on our website online.datasciencedojo.com also if you are one of our subscribers to our newsletters or if you rsvp&#39;d on our website specifically for this event we&#39;ll make sure to send you an email with the with the link to the recording and the slides and the google colab and in all the good things neil shared today so neil we really appreciate you being here everyone thank you for joining and i hope everyone has a good rest of their day thanks thank you thank you very

Transcript for:Introduction to Convolutional Neural Networks with TensorFlow

Transcript for:
Introduction to Convolutional Neural Networks with TensorFlow