Transcript for:
Quantum Neural Network Overview and Example

hello welcome welcome back to [Music] quantum computation simplified i'm kadri singh here so in this video we'll learn um about quantum neural network what are the steps involved and then we'll also see an example code this is as a follow-up video from my previous one on quantum machine learning uh very now we saw about the support vector machine right so before we jump into network the actual one uh we'll see what is the current approach or status basically we are trying to enhance a classical machine learning algorithm or in this case a neural network algorithm with some piece of quantum computer basically for executing time consuming calculations we need to have some optimizing algorithms from a classical machine learning architecture because for neural network we do not have so with these two points you can say that there is no hundred percent quantum approach for machine learning and in particular the neural network that also gives us a lot of opportunities right to come out with better quantum algorithms and it has got many applications like whatever we have applications for classical neural network to name a few speech synthesis classical classification categorization nlp financial modeling molecular modeling i mean we can name many applications and as usual we have a drawback that the existing uh quantum hardware uh i mean they do not have sufficient logical qubits uh to solve real life problems when i say real life problem we can do so now also but certainly you don't solve a larger problem which will have certainly advantage over classical algorithms but that doesn't uh really prevent us in order to come out with good machine learning algorithms particularly quantum machine learning algorithms so that once the quantum hardware is available we will be able to quickly make use of the algorithms so that is exactly the reason for us to learn quantum machine learning and now so before uh going into the quantum neural network we'll try to have some uh refreshing knowledge about the classical neural network and there is also a prerequisite for you to understand this video one you should have basic knowledge about quantum computing if you do not have uh please go through my earlier videos on this subject or you can also go through other videos and textbooks which will help you to understand the basics in addition it's better if you know classical neural network and a bit of python coding but if you do not have it's fine anyway that's what we are going to see now and those people who are familiar with classical neural network they can skip a couple of videos in the sorry couple of minutes in this video and they can move on to quantum neural straight away but it's better if you can see the steps involved the way we are going to execute the neural network and trying to compare the classical network so i mean i'll leave it to you uh yeah we'll start with what are the steps involved in classical neural network we know that we need to have some input data for training and testing so you will have some set of data for training and some other data for testing so when i say input it will have a data and label for classification problem again neural network is used not only for classification problems it has got a wide application for example to predict stock value based on the input conditions and the prevailing conditions so that case you will have a different set not just a label maybe some range of values okay so in order for us to understand this one i'll make use of for this data set which has come from a national institute of standards and technology usa wherein they're given a lot of handwritten images and the corresponding digits so that's why we can call them as the digit as label and this as data so and the data is available in pixel format and they're given for every data in 28 by 28 pixel and each pixel has got a value so if it is so white now it's assigned as one and if it is black it is zero and in between for other grayscale pixels you have values between 0 and 1. so that's the input we have in this example which we are making use of it for learning the steps involved and typically uh it has got a structure like input layer hidden layer output layer so what is input layer input layer is nothing but the data converted to network format okay so in this case 28 by 28 pixel with each pixel is having value so that is flattened out for uh to 784 neurons we call them it's known as neuron so you have 784 input layer and what should be the size of the output layer obviously you need 10 neurons in the output layer so representing 0 to 9 digit correct and in between input and output layer we will have something called hidden layer so how many hidden layers we should have and how many neurons we should have on each hidden layer or that is the challenge so that comes possibly by experience today we don't have any mathematical uh background to arrive at that number i mean and hope we may be able to do or get the right number otherwise people do on trial and error basis and if you have more hidden layers called deep neural network and if you have less it's called cello neural network okay afterwards now that we have values we need to get what is the value coming here so that is the expected value so that is done uh with the help of assigning values uh for the intermediate layers or hidden layers so that is done with the help of again weights and biases that is using this activation values plus some weights and biases we will be able to get the values of neurons in this layer and from this will be able to get like this i mean like this so you will be able to get at the end the output layer so how exactly we'll be using weights and biases we will see in the subsequent slide and as you can see here weight and bias it is like a straight line equation y equal to m in your weight is above m and this is x and the bias is c so that will be like a linear equation so the relationship between these two layers can just be linear we need to bring in some non-linearity so that is done with the help of some functions like relu oh it's called rectified linear you need sigmoid and there are many more functions available that will help us to bring in non-linearity okay so initially you'll start with some random weights and biases for all this um nodes but obviously that will not give the right output right so then what you have to do is you have to find out the error between the estimated value and the target value and then you do something called cost function you define a cost function which again is a function of the error then you try to minimize it so in minimizing it you know that how together so you have to increase the value or decrease the value that you will be able to find out using gradient decent method and you know that for the minimum value you will get the slope as 0 correct the first derivative is 0 what we would have studied in our school days so the same approach it's very simple mathematics and from there i you should know how to how much to be corrected to get the minimum value and once you from here you will get the values i mean you will not change the values you will be modifying the weights and biases to get the new set of values like that you will go back till you get weights and biases for all the nodes and the connections correct of course this you won't change it because this is the input so all other values will get changed because of the back propagation and all these steps so you'll go back and forth many number of times till you are satisfied with the output so typically at the end one would expect for example for this nine obviously this should be one and this all other outputs should be zero but that is again theoretically that's possible practically you will get some values and this will be close to one so and in order for this to happen as some set of values should happen in this layer and some other set of values should happen in this layer so that's how we will be able to repeatedly do i mean doing the calculation to get the desired output so for 6 it may look like this this is not exactly a solution just for our understanding we have indicated how the neural network will look like particularly when you have two layers two hidden layers and each layer is having 16 neurons so that's what we'll be seeing in the subsequent slide okay and all these figures are taken from two references one there's an extremely uh good series of videos on classical neural network three blue one brown uh series please go through that if you want to understand more about classical neural network uh there's another ebook available in this place and that is written by michael nielsen you know who is michael nielsen right so he has co-authored the bible of quantum computing so that's again a very good book for us to understand a classical neural network so now let's go slightly deeper into that so now okay so we have got the input flattened to have 784 weights i'm sorry 784 neurons and from this neurons and the set of weights so how many weights 784 into 16 because we are going to have two [Music] hidden layers and each layer hidden layer is having 16 neurons so you will you need 784 into 16 weights to in order to get one neuron so like that we need uh i'm sorry 784 weights you need for one neuron so totally for this layer you need 784 into 16 weights and of course 16 biases should be sufficient so that is what we'll be doing take the weight this way to multiply by sorry this neuron value multiplied by weight then like this you calculate for all the neurons with some random weights initially to start with after the summing up all you add a bias then you do some and non-linearity to get the value of this neuron so like that you have to do 16 times right then afterwards you move to the once you have values for all these neurons you can calculate for this one similarly you can calculate for this so here we have 10 so totally how many weights 784 into 16 plus 16 into 16 plus 16 into 10 thousands of weights right but biases luckily we have only 32 42 biases so when you start with random number for example for this image you may get something like this but the expected value is like this right so you need to subtract that is what you are doing here uh the expected value minus whatever you have found out you take uh square of it modulus take the modulus and then square of it it is one of the cost functions there are many cost functions which can be defined again except you to decide possibly uh once you become an expert you'll be able to do the the right choice afterwards uh you have to find out the slope so how do you do that ah you have to take a derivative uh but since we have two variables we need to take partial derivative once with um w uh the weight and the another time with the bias value so that will give you the gradient how much you have to move so once you know that then you subtract to get the minimum value so the new set of w's will be calculated from the old set minus the gradient again there are many numerical methods available so you don't have to break your head to write the code so you can make use of the code for an example and this may be the original set of w for a particular layer and this will be the gradient w you add you will get a different [Music] i mean then and different set which will take us to the optimizing optimized solution quickly so from uh from this cost functions first you will estimate the weights and biases for this guy um for this layer then you keep moving back it's called back propagation okay and of course this value you won't change it because that is the input right then like this you have to go back and forth so that is called typical one iteration or in machine learning uh terminology it's called one epoch and how many times you do till you get satisfactory results okay i mean now you in order to calculate the activation values for this neuron uh you you really don't do like one by one uh you will do matrix multiplication okay so you need to have so this is how many uh 784 so you need to have 784 here and then this will be 16 so that output you need to have 16 values correct so it's a matrix multiplication plus matrix addition you will get this set similarly you will get with simple calculation like this and you should remember that for each of the input image you will get a set of weights again for each layer so that you need to take average of it so you have to list out all the values that mean it all shows that a lot of computers though they are very simple but you need to do lot of computations correct so you will be able to get the right values of w and b once you run reasonably a good amount of epochs after that the model is ready and in future if you are able to get the new handwritten image you will be able to classify it correctly but you can see there are two [Music] lucknow in this method one just for identifying zero to nine digits we need to have so many variables so these many weights and biases are required required right so is there any better way of doing it yes there is a way and that way also gives another advantage the advantage uh the drawback first we'll see in this approach is suppose you have an image like this and there is a 2 here there is a 2 here and it doesn't really see the spatial location of it correct so you just flatten it it doesn't really care so that is not a good approach but if you are able to find out a feature like this particularly when you have a larger picture when you want to identify many things in the picture so this is not the right approach so for that experts come out with something called the convolution neural network so what they try to do is they'll try to capture the feature from the input so the input is same 28 by 28 so they will define one feature so feature is nothing but some matrix two by its square matrix obviously uh for making the calculations simpler uh so in this example what they have done is they have taken a five by five so that will try to capture some piece of information from this picture so typically when you have five by five you the next matrix size will become 28 minus 5 plus 1 that is 24 but that means so we are having only 5 by 5 25 weights only not with the big multiplication 786 plus but we need to have some more so how i need to have some more is so this particular five by five may find out only one feature so like that there may be many features so one feature may be like this another feature may be like this another one like this so finally you may have to put together to have one or like this three correct so you need to have many filters people call it as a kernel filter i mean different names people use so that will help you to extract the right feature okay so in this example they have shown that three feature features they have taken okay so this is called convolution ah layer so why is it called convolution layer because we are having a matrix multiplication here it's a special matrix multiplication not the normal multiplication convolution so you have rows and columns so try to multiply and then at the end you add a bias then of course so we need to have a non-linearity also so now the problem size is reduced but this itself is not sufficient we want to reduce the size still further so that is done with the help of something called pooling so how polling is done is suppose we are having say 24 by 24 here so that i'll take with four again depending on the size what we define so there is no hard and fast rule like how many uh what is the size of the matrix so okay in this example they are given two so out of two by two i mean so this out of four um values which one is having a larger value so that alone will be taken care because only the important one we are trying to take care ah remaining we just neglected there is one way of doing it's called max pooling but some people say no no no we have to take care of all then people do average pooling like that different techniques are available so which will help us to reduce the size so in terms of depth or in terms of feature maps it doesn't reduce but the size the two-dimensional one will get reduced and for you to have better understanding of convolution maybe i will try to show with this example so this example is not for the problem what we are working on ah but this is for some other problem so wherein we define the filter or the kernel a 2x2 with one one one obviously it's for simplicity output one one one and certainly they all will have different values and what we do is take the first four values then you multiply corresponding one 1 by 1 plus 1 into 2 plus 1 into 4 plus 1 into 5 which is nothing but 1 plus 2 plus 4 plus 5 which is 12 then you slide to the next value like that you keep doing okay till you cover the entire area so that after doing that you will get a reduced number depending on the size of the kernel what you use similarly you go this way as well as this way good so similarly uh you may do the same way here you'll slide so in a neural network term it's called stride sometimes you don't go for on each pixel you may skip in between one and so then stride value will be different again whether to skip or not it comes by experience and the output accuracy okay so for this also you will use the i mean the same ebook also there is another paper which explains convolution neural network very nicely all right so now that we have some basics we will run through your code okay i'm sorry yeah before that we need to see how to correlate the classical neural network with the quantum neural network so this bullets are same as what we have seen for classical network and then we are seeing what are the changes required for quantum neural network so the input data label for training and testing yes we will have two different set of data the only difference is we may straight away we can use the data in a classical way and then keep doing the training or we will convert them into quantum states so people follow both approaches so it can be either classical or quantum the way we handle the data as far as the structure is concerned again we will have input and input hidden output layers but the difference is some layers can be quantum some layers can be classical or all layers can be quantum it's possible okay then building models using weights and biases so here also we do but whenever we use the quantum neural network we don't call them as weights and biases instead we call the angle of rotation theta okay so we'll try to optimize the theta value then non-linearity scaling optimization sigmoid they're all done through classical algorithms because we do not have equivalent quantum algorithms for doing this operation then learning through cost function stochastic gradient descent and back propagation done with combination of classical and quantum algorithm some people do full with classic some people do wherever possible quantum so we will see in our example we will make use of quantum for gradient that descent so as you can see here it's like a combination of classical and new quantum so this picture is given by wikipedia you can see here the type of algorithm can be classical or quantum similarly the type of data how we handle that can be quantum and classical so there are four approaches possible so this we are not interested because we want to do something a quantum way and this has not been reached so we need to depend on classical computer so the most widely used one will be quantum algorithm use and possibly yes classical data okay or this also is possible quantum data some bit of quantum algorithm and classical algorithm so it's a kind of combination of these two we will be having okay as i told you that there are many possibilities so let's see how each company is trying to do so this is one of the approaches when i i repeat one of the approaches it's not the approach for ibm so i in this example of code we will be using this approach so wherein we'll start with the classical code and then because we don't want to push the quantum state and we'll do have a couple of you know classical neural networks and the output of that we will be using to determine the angle of rotation so we'll have one layer of quantum network and then again output of this will be used to calculate the value and define the cost function then you need to have a decent gradient then optimization algorithm so you'll go back and forth so this is one of the approaches for from ibm and this is slightly at the higher level like given by penny lane penny lane is a company uh who works i mean a lot on quantum neural network so you can go to their video i mean go to their website or see their video so roughly i mean in order to understand that they're given that it's a combination so this is quantum and they're all classical ways okay and this is from google so you know that google has already got nice classical machine learning algorithms in their tensorflow so they'll make use of tensorflow and here but they start with quantum that in fact they transform the data to quantum state then they will have some quantum manipulation in terms of quantum and neural network make the measurement convert the data to classical uh network form then you again get the estimated value compared with the target value define the cost function try to minimize it and then update these parameters okay again it's given at a high level uh but one good thing is they have functions for doing everything i mean you can choose you don't have to write the code actually all right so with this we can go no before that yeah we will try to understand one more bit so as we have seen there are two approaches one is enhancing the classical machine learning algorithm by outsourcing certain difficult calculations or optimize the quantum algorithms using a classical machine learning architecture so again we are trying to do some error in between so we'll start with in in the coding example we will be again using the same mnist database but instead of having all the digits 0 to 9 we will work only on 0 and 1 to make problem simpler then we will be defining a quantum class so i do i say class instead of a circuit you will see later so in this you will be surprised that we are going to use only one qubit and with that qubit uh of course not alone we'll be having some more the classical neural network so we will be manipulating the state of the qubit with the hadamard gate and the rotational gate so normally the rotation is done along i mean with respect to y axis or x axis and certainly not on z axis i hope you know why then make a measurement from the measurement find out uh the values and its probabilities so that's the data again you will use it as i told you earlier the data loading the input is done in a classical way and there are some classical neural network uh layers are used from pi touch only one layer is from quantum what we have seen here and the gradient descent is also done in our example through quantum so you execute theta correct normally so for gradient what you do is theta plus s there is something called shift so you go one step above and then one step below that's the term minus one the difference will give you the gradient and this gradient you can use for optimizing so again for optimization you will use pythons clear so maybe yeah even if you have slideout you may have a better understanding once you see the code so let's go to the code yeah so this is the code which you're familiar i have just copied the all the i mean whatever is available in ibm's website so i have not written any code i mean why i'm not running um it's basically to make my life easier to explain otherwise you can also use their website okay so as usual we need to import all the functions required from quizkit library and two new things you can see here uh transparent assemble you will understand better when you could when you use these functions and the totally new thing is importing all the pythagoras related functions again specifically for neural network so that's what is being done here so let me run it so the next step is building a quantum circuit so that we are not directly building we are building through a class the reason is we are going to combine both classical and quantum so that's the reason we are trying to define a class and then later i use an instant of this class so for that class we have to initialize it so the initialization is done therein actually you will make a prepare the circuit typically as you know that depending on the number of qubits you get the circuit the number of qubits set in the required state then you have adam hadamard gate of course you have in between some barriers for better visibility then you have a rotational gates so rotation with respect to y the only difference you will see here the parameter theta so the theta is coming from elsewhere so we do not know the value when you build the circuit but that value you will come to know from the classical networks afterwards you make a measurement then this is same as the back end what we used to use earlier which and then of course in this case we are going to have only a simulator later if you want you can run it on quantum computer and the number of starts you know and for run also we are defining a function uh instead of running straight away so this is what i was telling you the new functions we are trying to put together all whatever we are defined here like what is the circuit what is the background and number of sorts and how to get the theta value so once you have everything you run it here and from the once you run it you get the results from the results you get the counts and states and then you have the probability and the value which will give us the expected value that value you'll be passing it on to the next stage okay so i think i've run it if not okay so let's see how the circuit will look like it's same like what we have seen in the slide so you start with the qubit at zero state hadamard it apply some barriers then you have another y gate and theta value comes from outside make a measurement then you evaluate that's what and then you do for the next operations so this is a function i mean a class for a hybrid function so basically you need to again bring the all the quantum one for forward operations and then take the result and give it to pythage so that is exactly for back propagation okay that's what is done here and for back propagation uh with respect to [Music] um quantum way uh we are only making use of for the gradient as i told you we need to get theta plus yes theta minus s of course you need a big expression because there's an array so then you evaluate that means running the quantum circuit for the plus s theta minus s then you subtract you get the gradient and that gradient you pass on to python function as a tensor okay so this is the class which calls again which really marries the quantum with uh the pythag function and then that module is a python uh yeah pi touch model you can see their documentation to understand uh this one better so that will help us i mean you need to define functions for initialization and to call the hybrid function what we have defined earlier so let's run it into now we have finished coding of course you can see here we are not written big code other than calling [Music] some of the quantum library functions plus some bit of python programming for us to get married to the classical network so before that we will get the data so as i told you we are going to use the mnist database that's what is being done here and we are not going to make use of all the digits 0 to 9 we are going to have only 0 and 1 so that is what is being done here identifying the indices of that data then you have you collect the data and the target value and assign to this variable so let's run it so now we have the training data what is the next one so if you want i mean this is really not required for training the model but just for you to get a feel of it let's see how the data looks like you can see here so this is labeled 0 1 1 0 1 okay and then next step is to get testing data so for testing we are taking a less number of samples uh here again now we need to use only the data for zero and one so that is what is being done and that is assigned to the variable test loader okay once we run it so this is the key thing where for the neural sorry classical neural network so you need to have some initialization functions depend what is the kernel size and how many layers you want to build so how many neurons all those things are defined of course here and then the forward function you have to define right how is it done so you need to take the convolution of from the input data then apply non-linearity max pool you do then the second convolution layer non-linearity max pool you do then something called dropout which is not explained but this further to simplify the problems if you want to understand dropout you can see any textbook on a neural network then we need to have flattening flattening also needs some bit of i mean then you need to have non-linearity then there is some more layers of classical network and finally you need to marry this with the the output of this value should go to quantum one right and afterwards you get back the value and categorize keep it in the pythage function so and don't worry i mean if you were not understood each and every function and there's a nice documentation on python's website now we are defining all the code so what is next we have to call those functions and then start training the network so let me give a run because it takes a lot of time before i start explaining so it started running so we need to call this function model network then there are different optimizes optimization tools or functions available so one search is called adam that is what is being called here then you need to have some loss function sorry the output of logs function uh that is also calculated of from the built-in function then the number of epochs or number of iterations is defined here as 20 because it should be sufficient so you may need some you may need get some improvement if you increase it slightly but otherwise it works nicely then the last list is also required so why is it required we will see here down so this is where the training takes place so you start i mean for each epoch the following steps so the batch index we have seen the dart data that we are getting from the train loader right then you need to initialize to zero value the gradient while starting and then output is the model so this is the statement which really executes all the functions what we have defined and it will give the last based on the output and the target value so we will get output from here and the target value we know from there so we get the last value with true whatever function defined in the pythage and what we are calling so that one we will do use it for back propagation so that is what is being done here and then once the back propagation is done you need to of course optimize it that means how to select the next set of weights okay so that is being done here afterwards yes last list is available that we are trying to see and if it is a good model you will get minus one so y minus one again you see the python's documentation so how that is defined this last item and then you see here since we are given 20 a box right so each box is roughly 5 of the total training so we have reached the 70 percent we need to have another 30 percent but the learning looks reasonably good all right so we have to remember a couple of things here so one yeah the epochs is one thing that's not a big deal but the learning rate is going to be a big deal which will affect the real learning rate and of course the number of hidden layers and number of you know neurons in each layer i mean the head each hidden layer so they're all going to be a challenge in order to get the optimum solution so how do we say that we have a good so a good model that you can see by looking at the loss and more than that we will run with some test data that will give us the confidence okay now the learning is done it's reasonably good but sometimes i have seen it has gone up to minus 0.99 something so this i hope this should be sufficient to get 100 uh prediction so we are going to evaluate the train model using test data so again now we'll be having the same approach only thing will not go back and forth so we'll do go once or straight so we'll execute this alone from there you get the prediction you check whether the estimation is correct or not and then what is the loss and that's what we are going to print now okay the accuracy is really great though the last is okay i mean minus 0.96 but this is very nice right with one qubit we are able to get a good accuracy so let us see some of samples whether it is really done or not so what we are going to do is plot the image and then uh give the title with the value what it has predicted you can see here this predicted all maybe i don't know whether maybe i'll run it with another set uh you can see here their time it was all one and now you can see it was able to detect uh very very nicely as you can see here um it involves a lot of mathematics there's no doubt about it but all those things are very simple not very complicated mathematics so i'm sure you'll be able to easily pick up if you again need some more data or some more explanation you can go through the websites of ibm google penny lane i try to cover most of the essential things here so when you start coding i'm sure you will have some doubts certainly at that time you can have those references but otherwise yeah we have learned how to do quantum neural network so what next you can do is explore different approaches like can start with quantum data or classical data or some more layers of quantum similarly you can also see what google and pennylane are doing google uses tensorflow like what ibm is using uh pytags i think pennylane they have both options so that will help you to learn more about track quantum neural network and i'll continue to learn and come out with more videos on quantum algorithms so i also request you if you like this video give a thumbs up and then also tell me what more areas i need to cover so that will be very helpful to me thanks for watching um see you soon bye bye