Transcript for:
MLPs and Non-Linear Problems

hi again so maybe you just watched my previous videos about uh coding a perceptron and now I want to ask the question why not just stop here so okay so we had this like very simple scenario right where we have a canvas and it has a whole bunch of points in that canvas or cartisian plane whatever we want to call it and we drew a line in between and we were trying to classify some points that are on one side of the line and some other points that are on another side of the line so that was a scenario where we had the single perceptron the sort of like processing unit we can call it the neuron or the processor and it received inputs it had like x0 and x1 were like the x and y coordinates of the point it also had this thing called a bias and then it generated an output each one of these inputs was connected to the processor with a weight you know weight one weight two or whatever wait wait wait and the processor creates a weighted sum of all the inputs multiplied by the weights that weighted sum is passed through an activation function to generate the output so why isn't this good enough now let's first think about what what's so what's the limit here so the idea is that what if I want any number of inputs to generate any number of outputs that's the essence of what I want to do in a lot of different machine learning applications so let's take a very classic classification uh algorithm which is to say okay well what if I have a handwritten digit like the number eight and I have all of the pixels of this digit and I want those to be the inputs to this perceptron and I want the output to tell me uh a set of probabilities as to which digit it is so the output should look something like you know there's a 0.1 chance it's a zero there's a 02 chance it's a one there's a 0.1 chance it's a two 0 three four five six seven oh and there's like a 0.99 chance it's an eight and a 0.05 chance it's a 10 and I don't think I got those to add up to one but you get the idea so the idea here is that we want to be able to have some type of processing unit that can take an arbitrary amount of inputs like maybe this is a 28x 28 pixel image so there's 784 grayscale values and instead those are coming into the processor which is weighted and summed and all this stuff and we get an output that has some arbitrary amounts of probabilities to help us guess eight that this is an eight this model why couldn't I just have a whole bunch more inputs and then a whole bunch more outputs but still have one single processing unit and the reason why I can't is uh stems from an article I don't know sorry a book that was published in 1969 by Marvin Minsky and Seymour paper paper called perceptrons you know AI uh luminaries here in the book perceptron Marvin Minsky and Seymour paper point out that a simple perceptron the thing that I built in the previous two videos can only solve linearly separable problems so what does that mean anyway and why should you care about that so let's think about this this over here is a linearly separable problem meaning I need to classify this stuff and if I were to visualize all that stuff I can draw a line in between this part of the d this stuff that's this class and this stuff that's with this class the stuff itself is separable by a line in three dimensions i could put a plane and that would be linearly separable because I can kind of divide the space in half and and and and understand it that way the problem is most interesting problems are not linearly separable you know there might be some data which clusters all here in the center that is of one class but anything outside of it is of another class and I can't draw one line to separate that stuff and you might be even thinking but that's you know still so much you could do so much with linearly separable stuff well here I'm going to show you right now a particular problem i'm looking for an eraser i'm walking around like a crazy person i'm going to show you a particular problem called X or I'm making the case for why we need to go a step further oh I just had an idea i'll come back to it later i'm making the case for why we need to go to a go go go a step further and make something called a multi-layered perceptron and I'm going to lay out that case for you right now so you might be familiar you might remember me from my videos on conditional statements and boolean expressions well in those videos I talked about operations like and and or which in computer programming syntax are often written you know double amperand or two pipes the idea being that if I were to make a truth table true true false false so what I'm doing now is I'm showing you a truth table i have two elements i'm saying what if I say A and B so if A is true whoa whoa whoa whoa this makes no sense what I've drawn here because I am losing my brain cells slowly over time with every passing day it's very sad true false true false true and true yields true if I am hungry and I am thirsty I shall go and have lunch right true and true yields true true and false is false false and true is false false and false is false right if I have a boolean expression A and B I need both of those things to be true in order for me to get true interestingly enough this is a linearly separable problem i can draw a line right here and true is on one side and false is on the other side this means this is a this is a linearly separable problem which means I could create a perceptron that perceptron is going to have two inputs there are going to be boolean values true or false true or false and I could train this perceptron to give me an output which if two trs come in I should get a true if one false and a true comes in I should get a false two falses come in I should get a false great or I could do the same thing what does ore change into if I'm going to do or let me erase this dotted line and or now ah all of these become true because with an or operation A or B I only need one of these to be true in order to get true but if both are false I get false and guess what still a linearly separable problem and is linearly separable or is linearly separable we could have a perceptron learn to do both of those things now hold on a second there is another boolean operator which you may you might not have heard of until this video which would be really kind of exciting for me it would make me very happy if somebody watching this has never heard of this before it is called XOR can you see what I'm writing here xor the X stands for exclusive exclusive it's exclusive or which means it's only true if one is true and one is false it's not true both are false this or that if both of those things are false I'm still false but if both are true it's also false so this is exclusive or let me erase all this exclusive or means if one is one is true and one is false it's true if one is true as one is false it's true if both are true it's false if both are false it's false this is exclusive or a very simple boolean operation however I I I triple dog dare with the cherry on top u to draw a single line through here to divide the falses and the truths i cannot I can draw if this is not a linearly separable problem this is the point of all this like rambling i could draw two lines one here and now I have all the trues in here and the falses outside of there this means a single perceptron the simplest cannot solve cannot solve the a simple operation like this so this is what Minsky in paper talked about in the book perceptrons well this is like an interesting idea conceptually it kind of seems very exciting but if it can't solve exor what are we supposed to do with this the answer to this is and you might have already thought of this yourself it's not too much i I I kind of missed a little piece of my diagram here right let's say this is a perceptron that knows how to solve and and this is a perceptron that knows how to solve or what if I took those same inputs and sent them into both and then I got the output here so this output would give me the result of and and this output would give me the result of or well what is XOR really xor is actually or but not and right so if I could solve something and is linearly separable not and is also linearly separable so what I want then is for both of these outputs actually to go into another perceptron that would then be and so if this perceptron can solve not and and this perceptron can solve or and those outputs can come into here then this would be the result of both or is true and not and is true which is actually this these are the only two things where or is true but not and not but not and and so the idea here is that more complex problems that are not linearly separable can be solved by linking multiple perceptrons together and this is the idea of a multi-layered perceptron we have multiple layers and this is still a very simple diagram you could think of this almost as like if you were designing a circuit right if you decide whether electricity should flow and this were like a these were switches you know how could you get a bunch of how could you have an LED turn on with exclusive ore you would actually wire the circuit basically in exactly this way um so this is the idea here so what I am would like to do in the next so at some point I would like to make a video where I actually just kind of build take that previous perceptron example and just take it a few steps farther to do exactly this but what I'm going to do actually in the next videos is diagram out this structure of a multi-layered perceptron how the inputs how the outputs work how the feed forward algorithm works where the inputs come in get multiplied by weights get summed together and generate an output and build a simple JavaScript library that has all the pieces of that neural network system in it um okay so I hope that this video kind of gives you a nice follow-up from the perceptron and a sense of why this is important and I'm not sure if I'm done yet i'm gonna go check the live chat and see if there any questions or important things that I missed and then this video will be over oh yeah I'm back so there was one question which is important like oh what I heard some somebody in the chat asked what about the hidden layer and so this is jumping ahead a little bit because I'm going to get to this in more detail in the next video there's a the way that I drew this diagram is pretty awkward let me try to fix this up for a second imagine there were two inputs and I actually drew those as if they were neurons and I know I'm out of the frame but I'm still here um and these inputs were connected to each of these perceptrons each was connected and each was weighted so this is actually what's now known as a three layer network there is the input layer this is the hidden layer and the reason why it's okay well actually let me go this is the output layer right that's obvious right this is the input those are the inputs the trus and the falses this is the output lay layer that should give us a a result are we still true or are we false um and then the hidden layer are the neurons that sit in between the inputs and the outputs and they're called hidden because as a kind of user of the system we don't necessarily see them a user of the system is feeding in data and looking at the output the hidden layer in a sense is where the magic happens the hidden layer is what allows one to get around this sort of linearly separable question so um the more hidden layers the more neurons the more amount of complexity in a way that the system the more weights the more parameters that need to be tweaked and we'll see that as I start to build the neural network library the way that I want that library to be set up I want to say I want to make a network with 10 inputs three outputs one hidden layer with 15 like hidden neurons something like that but there could be multiple hidden layers and eventually as I get further and further down this road if I keep going we'll see that there are all sorts of other styles of how the network can be configured and set up and whether the output feeds back into the input that's something called a recurrent network convolutional network is if some this kind of like um set of image processing operations almost happens early on before as one of the layers so there's a lot of stuff in the grand scheme of things to get to but this is the fundamental building blocks uh so okay so I'm in the next video I'm going to start building the library and to be to be honest I think what I need to do No no no no yeah I'm next video I'm going to set up the basic skeleton of the neural network library and look at all the pieces that we need and then I'm going to have to keep going and look at some matrix math that's going to be fun okay uh see you soon [Music]