Singular Value Decomposition (SVD) by Louis Sorano

hello i'm louis sorano and this video is about singular value decomposition in this video i will tell you how some rotations and some stretchings are going to help us with some amazing applications such as image compression let's begin but before we start some announcements this video has some code attached and it's in my github repo encouraged to take a look at it and play with the code it's for the application at the very end the github is luis disserano and the repo is called singular value the composition it's linked in the comments and also i'd like to announce i have a book called rocking machine learning and in the comments there's a link to find it and there's also a discount code [Music] okay so let's get started first let's get started with some transformations transformations in the plane or in space can be seen as a function that takes points to other points some special cases are stretching and compressing for example we can stretch an image horizontally or we can also compress it we can also stretch it vertically and we can also compress it and finally we can rotate this image by some angle now that we know how to stretch compress and rotate let me give you a puzzle can you start with this circle on the left and turn it into the ellipse on the right only by applying rotations and horizontal and vertical stretchings and compressions by the way we're going to start calling stretches and compressions scalings from now on feel free to pause the video for a minute and give it a try but notice something very important and is that you can only stretch and compress in the horizontal or in the vertical direction not at any other angle and here's the way i do it first i stretch in the horizontal direction then i compress in the vertical direction and finally i rotate counterclockwise to get the desired ellipse that wasn't so hard right now let's go for a harder puzzle in this one the circle is colored like this and you're supposed to turn it into the ellipse on the right but respecting the colors so let's try the same thing as before first we stretch horizontally then we compress vertically and finally we rotate and oh no we didn't get the same thing we got the shape right but not the coloring this problem is harder because we actually have to map every point of the circle to a point in the ellipse you can think of it as a color circle or maybe a chain with colored beads that need to be in the right location when you finish the process so what can we do now well don't panic just a little harder but we can do it what we have to do is before stretching we rotate it to be in the right orientation now we can scale we can stretch horizontally like before now we can compress vertically and then we rotate again to be in the right orientation and voila now we got it the moral of the story is that doing a rotation then a scaling and then another rotation is the way we can mimic every linear transformation but what do i mean by linear transformation a linear transformation is a matrix and that's the point of this video matrices every matrix defines a linear transformation on the plane as follows let's say this matrix a which is three zero four five it takes every point in the plane and moves it to a different point for clarity we have two planes the one on the left and the one on the right so if at the left we have a point with coordinates p comma q then the matrix a sends it to the point with coordinates three p plus zero q for the first row entries and 4p plus 5q for the second row entries let's do some examples the 0.10 which is this blue point in the left gets sent to the point 3 4 which is this blue point in the right and this is because 3 times 1 plus 0 times 0 is 3 and 4 times 1 plus 0 times five is four now we can do the other ones the point zero one the red one on the left gets sent to the point zero five the red one on the right next is the point minus one zero the green on the left you get sent to minus three minus four the green on the right and finally the point zero minus one the yellow on the left gets sent to zero minus five the yellow on the right and we can map a few more points like these four which get mapped to these four as you can see on the left we can draw a unit circle which gets mapped in the right to this ellipse over here and in general every two by two matrix will define a linear transformation that sends the unit circle in the left to some ellipse centered at the origin on the right and that ellipse defines the linear transformation now some matrices are special for example matrices would look like this cos theta sine theta minus sine theta cos theta for some angle theta they represent a rotation by an angle theta and diagonal matrix namely those that only have elements sigma one and sigma two in the main diagonal and zeros everywhere else represent scaling so the percent horizontal stretching of the plane by a factor of sigma 1 and vertical stretching of the plane by a factor of sigma 2. notice that in this case the factor is bigger than 1 so we're doing a stretching if it's less than 1 we're doing a compression and if it's 1 we're doing nothing also if it's a negative value then this is a stretching followed by a reflection over the axis but we're not going to worry much about this now now what does this have to do with matrices well here is the main moral of this video in the same way that at the beginning of the puzzle we use a rotation a horizontal stretching and a vertical stretching and then a rotation to get from the circle to any ellipse that we want point by point then any matrix any linear transformation can be expressed as a rotation followed by a horizontal and a vertical stretching followed by another rotation and in matrix form this is equivalent to matrix multiplication here is the main equation for today we can write this matrix as a product of these three a rotation matrix a scaling matrix and another rotation matrix we write our matrix as a our two rotation matrix as u and v and our scaling matrix as sigma notice that this b has a dagger standing for the adjoint or conjugate transpose this won't matter too much in our video today because most matrices will use our square but it's an important thing to notice and it is for notational and dimensional convenience that we use v dagger instead of v now how do we find this well there are mathematical ways and they are not too different from finding eigenvalues and eigenvectors but i will leave that mathematical reasoning for another video and for now i have two great ways to do it one is using wolfram alpha which is one of my favorite tools is free online on the web and the other one is using the numpy package in python it has a very simple function called svd so here is the decomposition that we just found and let's interpret it as rotations and scalings so if you like floating point numbers you can use this one if you like radicals and square roots you can use this one they're both the same thing so we're going to use the unit circle in the left for reference first let's look at the matrix in the right because when we multiply these matrices by a vector the first one we multiply by the vector is the one on the right so they read backwards this matrix on the right happens to be a rotation of negative pi over 4 which is negative 45 degrees and you can work out that the cosine and the sine of negative 45 degrees actually gives us this matrix from the rotation matrix we introduced before so in the left we're going to rotate our circle by negative 45 degrees like this which is the same as 45 degrees clockwise now let's look at the next matrix the next is a scaling matrix so this first entry scales everything horizontally by a factor of 3 square root 5 which is positive so it stretches it like this and the second factor scales everything vertically by a factor of root 5 which does this because root 5 is also bigger than 1 so it stretches and finally the matrix on the left is also a rotation matrix it's harder to find the angle but if you take the ratio between the left coordinates you get the arc tan of 3 which is 71 71.72 degrees or 1.249 radians and that's the rotation so we proceed to rotate the plane by that angle counterclockwise and voila that's exactly how the original transformation looked like we can summarize everything in this graph our original transformation a turned the unit circle into this ellipse and that can be summarized into one rotation v dagger then two scalings sigma one scaling is horizontal and the other one is vertical and finally another rotation u and that encompasses the equation in the middle a equals u sigma v dagger now let me show you how this is used in dimensionality reduction or simplifying our matrix here is a slightly different matrix with entries 1.8 1.2 4.4 and 4.6 notice how it still turns a unit circle on the left into an ellipse on the right but this time the ellipse on the right is really skinny it's almost like the points are forming a line that means that the matrix is close to being a degenerate transformation it's a transformation that takes the whole plane on the right and crunches it into a line on the right that means that the matrix is close to being singular and having determinant zero meaning that it has a smaller rank than its size now that's a lot of linear algebra terms but we'll get into some of them later for now let's look at this transformation through the length of the singular value decomposition to see where we can capture that skinniness of the matrix and in particular let's look at the scaling matrix sigma the horizontal scaling is large it's a factor of 6.71 and that's why the ellipse is long but the vertical scaling is only by a factor of 0.44 which is very small and this is the reason they live so skinny is because we're compressing it vertically to almost a line so the degeneracy of this transformation is really encoded in the 0.44 here so what if we said hey who are we kidding this 0.44 is really small let's turn into a zero and see what happens so we turn it into a zero which means that we're compressing this ellipse into a line and our transformation becomes a transformation that turns the entire plane the entire circle and the entire plane into a line and as i mentioned before this is called a degenerate transformation and so something very special happens if we turn the 0.44 below into a zero then our original matrix changes because our original matrix was the product of these three so the new matrices 1.5 1.5 4.5 and 4.5 notice how strange this matrix is its rows are the same and this is an example of a degenerate matrix now not all of them look like that but they do have relations between the rows and relations between the columns and so as we mentioned this matrix has rank one morally it just feels like it doesn't carry enough information right this matrix the the two columns are exactly the same so we can just encode it into one of the columns and then say well they're both the same so you don't need to record all these numbers right so let's recap our original almost degenerate matrix of rank two got sent into a matrix of rank one so notice how doing a small modification we managed to store a very close matrix of rank two as a matrix of rank one now that's not clear and that may be confusing let's look at a bigger example this is a rank 1 matrix now stare at this matrix for a moment and think of what's special about it is it's not random entries at all if you look at it for a while you notice that the rows are all very similar they all look like one two three four it's just that the first row is one times one two three four and the second row is negative one times one two three four and the third one is two times one two three four and the fourth one is ten times one two three four so they're all multiples of one two three four so i don't need 16 entries to encode this matrix i only need eight because i only need the green and the blue entries i can only say well these are all multiples of the green one and the multiples are on the blue entries so that's the same as saying that this matrix can be expressed as the product of these two vectors and that's called an outer product now what is special here well that the matrix on the left i need 16 numbers to encode it whereas the one on the right i only need eight numbers now there's not a huge difference between sixteen and eight but imagine if this was a one thousand by one thousand matrix on the left you have one million entries and on the right you only have two thousand because it's two vectors of one thousand so i've saved a lot of space and that's what we mean by compressing that's what we mean by dimensionality reduction i have expressed a matrix that has unnecessarily more entries that it needs to as a product of two smaller things now obviously i cannot do that with every matrix let's look at this matrix this matrix there's no way i can find two vectors that multiply give me that matrix because that's very random that's an actually rank 4 matrix the determinant is not 0. i need all the 16 entries to explain this matrix so i can't express it like this but what we do with singular value decomposition is that we approximate this matrix as a sum of rank 1 matrixes so instead i can say well this matrix is this rank one matrix plus this one plus so on and how many do i need well i actually need four but let's just say i need many of them and then i decide how far do i want to go if this is a sum that gets smaller and smaller and smaller if i stop somewhere i can say well okay my matrix i won't be able to express it as a sum of one or maybe two rank one matrices but it's close enough and that's what we do with singular value decompositions we say my matrix is close enough to a sum of a few rank 1 matrices so i'm going to call it that sum and let me show you how we do that expansion using singular value decomposition so let's say we have our matrix a and we express it as a product of a rotation matrix u a diagonal matrix sigma and another rotation matrix v dagger and i'm going to label the columns of u as u 1 u 2 u 3 u 4 the diagonal elements of sigma as sigma 1 sigma 2 sigma 3 sigma 4 and the rows of v dagger as v1 v2 v3 v4 and now we're going to carry out the matrix multiplication and it actually expands like this the first term is sigma 1 times the outer products of u1 to be one and the rest are similar terms that's actually what happens if you actually carry out the matrix multiplication you get that and each one of these is a rank one matrix now their sum is obviously a higher ranked matrix but now the key is on the red numbers and those numbers are the singular values so let's say that the first one is a large number and that the second one is a medium number and the third one is small and the fourth one is tiny so we've been able to order them from large too small and let's say that the third and the fourth are tiny enough that we don't need to worry about them so we can forget about them and we've been able to express our matrix a as a sum of two rank 1 matrices and that is a lot simpler than a rank 4 matrix so we've been able to find a matrix that is very similar to a but that has a smaller rank and that we can compress that we can encode using fewer entries and that's the idea obviously here we went from 16 entries to four plus four plus four plus four which is actually sixteen but imagine if this matrix was a thousand by a thousand we've been able to express it as four thousand entries which is much better now as a quick example let's work out the composition of the original matrix we were looking at you can do this with will from alpha again or with python and we've worked it out in the repo that i show you in this video it's this one over here and when we look at the components the largest singular value component which is 21.2 is this outer product which gives us this rank 1 matrix and notice that this matrix is very close to the original it's actually the closest rank 1 matrix that we can find to the original matrix now if we're not content we can keep going and find the component with the second singular value and if we add these two things we get this one now notice that now we're awfully close to our original matrix this is the closest rank two matrix that we can find to the original matrix and we can keep going let's take the third component so it still has a small eigenvalue and the sum of these three is this and notice that now we are awfully close to our original matrix right with a rank 3 matrix this is again the closest rank 3 matrix to the original matrix but if we're not fully content we can go all the way the last eigenvalue is tiny because it's just made up of tiny corrections that will get us the original matrix after four iterations we always get the original matrix because the original matrix has rank 4. now i've noticed that i've used the word rank a lot without really explaining it so let me give you an idea of what rank is and we're going to keep the example of a 4x4 matrix but you can imagine this in any size so rank one matrices are the most predictable ones they may have a lot of entries but the entries satisfy a lot of relations and we capture this with this equation we can write the matrix as a product of a tall skinny vector of with one and a short long vector of height one so there's a lot of structure to the coefficients and a lot of correlations between them now a matrix of rank two we cannot express like this but we can express it like this as a product of a matrix of with two and a matrix of height two so we still got a lot of structures still has a lot of correlation still predictable but less predictable than the matrix of wrong one it's just a little more complex a matrix of rank three is not expressible like this but we can express it like this a product of a matrix of with three and a matrices of height 3. and finally a matrix of rank 4 is the most complex of all in this case it's a full rank matrix because you cannot get any higher than rank 4 with this matrix normally the highest rank you can get is whatever is smaller between the number of rows and the number of columns in the matrix and so for this rank four matrix for this full rank matrix there's simply no linear relations that we can exploit like this so as you can see the higher the rank the higher the unpredictability among the entries and the lower the rank the higher the relations between them that we can exploit and so the goal of dimensionality reduction is to say okay we have a very big matrix and it may have high rank so it may be hard to express it as a product of two small matrices but we can find a matrix that is very close to it and that has a much lower rank so i can express it as a product to two small matrixes and thus store it much easier now notice that all our matrices were square what happens if we have rectangular matrix say this four by six matrix well that still works this method is fascinating because it still works for any shape of a matrix the way it works is u is still gonna be four by four sigma is not gonna be four by four but it's gonna look like a four by four with two rows of zeros so we padded with zeros to make it six by four and v dagger is gonna be a six by six matrix and this product actually checks out now notice that these vectors v5 and v6 they never get touched in this product because they will correspond to these two rows over here but that's okay it's still useful to have the entire matrix so we can use this method in matrices of any shape we want and that's super important now finally let me show you a really cool application to image compression we're going to compress this image of a heart and this is all done in this github repo so you can sing along with the code my github is luis giserano and the reef was called singular value the composition so first we're gonna add values to the pixels so i'm gonna add a one for the black ones and a zero for the white ones this is normally done the opposite way in graphics but i like this one because it's better for visuals so basically we're gonna compress this matrix of ones and zeros over here so we're gonna find the singular value the composition using our favorite tool whether it's world from alpha or numpy we're gonna get this decomposition and you may get a different one they they are the same up to certain transformations so this is okay if you get a slightly different one but we're gonna work with this one and depending on the tools you use they may give you matrices of slightly different sizes but that only means that you may have to pad some zeros here and there but in general the answers are consistent so we're going to compress this matrix and how do we get the components well we're going to be doing outer products so let's take the outer product of the first column of u the first row of v dagger i'm going to multiply it by the scalar which is the top left entry of sigma so that gives us this image over here notice that it looks like a fuzzy version of the heart and it's also a rank one matrix if you look it's a very simple matrix but it sort of captures the heart a little bit and its intensity is the singular value which is 4.74 now let's go to the next component the next component is the outer product of this green column on the left this blue column on the right and we multiply it by this factor of 1.41 so as you can see it's a slightly lighter matrix because it's multiplied by a smaller number it has less intensity and we keep going finding each component and since each singular value is smaller each matrix is lighter than the next one and some of them are zero when we start getting to zero singular values and this poor column over here this whole row over here never actually gets uh used in the product but that's fine so these are our components and now all we have to do is add some of them starting from the left so if we add the first one well we get this image if we add the first two then we get this image which is a little a little more sharp then if we add this three we get this image which is just a little sharper and if we add this one we get a perfect heart and we can stop there because the last ones have singular value zero and this all means is that this matrix over here has rank four because it was expressed as four terms over here so we can always decide how far we want to go if we don't have a lot of storage we can use the first or the second image and if we have more we can keep using more of these components but in general the singular value composition allows us to pick how sharp we want our image and the price is obviously that the sharper we want the image the more storage we're going to need and that is it for singular value the composition if you like this technique i highly recommend other videos that i have on similar dimensionality reduction topics there's one on matrix factorization and one principle component analysis the links are in the description and that's all folks thank you so much for your attention i'd like to remind you that i have a book called rocking machine learning where i explain the algorithms of supervised learning and some interesting machine learning techniques with simple real-life examples and lots of python code you can get it from manning.com and with the special discount code serrano yt you can get a 40 discount so if you like this video please subscribe for more content or hit like or share amongst your friends or feel free to add a comment to tell me what you thought i love reading your comments also if you have ideas on what you'd like to see next let me know in a comment you can also tweet at me my twitter handle is louis likes math and in this page serrano dot academy you can find all this information the videos the book and any news thank you very much that's all for now and see you in the next video

Transcript for:Singular Value Decomposition (SVD) by Louis Sorano

Transcript for:
Singular Value Decomposition (SVD) by Louis Sorano