Understanding Linear Algebra in AI

hello in this lecture we're going to delve into linear algebra for artificial intelligence artificial intelligence itself incorporates a vast amount of mathematical Concepts not only geometry but also linear algebra and statistics are among the various mathematical elements Heavenly involved among this linear algebra is considered one of the key mathematical theories in an artificial intelligence hence without a solid understanding of linear algebra you might find it challenging to read AI research papers or develop your own AI models often requiring considerable efforts and time therefore having a basic understanding of linear algebra enables an easier and more convenient approach to AI which is why we're conducting this linear algebra lecture for AI linear algebra itself is somewhat complex field of study and it has a long history as well so covering the entire vary of linear algebra can be quite complex and difficult so we'll review the linear algebra theories commonly used in Ai and see how they're applied in AI in this lecture I've listed three key words here vectors and matrixes their geometric meanings and how they're used in AI these three topics are not everything about linear algebra and they don't cover the entire field of linear algebra but they are essential elements we're understanding AI technically our last Topic application to AI is about how linear algebra is used in AI so the first two topics are more crucial when studying linear algebra linear algebra is a study of numbers and although it's a study of numbers it's primarily focused on the operations of vectors and matrices you might not be familiar with the terms vector and Matrix but you can think vectors and matrices as an arrangement or collection of numbers a vector is a collection of numbers arranged either vertically like this or horizontally like this representing a series of numbers in a row or a column matrices on the other hand are two dimensional structures for inv vectors we'll look into the functions of vectors and matrices and how operations involving them can be applied in linear algebra operations are crucial with addition and scaling being considered the most important as a side note we also have substraction operation naturally followed by addition scaling or scalar multiplication is similar to multiplication but it also includes different elements scaling is similar to simple multiplication using natural numbers we learn in elementary school and that's why we call the scaler multiplication you might feel unfamiliar with the terms but it's crucial to understanding that multiplication here differs from the general concepts of multiplication so keep in mind that we have the operations of addition and substraction as well as scalar multiplication it's important to note that scalar multiplication and multiplication for numbers are different and scalar multiplication is for vors and matrices now let's explore the types of data to which addition subtraction and scalar multiplication can be applied as well as how these operations can be applied to the those data here I've listed different types of data essentially most things in our world can be considered data for example images languages and numbers all these are data let's see why they are data in the case of images they represent visual elements on a plane but if you look at an image on screen more closely they're compos of very small units called pixels so when we talk about resolution for each pixel in the image they are given values for colors like RGB therefore an image essentially has dimensions of X and Y each representing the length and width of the image Plus for each pixel there are RGB values so X representing the length while representing the width and RGB values together make an image threedimensional data in the cas of languages my voice is recorded over time while I speak and it forms time series data what about words consider the word scaler scaler is composed of six letters s c a l a r arranged in a row and each letter requires memory thus where is our data as well similarly numbers like 3 or six dates April 9th and times like 12:15 are also forms of data therefore it's necessary to understand the forms this data can take there are various forms including the ones introduced earlier but let's discuss a few key ones let's talk about points first points are literally dots and they data defined by a single point let's think about this conceptually for a moment they're just a DOT existing in other words it either exists or it doesn't in the case of scalers we're talking about onedimensional data we all know about onedimensional data because we've learned about them in elementary school we learned about the number line where we place a zero at the center and then in the early years of elementary school we learned about moving forward to 1 2 3 and so on as we progress we learned about negative numbers realizing that the number line also extends backward so scalers belong to onedimensional data and they're the data that can be represented by some number of on this Lumber line vectors on the other hand start increasing Dimensions there are two dimensional data you can think of two-dimensional data as having two number lines we can draw a vertical number line crossing the horizontal one let's call this vertical one y it can also go one 2 or -1 -2 now how can you represent data here in this case data are points on the plane points have both X and Y information at the same time it holds two pieces of information so for example if I want to represent a certain data point and it's located over here then its X information is -2 and Y information is -1 meaning this data has a value of -2 and -1 in the case of one-dimensional scaling this data only had the X information of -2 but when it becomes a vector Dimensions increase like this Dimensions can grow further although we are talking about two Dimensions obviously Dimensions can increase even further let's think about a third axis coming out of the screen and we'll call the axis Z so there's also a value coming out of the screen suppose the value coming out of the screen is three so if it's --2 and -1 and it comes out three units from the screen now we have three information within the point -2 -1 and 3 earlier I set a VOR is a two-dimensional data but it doesn't mean that there are two xes and the number of axes can grow larger they can grow but still all of the information can be represented as a single line of scalers that's why it's a vector vectors can be thought of as list of scalers lined up one after another of course the mathematical definition might be slightly different but for now if every information is listed in a single line we'll call it a two-dimensional data so what is a matrix then you can think of a matrix as a two-dimensional Vector what I mean by a two-dimensional Vector is that in a matrix vectors are lined up as scalers are lined up in a vector a tensor is a two-dimensional version of a matrix and a higher dimensional tensor is a two-dimensional version of that tensor theoretically saying a vector is two-dimensional and a scaler is onedimensional well that's not exactly a mathematical definition but to summarize a scaler represents numbers like 1 3 or five and they're the numbers themselves a vector has one more dimension so it's collection of scalers when these vectors go up one more Dimension vectors St up and Stack Up and St up and eventually showing this form and we call it a matrix so you can think of dimensions of data as going up one at a time let's see how we represent this data for example these scalers are numbers and numbers like 1 3 or even Pi are scalers so these scalers belong to the real number domain or if it's something like 1 + 3 I it's a complex number so it belongs to the complex number domain so we can express the scaler as an element of a set like this scalers can belong to real numbers integers or even natural numbers if they belong to integers we represent with the capital Z and if they belong to natural numbers we represent with the capital N thus scalers belong to a certain number system let's think about vectors vectors can be thought of as a list of two scalers like this or a list of three scalers or it can even be a list of five scalers like this that's why I said it's not mathematically correct to say that factors are two-dimensional earlier and it's more accurate to describe factors as data with one Higher Dimension than a scaler so this one has two numbers right since there are two real numbers this Vector is a collection of two real numbers this one is a collection of three and this one is a collection of five real numbers so if a vector is an element of RN it means that inside the vector there are n real scalers or real numbers that's the correct way to describe dimensions of vectors so the basic representation of a vector looks like this let's talk about matrices as introduced earlier the dimension of a matrix is one higher than that of a vector I will write a new Matrix on the board it contains three vectors written horizontally inside the Matrix we have a couple of pieces of information to consider the Matrix is composed of three vectors and each Vector is a vector of four scalers and this is how we represent this Matrix the number of rows is three and the number of columns is four so we write the number of rows first and then write the number of columns and that's the size of the Matrix suppose we have a matrix a if the Matrix belongs to n byn matrices what does it mean it means that the vertical length of the Matrix is n and the horizontal l length is M defining the size of the Matrix now let's shift our perspective a bit if you have a data a from n by1 matrices what does it mean it means that the vertical length of the data is n and the horizontal length is one thus it's a vector what if our data is from 1 by m matrices the vertical length of the data is one and the horizontal length of the data is n so it's a vector as well just written horizontally so even if our data look like matrices there are still vectors be careful not to be confused even when they look like matrices lastly let's talk about tensors what are tensors they're like Matrix is teched up so it gets a bit more complicated let's consider an image say it has a resolution of 360x 480 and each pixel has RGB data of three so the dimension of the data is multiplied by three in the case sub tensors for higher dimensional data you just multiply the dimensions like this for higher dimensional tensors we just multiply additional dimensions that's how we Define dimensions of tensors so far we've defined dimensions of scalar factors and matrices and even tensors let's summarize a scaler is just a number in a certain number system a vector is just a list of scalers with one Higher Dimension the dimension of a matrix is higher than the dimension of a vector by one lastly tensors represent any data with higher dimensions and it can go even higher that's why the earlier explanation of Dimensions like scalar is onedimensional and vectors are two-dimensional are not accurate they were just to show the increase in dimensions scalers are just a single elements while vectors have length matrices have length and width and tensors have length width and even depth please remember the meanings of dimensions and how we can describe them let's explore vectors in more detail as mentioned earlier vectors are collections of scalers and they belong to r n again they are collections of scalers and they have n scalers inside them one important concept related to vectors is linear Independence but before delving into the concept of linear Independence let's discuss the functions of vectors we'll begin with vectors in R2 since they are easier to visualize what are vectors in R2 they're basically coordinates on a plane like 13 or 2 4 on the board we have a plane with two axes X and Y we can draw a13 and another 24 which will call Vector a and Vector B respectively coordinates provide a simple way to represent vectors on a two-dimensional plane now let's delve into vectors on a two-dimensional plane we can represent vectors with coordinates but they have directions as well in other words vectors can be represented as points but they can also be represented as arrows pointing to their coordinates for instance let's say we have a vector a pointing to a coordinate a b we can say Vector a is in R2 with coordinates a b let's draw another Vector in the plane called B with values C and D then B is also a vector in R2 with coordinates C and D so far we've defined two vectors A and B now let me explain what linear Independence is it will be helpful to understand if you think linear independence of as how distinguish Vector a and Vector B are let's take Vector a as an example we'll start by extending the arrow or vector a in its current direction adding straight lines both forwards and backwards similarly we can extend Vector B in its own directions both forward and backwards as a result Vector a and Vector B now have their own distinct spaces represented by lines extending from each Vector these lines are on a two-dimensional plane however these lines represent R1 which is a space of scalers therefore the lines extended from Vector a and Vector B correspond to scalar spaces or R1 in the two-dimensional plane we have two lines representing R1 now imagine there's a person walking along these lines suppose this person can only move parallel to Vector a and Vector B either in the same or opposite as these vectors can you reach any point on the plane just by moving along these two lines put differently can we find a path to any point on the plane by only following the direction of these two lines this scenario provides an easy way to understand linear Independence it's about figuring out if it can reach any given point on the plane by Solly moving in the directions of these two lines let's draw a straight line on the plane the line can have two opposite directions and we can Define Vector a along the line next we'll add another straight line on the plane and Define Vector B on it if we can find the path to any point on the plane by following either the same direction or the opposite direction as Vector a and Vector B then we say that the vectors A and B are linearly independent however if we can cover the entire plane using only these two vectors we consider them to be linearly dependent when do two vectors become linearly dependent let's draw a straight line on the plane and Define Vector a along it then we Define Vector b along the same line despite their opposite directions when you try to combine them in various ways like splitting each vector and merging them there are points on the plane that remain unreachable in such cases the two vectors are considered linearly dependent for instance let's take Vector a with coordinates 42 and Vector B with coordinates -21 regardless of how you combine them reaching certain points on the plane becomes impossible I'll delve into a more mathematical explanation of linear Independence suppose we have Vector a with coordinates ab and Vector B with coordinates CD if you can construct any point on the plane using the form Alpha a plus beta B then vectors A and B are considered linearly independent here Alpha and beta represent scalers a simpler but less formal definition is that two V vectors are considered linearly independent when we can reach any point on the plane by splitting and merging them Let's test if we can reach this point over here using vectors A and B first we move along the directions of vector a by a certain distance then we can reach our destination by drawing a straight line starting from this point parallel to Vector B vector addition is simply drawing a line in the direction of the first vector by a certain amount and then drawing a connecting line in the direction of the other Vector suppose we have Vector a vector B and Vector C which is the result of adding Vector a and Vector B how do we Define Vector C once more we follow Vector a until it reaches its end then starting from the end point of vector a we draw Vector B Vector C is the resulting Vector connecting the origin to the end point of Vector B it's important to recall that if we can spend the entire plane through vector addition between two vectors then those vectors are considered linearly independent now let's define a basis consider two vectors A and B on a plane since they are linearly independent they can construct the entire plane through vector addition for instance if we pick any arbitrary point on the plane we can draw a straight line in the direction of vector B first first and then draw the second line in the direction of vector a let's consider basis for RN as we need at least two vectors to construct our2 we need at least n vectors to construct our n space if we can find n vectors that construct and dimensional space we refer to those M vectors as the bases of r n indicating the they span RN this is the essence of the definitions of bases and spin imagine we have two arbitrary linearly dependent vectors can they form the basis of R2 absolutely by adjusting the length of these vectors we can create infinitely many bases for R2 however in the standard basis for R2 the vectors have a unique length of one to summarize in the basis for RN there are n vectors and together they spend the entire RN as mentioned earlier matrices can be seen as collections of vectors similarly the basis of RN consists of a collection of vectors that span RN let's consider two vectors A with coordinates 21 and B with coordinates 1 2 since vectors A and B can span R2 they form the basis for R2 but how their vector addition represented as Alpha a plus beta B using this form the new x value becomes 2 Alpha + beta and the new y value becomes Alpha + 2 Beta any coordinates in our two space can be expressed using this form let's arrange vector a and Vector B in The Matrix writing each Vector vertically the First Column represents Vector a and the second column represents Vector B forming a 2X two Matrix with values 2 and one in the First Column and one and two in the second column within this Matrix we have two vectors from the bases of R2 although their length may be not one as it's not the standard basis for R2 since we have two vectors in The Matrix and these vectors SP are two we say that the rank of the Matrix is two we can construct any vectors in R2 with those vectors in The Matrix now consider 3x3 Matrix this Matrix comprises three vectors labeled as a B and C can we spend the threedimensional space with these vectors through vector addition let's introduce the third axis Z by scaling each Vector with alpha beta and gamma they can represent any point in R3 thus the rank of this Matrix is three so is it always true that the rank of the Matrix is always the same as the length of its width not necessarily let's draw a straight line on a two-dimensional plane we'll place two points on the same line the resulting Matrix will be a 2X two Matrix with values 2 and one in the First Column and four and two in the second column what's the rank of this Matrix the answer is one let me explain why we cannot reach a point here with a linear combination of these two vectors as they have the same direction therefore there are points on the plane that cannot be reached by these two vectors resulting in a rank of one for the Matrix let's practice with another example is it possible for a matrix to have rank two in a three-dimensional space if a matrix has a rank of two in a three-dimensional space it means that the vectors in The Matrix can only span a two-dimensional plane within the three-dimensional space for instance if we draw an xit Z these vectors may only span a two-dimensional plane like this the rank of the Matrix in a three-dimensional space can be one as well if the three vectors can only span a line here's an example of a matrix with a rank of one with values 1 0 0 2 0 0 and 3 0 0 no matter how you scale the values of Y and Z for these vectors they cannot be adjusted you can only adjust the X values in this Nal and the spin of these vectors will be along the x-axis that's why the rank of The Matrix is one let's illustrate with an example of a matrix with rank two in a three-dimensional space despite having three vectors within the Matrix leaving the last Vector if you scale the first vector by two you obtain the second Vector consequently the rank of the Matrix decreases by one let's summarize when does an M byn Matrix have a rank of n on M byn Matrix has rank of B only when all of its n vectors are linearly independent however if the vectors are linearly dependent the rank of the Matrix will be less than n let's explore the mathematical conditions for linear Independence if you have factors A and B that are linearly independent each Vector should extend along a different line in other words the lines extended from each Vector should intersect at a point rather than overlap conversely if we can construct Vector a by scaling Vector B then these two vectors A and B are linearly dependent mathematically a linear combination of two vectors should not always result in zero when they're linearly in dependent because this would imply they lie in the same line there should not exist a combination of Alpha and beta where the linear combination of the vectors equals zero that's the mathematical conditions for linear Independence let's suppose we have a matrix or a collection of vectors denoted as a there are three vectors inside the Matrix and the vectors inside the Matrix are 1 12 0 0 0 1 and - 1 1 and Zer Matrix a is a 3X3 Matrix now let's determine if it's Vector in The Matrix is linearly independent and capable of spanning three-dimensional space we'll label each Vector as a b c respectively to begin we'll examine vectors A and B for linear Independence will employ the form Alpha a plus beta B the resulting linear combination of vectors A and B will yield Alpha 2 Alpha and beta is there any combination of Alpha and beta that results in zero yes but only when both Alpha and beta are zero however if vectors A and B are linearly dependent there is another combination of Alpha and beta besides 0 and zero that would yield zero so we've established the vectors A and B are linearly independent now what about vectors A and C when we combine vectors A and C we get Alpha minus beta 2 Alpha + beta and zero is there a way to make this combination equal to Zer well setting Alpha minus beta and 2 Alpha + beta equal to Z gives us 3 Alpha equals z when we add the equations together this means that both Alpha and beta must be zero consequently 0 and Z is the only combination of Alpha and beta that results in a linear combination of zero confirming the linear independence of vectors A and C similarly vectors B and C also Pro to be linearly independent hence every pair of vectors in The Matrix is linearly independent resulting in The Matrix achieving a full rank of three let's consider a new Matrix a consisting of three vectors let's denote the first two vectors As A and B respectively when we combine Alpha times Vector a with beta time Vector B we obtain Alpha minus beta 2 Alpha minus 2 Beta and zero can we get zero for each value yes when both Alpha and beta take on the same value whether it's three four or any other value the resulting combination equals zero put simply whenever Alpha equals beta the resultant value will be zero consequently vectors A and B are linearly dependent and the rank of the Matrix decreases by one making it two when the rank of a matrix decreases it implies the existence of a vector that when multiply by The Matrix is a result of zero in simpler terms there's a vector that when multiply by The Matrix a produces a result of zero but before delving deeper into this Vector we need to know how Matrix and Vector multiplication operate so let's take a look at the basics of Matrix and Vector multiplication suppose we have a vector in r n which is essentially an N by1 Matrix we'll introduce a new n byn Matrix denoted by a and then multiply Matrix a by Vector X can you guess the dimensions of the resulting Matrix the outcome of multiplying the matrix by the vector will be another n by1 Matrix which is also a vector when we multiply two matrices A and B where a is an a by B Matrix and B is a b by C Matrix the resulting Matrix c will be an ax C Matrix so the bees between a and c will vanish remember we can only multiply two matrices when the inner Dimensions match meaning B and C are equal since the inner dimensions of Matrix a and Vector X are both n the resulting Matrix is on N by one Matrix so when the inner Dimension are the same multiply a matrix by a vector ISS a vector now let's Del deeper into how this operation functions and what implications the result holds let's explore Matrix Vector multiplication within a two-dimensional plane consider the equation ax = Z let Matrix a be defined by the vectors 1 2 and 24 we denote each VOR As A and B let's plot factor a and Vector B on a two-dimensional plane so this is Vector a and this is Vector B when we plot these vectors on the same plane we observe that we can obtain Vector B by scaling Vector a by two therefore one method to obtain Zero from the two vectors is to scale Vector a by two and then subtract Vector B consequently we can satisfy the equation by multiplying Vector -2 and 1 making the right hand side of the equation zero in other words when Alpha * Vector a plus beta time Vector Bal Z we can substitute -2 for Alpha and 1 for beta let's plot the new Vector -2 and 1 on the plane and denote it as X Vector a and Vector B span a straight line rather than the entire plane now let's consider another straight line perpendicular to this line since the coordinates of vector a are 1 and two this line will pass through the coordinates -2 and 1 which corresponds to Vector X thus vectors A and B can only spin a straight line inside of the entire plane while our new vector X allows them to spend the entire plane once more vectors A and B only cover a line leaving some points on the plane unobtainable however Vector X enables them to spend the entire plane the original straight line span by vectors A and B is referred to as the range space while the remaining space on the plane that was not covered by the range space is called the null space in summary within our two-dimensional space there is a region we can reach by linearly combining the original two vectors we'll refer to this uncovered area as the no space when combined the no space and the range space account for the entire plane let's revisit Matrix a with values 1 2 and 2 4 the rank of the Matrix is one now let's include Vector x with values -2 and 1 into the Matrix which is perpendicular to Vector a as a result Matrix a becomes a 2x3 Matrix and its rank increases to two in other words the three vectors within the Matrix can spend the entire plane since vectors 1 2 and 2 4 are linearly dependent removing one of them from the Matrix still allows vectors within the Matrix to spend the entire plane therefore the remaining two vectors 1 2 and -21 serve as the basis vectors for a two-dimensional space capable of spanning the entire plane remember if there's a nonzero Vector X that's satisfies axal Z the space defined by Vector X is known as the null space Additionally the presence of linearly dependent vectors in The Matrix leads to a reduction in The Matrix rank therefore we can view a matrix as a linear operator in a matrix Vector multiplication of the Form axal B where a is a matrix and x and BR vectors Matrix a acts as a linear operator that transforms Vector X into Vector B again Matrix a acts as a linear operator that transforms Vector X into Vector B imagine there's an unknown box that takes Vector a as an input and produces Vector b as output inside this box Matrix a is multiplied by Vector a thus the Matrix itself functions as a linear operator now let's delve into the concept of a linear operator and its operation on a two-dimensional plane for instance consider a matrix composed of vectors A and B where a is 1 2 and B is 2 1 the rank of the Matrix is two and it's a full rank Matrix let's multiply this matrix by a vector X1 and y1 because we're scaling Vector a by X1 and Vector B by y1 the outcome of the Matrix Vector multiplication is X1 + 2 y1 1 2 X1 + y1 if the vector X1 y1 is 1 and one the outcome of the equation ISS the vector 3 and three in simpler terms when the initial Vector 1 and one is the input of the Matrix vector multip application it ilds a new Vector 3 and three as output what does it signify that the original Vector 1 and one has shifted to its new coordinates 3 and three let's examine this more closely on our two-dimensional plane this Vector is positioned at coordinates 3 and three in other words the point's coordinates are 3 and three on a two-dimensional plane where the axis are labeled as the x-axis and Y AIS now let's imagine that instead of the x-axis and y- AIS the same plane is depicted using Vector a and Vector b as its new X's once more we are visualizing the plane from a perspective that utilizes Vector a and Vector B and than U axis to reach the points at coordinates three and three on the plane we start by moving along Vector a and from the end point of vector a we continue by moving along Vector B therefore the coordinates of the points are three and three on the plane when using the original x-axis and Y AIS however when using Vector a and Vector b as X's the new coordinates of points become one and one let me elaborate on the significance of Shifting coordinates from 3 and 3 to 1 and one consider if the two-dimensional space initially used vectors A and B as its x's in this scenario the coordinates of point would be one and one the change in coordinates imply that when using the x-axis and Y AIS the coordinates are interpreted as three and three therefore multiplying a matrix by a vector signifies adjusting or rotating the original axis towards the vectors in The Matrix let's consider a matrix a with values 1 2 2 and 1 we'll draw lines passing through the coordinates in The Matrix and the origin which will serve as the new AIS of the plane denoting each VOR in The Matrix As A and B will refer to the corresponding lines as a and b respectively on line a we can designate one and similarly on line B we can mark one as well these ones correspond to the vectors in The Matrix we can also label two and negative 1 on both lines A and B with this setup the coordinate space has been transformed allowing us to determine the new coordinates of points in the plane Matrix a has Twisted the coordinate space and we are able to identify new units or WS within this modified coordinate system within this new coordinate space we can locate a point at coordinates one and one which is positioned here conversely the coordinates of the same point become three and three when viewed on the plane using the original axis namely the x-axis and Y AIS this illustrates one aspect of comprehending Matrix Vector multiplication thus in Matrix Vector multiplication a vector multiplied by a matrix represents the altered coordinates of a point within a transform coordinate system whereas the resulting Alpha Vector signifies the coordinates of the same points within the original coordinate space let's recap when we apply Matrix Vector multiplication to a vector on a two dimensional plane the vector under goes a shift resulting in new coordinates as the coordinate space adjusts or twists with this interpretation of Matrix Vector multiplication we can Now understand matrices from various perspectives for example let's take a rotation Matrix suppose we have Vector a positioned at coordinates to Zer if a rotate Vector a not 90° counterclockwise we obtain a new Vector at coordinates 02 which will denote as Vector B let's consider this rotation of a vector as a form of Matrix Vector multiplication We performed Matrix Vector multiplication on an unknown Matrix with the vector 2 0 resulting in a shift of the vector's coordinates to 02 any guesses on what this Matrix might be as mentioned earlier 20 represents the coordinates of a vector in an adjusted coordinate space while 02 represents the original coordinates of the same Vector in the initial coordinate space how could we adjust the axis of the plane so that the coordinates 02 correspond to 2 0 one approach Cy to designate the vertical axis as the xaxis and the horizontal axis as the Y AIS consequently the x value of the vector would be two and the Y value would be zero thus in this new coordinate system the vector's coordinates would be two 0 whereas in the original coordinate system they will be zero too so the vectors in the unknown Matrix serve as new X's that are rotated 90° counterclockwise how can we express this 90° rotation in a matrix let's consider the values 0 1 - 1 and 0 as potential entries for the Matrix let's denote these vectors as a and b the coordinates of vector a are 01 while the coordinates of vector B are -1 and 0 that's what vectors A and B look like given that vectors within a matrix are treated as the new axis of a two-dimensional plane The Matrix comprised of these two vectors A and B rotates the coordinate space or plane by 90° counterclockwise this Matrix is commonly referred to as a rotation Matrix a rotation Matrix is a good example that directly demonstrates how the coordinate space under goes a transformation from now on we're going to explore the multiplication between two matrices instead of a matrix and a vector let's see what happens when we multiply two matrices let's consider multiplying two matrices a and b along with a vector X once again A and B are matrices and X is a vector we'll Begin by multiplying Matrix B by Vector X resulting in a new Vector then we'll multiply Matrix a by the product of Matrix B and Vector X ultimately yielding another Vector to summarize we initially have Vector X which we then multipli by matrix B next we multiplied Matrix a by the product of Matrix B and Vector X let's approach this process conceptually when we multiply Matrix B by Vector x with tilt the initial coordinate space subsequently by multiplying the resulting product by matrix a we tilt the already tilted coordinate space once more now let's explore matrix multiplication with ba instead of a first we multiply Matrix a by Vector X then we multiply Matrix B by the resulting products of Matrix a and Vector X so does the order of matrices in matrix multiplication matter yes because AB initially transformed the coordinate space with Matrix B first whereas ba transforms the coordinate space with Matrix a first hence the order of matrices in matrix multiplication is significant for instance rotating the coordinate space before flipping it could ELD different coordinates compared to flipping it first and then rotating it once more Al like General multiplication involving natural numbers the order of matrices in Matrix multip ication can yield different results therefore I want to emphasize that AB is not equal to ba and ab minus ba does not always equal zero as explained the reason why the product of matrices varies depending on the order of matrices can be understood by the examining the order of Transformations applied to the coordinate space if you have prior experience with linear algebra you might recognize some of those terms on the board let a be a 2x2 matrix with values a b c and d the transpose of Matrix a is obtained by interchanging its rows and columns initially we wrote vectors in The Matrix vertically however in a transpose Matrix vectors are written horizontally with each row of the Matrix becoming a Factor if Matrix a is equal to its transpose we refer to the Matrix as a symmetric matrix it's important to note that the symmetric matrices always have an equal number of rows and columns on the other hand if the transpose of Matrix a is equal to its negative we refer to the Matrix as a skew symmetric Matrix a diagonal matrix is a square Matrix where o elements outside the main diagonal are zero the main diagonal of a matrix consists of the elements where the row index equals the column index a triangular Matrix is a special type of square Matrix where all the entries above or below the main diagonal are zero depending on whether the non-zero entries are above or below domain diagonal we classify triangular matrices into two types upper triangular matrices and lower triangular matrices so far we have learned that the multiplication between a matrix and a vector results in a transformation of a coordinate space and we'll apply this concept to understand the fundamental principles of neuron networks let's delve into the basic structure of neuron networks initially we have data which serves as an input and this data pass through a hidden layer before Ely being returned as output let's denote the data as X the weights and biased in the hidden layer as W and B and finally the output as y heads then the output y hats is determined by the activation function Sigma applied to WX + B let's explore how multiplication between a matrix and a vector operates within this process let's say the data X is a vector in RN the weights and bias in the hidden layer also have their own Dimensions consider that the weights take the form of a matrix suppose the output y hats is a vector in RM since the dimensions of Y hats and X are M and N respectively the waist W must be a m byn Matrix and the body must be a vector in our M to clarify W is on M byn Matrix MB is a vector in our M the first letter here is M suppose the data X is an image and the output y hats is a list of labels of the image in this case n is much larger than m let's revisit the equation from the previous slide y hats equals Sigma WX plus b the length of vector X is very long since its Dimension is n by 1 on the other hand the vector y hat is shorter compared to x with its Dimension being M by one meanwhile W is an M byn Matrix and B is also a vector in RM specifically M by one vector in the equation we begin with Vector X in r n initially we multiply an M byn Matrix W by Vector X do you remember what it means to multiply a matrix by a vector The Matrix Vector multiplication transforms the original coordinate space up to now we are only familiar with transformations of a coordinate Space by Square matrices such as a 2X 2 Matrix however in our equation the number of rows and Columns of Weights W are different the dimension of the input is n by1 while the dimension of the output is M by one indicating that the information in the input has been compressed to lower dimensional data in other words the input X exists in a high dimensional space while the output exist in a lower dimensional space thus the weights Matrix W not only distorts the original cord inate space but also reduces its dimensionality stated differently you can also decrease the dimensionality of space through a matrix Vector multiplication when the number of columns of the Matrix is larger than the number of rows now let's discuss why we need an activation function in neural networks transfer in a coordinate space such as rotating and resizing are all linear Transformations however sometimes we need to introduce curvature where remove linearity from the axis in the space to extract meaningful information from the data that's why we apply an activation function to the product of weights and data some of the most popular activation functions include sigmoid Ru or hyperbolic tangent function Fons these functions introduce nonlinearity to the space so the fundamental concept of neuter networks is to extract essential Dimensions from the data by distorting and reducing the dimensions of a coordinate space the objective of training neur networks is to find weights W and bias B that enable us to transform a coordinate space and reduce Dimensions until we obtain crucial information from the data now let's delve into the addition formulas for S and cosine by using the transformation of a coordinate space some of you may already be familiar with this addition formulas which facilitate the calculation of s and cosine for sums or differences of angles let's explore how these formulas can be explained within a coordinate space H to begin let's define two matrices the first Matrix rotates the coordinate Space by Alpha degrees and the second Matrix rotates the space by Beta de now let's see what these rotation matrices look like rotation Matrix are comprising cosine Theta negative sin Theta sin Theta and cosine Theta rotates according to space by Theta degre let me explain how this Matrix rotates the space when we multiply the matrix by a vector it gives us the new Vector suppose the input Vector is 1 Z after the multiplication the output Vector will have values cosine Theta and sin Theta if we plot this new Vector in a two-dimensional plane the angle between the two vectors is Theta another way to think about this is that the coordinates of the same Vector in the transform space becomes one zero while in the original space coordinates are cosine Theta and sin Theta because the coordinate space has been rotated so in this transformed coordinate space the vector's coordinates are 1 Z let's revisit the addition formulas for S and cosine earlier we had two rotation matrices we'll call the first Matrix which rotates Space by Alpha deg a and the second Matrix which rotates the space by Beta degre B now imagine we have another rotation Matrix C which rotates the space by the angle Alpha plus beta Matrix C is the result of multiplying matrices A and B unlike most cases where the order of matrices in multiplication matters with rotation matrices it doesn't so the result of matrix multiplication ab and ba are the same let's manually multiply the two matrices starting with Matrix B the entries on Matrix B are cosine beta negative sin beta sin beta and cosine beta the entries of Matrix a are cosine Alpha negative sin Alpha sin Alpha and cosine Alpha the first entry in the product of the two Matrix say is cosine beta * cosine Alpha minus sin Alpha * sin beta the next entry is sin Alpha * cosine beta plus cosine Alpha * sin beta using the addition formulas for S and cosine these Expressions simplify to cosine Alpha + beta sin Alpha + beta sin Alpha plus beta and lastly cosine Alpha plus beta the First Column of the resulting Matrix aligns with this addition formulas this example demonstrates how the multiplication of two rotation matrices can be understood through the lens of the S and cosine addition formulas determinant is also a term frequently used in linear algebra let's explore the determinant of Matrix a suppose we have a matrix a with values a b c and d we can calculate the determinant of Matrix a by finding the value of a * D minus B * C however we're going to discuss the geometric interpretation of the determinant let's plot the two vectors represented by matrix a on a two-dimensional plane suppose a and C are 2 comma 1 and B and D are 1 comma 2 these two vectors on the plane represents each Vector we'll use the vectors 2 one and one two as the new axis of the plane under this new coordinate system defined by the two vectors we can plot a points at coordinates 1A 1 which was originally located at 3 comma 3 in the original coordinate space now let's calculate the area inside the parallelogram formed by these vectors one straightforward method to calculate the area is by dividing the parallelogram into smaller pieces like this considering that the area of the square containing the parallelogram is 9 and the area of the space not covered by the parallelogram is six we find find that the area of the parallelogram is three as the coordinates in the transformed coordinate space are 1 one and in the original coordinate space r 33 this illustrates what the Matrix Vector multiplication looks like using the formula above we can calculate the determinants of Matrix a which is 4 - 1 = 3 Let's interpret the value of the determinant of the Matrix conceptually in the transformed ordinate space each Vector in The Matrix defines ones in the transformed coordinate space the length of both vectors and the area of the parallelogram are one however in the original coordinate space the area of the parallelogram is three as we measured so the determinant of Matrix a signifies that the area of the parallelogram increases by three in the original coordinate space from 1 to three now let's imagine we increase the length of one side of the parallelogram to two the coordinates of the vector change to 21 in the transformed space and 54 in the original space let's observe how this change affects the area of the parallelogram we'll scale the vector to 1 by 2 resulting in coordinates of 42 connecting Vector one two to this scaled vector forms a new parallelogram to measure its area we split the square containing the parallelogram the total area of the square is 20 with the uncovered area of 14 living the parallelograms area is six in the transformed space the parallelogram area is two while in the original space it increases to six once again we observe that the area of the parallelogram increases by three which corresponds to the determinant of the Matrix this method provides a straightforward explanation of the determinant of a matrix sometimes the value of a matrix determinant is less than zero consider Matrix a with entries 1 2 2 and one calculating the determinant using the formula gives us -3 this Matrix contains the same vectors as the one in the previous slide but their order has changed this change in order is why there's a negative sign in front of the three let's delve into what the negative sign in front of the three signifies in a typical two-dimensional plane we usually label the horizontal axis as X and the vertical axis as Y imagine we swap these labels this switch results in a negative determinant here's why to switch the labels of the axis we need a new Matrix when a vector's coordinates on the plane with switched axis are one0 its coordinates in the original space become 01 let's check the transformation of the coordinates in Matrix Vector multiplication to achieve this transformation through Matrix Vector multiplication we use the Matrix 01 1 0 this Matrix has a determinant of -1 therefore switching the order or labels of the axis results in a negative determinant in summary a negative determinance indicates that the order of the axis has been reversed from that of the original space Cas let's practice calculating the determinant of a matrix with an example as previously explained the determinants of a 2x2 matrix can be calculated using the formula a minus BC for a 3X3 Matrix we often use a method known as Kramer's rule which you might have seen in linear algebra textbooks CR 's rule is sometimes referenced in AI research paper as well so you might be interested in learning how to calculate the determinants of a 3X3 Matrix to calculate the determinants of a 3X3 Matrix we split it as follows first multiply the elements a at the first row and First Column by the determinant of the 2x two Matrix that remains after removing the row and column containing a next multiply the element B at the first row and second column by the determinant of the 2x2 matrix that remains after removing the row and column containing B lastly multiply the element C at the first row and third column by the determinant of the 2x two Matrix that remains after removing the row and column containing C therefore the determinant of this 3x3 Matrix is computed as a * the determinant of the 2x2 matrix formed by the second and third rows of the second and third columns plus c times the determinant of the 2x two Matrix formed by the second and third rows of the first and second columns minus B * the determinant of the 2x2 matrix formed by the second and third rows of the first and third colums the reason for the negative sign in front of B while the signs in front of of A and C are positive relates to the rules of the determinant calculation method which alternate sign across the Matrix specifically the 2x two Matrix associated with B in the expansion is formed by the elements fi and DG the determinant for the Matrix fi DG is calculated as B * FG minus di we put a negative sign in front of B because the order of Di and GF is switched to calculate the determinants of a 3X3 Matrix we split the Matrix into three parts and sum the results from each the process is similar when calculating the determinants of a 4x4 Matrix we divide the Matrix into four components the first component consists of a and the remaining 3x3 Matrix we then multiply a by the determinants of the 3x by3 Matrix the second component contains B and the remaining 3x3 Matrix the remaining 3x3 Matrix can be constructed like this then we multiply B by the determinants of the 3X3 Matrix this process is repeated for each column and the results are summed up in most cases you don't need to perform this calculation by hand so it's more important to understand the process rather than memorize it for practical applications the determinants of a matrix can be positive negative or even zero let's explore what a zero determinant means consider a matrix a with elements a b c and d where the determinant calculated as a minus B C = 0 understanding the meaning of a deter minent of zero is not intuitive previously we discussed a matrix a with values 1 2 2 and one we can set the determinant of Matrix a to Zero by changing a value with an a let's change the bottom right value from 1 to four now the determinants of this Matrix become zero next let's plot the vectors from Matrix a on a two-dimensional plane notice that both vectors 1 2 and 24 lie in the same line this means the vectors can spend the entire space indicating they're linearly dependent consequently the rank of the Matrix is one not two this scenario also implies the existence of a vector X such that ax equal Z all of these observations are interrelated and stem from the fact that the determinant is zero now let's interpret the geometric meaning of a determinant of zero suppose we multiply Matrix a by a vector X and the product is Vector Y in this transformation any Vector we multiply ends up on the same line as y even vectors initially located outside this line ultimately align with it now consider an arbitrary shape or region denoted as s in the plane initially on the original coordinate space we can measure the area covered by S however every point with an S eventually maps onto the line resulting in an area of zero once more the determinant of a matrix represent changes in area when the determinant is zero it signifies that every point on the plane transforms on two points along the line because every point on the plane transforms onto points along the line and there are points unreachable by the vectors in The Matrix the determinant of the Matrix become zero when the determinant of a matrix is zero it signifies that presents of linear dependent vectors within the Matrix this reduction in determinant also implies a decrease in the rank of the Matrix alongside the existence of a no space essentially these Concepts hold the same idea now let's explore the concept of the inverse Matrix the inverse of Matrix a is denoted by a with a negative-1 in superscript previously we learned that the order of matrices in matrix multiplication matters and the final product can change depending on the order suppose we multiplied matrixes A and B and the product is an identity Matrix an identity Matrix is a square Matrix in which all the elements of the main diagonal are ones while all the other elements are zeros it's usually denoted by the symbol I each Vector in an identity Matrix represents the axis of the original coordinate space in other words when multiplying the identity matrix by a vector The Matrix does not alter any coordinates or the space let's explore what it means when the product of two matrices A and B results in an identity Matrix as I explained earlier the products ABX and ba ax can differ because in the multiplication ABX we first alter the coordinates with Matrix B then with Matrix a by multiplying B first and a next we end up at the same coordinates or coordinates space in other words we transform the coordinates twice and the final result remains unchanged if we multiply Matrix a by a certain Matrix and obtain an identity Matrix as a product we call the matrix multipli by matrix a the inverse Matrix of a this means that an inverse Matrix of a restores the coordinate space altered by matrix a we begin with a two-dimensional space defined by the axis X and Y suppose a is a rotation Matrix that rotates the space by 90° counterclockwise let's visualize the space after it's transformed by matrix a the inverse Matrix of a restores the space back to its original state where X represent the horizontal axis and y represents the vertical axis restoring the original space requires a 90° clockwise rotation therefore the inverse Matrix of Matrix a is a rotation Matrix that rotates the space 90° clockwise for every rotation Matrix there exists an inverse Matrix that rotates in the opposite direction by the same angle similarly matrixes that strength the space also have an inverse Matrix as the space can be restored to its or state by scaling the sides of its coordinates however there are matrices that do not have an inverse let's consider the Matrix from the previous slide that Maps every point on the plane onto a single line the vectors within the Matrix are linearly dependent after the transformation by The Matrix vectors on the plane will lose their information and be compressed to a single line as two- dimensional data is reduced to one dimension in other words this means that different vectors on the plane can be mapped to the same points on the line for instance two distinct coordinates a b and CD can be mapped to the same coordinates E and F this transformation can be expressed as a matrix Vector multiplication ax Matrix a shifts every point on the plane to a line however to restore the transparent space back to the original space we lose track of where the original coordinates EF came from this loss of information is referred to as information loss because of this information loss we cannot restore the space back to its original space consequently in this case The inversal Matrix a does not exist now let's explore the condition for the existence of an inverse Matrix this condition is that the determinant of the Matrix must not be zero this implies that every Vector in The Matrix is linearly independent making The Matrix a full rank Matrix earlier we learned about the relationships between linear Independence the rank of a matrix and the existence of the no space now we understand the connections between the existence of an inverse Matrix and this condition as well let's return to Vector operations previously we learned about operations such as addition and scaling in addition to this the dot product or inner product is another important operation the inner product is an operation between two vectors that returns a scaler suppose we have two vectors A and B let's plot them on the plane we can calculate the similarity of the two vectors using the dot product understanding the dot product might not be intuitive so I will explain it in more detail think of it this way the dot product between two vectors indicates how much they can cooperate or how much they cannot cooperate let's replot vectors A and B so that they are perpendicular to each other with Vector a on the x axis and Vector B on the Y AIS this shows that Vector A and B have different directions imagine two people one on each of the vectors A and B moving along their respective vectors the person on Vector a can only see the person on Vector B when they cross at the origin where the vectors intersect now let's introduce the new Vector C onto the plane unlike with Vector B the person on Vector a can partially see the movement of the person on Vector C by observing the shadow of vector C onto Vector a similarly the person on Vector B can observe the movement along Vector C by watching its shadow onto Vector B however the person on Vector B cannot see the movement of the person on Vector a at all when they cannot observe each other's movements through a shadow the do product of the two vectors is zero when they can perfectly observe each other's movement it's a product of the length of the vectors let's define the dot product mathematically consider two vectors A and B with an angle Theta between them the do product of vectors A and B is obtained by multiplying the length of both vectors and the cosine of theta this formula allows us to calculate the dot products of two vectors when the angle Theta between the vectors is 90° they are perpendicular and cosine Theta equal Z consequently the dot product also become zero conversely when the angle Theta between the vectors is zero cosine theta equals 1 resulting in the dot product being the product of the length of the vectors hence the dot product can be defined using the length of vectors and the angle between them however there's another way to define the dot product let's consider vectors A and B on a two-dimensional plane both vectors A and B have two components let a be 21 and B be one two the do products of vector a and b equals the sum of the products of their corresponding components resulting in a value of four now let's delve into significance as previously mentioned the dotproduct of vectors A and B represents the relative movement of B observed from the Viewpoint of a A can perceive the movements along B by drawing a perpendicular line or Shadow connecting a to the end point of B subsequently a can partially observe the movements along B based on the angle Theta that's why we calculate the dot products by multiplying the length of the vectors and cosine Theta the length of the adjacent side of the right angle triangle represents the product of the length of vector B and cosine Theta this product represents how much a person on Vector a can observe Vector B if the coordinates of vector a are X1 y1 and the coordinates of vector B are X2 Y2 then we can define a vector with a direction of this perpendicular line as- y1 and X1 however when conducting a DOT product we disregard this perpendicular line and only consider a line parallel to Vector a as mentioned earlier the dot product of two vectors is calculated by summing the products of their corresponding components when summing the products of their corresponding elements the perpendicular components are eliminated leaving only the parallel components you can find the length of vector a by taking the square root of the sum of the squares of its elements X1 and y1 similarly you can find the length of vector B if you calculate the product of the length of vector A and B and cosine Theta you'll eventually arrive at the same values obtained by summing the products of their corresponding elements it's crucial to remember the geometric interpretation of the dot product of vectors we calculate the dot product of two vectors when we want to measure the relative movements between them thus we can use the do product itself as an index representing the similarity between the vectors if the angle between the vectors is narrow we'll get a high value for the dot product if the two vectors are perpend pendicular will have a wide angle then we'll get a low value for the dot product from the data perspective when the vectors are far away from each other it means that they contain different types of information since they have different types of information the similarity index has a low value that's how we can use the do products as a similarity index when two vectors are perpendicular the dot product of the vectors is zero implying that they hold different types of information because vectors hold various types of information and the dot product can measure similarities or differences between them it is often employed in the attention mechanism the attention mechanism is a concept in artificial intelligence frequently utilized in natural language processing let's consider a sentence H how about using the title of this slide as an example we have four words in the title each pair of which has a similarity measure together these words form a sentence when constructing a sentence the similarity measures between its words are typically High let's set a random word Apple next to the title and enumerate each word the word apple is quite relevant to the other four words in the title thus its similarity measure is lower than that of the others however the similarity measures between the other words are high specifically the similarity measure between dots and product is higher than the similarity measures between the other words let's create a table with five rows and five columns where each ring column corresponds to a word in the title the similarity measure between a word and itself is very high so I'll fill the diagonal boxes with the darkest color the similarity measure between the words dots and products is also high but not as much as the self similarity measure so supposed the similarity measure between dots and ATT tension is also High similarly the similarity measure between Advanced and attenion is high we refer to this table as a similarity map or similarity Matrix the self attention mechanism or tension mechanism in natural language processing defines the similarity made Matrix before training each entry in the similarity Matrix holds a similarity measure which can be calculated using the dot product in natural language processing words are treated as data thus each word in the title is not merely a group of letters but instead it's represented as a vector in an n-dimensional space if the similarity measure between two words is high the dot product of the corresponding vectors also has a high value if the similarity measure between two words is low the dot product has a low value in the attention mechanism encoding is achieved through this process thus the dot product is a crucial concept let's summarize the concept in linear algebra we've learned so far suppose we have an Mion Matrix a then Matrix a is composed of n vectors in RN if this n vectors within the Matrix are linearly independent to each other then the following properties is naturally follow the rank of the Matrix is n the null space does not exist the determinant of the Matrix is not zero unless ly the inverse of the Matrix exists let's consider another Matrix a composed of n vectors if at Le one pair of vectors within the Matrix is linearly dependent then the following property is hold there exist a no space that cannot be spanned by the vectors meaning there exists a vector X such that ax equals zero moreover the rank of the Matrix is equal to the number of the vectors Within The Matrix that are linearly independent and it is less than n consequently the determinant of the Matrix is zero and lastly the inverse of the Matrix does not exist let's define the cross product of vectors as well suppose we have two vectors A and B on a two-dimensional plane the cross product of the vectors conveys a directional information between them now consider a three-dimensional space with vectors X Y and Z on each axis in this three-dimensional space there exist a vector perpendicular to both vectors X and Y vectors X and Y are already perpendicular to each other and Vector Z is perpendicular to both vectors calculating the cross products of the two vectors X and Y is related to finding vectors like Z that are perpendicular to both X and Y suppose the coordinates of vector X is 1 0 0 the coordinates of vector y are 0 1 0 and finally the coordinates of vector Z is 0 01 by calculating the cross product of vector X and Y we can determine if we can find perpendicular Vector Z you can compute the cross products of vectors A and B by Computing the determinants of a matrix arrange vectors A and B horizontally on the second and third rows of the Matrix for the I component calculate the determinant of the corresponding 2x2 matrix as y1 Z2 minus Z1 Y2 similarly for the J component compute Z 1 X2 - X1 Z2 and for the K component evaluate X1 Y2 minus X2 y1 from the corresponding 2x2 matrices now let's delve into the conceptual understanding of this product consider two vectors A and B in a three-dimensional space as previously mentioned the cross product of A and B is a vector perpendicular to both since vectors A and B lie on the same plane the vector perpendicular to both A and B must extend out of the screen we've discussed various topics related to vectors and matrices such as matrix multiplication and the rank of a matrix moving forward we'll delve into the concepts of igon values and igon vectors some of you may already be familiar with the formulas for calculating igon values and ium vectors and you can easily find the calculation methods in a linear algebra textbook and other resources however in this lecture we'll primarily concentrate on their geometric interpretations mathematically an igon value often denoted by the Greek letter Lambda satisfies the equation axal Lambda X where a is the Matrix and X is the ion Vector in other words it means that when you multiply Matrix a by its igon Vector X you get the same Vector X back just scale by the igon value Lambda let's expand the equation suppose a is a 2x2 matrix the left hand side of the equation ax is simply the result of multiplying Matrix a by Vector X scaling Vector X by Lambda is the same as multiplying a diagonal matrix with diagonal entries of Lambda by X I can Val values and ion vectors are values and vectors that satisfy the equation these are the basic definitions of ion values and ion vectors commonly used in linear algebra let's quickly review the methods for calculating ion values in ion vectors we'll rewrite the equation from the previous slide after subtracting the right hand side of the equation from both sides we obtain Matrix 3 - Lambda 1 0 and 2 - Lambda multiplied by x = 0 now we need to find the values and vectors for Lambda and X this equation can be Rewritten as a minus Lambda i x = 0 we're already familiar with this form X is a vector that constructs a old space additionally X exists only when the determinant of the Matrix is zero the determinant of the Matrix can be expressed as 3 - Lambda * 2 - Lambda = 0 thus Lambda is either 3 or two when the value of Lambda is 3 a minus Lambda I becomes a matrix with values 0 1 0 0 the ion Vector is a vector that returns zero when multiplied by The Matrix thus X is a vector 1 Z and it's an ion Vector for ION value three let's delve into the geometric interpretations of igon values and igon vectors imagine a two-dimensional plane displayed on the screen earlier we found that the igon vector of Matrix a is 1 Z when the igon value is three now let's quickly determine the igon vector of Matrix a for the igon value 2 the values in The Matrix a minus Lambda I are 1 0 0 and 0 thus the igon vector is 01 the igon vector of a is 01 when the igon value is two what's the geometric interpretation of these values let's plot two vectors 3 and 1 two on the plane denoting its Vector As A and B we proceed to transform the current coordinate space with Matrix a consequently coordinates of arbitrary points on the plane shift to new coordinates for example coordinates 1 one will be shifted to 42 we can determine the new coordinates through Matrix Vector multiplication now let's examine why coordinates 1 one are shifted to 42 from the end point of vector a we can draw Vector B which eventually points to the coordinates 42 in other words the point at coordinates 1 one in the transform space is located at coordinates 42 in the original coordinate system what about coordinates 1 Z there will be shifted to new coordinates 3 0 using the Matrix we can transform a coord space and compare changes in coordinates on the plane there's something important to pay attention to here some vectors on the plane will remain on the same line even after their coordinates have been shifted by a matrix let's see which vectors behave like this for example the vector 1 has been shifted to 30 0 and both vectors lie in the same line with the same direction Vector 01 also behaves in a similar manner if we multiply Matrix a by the vector 01 then we get a vector 1 2 oh these two vectors don't lie on the same line it is highly likely that I made a mistake and 01 is not an ion Vector of the Matrix a the values in The Matrix a minus Lambda I are 1 1 0 0 if we multiply the matrix by the igon vector for igon value of two we get zero thus the ion Vector for Theon value 2 is is 1-1 the coordinates 1-1 are in the transform space and we can determine their original coordinates through Matrix Vector multiplication the original coordinates of one1 are 21 by plotting this coordinates on the plane we can see that they lie on the same line and maintain the same direction so do you see why the vector 01 is not an ion vector of Matrix a when the coordinates of an igon Vector are transformed by a matrix they remain on the line spin by the ion Vector let's consider another set of coordinates on this line the coordinates -1 and 1 will be transformed to -2 and 2 by matrix a in summary any coordinates on the line span by an igon Vector will be transformed to coordinates still on the same line furthermore ion values determine the extent of this transformation along the line since 2 is an ion value for the igon vector 1-1 both the coordinates 1 1 and -1 and one have been transformed by a factor of two along this line another set of coordinates -2 and 2 on the line is shifted to -4 and 4 by a vector of two along the line conversely coordinates 1 Z have been transformed by a vector of three as the corresponding ion value is three these explanations provide a geometric insight into igon values and igon vectors let's explore why these line span by each corresponding ion Vector are crucial during a matrix transformation the coordinates of a points are relocated to new positions somewhere in the transform space while other points outside this line struggle to move their new positions the points on this line smoothly shift along the line to their new coordinates in other words while the shapes of arbitrary lines can be altered by the transformation the line span by the ion vectors remained unchanged the this highlights the significance of igon vectors and igon values in linear Transformations let's summarize the geometric significance of igon values and igon vectors consider Matrix a with igon values and igon vectors when you multiply Matrix a by one of its igon vectors the direction of the resulting Vector remains unchanged although it's length may vary the length of the resulting Vector is determined by the igon value corresponding to the ion Vector it's important to remember their geometric interpretations however ion values aren't always real numbers such as the case with two and three from the previous example for instance take a rotation Matrix if we consider rotation Matrix R the rotate space counterclockwise by an angle Theta degrees it has entries cosine Theta sin Theta sin Theta and cosine Theta this Matrix are rotat the space by Theta de let's find the igon values of this rotation Matrix matx if Matrix R rotates the space by 90° its entries become 0 -1 1 and zero subtracting Lambda I from The Matrix gives R minus Lambda I with values negative Lambda -1 1 and Nega Lambda the determinant of this Matrix is Lambda 2 + 1 to make the determinant zero Lambda must be either positive I or negative I thus Lambda is an imaginary number when the value of an ion value is an imaginary number it implies that the corresponding ion Vector also contains imaginary numbers this means that there are no vectors that maintain their directions after the transformation by The Matrix let's see why the rotation Matrix R rotates a space by 90° which means the direction of every vector or Line crossing the origin will be changed thus imaginary igon values imply that there are no vectors on the plane that maintain their Direction after the transformation so remember that if you obtain an imaginary number for an nigon value of a matrix there are no vectors that maintain their directions after the transformation let's quickly go over some of the use for formulas related to ion values the trace of a square Matrix is the sum of its diagonal elements and also equals the sum of its igon values while the determinant of the Matrix is the product of its igon values Matrix diagonalization is a key concept that underscores the importance of understanding ion values and igon vectors let's delve into the diagonalization process for each ion value of a matrix there exists a corresponding ion Vector if a is an M byn Matrix there can be up to n pairs of igon values in igon vectors if VN is an ion Vector of Matrix a then there exist an ion value Lambda n such that AVN equal Lambda n * VN since V1 through VN are vectors in RN we can represent this n equation as a single matrix multiplication the first Matrix in the multiplication is simply Matrix a while the second Matrix comprises the nigon vectors written vertically essentially the second Matrix forms a vector space with igon vectors as it faces vectors both of these matrices are n byn matrices the resulting matrix consist of and igon vectors scaled by their corresponding igon values we can also Express this Matrix as the product of a matrix of igon vectors and a diagonal matrix where the diagonal elements are the igon values let's denote the second Matrix as D and the first Matrix as P then the Matrix of of here is also P we can now simplify the equation with symbols as AP equals PD by multiplying the inverse Matrix of p on both both sides of the equation we get a equals PDP inversed remember if you multiply a matrix by its inverse you get an identity Matrix as a product let's explore the meaning of Matrix diagonalization Matrix D and the equation only holds information of igon values Matrix a can be represented as as a product of three matrices the first Matrix is composed of ion vectors of the Matrix the second Matrix holds the igon values as its diagonal entries and lastly the third Matrix is the inverse of a matrix composed of igon vectors we can decompose Matrix a into the three parts earlier we labeled each Matrix as p d and the inverse of P Matrix diagonalization is the process of decomposition of a matrix let's explore how linear algebra can be applied to principal components analysis PCA is frequently used in data analysis and artificial intelligence and it's one of the methods of Dimension reduction first of all let me explain why we need Dimension reduction in artificial intelligence first we compress High dimensional data to lower dimensions additionally by reducing the dimensionality of the data we eliminate Noise Within it in a sense we reduce the data Dimensions from n to M where m is less than n through Dimension reduction however we cannot randomly eliminate any dimension in the data instead we retain M miningful data while discarding n minus M meaningless dimensions for example if the data is three-dimensional we keep two meaningful dimensions and eliminate one unnecessary Dimension however when eliminating unnecessary Dimensions instead of Simply discarding them we transform the coordinate space before elimination PCA is a method of Dimension reduction that eliminates necessary Dimensions within the transformed coordinate space so how do we determine which dimension is Meaningful and which is not first we start by normalization suppose we have data X to normalize the data we subtract the mean of the data first and then divide by the standard deviation of x why do we need to normalize data if we examine the dimensions of the data we may find that some Dimensions have a ranged between 0 and 100 while other might range between 0 and 1 or 100 to 100 however some Dimensions with a range between zero and one might be more important than those with a wider range thus by normalizing the data we can standardize each Dimension to a range consistent with the standard normal distribution step two involves calculating the covariance Matrix we utilize the formula displayed on the screen to derive this Matrix from step one we have normalized data when we examine the normalized data more closely we find that some data fluctuates with short Cycles while other change slowly with longer periods some data may even fluctuate more rapidly with shorter Cycles than this we regard rapidly fluctuating data as meaningful while data with slower fluctuations are deemed unnecessary as they convey less information The covariance Matrix not only contains information about its own variance but also the relationships with other n minus one data for instance consider data number one and data number two let's visualize the variances of this data sets given that both data sets fluctuate with the same cycle we might question the necessity of retaining both data sets rather than utilizing just one of them suppose we have a CO variance Matrix V after we find the igon values and igon vectors of of this Matrix we can decompose it using Matrix diagonalization in the matrices P and D which contain the igon vectors and igon values respectively we can find information about the variances of the data since the cence Matrix shows how data relate to each other as mentioned earlier we can identify pairs of data that exhibit similar patterns using the covariance Matrix using the igon values from Matrix D we can determine which data is important and which is not a high ion value signifies important data while a low ion value suggest less significance indicating that we might consider dropping the corresponding data in The Matrix D obtained through Matrix diagonalization with have n ion values from Lambda 1 to Lambda n when arranging the ion values in Matrix D we order them by their magnitude within Matrix D we retain the top M ion values while discarding the rest as higher ion values indicate greater significance we also have matrices p and p inverse surrounding Matrix D similarly we preserve only the first first M igon vectors with highest igon values from these matrices while discarding the rest this process reduces the data's dimensionality by retaining only the most important components now let's see how the dimensionality of data can be reduced the dimension of the reduced Matrix P inverse is M byn while the dimension of our original data Matrix a is n byn by multiplying Matrix a by matrix P inverse we obtain a new Matrix a with reduced Dimension to summarize PCA is one of the important examples of applications of igon values in igon vectors and it's frequently used in data processing so remember how igon vectors and igon values are applied to other fields outside of linear algebra now we are going to explore Matrix exponential although Matrix exponentials don't appear frequently in image processing or natural language processing there significance in understanding a random walk or Mark of decision process which your decision making processes so it would be beneficial to learn about Matrix exponentials suppose we have a matrix with entries 1 2 3 and four the exponential of the Matrix is not a matrix with values E1 E2 E3 and E4 the definition of exponential Matrix a resembles Taylor's theorem the mathematical definition of a matrix exponential is written on the board its mathematical definition is entirely different from exponentiating each individual entry Matrix exponentials have the following property let's explore each property in detail the first two properties are similar to the properties of scalar exponentials however the third property is especially important suppose the function f ofx is e to the X then the derivative of f ofx is also e to the X this property holds for matrices as well when a is a matrix the derivative of e to the a is just a * e to the a t because of this simple property we use Matrix exponentials often in AI paper our less property is also very important when we de compose an exponential Matrix with Matrix diagonalization every Matrix stays the same except for the central Matrix meaning that ion vectors don't change we often use Matrix exponentials when we want to find the solution to linear systems suppose the values of x sub T vary with time and they're also influenced by the velocity we Define X of t as a Time the exponential of a t + b then the derivative of x of T is a * a * the exponential of a t if we substitute a Time exponential of a t + B into x sub T in the formula above we get a * a * exponential of a t + B A thus ba equal Z and we can drop B in x sub T therefore X of t equal a KN * exponential of a t to summarize when we want to observe changes in values over time the solution of the linear system often involves a matrix exponential the last topic we'll cover today is the pseudo inverse Matrix let's consider a matrix a in a vector B where their relationship is represented by a matrix Vector multiplication if we multiply both sides of the equation by the inverse of Matrix a we get x = a inverse * B however if a is not a square Matrix meaning it's a rectangular Matrix then the rank of a is less than n consequently The Matrix doesn't have an inverse for instance if Vector X is in RN and Vector B is in RM Matrix a is an M byn Matrix and it doesn't have an inverse however even though the rectangular Matrix a doesn't have an inverse we can determine its pseudo inverse Matrix instead of multiplying the inverse of a on both sides we multiply the transpose of Matrix a initially a is on M byn Matrix X is of Vector in RN and B is a vector in RM the dimension of the transpose of Matrix a is n by m thus we can write n by m * n by n * n by 1 = n by m * m by 1 the dimension of the product of these two matrices is n byn so we can determine an inverse of the product this time we'll multiply the inverse of a transpose time a on both sides of the equation the dimension of the inverse of the product is n byn and the dimension of a transpose is n by m and the dimension of B is M by 1 thus the dimension of the final product is n by 1 does this final product provide a complete explanation of vector X it depends on the sizes of M and N suppose m = 3 and N = 100 through the Matrix Vector multiplication the dimension of X has been decreased from 100 to 3 which means the information contained in X has been compressed to a lower Dimension and during the process some information can be lost thus complete recovery of information will be impossible X will partially recover its information as much as B can explain however when m is much larger than n b contains much more information than x meaning that X can completely recover its information pseudo inverse matrices are commonly used in AR research papers so it's worth noting the concept of the information recovery process with pseudo inverse matrices so far we've covered various topics in linear algebra but there's one very important key point you must remember the geometric meanings of the concepts throughout the lecture we explore the geometric interpretations of matrices inverse matrices rank determinants Matrix Vector multiplication and so on we also learned about their applications in neon networks and the attention mechanism a significant part of linear algebra re VES around matrix multiplication or multiplication between matrices and vectors which are commonly used when building AI models we also learned about transformation of space through matrix multiplication with matrix multiplication we can not only rotate or tilt space but also increase or decreased Dimensions moreover if the inversal Matrix does not exist information can be lost during the transformation process understanding these geometric interpretations of Concepts is crucial When developing your own AI models or reading AI research papers while you can review linear algebra by solving problems in textbooks I highly recommend picking on AI research paper and trying to understand how the geometric meanings of the concept we've learned in the lecture are applied that's it for today's lecture and thank you so much much for listening see you at the next lecture

Transcript for:Understanding Linear Algebra in AI

Transcript for:
Understanding Linear Algebra in AI