Understanding Inner Products on Matrices

Hey folks, my name is Nathan Johnston and welcome to lecture 18 for Advanced Linear Algebra. So remember last class we were introduced to inner products and we introduced standard inner products on the vector spaces Rn, the dot product, Cn, the dot product, the vector space of continuous functions, which was just, you know, the standard inner product was an integral, and we showed that there were a bunch of weird inner products as well. But we haven't yet introduced any inner products, not even a standard one, on the space of matrices. So that's the goal of today's lecture. But to be able to do that, we've got to introduce a new function on matrices called the trace. So that's what we're going to do first here. We're going to introduce the trace of a matrix. And so what it is, is it's kind of the simplest function on matrices that there is. So suppose you've got some square matrix, then the trace of the matrix, which we just denote in this way, tr of a, it's the sum of the diagonal entries of the matrices. So you just take a11, a22, up to ann, and you add them up. Okay, so just a quick example to make sure that we're understanding what's going on here. The trace of this matrix, for example, the trace of 1, 2, 3, 4, all you do is you look at, well, what are the diagonal entries? Well, they're 1 and 4. So add up 1 and 4, and you get the trace of that matrix is 5. And this works no matter what the size of the matrix is, as long as it's square. Okay, so just another quick example. If you got this matrix 1, 2, 6, 0, minus 4, 8, 3, 1, 2, then the trace is, again, you just look at what are the diagonal entries of that matrix. Well, they're 1, minus 4, and 2. You add those up. you get minus one. So that's the trace of that particular three by three matrix. Okay so the trace at first glance it seems like a really arbitrary and silly function. Like why would adding up the diagonal entries of a matrix be useful? Okay and the answer is basically this theorem here. This tells us why the trace is interesting and the reason is well remember that matrix multiplication in general is not commutative. A times B is not usually the same as B times A. But if you throw a trace around things, then you do get the same answer. In other words, if you have any two matrices such that both of the products, A, B, and B, A, make sense, they don't have to be square themselves. A and B don't have to be square. Just as long as those products are each square, then trace of A, B does equal trace of B, A. Now this is really nice because what it lets us do is it lets us treat matrix multiplication as if it were commutative in some situations, even though it's not actually commutative. Okay, so let's see how, first off, let's see how we prove this theorem. Where does this come from? All right, well, what you do is you just work from the definition of everything, okay? So we want to compute the trace of AB, and we also want to compute the trace of BA. Well, let's just write down those two things explicitly in terms of the entries of the matrices AB, all right? So the way we do that first is let's compute the diagonal entries of these two matrices, okay? So let's start off by computing the II entry. of the matrix AB. That's what this notation here means. It just means the i-th diagonal entry of the matrix AB. The way you compute that is just from your matrix multiplication rule. Remember the way matrix multiplication works is you sum over matching entries of the inner subscripts and then your outer subscripts ii here are just coming from whatever your subscripts, whatever entry you're trying to compute in the product here. And then similarly if we want to compute the j-th diagonal entry of BA, the jj entry of BA, then again we're going to sum over i this time, we're going to sum over matching inner indices, and then our outer indices are both j here because those are the outer indices that we want over there. Okay so those are the diagonal entries of these two matrices, and first off note that the diagonal entries themselves do not have to be the same. This number does not in general equal this number. I mean these matrices they might have a different number of diagonal entries even, like a b does not even have to be the same size as b a. Okay, so the diagonal entries themselves can be different, but the trace will still be the same. The sum of those diagonal entries will be the same. And how do we show that? Well, now just add up all of these. Okay, so trace of AB just equals the sum over I of all of these, right? So I'm just going to copy this, put it down here, and throw a sum over I out in front. And similarly, what is the trace of BA? Well, it's just the sum of all of these. So I'm just going to take this expression here, copy it down here, and put a sum over J. in front of it. So I'm just summing up all those diagonal entries. Okay and how do I show that these are two these two expressions are equal to each other? I look at them really hard. Okay so this expression here, I've got a bunch of aij's and a bunch of bji's. Well here's a bji and here's an aij and I mean I can swap the order those are just real or complex numbers numbers from a field that I'm multiplying together and I've got sum j equals one up to n. Oh here's a sum j equals one up to n. Here's a sum i equals one up to m. Here's a sum i equals one up to m. Those are the same number, they're just written in a different way. I've just swapped a bunch of things around. You can swap sums, you can swap number multiplication, and so on. So, I mean, these things, they're just equal to each other. These two expressions are the same. And that's it. That's the whole proof. Okay. So that's why we care about the trace. Okay. The trace is really nice because of that kind of commutativity property. It also has a bunch of other nice properties that you can prove straight from the definition. These are actually a bit easier to see. So, for example, trace of A plus B equals trace of A plus trace of B. And why is this? Well, I mean, here it's just saying if I add up two matrices and then... add up the diagonal entries of the thing that I get, I get the same answer as if I add up the diagonal entries of one and then add up the diagonal entries of the other and then add those up together. And I mean, of course they are. You're just doing entrywise calculations in different orders, right? You're just adding up all the same numbers just in a different order. So of course you get the same answer. And similarly, this next one, if I scale or multiply a matrix and then add up its diagonal entries, I get the same thing as if I add up its diagonal entries and then scale or multiply it. Well. Yeah, of course you do, right? Again, you're just doing entrywise operations in different orders. You get the same answer. Okay, and these two properties here taken together tell us that the trace is a linear transformation, okay? And when you combine it with that commutativity property, we're sort of saying that the trace, in a sense, it's the nicest linear transformation acting on matrices. It's kind of the nicest linear transformation that takes in a matrix and spits out a number. Alright, and then one final property of the trace that's nice to know is that the trace of the transpose of a matrix just equals the trace of the matrix itself. And I mean, again, of course it does, right? When you transpose a matrix, all you're doing is you're flipping its entries across the diagonal. The diagonal entries don't change at all, so the sum of the diagonal entries doesn't change either. Okay, so the proofs of all of those three statements are just straightforward, so that's why we're not dwelling on them. Okay, for our immediate purposes in this particular class, The reason we care about the trace is it lets us define an inner product on the space of m by n matrices. So what it lets us do is, well, it actually lets us define the standard inner product on this space. So here's what it is. The standard inner product between two matrices A and B, what it is, is it's the trace of A star B. And again, remember this A star notation? What it means is the conjugate transpose of A. So what you do, A star, that's the matrix that you get if you take A. and then you transpose it and then you put a complex conjugate on every single entry in A. Okay so A star is the complex conjugate of A transpose. All right and what this is saying is that if I take A star multiply by B and then take the trace of the resulting product well that gives me an inner product on the space of matrices. Okay and so we could prove this straight from the definition right? I mean remember one page ago maybe no more than that a couple pages ago We saw the definition of an inner product. So I'm just going to scroll back up and show you. And then what we've been doing to show that a function actually is an inner product is we've just been going through these three properties here and showing that they hold. We could do that. Okay. That'd be one way to go. But I'm going to do it a slightly different way this time. Okay. Because there's another way of looking at this inner product that's maybe a little bit more enlightening that really shows what's going on here. Okay. So I'm going to start off. I'm going to take this expression here. I'm going to write down really explicitly in terms of the entries of A and B, just like we did up above when we were proving that commutativity theorem. So if I write it out really explicitly in terms of the entries of A and B, a little bit of a calculation will show you that, hey, that trace of A star B equals this sum right here. It equals the sum of Aij bars times Bijs. So complex conjugate of each entry Aij times the corresponding entry Bij from B. Okay, and now think about this a little bit. What we're doing is we're taking every entry from A and multiplying it by the corresponding entry from B. That's a lot like the dot product on Rn or Cn, right? Remember the dot product on Cn in particular, you take the complex conjugate of an entry from the first vector and you multiply it by the corresponding entry in the next vector, and then you add up all those pairs. We're doing the exact same thing. The only difference here is that the entries are arranged in a matrix rather than just in a one dimensional list, rather than in a number, or sorry in a vector, right? Okay so this expression here, another way of looking at it is it's just the usual dot product of the standard, sorry of the coordinate vector of a and the coordinate vector of b, right? You can represent these members here, these a and the and b they live in a vector space, the vector space of matrices. So if we fix some basis of that vector space, in particular if we fix the standard basis of that vector space, then we can represent them as coordinate vectors, and that's all I'm doing here. If you take the dot product of those coordinate vectors, you get exactly this sum here. Okay, another way of thinking about it is just if you take those matrices, if you take A and B, and you just forget that they're matrices, if you just take all of their entries and list them in a vector instead, right, if you list them in a vector with MN entries. and then do their dot product, this is what you get. You get this sum here. And because we already know the dot product is an inner product, we're going to find that, hey, this expression here is as well. This inner product here, it really is an inner product. You can just leech all of the properties off of the dot product on cmn to see that, hey, this really is an inner product on mmn. So that's a nice way of looking at it. Really, this this inner product here really it's the dot product on the space of matrices okay so it's the standard inner product on the space a couple other names that you'll see for it if you look at other books and you encounter it in the future sometimes it's called the Frobenius inner product we'll use that name occasionally in this course or sometimes it's called the Hilbert Schmidt inner product okay we're not going to use that name in this course but just be aware that if anyone uses that name that's what they're talking about okay so that will do it for today class I will see you all in lecture 19

But to be able to do that, we've got to introduce a new function on matrices called the trace. So that's what we're going to do first here. We're going to introduce the trace of a matrix. And so what it is, is it's kind of the simplest function on matrices that there is. So suppose you've got some square matrix, then the trace of the matrix, which we just denote in this way, tr of a, it's the sum of the diagonal entries of the matrices.

So you just take a11, a22, up to ann, and you add them up. Okay, so just a quick example to make sure that we're understanding what's going on here. The trace of this matrix, for example, the trace of 1, 2, 3, 4, all you do is you look at, well, what are the diagonal entries?

Well, they're 1 and 4. So add up 1 and 4, and you get the trace of that matrix is 5. And this works no matter what the size of the matrix is, as long as it's square. Okay, so just another quick example. If you got this matrix 1, 2, 6, 0, minus 4, 8, 3, 1, 2, then the trace is, again, you just look at what are the diagonal entries of that matrix. Well, they're 1, minus 4, and 2. You add those up. you get minus one.

So that's the trace of that particular three by three matrix. Okay so the trace at first glance it seems like a really arbitrary and silly function. Like why would adding up the diagonal entries of a matrix be useful? Okay and the answer is basically this theorem here.

This tells us why the trace is interesting and the reason is well remember that matrix multiplication in general is not commutative. A times B is not usually the same as B times A. But if you throw a trace around things, then you do get the same answer.

In other words, if you have any two matrices such that both of the products, A, B, and B, A, make sense, they don't have to be square themselves. A and B don't have to be square. Just as long as those products are each square, then trace of A, B does equal trace of B, A. Now this is really nice because what it lets us do is it lets us treat matrix multiplication as if it were commutative in some situations, even though it's not actually commutative.

Okay, so let's see how, first off, let's see how we prove this theorem. Where does this come from? All right, well, what you do is you just work from the definition of everything, okay?

So we want to compute the trace of AB, and we also want to compute the trace of BA. Well, let's just write down those two things explicitly in terms of the entries of the matrices AB, all right? So the way we do that first is let's compute the diagonal entries of these two matrices, okay? So let's start off by computing the II entry.

of the matrix AB. That's what this notation here means. It just means the i-th diagonal entry of the matrix AB.

The way you compute that is just from your matrix multiplication rule. Remember the way matrix multiplication works is you sum over matching entries of the inner subscripts and then your outer subscripts ii here are just coming from whatever your subscripts, whatever entry you're trying to compute in the product here. And then similarly if we want to compute the j-th diagonal entry of BA, the jj entry of BA, then again we're going to sum over i this time, we're going to sum over matching inner indices, and then our outer indices are both j here because those are the outer indices that we want over there.

Okay so those are the diagonal entries of these two matrices, and first off note that the diagonal entries themselves do not have to be the same. This number does not in general equal this number. I mean these matrices they might have a different number of diagonal entries even, like a b does not even have to be the same size as b a. Okay, so the diagonal entries themselves can be different, but the trace will still be the same. The sum of those diagonal entries will be the same.

And how do we show that? Well, now just add up all of these. Okay, so trace of AB just equals the sum over I of all of these, right? So I'm just going to copy this, put it down here, and throw a sum over I out in front.

And similarly, what is the trace of BA? Well, it's just the sum of all of these. So I'm just going to take this expression here, copy it down here, and put a sum over J.

in front of it. So I'm just summing up all those diagonal entries. Okay and how do I show that these are two these two expressions are equal to each other? I look at them really hard.

Okay so this expression here, I've got a bunch of aij's and a bunch of bji's. Well here's a bji and here's an aij and I mean I can swap the order those are just real or complex numbers numbers from a field that I'm multiplying together and I've got sum j equals one up to n. Oh here's a sum j equals one up to n.

Here's a sum i equals one up to m. Here's a sum i equals one up to m. Those are the same number, they're just written in a different way. I've just swapped a bunch of things around.

You can swap sums, you can swap number multiplication, and so on. So, I mean, these things, they're just equal to each other. These two expressions are the same.

And that's it. That's the whole proof. Okay. So that's why we care about the trace. Okay.

The trace is really nice because of that kind of commutativity property. It also has a bunch of other nice properties that you can prove straight from the definition. These are actually a bit easier to see. So, for example, trace of A plus B equals trace of A plus trace of B.

And why is this? Well, I mean, here it's just saying if I add up two matrices and then... add up the diagonal entries of the thing that I get, I get the same answer as if I add up the diagonal entries of one and then add up the diagonal entries of the other and then add those up together.

And I mean, of course they are. You're just doing entrywise calculations in different orders, right? You're just adding up all the same numbers just in a different order.

So of course you get the same answer. And similarly, this next one, if I scale or multiply a matrix and then add up its diagonal entries, I get the same thing as if I add up its diagonal entries and then scale or multiply it. Well. Yeah, of course you do, right?

Again, you're just doing entrywise operations in different orders. You get the same answer. Okay, and these two properties here taken together tell us that the trace is a linear transformation, okay?

And when you combine it with that commutativity property, we're sort of saying that the trace, in a sense, it's the nicest linear transformation acting on matrices. It's kind of the nicest linear transformation that takes in a matrix and spits out a number. Alright, and then one final property of the trace that's nice to know is that the trace of the transpose of a matrix just equals the trace of the matrix itself. And I mean, again, of course it does, right? When you transpose a matrix, all you're doing is you're flipping its entries across the diagonal.

The diagonal entries don't change at all, so the sum of the diagonal entries doesn't change either. Okay, so the proofs of all of those three statements are just straightforward, so that's why we're not dwelling on them. Okay, for our immediate purposes in this particular class, The reason we care about the trace is it lets us define an inner product on the space of m by n matrices.

So what it lets us do is, well, it actually lets us define the standard inner product on this space. So here's what it is. The standard inner product between two matrices A and B, what it is, is it's the trace of A star B.

And again, remember this A star notation? What it means is the conjugate transpose of A. So what you do, A star, that's the matrix that you get if you take A. and then you transpose it and then you put a complex conjugate on every single entry in A.

Okay so A star is the complex conjugate of A transpose. All right and what this is saying is that if I take A star multiply by B and then take the trace of the resulting product well that gives me an inner product on the space of matrices. Okay and so we could prove this straight from the definition right?

I mean remember one page ago maybe no more than that a couple pages ago We saw the definition of an inner product. So I'm just going to scroll back up and show you. And then what we've been doing to show that a function actually is an inner product is we've just been going through these three properties here and showing that they hold. We could do that.

Okay. That'd be one way to go. But I'm going to do it a slightly different way this time. Okay.

Because there's another way of looking at this inner product that's maybe a little bit more enlightening that really shows what's going on here. Okay. So I'm going to start off. I'm going to take this expression here.

I'm going to write down really explicitly in terms of the entries of A and B, just like we did up above when we were proving that commutativity theorem. So if I write it out really explicitly in terms of the entries of A and B, a little bit of a calculation will show you that, hey, that trace of A star B equals this sum right here. It equals the sum of Aij bars times Bijs.

So complex conjugate of each entry Aij times the corresponding entry Bij from B. Okay, and now think about this a little bit. What we're doing is we're taking every entry from A and multiplying it by the corresponding entry from B.

That's a lot like the dot product on Rn or Cn, right? Remember the dot product on Cn in particular, you take the complex conjugate of an entry from the first vector and you multiply it by the corresponding entry in the next vector, and then you add up all those pairs. We're doing the exact same thing. The only difference here is that the entries are arranged in a matrix rather than just in a one dimensional list, rather than in a number, or sorry in a vector, right?

Okay so this expression here, another way of looking at it is it's just the usual dot product of the standard, sorry of the coordinate vector of a and the coordinate vector of b, right? You can represent these members here, these a and the and b they live in a vector space, the vector space of matrices. So if we fix some basis of that vector space, in particular if we fix the standard basis of that vector space, then we can represent them as coordinate vectors, and that's all I'm doing here.

If you take the dot product of those coordinate vectors, you get exactly this sum here. Okay, another way of thinking about it is just if you take those matrices, if you take A and B, and you just forget that they're matrices, if you just take all of their entries and list them in a vector instead, right, if you list them in a vector with MN entries. and then do their dot product, this is what you get. You get this sum here. And because we already know the dot product is an inner product, we're going to find that, hey, this expression here is as well.

This inner product here, it really is an inner product. You can just leech all of the properties off of the dot product on cmn to see that, hey, this really is an inner product on mmn. So that's a nice way of looking at it. Really, this this inner product here really it's the dot product on the space of matrices okay so it's the standard inner product on the space a couple other names that you'll see for it if you look at other books and you encounter it in the future sometimes it's called the Frobenius inner product we'll use that name occasionally in this course or sometimes it's called the Hilbert Schmidt inner product okay we're not going to use that name in this course but just be aware that if anyone uses that name that's what they're talking about okay so that will do it for today class I will see you all in lecture 19

Transcript for:Understanding Inner Products on Matrices

Transcript for:
Understanding Inner Products on Matrices