Transcript for:
Understanding Skewness and Moments

So what is the topic of skew and what is it going to involve? All right. So skewness and moments. So it turns out to discuss the topic of skewness, we must also talk about moments. As we already know, the expected value and the variance. of a random variable are particular cases of quantities known as moments of this variable. Moments. are a quantitative measure of the shape of a function. Not the moment. Aha, pun intended. We don't know how, but that's okay. This concept is closely linked to physics. I'm quite fond of physics as a subject. I happen to think that physics is a beautiful subject in mathematics. So specifically, then what I would say is that we want to talk about some general concepts of how physics relates to this, just in case we have seen these in a physics course. So one of the things that's true is that. If a function represents density of a body, And for those who maybe somewhat remember that from Calc 2, there's typically at least one example of such a function in Calc 2 of where you do a density thing. And if you were in my version of Calc 2, I do not make a big deal out of it. The reason why is because it actually is the focus of a physics course. But if, in fact, you have a function that represents density of a body. The zeroth moment is the total mass. The first moment... divided by total mass is the center of mass, and the second moment is the rotational inertia. And what would be maybe a sort of famous example of a rotational inertia problem or situation? Is there a classic device that is oftentimes used for discussing rotational inertia? that might be in some ways similar to delicious sandwich, a gyroscope, or probably pronounced more correctly, a gyro, a gyroscope. A delicious, delicious sandwich that's a combination of beef and lamb with some tzatziki sauce and Then you spin that sandwich around and around, and then you can then experience its rotational inertia as it doesn't want to then change along its axis. I presume that that's what a gyroscope is, or that it might be called a gyroscope, but it's spelled almost exactly the same thing as the sandwich. So, hey, why not? Okay, so let's define then these moments. Now, we're going to define a more general version of a moment than what we've had occasion to use here. The kth moment around slash about a point C of a PDF. f of x for random variable x is given by, and we're going to have two cases. So we're going to have the discrete case as well as the then continuous case. So it's either mu k, which will be equal to the summation of xi minus c raised to the k times f of xi. Or it will be mu k will be equal to the integral of x minus c raised to the k times f of x dx. So it's the difference between whether you use a discrete or a continuous presentation. If c equals 0, then We're going to rewrite the formulas. Mu k prime equals, now I don't mean prime in terms of derivative, I just mean different. Mu k prime is equal to the summation of x i raised to the k times f of x, or mu k prime is equal to the summation. or probably the integral of xk f of x dx. And these two instances of where if c is equal to zero, these are what are referred to as the raw moments. Okay, so that is a thing, isn't it? That is a thing for sure. Now, if k were to be equal to zero, then what would a raw moment be equal to? So k equals zero, and that means that the xi is raised to the zero power, or the x is raised to the zero power, which means it's really just what value? One. And if you're talking about then the raw moment, where the c is equal to zero, then let's just look at the formula for the continuous case. It would be the integral of f of x dx. What's that equal to? Well, remembering it's the probability distribution for a random variable would be equal to what? And everybody agrees. 1 is the entire integral of a Probability density function for a random variable is equal to one. Awesome. That's a good observation. Good job, us. High five. Okay, well, another observation that we can make is that E of X happens to be equal to Mu sub one prime, which is equal to the integral of X times F of X DX. OK, so that's where we can then say that we do, in fact, have the second moment or the first moment, first raw moment as being the expected value. Cool. We would agree with that. Now, what is E of x squared? E of x squared is mu sub 2 prime. And E of x3 is mu sub 3 prime, etc., etc., etc. So when we previously defined moments. We were really defining the raw moments. And that's okay. That's good and relevant moments. So we're defining more general notions of moments right now, which is the reason why we need the sub-language. Well... If c is equal to the expected value of x, Then mu k is equal to, and there's going to be two different formulas, one for the discrete case, one for the continuous case. Then we have the summation of xi minus expected value of x, the quantity raised to the k, times f of xi. Or else we have the integral of x minus expected value of x, quantity raised to the k, times f of x dx. And this is where we have either the discrete formula, which was the first one, or continuous, which is the second formula. we refer to this as the kth central moment of random variable x. Well, it just so happens that the variance of x is equal to mu k, or probably mu sub 2, which is the second central moment. And the reason for that is, is because that's equal to the expected value of x minus the expected value quantity squared. Cool, cool, cool, cool, cool, cool. So this, again, is generalizing some of those notions of moment that we previously had seen. And that's a very good thing. Now, there are ways of connecting central moments to raw moments, but we're not going to get into that because that's not entirely important for the purposes of what we're covering today. So in other words, there are ways of being able to relate the two concepts. for one thing one of the things that hopefully is kind of obvious is that the central moments are just a shift on the raw moments so all we're doing is doing a translation now and prob and stats theory, a standardized moment of a PDF. is a moment, typically a higher order centralized moment. That is... Normalized. So normalization is division by an expression. of the standard deviation. Okay, now, what's our sort of extremely famous example of standardization? So let's... call to mind our favorite z-score, which happens to be z is equal to what? x minus the mean over the standard deviation. Whoa! Okay, so that's a general process, and it turns out that that specific example happens to be something that is true in general. So this is where we can say the standardized moment of order k. is given by nu sub k equals mu sub k over sigma k, where sigma is the standard deviation, and mu is our good old-fashioned moment. Awesome. Now, this kind of normalization... leads to the fact that standardized moments are dimensionless quantities. So because we're dividing by whatever units are involved, they then don't have a dimension. Now, third and fourth standardized moments are things that are widely used in probability and statistics, but moments of higher orders than third and fourth have extremely little value. At least at the moment. This is one of those things where when we say a comment like that in mathematics, we want to be sure to say at the moment. Because math is a ever-growing, ever-developing field. So it's just that at the moment, they don't have a practical purpose. They could, I don't know, 100 years from now, end up having an actual practical purpose. The third standardized moment, nu sub 3, is often referred... to as the skewness. And just to be clear, nu sub 3 is equal to mu sub 3 divided by sigma raised to the 3. Skewness is a measure of the asymmetry of a PDF about its mean. And we would typically denote this as skew of x. So that's one way of writing what new sub 3 is, is that we would just refer to it as the skew of x. Now, the skew of x can be positive, it could be negative, it could be zero, or, and this is the exciting one, undefined. So that one's quite fun. Okay, so let's make some observations about potentials for this. So let's imagine, so I'm going to draw sketches and simultaneously also describe the sketches. So let's imagine two different charts. So these are two side-by-side comparison charts. And we're then going to imagine two distributions. So the one is going to be a wave which is leaning towards the right. And then the other is going to be a wave that's leaning towards the left. Now, these two waves are you're going to imagine the way that you. Wow. that pen really flipped. You're going to imagine that how you got to these waves is that you start with a normal distribution, which is bell-shaped and symmetric, and then you're going to tilt everything. And so by process of tilting everything, that means that you can kind of imagine that what would normally be the midline has tilted as well. So if things were perfect, actual, exact, normal distributions, then everything is bell-shaped and symmetric, meaning evenly sided in respects to the mean. However, if you are tilting everything to the right, then what you can imagine is that that center line is curving off to the right as well. And if everything is tilted to the left, then that center line is curving off to the left. so that if you wanted to still specify there as being two halves to the graph, the two halves are now tilted. They're askew. Ooh, that's a good name. Oh, that's the reason why we call it skew is because now things are askew? Yes. So it turns out that this is not like a name that's just distinct to mathematics. We're just recycling a word that we would just use in normal English to say that things are askew. and therefore to measure the amount of askewness, you then have what's called the skew of the graph. So I'm making the presumption that you know that referencing something as being askew is a way of saying that it's tilted. So this is where then when we have two different sort of tilted graphs, the amount of the values that are to the right are different than the values that are to the left. So the tapering sides, because you can imagine that when you're tilting a bell curve, you're then weighting stuff on one side, which means that then the other side looks thinner. So when you're looking at the thinner side, the thinner side is what you would reference as the tail. And it provides a visual means for determining which of the two kinds of skewness a distribution has. So when we are having of where things are tapering more towards the left, well then this is where we're going to say that skew of x is less than zero. And when there's more of a taper towards the right, this is where we're going to say skew of x is greater than zero. So for a negative skew, the left tail is longer and the mass of the distribution is concentrated on the right of the figure. So the distribution is said to be left skewed, left tailed, or skewed to the left. So skew is again the kind of, you can think of that as the side of where it's stretched out, if that's a helpful way of imagining it. And then for a positive skew, the right tail is longer, the mass of the distribution is concentrated on the left of the figure, and the distribution is said to be right skewed, right tailed, or skewed to the right. Now if a distribution is completely symmetric about its expected value, then we would say that the skewness equals zero. Now, the converse is not true in general, meaning what? An asymmetric distribution might have the skewness value of zero if one of its tails is long and thin and the other is short but fat. So there's not just a particular guarantee on this. Now if a distribution has finite expected value, which is something that is a fairly reasonable assumption for a lot of different types of distributions, and it also has finite standard deviation, then the skew of x which is equal to nu sub 3 is equal to the expected value of x3 minus 3 times the expected value of x times sigma, the quantity squared, minus the expected value of x, the quantity cubed, all of that over sigma cubed. So in other words, there's a nice single particular way of determining the skew for a distribution, so long as it has finite mean and finite standard deviation. Now, that also means that for lots of different types of distributions that are discrete distributions, we can then find nice, succinct formulas for the skew. So, for instance, we do know for things like... the geometric distribution, we know that there's a standard formula for the mean and for the standard deviation, which means we could substitute those in and then come up with a single nice formula for the skew. I'm not going to get into that because it's not super, super important in that regard for us to talk about the skew in that regard. What is important is for us to be able to know that there is a formula available for discrete distributions. for continuous distributions, well then it becomes then just a matter of then, for any particular continuous distribution, you would then do it out. So if your distribution is a normal distribution, then it just becomes a plain old, you plug things into this formula kind of computation. So if, for instance, you happen to be talking about a standard normal distribution, Then for a standard normal distribution, that means that mu is equal to zero and sigma is equal to one. So we can now investigate what this u is by looking at this formula. OK, so the expected value of x cubed. Well, what would that be for the standard normal distribution? That would be x cubed times the integral, blah, blah, blah, blah, blah. Do it out. Well, every time there's a sigma, it's actually a one. And then we go, okay, well, that's going to end up where the skewness is going to be. Oh, wow, that's going to end up being exactly a zero at the end. Cool. Okay, well, that's also because we happen to know that that distribution is completely symmetric about the mean. Awesome. Okay, so there's now then another value then. that is a similar value to the skew that we want to talk about as well. The fourth standardized moment, nu sub 4. of a random variable x is called the kurtosis. And for those who are mega fans of Greek and Latin prefixes, I want to say that this comes from the Greek kurtos, which means curved, and no, it didn't just happen to automatically know that I looked it up. There are some I actually happen to know just from memory. Kurtosis is a major. of the, and I'm going to invent a word here, so tailiness of the PDF. I'm fairly certain that tailiness is not an actual word. So I'm spelling it as T-A-I-L-I-N-E-S-S. I don't think that's a word. But so it's the amount of tail of a PDF. So the value of the kurtosis describes the thickness of the distribution's tail. So just sort of. Well, we'll get there when we get there. So if we would calculate the kurtosis of a normal distribution, we'd find that its value equals three, no matter what parameters the distribution has. So that's giving us a sort of example to help us to imagine what this kurtosis is going to end up being. So three is sort of the standard, as it were, in terms of it for a normal distribution. that is going to be three is the magic number. So most mathematicians and as a result, statisticians would prefer to compare things to the normal distribution. And so from that, three is sometimes subtracted from the kurtosis value to kind of standardize it. So this is where then for many, they would then say that the curt of x is equal to nu sub 4 minus 3, which happens to be equal to mu sub 4 divided by sigma to the fourth. which is called the excess kurtosis. So it's the amount of extra from it being a normal distribution. That fan is incredibly loud in this room, by the way. So if someone told me that we were actually secretly on a jet plane. and that that fan was the actual engine. I would kind of believe it if I just zoned out for a couple of moments. So what is the excess kurtosis of a normal distribution? Zero. The excess kurtosis of a normal distribution is zero. Now, Distributions with zero excess kurtosis, zero excess kurtosis distributions are called mesokurtic or mesokurtic except for i've almost exclusively uh heard it pronounced meso so mesokurtic i do enjoy some good miso soup with my sushi but i'm pretty sure it's pronounced differently now the most famous example of a mesokurtic distribution is the normal distribution There are some other distributions that can be mesocurtic. So the binomial distribution can be mesocurtic. It is not always mesocurtic. There are two possibilities on when it can be mesocurtic. If. curt of x is positive then we call the pdf leptokurtic and lepto means slender so slender So kind of done out, it means slender tailed. Now, in terms of what it actually graphically looks like, a leptokurtic distribution has a fatter tail than the normal distribution. And we say, well, why is it called slender? slender is in a different meaning. So we'll look at some examples. So the exponential distribution is an example of a leptokurtic distribution. So what does an exponential distribution look like? Well, that's where it starts off at a particular value. like, you know, one, and then it continues on down, down, down, to where then it has a horizontal asymptote, correct? So if curt of x is less than zero, Then we call the PDF PLATY, which means broad. Now, if you want a really nice way of thinking about it, can you think of a marsupial that might possibly have a very similar common Greek name for where we could then be having that help us to think of broad tail. What kind of marsupial might possibly help us to remember platykurtic? Yes, a platypus. I would agree with that. So most certainly. And of course, the most famous platyplos is Perry, Perry platyplos. So from, wow, my mind is blank. What show is that? Oh, yeah, Phineas and Ferb. Thank you. So the most, so one of the examples of a platycurtic distribution. is a Bernoulli distribution with one half as the probability involved. Another example of a platycardic distribution is the uniform distribution. Now, there is also a formula for this. So, Kurt or pardon me, not Kirk, nu sub 4 is equal to the expected value of x raised to the fourth minus 4 times the expected value of x times the expected value of x raised to the third plus 6 times the expected value of x, the quantity squared. times the expected value of x squared, minus three times the expected value of x, the quantity to the fourth, all of that over sigma raised to the fourth. And then we say, okay, and what about for the excess kurtosis? The excess kurtosis, well, you just subtract three from that. Okay, so what's the point of kurtosis and skew? The point of kurtosis and skew is gaining the ability to be able to numerically describe the shapes of distributions without actually having to look at their graphs. That's the point of it. But it also gives us a comparison point of being able to say how different distributions are from the normal distribution. The normal distribution again is our our main distribution that we're going to reference a lot of the times. So that's it. That's the subject of skew. That's also the subject of kurtosis. And so we now then learned three additional terms, mesokurtic, leptokurtic, and platykurtic. Fun phrases, by the way. Very fun phrases. And all of that just from having a little bit of a tweak on the notion of a moment. which turns out we already knew about moments because we already knew about standard deviations and we already knew about expected values, but we also knew about already the moment-generating function for the purposes of this course. And so that's actually the raw moment-generating function. So you can extend that to a generalized moment function, etc., etc., and it gives you more things. Okay, so next time, what are we going to do? Well, next time, we're going to continue from where we had left off and see from there. So have a good one. Enjoy, enjoy.