Transcript for:
Understanding Sample Means and Distributions

in this video we're going to be exploring sample means we're going to be covering all of this it's a lot of work it's 8 hours worth apparently it's not going to take us 8 hours now at the start of this video it's going to feel a little bit wishy-washy a little bit airy fairy because we've got to do things like examine simulate simulate don't be tempted to fast forward to the end of the video because you are going to need this stuff having seen external exams in the past you need to be able to think about these things they put them in multiple choice questions a lot all right let's take a look at it now what i've done here is simulate a population one person is 160 centimeters tall another person is 180 centimeters tall now i've done this all the way down there's about sorry there's exactly 600 people in this town let's just come up with those numbers num person number one person number two person number three all right let's take a look at what the distribution of these heights looks like all right it shouldn't surprise you to see that the heights of the population are approximately normally distributed you've learned about that in matt's methods so so far we have a population that is normally distributed now what i'm going to do is take a sample a random sample of people from this population so here's my random sample i'm saying that word it's important here's my random sample i've taken 10 people now what i can do is find the average height of those 10 people there's my average height average height 144.5 we call this average the sample mean means just another word for average and this is a sample of our population now there's no reason we couldn't take another sample all right so this was sample one and now this is sample two now just ten random people from the same population what we can see is that sample means are different this sample mean is 144.5 this sample mean is 145.8 now of course i could do that a whole bunch more times and you can see that i have a whole bunch more times i did it 200 times hopefully getting the gist that the sample mean is something that we're really interested in 144.5 145.8 145.4 148.6 one of those is 151.5 152.7 but we can see that they're all somewhere in the 145s 150ish area now we can create a distribution of the sample means so this is my distribution of the sample means look at the shape of it it's kind of a normal distribution i might just make it refresh a couple of times all right because these random samples are happening every single time i refresh my browser and you can see that oh that's a normal distribution the sample mean is normally distributed so we get some really high high values we get some really low values but we get this bunching in the middle here we'll do it a few more times just to see that it roughly is normally distributed each time we do it now i'm doing samples of size 10 but i could do larger samples as well that's exactly what i've done here instead of having 10 people i've chosen a sample size of 30 and you can see i have a nice this is looking much more like a normal distribution than the previous examples let's just run the simulation a couple of times all very very normal looking distributions here okay and there's something else that's interesting about this graph as well look at the size of the spread of this normal distribution 144.6 to 145.6 is right at the end here and 153 to 154. let's compare that to the 10 by sample all right the lower value is like 142.6 the upper value is all the way up at 157.6 as opposed to 154.6 this one is much less neat in terms of normality and it's also far more spread out because my sample size wasn't as large so my sample isn't as representative of my actual population as opposed to this one here where you can see we have a much neater normal distribution now as i said we're getting a feel for this at the moment we're going to write all of this down and formalize it in a minute now the sample means are all over the place a little bit 150 151 149 i wonder what the actual mean of my town of 600 people is if i did a census if i counted them all you probably could have guessed this my population mean is 150 and you can see my sample means let's get that graph back up here are really centering right around 150 the mean of my sample means is this number here 149.886 so my sample mean mean is approximately the mean of my actual population that just feels like it makes sense if you're taking a bunch of samples of a population and you average them all out you would expect to get the average of the actual population so now we've pretty much done exactly what this dot point says examine the concept of the sample mean x bar as a random variable whose value varies between samples so a sample mean is a random variable it changes depending on the role of the dice so to speak the sample that you take and it varies across samples you take a sample you'll get one sample mean you take another sample you'll get another sample mean whose value varies between samples where x is a random variable with mean and standard deviation not much more to say there no real notes to take there because we're examining something let's move on to our second dot point we're going to do this dot point in stages first bit simulate repeated random sampling from a variety of distributions now i've already simulated random sampling from this distribution now this distribution is a normal distribution and when i did my sample means from that normal distribution i got a normal distribution what would happen though if the distribution of the population was not normal what i've done here is instead of creating a town of 600 people i've created a school of 600 people let's look at what the population of the school looks like grade ones grade twos grade threes grade fours grade fives you can see it's not a normal distribution it's more like a uniform distribution now when i take my sample means here you can see you would expect this right the sample mean 5.7 that's that's the sample mean right here we'll call it the sample mean all right you can see 6.77 6.2 5.8 6.9 that makes sense that's what our mean year level is going to be i wonder what the distribution of the random variable the sample mean is might surprise you hey what's this it's a normal distribution it's normally distributed around that mean of this value here that's maybe surprising to you but also maybe not so surprising if we take a sample from this we would expect the sample mean to be pretty close to the mean we get some u2's we get some year fours we get some sixes it's over the long run you would expect to get a representative sample and you would expect to get these representative samples but also over the long run in very unique cases you would accidentally get almost all the year ones and almost all year 12s in these very unique cases so even if the population is not a normal distribution the sample mean distribution will be a normal distribution but like that's only two we did a normal population we did a uniform population i've done one more population here i'm calling it a bimodal population the best i can come up with is its basketballers able-bodied basketballers who are around 175 centimeters tall and wheelchair basketballers that measured sitting in their wheelchair are around 125 centimetres tall so half of them are wheelchair basketballs and half of them are basketballs now when we take our sample mean distribution what's that going to look like let's move this one over here sample mean distribution oh it's also going to be normally distributed this one is a funny one because we've got people that are 120 centimeters tall 130 centimeters tall we've got people that are 175 centimeters tall but the mean height of our population that no one is is approximately 154 centimeters tall um so that is what our sample mean ends up being only a very few number a very small number of our sample means include look like an average of 120 because they include some taller basketballs and only a small number have averages of about this so what does all this mean now dot point says that we should simulate repeated random sampling from a variety of distributions to illustrate properties of the distribution of x across samples of a fixed size n now what are those properties that they want us to talk about well there's three properties that they want us to talk about and we just talked about one it's approximate normality if n is large what do they mean by if n is large n is my sample size now i've been touring this approximately normal and it is approximately normal it would be more approximately normal it would be closer to normal the larger my sample size and we've shown that before going from a sample size of 10 here where we have a normal distribution and a sample size of 30 where our normal distribution becomes more normal distributed so i guess we can tick off this bit as well and a range of sample sizes so simulate repeated random sampling from a variety of distributions we've done that with our three different populations and a range of sample sizes we looked at a sample size of 10 we looked at a sample size of 30. now we've also illustrated properties but the only property we've illustrated at the moment is that its approximate normality if n is large we need to do this one including its mean and its standard deviation the mean one is simple the sample means 145 153 148.7 if we take those sample means and we take the average of all of the sample means we get 149.985 which is almost exactly 150 we can say that the mean of the sample means is the mean of the population and you would expect that it also holds for our year level one the expected mean the mean of the actual population is 6.44333 and the actual was 6.664 you can see that they're very close together we're dealing with probability here and we can say that the mean of the sample means is is equal to the mean of the population it means that we can tick this off we know now that the mean of the sample means is equal to the mean of the population and finally we've got to do a little bit of a calculation so we're back here at my my population of people's heights our sample means here now this is a normally distributed population and we can talk about its mean which we know is 150 but we should also talk about its standard deviation the standard deviation of this population is 10. so you might be tempted to say that the standard deviation of the sample means is also 10. let's look at it well let's look at that the sample mean standard deviation is not 10. it's only 1.7691 and you might again feel that that makes sense the distribution of your population counts each individual and so you're going to end up with a big spread of people from height 115 all the way up to 182.2 but when you take a sample of 30 people and you average them out in that sample of 30 people you're gonna get some short people and some tall people and some short people and some tall people and when you average them out that distribution is going to squeeze smaller than the distribution of the population itself so the population had a standard deviation of 10 and the sample mean standard deviation is much much much smaller and that sample mean standard deviation is going to change based on how large you make your sample if you sample 100 people your standard deviation is going to become smaller and smaller again if you sampled only 10 people like i did over here then that district that standard deviation is going to be higher in this one where i only sampled 10 people the standard deviation of my sample means was 3.1 but in the example where i sampled 30 people instead of 3.1 the standard deviation was only 1.9 so let's go back and look at the formal formula for that all right so the standard deviation of our random sample our random variable is going to be equal to the standard deviation of the population over the square root of the number of people we sampled so in the particular case that we just looked at where the population's standard deviation was 10 and we took a sample of 30 people so standard deviation 10 sample of 30 people that means that we can put those numbers into that formula right there so the standard deviation of the population was 10 over the square root of 30 and we can come up with a number for that all right and that number is 1.82 and you can see that as we increase the increase the sample size we're going to be increasing the size of that denominator which means that we're going to further decrease that standard deviation so it seems like we're all over the place here but let's look at the dot point and i think that we'll we'll know what we're talking about here all right if we simulate repeated random sampling from a variety of distributions whether the population is a normal distribution or it's a uniform distribution or it's a bimodal distribution we're going to get we're going to know that it's shape is going to be normal if n is large enough if we use a range of sample sizes the larger the sample size the smaller the standard deviation is going to be and the closer and the more likely we are to match the mean of the actual population and that brings us to our last dot point we're going to be able to knock this one over faster than the other two simulate repeated random sampling from a variety of distributions we've done that and a range of sample sizes we've done that to illustrate the approximate standard normality of this formula here for large samples n greater than 30 where s is the sample standard deviation all right now we've actually done most of that what we've done can be read as this simulate repeater rem sampling from a variety of distributions and a range of sample sizes to illustrate the approximate normality of the random variable x the sample means for large samples n greater than 30. so the bits that we need to add into our picture are this word here and this bit here we're not just looking for approximate normality anymore we're looking for approximate standard normality and you know that a standard normal distribution has a mean of 0 and a standard deviation of 1. so let's go back to one of our spreadsheets we should be very familiar with this one now we've got a population we've got a sample size of 30. okay now we need to use that formula now this is not a formula you need to memorize or anything so don't worry about it it's just seeing how it works this formula should remind you of standardization from uh maths methods we take the actual mean and subtract it from uh the sample mean from the actual mean and then divided by the standard deviation which is calculated by uh 10 divided by the square root of the sample size uh you again don't need to worry too much about this you're going to see a picture and that's what matters this picture that's going to appear in a minute all right we have this what we've done is standardize the sample mean using the mean and the standard deviation we stretch that across all 200 of our samples sorry about that there we go and what we're going to get is a nice neat little graph i'll have to insert it this is the graph we get and you can see that it's approximately standard normal distribution standard as in a mean of zero i don't know if you can see that very small number negative 0.06 so there's our middle value which is zero and you can see that it's stretching out to negative 2.66 and positive 3.05 that's what you would expect a standard normal distribution you would expect it to span between negative three standard deviations and three standard deviations so that's that dot point covered by doing that we've shown that simulated repeated random sampling from a variety of distributions and a range of sample sizes illustrates the approximate standard normality of this formula here which you've met before in maths methods when you're standardizing things going to a standard normal distribution that's where we end up all right that's 20 minutes long we've covered all of that stuff what you really need to walk away with is a sense of what's happening when you're taking sample means from a population you're ending up with that nice normal distribution there are a couple of things that are going to come after this video where we sort of answer some questions and do some mathematics and i am sorry that we've talked about all this at great length but i do hope that seeing those simulations has sort of ticked this box for you