Transcript for:
Normal Distribution Applications

okay let's have a look at the last part of statistics for year two which is the normal distribution applied year 2 chapter 3 now as always this is a summary of everything that has come kind of like in the teaching videos so I'm not going to go into loads of teaching parts of this more just kind of uh showing you the kind of skills that you would require and the first thing we obviously need to know how to do with the normal distribution is make sure that we can find probabilities and we know how to use the inverse function on your calculator now I use the graphics calculator but I think it's pretty similar with the class wiiz as well so when we have this Y which is normally distributed remember that means is distributed with a normal distribution the mean is 25 and the standard deviation is three because it goes mu Sigma squ or mu and then the variance so when we're going to type in this one we're going to find the probability that Y is less than 20 for this you make sure that you've got your mean as 25 your standard deviation as three I want it to be less than 20 so the upper value will be 20 for the lower value just type in a very negative value I don't know likeus 10,000 even minus a th000 would be enough as well and I'm going to do all my answers to four decimal places so for four decimal places this is 0.478 and that is four decimal places the reason I've got in this one here is to show that the inequality sign if it's or equal to it really doesn't matter we can just put that the lower is 24 the upper is 27 it does not matter if there is the or equal to sign or not because it's a continuous function and to four decimal places this is 0.37 81 to four decimal places this one when Y is exactly equal to 24.5 the probability of this is zero this is because it is a continuous function and the probability of it being an exact value of 24.5 is zero it's not possible if you do the lower as 24.5 and the upper as 24.5 you will also see that you get zero here now these three are going to require the uh inverse function on your calculator my Graphics calculator can do the left tail or the right tail but I'm going to answer these questions as though they were um just for the class Wiz as well okay so when we have a normal distribution let's see if I can do a better kind of idea of what a normal distribution looks like like this we know that the mean comes in the middle okay so in this particular case our mean is 25 and this first question we're trying to find the value of a such that the probability that Y is less than a is 0.8 and a sketch can be quite helpful for this in other words we are looking for it where the area of it is0 .8 so I'm expecting a value that is greater than 25 you jump to the inverse part of the normal on your calculator on my one I can say the tail is to the left for the class WID the tail is always to the left the area is 0.8 and I've still got the same mu and sigma from before and I will do this maybe just to three significant figures I will say just from my calculator that it is 27.5 to 3 significant figures it's 27.5 248 blah blah blah if you have that as a long one and then for the next part if I'm going to do it for the probability that b is sorry Y is greater than b is 0.95 let's just use this as a bit of a visual tool you may not need to do this so y being greater than a value there's where B is that area that we have here is going to be 0.95 now because if you have the class wiiz you would always need to do to the left the area would be 0.05 for that area and I'm expecting B to be less than 25 so I will change the area to 0.05 on a graphics you could do the tail as the right and 0.95 and the answer is less than 20 here it is sorry 25 it is 20.1 to3 significant figures or 20.0 65 if you did it to more decimal places now this last one definitely calls for a diagram that we have here this one's maybe the more complicated of the three inverse questions that we have because we have like an inequality where it's between two values really struggle with drawing those lines smoothly that's the best I can do so the probability that Y is between 26 and C is 0.2 so there's 25 there's 26 that must mean that c is going to be even bigger and so this area is 0.2 there's multiple ways of doing this I'm going to start off by finding this area so that I can then add them together so the probability that Y is less than 26 go to the other part of the calculator not the inverse the NCD for this and I'm going to say the lower is minus 10,000 or a big number the upper is 26 and that area is 0.63 0558 blah blah blah so that is 0.63 so now I can actually change that question and instead of saying that it's the probability that Y is in between 26 and c I can just say the probability that Y is less than c is now going to be that total area which is the 0.63 and the 0 2 so it actually becomes 0.835 8 like this so I can find out what C is equal to by going back to the inverse mode on the calculator so on the inverse mode the area is going to be to the left still and I'm just going to type in 0.83 058 that should be enough decimal points and we get that c is bigger than 26 which we were expecting which is 27.9 to three significant figures if you did more than three significant figures I've got 27.8 69 911 on my calculator that we've got there okay and now what we're going to do is talk about when you have a missing mu or Sigma that we've got here so if you don't know mu or Sigma we can't actually use the calculator to find a probability because we always have to type in a mu and a sigma a sigma so instead we standardize the distribution to the standard normal distribution which is Zed and Zed has a mean of zero and a standard deviation of one and Zed is found by taking the the X distribution subtracting its mean and dividing by the standard deviation what you'll do is you'll use the inverse to find a zed value in other words a specific observation from this standardized normal distribution you will then standardize your x value using this process over here of subtracting the mean and divided by the standard deviation you compare them to get an equation and then you solve now if you have an unknown mu and sigma you have to do it twice and solve them simultaneously if they are both missing so I've got one where just one is missing and then one where two of them are missing so let's see if we can put this into practice we don't know the mean here the random variable X has a normal distribution with mean mu and this is the variance 25 so be careful because remember this is Sigma squar is 25 so we'll have to put Sigma as five when we use it on the calculator the square root we've been told that the probability that X is less than 11 is equal to 0.3 we need to find the value of mu to one decimal place so the first thing we're going to do is we're going to use the inverse to find a z value so what I mean is we're going to say the probability that Z is less than something has got to be 0.3 and I will use the inverse to find this now remember the area is to the left so we are going to be looking for a zed value that is definitely going to be negative because remember the mean of Zed is zero so I'm on the inverse mode I'm going to say that the area is 0.3 Sigma is 1 and mu is 0 and unsurprisingly get this negative I'm going to do it to four decimal places that will be enough minus 0.524 4 so now we've come up with these values we're going to standardize the x value by the way this is the Zed value and this is the x value we're going to standardize them and make them equal so when this gets standardized it gives you this standardize in other words 11 When you subtract the mean and divide by the standard deviation this is the process of standardizing over here we get we get the x value which is - 0.524 4 so what we need to do now is solve this equation I am going to on my other calculator I'm going to multiply up by five so I get - 0.524 4 * by 5 and then I'll rearrange this so the MU will go on to the right hand side and we'll just get the 11 plus 5 times by uh 0.524 so 11 + 5 * 0.524 and we get that mu is equal to 13.6 to one decimal place and if you wanted to you could actually go back to this question and you could check that it's correct so I'm going to do that very very quickly on my normal distribution I'm going to say that X is less than 11 I'm going to say that Sigma is five and the mean is 13.6 and we actually do get 03015 not exactly 0.3 but that's because we did some rounding here so the next part it says the weights of a particular type of tomato can be modeled by a normal distribution so if I say that these are the weights of the Tomato I'm still going to use x I could have used W if I wanted to there going to be the MU and the sigma squ here given that 25% of the tomatoes weigh more than 40 kg let's write that down more than 40 gr did I say kilograms I meant to say 40 G uh more than 40 G is 25% and we also get told that 10% of them weigh less than 20 G so less than 20 GS is 10% so I'm going to do that exact same process that I did on the previous one now this if you have the um non- Graphics I'd probably switch this so that we have the probability that X is less than 40 would be 0 .75 okay so we're going to switch that around the probability that X is less than 40 is 0.75 you wouldn't need to do that if you had the graphics calculator so I'm going to do the standardizing using the inverse on the calculator so we're going to be going to inverse I'm going to say that the sigma is one and the MU is zero I want the area to be 0.75 and we get for the Z value 0.674 5 to four decimal places so remember when they standardize these things will be equivalent to each other and for this one I'm just going to go straight to the probability of Zed just going to switch the area up to a 0.1 and unsurprisingly it's a negative so it's Min - 1. 2816 to four decimal places and that's 0.1 so these things standardize to give each other so let's start off with the yellow one that is that 40 minus the mean divided by the standard deviation is 0. 6745 I'll do a little bit of rearranging here so I'm going to multiply up by the sigma and then I'm going to add on mu that's the green one the yellow one excuse me and there's me highlighting it in green and then I'll do the same for the green one which is that 20 minus the mean divided by the standard deviation is - 1. 2816 so 20 is equal to - uh 1. 2816 Sigma and then that mu gets added across so we've done the yellow and the green and then what we have here with these two statements that are in blue are two simultaneous equations so we jump to the simultaneous equation solver on our calculator and we are going to put in the coefficients are 0. 6745 for Sigma 1 for mu and then 40 minus 1. 12816 for Sigma 1 for mu and 20 and I have now got that from my calculator Sigma is 10. 2244 and mu is 33136 okay I didn't even finish reading the Finish reading the question this is so classic to for me to just start doing the question now the interesting thing about this is the question is actually saying calculate the proportion of tomatoes that weigh between 25 and 35 G giving your answer to the nearest percentage so the whole point was they didn't tell us to find mu and sigma but obviously we can't answer this question about 25 and 35 GRS until we have found out what mu and sigma actually are so now that we found them we can actually go into the calculator and do the probability that something is between 25 and 35 G so the last part of the question is to say what is the probability that the tomato is between 20 5 and 35 G so so much jumping around in my calculator let's go back to the stats mode normal distribution the lower is 25 the upper is 35 Sigma I'm going to type in that's 10.22 44 it's been a lot of rounding in this question but at the end it's all getting rounded to the nearest percentage point and mu is obviously 3310 so this is 0.35 956 if you got slightly different to that it doesn't really matter cuz it should round to 36% obviously that's going to round to 36% so 36% of the tomatoes are between 25 and 35 G okay so approximating a binomial distribution is the next part and we can approximate a binomial using the normal distribution if n is large and if p is close to 0.5 now n is large I mean five or 10 is not large enough it's difficult to say we normally say around I think about 50 but they would make it clear in the question right 100 is obviously large 10 is not large enough and if p is close to 0.5 so I don't know something like 0.4 something up to 0.5 something is probably going to be good for this now we can use a normal distribution this is where you find the mean for the normal distribution by taking the n and p from the binomial and multiplying them to find the variance you take this NP and you also multiply 1 minus P but remember when you do the standard deviation you are going to need to do the square root of all of that that you've got okay so do make sure that you're not putting this in your calculator for standard deviation you're putting in the Square root of this and this is in the formula booklet there's something called continuity Corrections this is because a binomial distribution is discrete whereas a normal distribution is continuous so because we're going from a discrete to a continuous we need to do these continuity Corrections there are plenty of ways of doing this this is my full proof method that always works so you rewrite any binomial cumulative probabilities with the greater than or less than or equal to signs and then you enlarge the range by 0.5 and of course you will see what I mean when we do do this with an example as a reminder so let's have a look at this one we've got here we've got that X has a binomial distribution with 200 as n and 0.55 as P using an approximation find these probabilities so I'll start off by doing my approximation now why is the letter we usually use for normal and X is usually what we use for binomial but you can see I've been using X for normal as well so it's not something we need to be too worried about so as we're going to say here we know that the MU and the uh variance that we have they're going to be NP and then the variance is np1 minus P so if I write that underneath n * p is 200 times by 0.55 which is 110 and then I'm going to keep that NP on my calculator the 110 I'm going to times it by 1us 0.55 which is 0.45 and this gives us 49.5 so do make sure that when you put Sigma on your calculator we are going to be doing the square < TK of 4 9.5 and I always write that separately to remind me to type it in correctly so let's do some continuity Corrections the first one is saying the probability that X is less than 113 well that's actually as with the or equal to sign that's saying it's less than or equal to 112 So now that we've got to that stage to make it become with the normal distribution I'm going to switch it from 112 I'm going to enlarge it by .5 so it's going to be 112.5 so on my calculator I'm going to put a very negative value - 10,000 I'll do 112.5 as my upper and I'm going to put my uh mean is 110 I'm in the wrong mode I was in a complete different wrong mode so let's get this sorted uh the mean is 110 and the sigma is the square < TK of 49.5 and I get for this 0.6 388 to4 decimal places Part B we've got that it's between 100 and 11 and 115 okay let's rewrite this so greater than 111 is actually greater than or equal to 112 and that one's already like this so when I change it to Y I'm going to make the range bigger that means that it's at 112 and that's greater than that so I'm actually going to go down to 111.5 and I'm going to go up to 115.5 notice how I've made the range get bigger by 0.5 down here and 0.5 up here so on the C calculator I'll switch the lower to 111.5 and the upper to 115.5 and I get to four decimal places 0.1 9 8 4 to four decimal places and then our last one that we've got which is part C because it's not got any inequalities it just has an equality sign like this when I switch it I'm going to just make it by a bigger by 0.5 in each Direction so I'm going to go down to 108.5 and I'm going to go up to 109.5 because this is discrete this is continuous um a representation of a discrete thing in continuous form so that's 108.5 109.5 and that's a small probability of 0.05 61 to for decimal places so I'm also going to do this in the context of um something slightly different a worded context and then I'll talk about this exam tip afterwards because it's not directly related to this question so an Archer hits the bullseye with probability 0 45 in a training session they shoot 300 arrows explain why a normal distribution would be a suitable approximation to model this scenario so um we can say here that n is large and P is close to 0.5 and P is close to 0.5 and it's obviously a binomial distribution so it says using a suitable approximation find the probability that more than half of the arrows hit the bullseye so I'm just going to say that the binomial distribution we have that n is 300 and the probability is 0.45 when I change this to a normal distribution using the approximation I would do the 300 times by 0.45 so I get 135 I'm also going to multiply by 0.55 for this uh 1 minus P part this NP 1 minus p and that gives me 7 4.25 again I'm reminding myself put it in as the square root of 74 25 now in the language here says more than half of the arrows hit the bullseye so that's the probability that X is more than half that's more than 150 so I'm going to change that to an or equal which is that X is greater than or equal to 151 and now if I change that into the Y language I'm going to increase that range so if if it's greater than 151 I'm actually going to say that it's greater than 150 .5 so on the calculator the lower is 1 15.5 the upper is just a big value Sigma is the square root of 74.2 and mu is 135 and we get 0.036 Z to four decimal places so very unlikely that they get more than half which is quite unusual I'm doing 300 arrows they're not likely to get more than half of this right and of course you could do this with binomial but it's forcing us to do it using an approximation so it's because the binomial used to be quite difficult to to calculate for big values whereas normal was easier now though calculators are pretty powerful exam tip there will often be multiple skills within one exam question for the normal distribution and identifying which skill you're using is very important it might even get blended with the binomial question so this is not how they're going to come up in the exam these are just the skills that you need for the normal distribution so we're going to finish off with hypothesis testing for the mean and and then that's all of stats year two done so here is the testing for the mean in a nutshell what we do is we assume the mean is as stated that the mean for the population is what is told to us in the question using this assumption we calculate the probability of our observed sample mean which I'll talk about in a second that observation or something more extreme occurring if this probability is very unlikely lower than the significance level it gets us thinking maybe our original assumption wasn't true if that prob ility is quite likely then our assumption is probably true so calculating the probability for our sample mean how do we actually do this well if x has a normal distribution like this and we take a sample of size of size n from X this should say this shouldn't say from y so I'll make sure that's changed in the PDF then this thing the sample mean distribution is found by having the same mean but the variance changes the variance is divided by the sample size that we've got here so this is weird we're talking about like a normal distribution of the means of the samples taken from the distribution now we got some different uh possibilities for the hypothesis here the null hypothesis is that the average is just what's told to us in the question the alternative hypothesis if it's that it's we think it's less than or greater than that then it's a one-tailed test if the alternative hypothesis that we just think the mean is not the thing that was told to us in the question then it's a two-tailed test and we remember that means half the significance level in each tail so some stuff to do with the notation here x is the original distribution that we have xar is the distribution of the means of sample size n so it's quite weird we're no longer talking about the distribution of I don't know the amount of uh drink in a can of of um like Coca-Cola or something we're now talking about the distribution of the sample means so it's talking about something slightly different then xar small xar is a specific mean taken from the distribution of all of the possible sample means that there are okay this whole thing you could do use in critical regions but you don't need to as far as I'm concerned you don't need to know how to do critical regions explicit you can pretty much always from mark schemes um do it in the way I'm going doing this question here so the trading standards agency believes that a fizzy drinks producer has been misleading its customers by filling cans with less than the stated volume on the label of 330 milliliters an inspector is sent to investigate and takes a sample of 30 cans and carefully measures the volume of drink in each can finding that the mean value is 327.5 mlit given that the standard deviation of the volume of drink in a can is 8 milliliters carry out a suitable test to assess the trading standard agency's belief that the producer is underfilling cans State your hypothesis clearly and use a 5% significance level so in this question the null hypothesis is that the company is not lying and that the average amount of liquid in each can is 330 but the alternative hypothesis which is the trading standard trading standards agency they believe that they are filling cans with less than that amount so it's going to be that mu is less than 330 so what we're going to do here is we are going to uh assume that the null hypothesis is true so we're going to assume that mu equals 30 and we're going to say that y here has oh should we go with X or Y doesn't really we'll go with Y for this one and that y has a normal distribution with a mean of 3 30 and a standard deviation of 8 that was mentioned in the question here and I'm going to say where Y is the um amount of drink amount of drink per can so we're now going to do this part okay we're going to try and do this we're going to calculate the probability of our observed sample mean occurring well that means we need to come up with the distribution for the sample mean so this means that Y Bar is going to have a normal distribution with 330 for the mean but the uh variance is changing we're taking a sample of 30 cans so n is 30 so I'm going to divide by 30 now careful when you put this in your calculator I take the square root of the numerator and the denominator the square root of the numerator and the square root of the denominator and I'm going to calculate the probability that our observed sample is is going to occur or more extreme in other words what's the probability that they got something that was 327.5 M or less because if it was lower than 327.5 MERS that would also make us think that the drink has been underfilled so I'm just going to calculate that now I'm going to say that my lower is like- 10,000 the upper is 327.5 Sigma is 8 / the square < TK of 30 and the mean we have assumed is 330 now this probability that we've got here is 0.0375 which is clearly less than the significance level of 0.5 0.05 that we've got here so because it's less than 0.05 that is this version we're saying hm this is very unlikely it gets us thinking maybe our assumption was not true so this is actually supporting the uh the trading standard agency's belief that they are underfilling cans so I can say here hence we have evidence to reject hn and we always put it in context otherwise we don't get the mark the agency's belief that the producer is underfilling cans is supported because there's like a 3 or 4% chance that this would have occurred with the average still being 330 so that makes them go hm then we might need to do a bit more investigation into them this one I've done is like a no context because I think it's quite interesting maybe it feels easier maybe it feels harder a random sample of size 10 is taken from a population of X which has a normal distribution with a mean of 42 and a standard deviation of 2.5 given that xar this is the particular observation is 44 test the below hypotheses at the zero at the 1% significance level now remember the alpha we often use for the um significance level here so the hypotheses that mu is 42 and the alternative is that mu is not 42 uh this is immediately telling us this is a TW tailed test so I'm going to half the significance level and half of 0.01 is 0.005 okay so this is the one I'm going to use here so they've said that xar is 4 4 I better quickly come up with the distribution of xar so it's going to be the same mean and I'm going to do my 2.52 / 10 so that when I type the sigma for the calculator it will be the square root of the numerator and the square root of the denominator so we can assume I'm not even going to do with this extra language because this one isn't really required for this I don't need to assume that this is this let's actually just get straight in with doing the calculation the probability for our xar that it is H 44 Now 44 is bigger than the mean so we're actually going to say that it's 44 or even bigger cuz obviously if something is like 45 or 46 that also makes us think we're going further and further away from the mean so it should give us evidence to reject h no on the calculator the probability that X is greater than 44 so I'll put a big upper value Sigma will be 2.5 divid by the < TK of 10 and mu is 42 and it comes up in standard form which is 5.76 * by 10 ^ ofus 3 if we work that out quickly that is 0.576 okay which is compared to this significance level it is actually bigger than the significance level so we don't have enough evidence it's very very close so it's almost going to reject it but it's not enough evidence for us to reject this so I can say hence not enough evidence to reject H KN the mean we would still believe to be as 42 now I did say here you could use critical regions for this what I mean is you would have to do something like I don't know the probability that xar is greater than a has got to be equal to 0.005 and then if this value that you have if 44 is bigger than that value then it's in the critical region so we would reject H not if 44 is less than this value then it would be not in the critical region so we would not reject H not so you could do this if you want to for critical regions but I find this method is pretty foolproof and always works for this so this has been a long video because the normal distribution is a big topic and there's a lot of different things here but I hope in the way that I've organized this of splitting it up into the kind of different skills that there are that you'll feel a little bit more equipped to be able to work out what each question is asking for you to do so well done for studying all of Statistics now year one and year to next videos we'll be doing some stuff with mechanics and I hope to see you in one of those videos soon