hello and welcome to the video today I'm going to teach you everything in a level statistics in under 45 minutes these videos are absolutely every topic you need it gives all exam boards although there are some things that you need to check so you don't need to worry about it and it will include some examples as well I'll drop the completed notes in the description uh if you want to check them out and if you're new around here I'd really appreciate it if you give this video a thumbs up and subscribe share it with a friend if it helps you out it really helps me out so let's get into it so data collection um I'm not going to read the tool for you per se but um I'm going to upload the notes in the description anyway so you can always have a look at them uh kind of at your own pace so there are different types of data uh so you've got like quality of data so descriptive in categories uh and then there's numerical or quantitative data so just uh values numbers Etc sampling techniques uh so obviously sometimes we've got population sometimes we've got samples uh so simple random so I'm playing each member of the population has an equal chance of being selected so that's really important if there are obviously more uh if there's a chance that someone is going to get selected higher than the others then that's not a simple random sampling systematic so I'm playing is choosing for a frame so if the data is numbered one two three four randomly uh selecting the starting point and then every nth item in the list let's say you're going to choose every fifth person as long as the numbers are random then that's systematic stratified so stratified sample is one that ensures the subgroups or strata of a given population are equally uh sorry are each adequately represented within the whole sample population of a research study so the sample size for each subgroup is equal to the size of the whole sample divided by the size of the whole population Times by the population of the subgroup quote sampling uh sample selected based on specific criteria I.E age group um convenience and opportunity sampling the first five people who enter the Leisure Center or teachers in a single Primary School survey to find information about working in primary education across the UK uh self-selecting sample so people volunteer to take part in the survey either remotely or in person okay so processing and representation so there are different ways of doing this for data so qualitative data you can have pie charts by bar charts compound multiple bar charts dot charts pictograms um modal class used as a summary measure numerical and quantitative data um representing using frequency diagrams histograms cumulative frequencies box and whisker okay and measure the central tendency mode median mean if the mean is calculated from group data it will be an estimated mean because we don't know how the data points are spread within the each group so we assume they're in the middle or evenly spread out measures are spread range interquartile range standard deviation so if we need to find the quartiles um if n is odd then you find the median so you put them in order um you put them in order so that's the median so then now I've got two halves of my data if you take that seven out like so and then you find the median of each of those uh upper and lower sides and they are your upper quartile and lower quartile if n is even again you split you find your median okay in this case it's six it's not a data point but you split uh your data in half you find the median of both sides so that's 4.5 that's 8.5 and this is your upper quartile and this is your lower quartile standard deviation and so these formulas depending on your exam board you might be given these um on on the whole you're mainly given these um please do check your example to see if you need to divide by n or n minus one it is different for different examples so please do check and it's just knowing how to apply these so obviously X bar is the mean and it is the sample size um and then X is every data point okay buy a variate data so by being two variant meaning variable uh so this basically means you've got data with two variables so investigating correlation between two variables um so you need an independent variable is usually plotted on the horizontal axis uh the numerical measure of correlation can be calculated using the spearman's rank or product moment correlation coefficient which is also known as R and that's between one and minus one so something that's perfectly positive uh positive correlation will be one and something that's perfectly negative correlation will be negative one and something that kind of has no correlation would be near or exactly zero uh take care when interpreting the correlation coefficient look at the scatter graph so you might have two distinct groups misleading r value you might have R close to zero but there is a relationship it's just not linear so like the quadratic in the middle one and there might be an outlier distorting the R value um so just be wary make sure you look at scatter graph so cleaning the data you need to remove outliers and anomalies and you do this by either removing data which is 1.5 times the interquartile range so let's say I've got an upper quartile of eight and a low quartile of three that means my interquartile range is five so anything so 1.5 times 5 is equal to 7.5 so anything that is um 7.5 above or below the lower and upper quartile so I need to add 7.5 there so anything that's greater than 15.5 you need to remove and anything that's less than uh for negative 4.5 you need to remove from there just as an example um or you can use a standard deviation so two times the standard deviation you need uh above or below the mean so let's say the standard deviation is I don't know let's say five um and the mean is 100 then we need to remove anything that is uh above 110 so greater than 110 or less than um 90. oh not 990. okay probability so outcome is an event that can happen in an experiment sample space is a list of all possible outcomes for an experiment independent events means uh the the events don't depend on each other um so if one thing happens it doesn't affect whether the next things what happens in the next thing mutually exclusive means uh two events that cannot happen at the same time so for example if I roll a dice I can't get a one and a six at the same time okay so this is just some notation so a intersect B A and B both happen for independent events this is just uh probably priority of a times the probability of B uh a union B it basically means a or b or both happen for independent events this is a probability of a plus probability B minus uh the intersection uh a complement a does not happen and this is just one minus the probability of a okay emergency exclusive events there's no overlap um so the probability that they both happen is zero and the problem is that either of them happy is just the addition of their probabilities okay so this is an example so sample space example on the left here um so this is all about fractions basically and just understanding um what's going on so find the probability of picking a female so we're gonna so how many females are there in total there are 53 in total 53 in total and there are a hundred people so 53 over 100 or 0.53 okay picking a junior male so how many Junior males are there 15 uh out of 100 because I'm picking up everyone there so 0.15 not picking a junior male well that's just one minus 0.15 which is 0.85 picking a junior and a senior when two members are uh selected at random so chances of picking a junior how many juniors are there in total 35 out of 100 and I'm timting them uh and how many seniors are there in total but bearing in mind I've already picked one person one Junior so it's gonna be 65 over 99 which is I don't know if you can hear the helicopter hovering above my house but if you can good for you and that equals 91 over 396. okay next example on his way to work Josh goes through two sets of traffic lights the probability that he has to stop at the first set is 0.7 probably see that for the second set of 0.6 and we're assuming Independence find the probability that he has to stop at only one of the traffic lights um so we can ignore uh stop stop we can ignore go go we want go stop and stop go so so that is 0.7 so that is getting stopped to the first one uh and that's time 0.4 which is not getting stopped by the second one and then we want to add uh 0.3 Times by 0.6 so you can draw a tree diagram here um if if you need to um you shouldn't need to or but if you feel more comfortable doing that then please do so I'm just using the and an or rule here as well because I want uh is stopped and then not stopped so and so I'm timesing uh and then I've got all because we've got the two options or he's not stopped by the first one and he is stopped by the second set and so many times it's in the calculator I really hope you can't hear that helicopter because it's really loud uh 0.46 moving on to conditional probability when the outcome of the first events uh sorry the first event affects the outcome of the second event the probability of the second event happening is conditional on the probability of the first event happening so probability of B given a means the probability of a given sorry is basically what it says uh probability of B given a is equal to the probability of them both happening dividing by the probability of a so I can't highlight how important this is this is so important this is so important to remember if the probability is needed are not stated clearly a tree diagram or Venn diagram may help okay so we want the probability of not wrapped given that it's milk okay so we I've drawn a Venn diagram here for you um but if you you could draw this yourself or you don't even necessarily need uh to draw it um so in a box of dark and milk chocolates there are 20 chocolates uh so there are 12 12 of the chocolates are dark so that instantly tells me eight are milk uh three of these dark chocolates are wrapped there are five wrapped chocolates in the box so that means two wrapped milk um given that a chocolate chosen is milk chocolate what's the probability that it is not wrapped Okay so I've taken where I know that I've taken a milk chocolate first so it's out of eight um and how many of them are not wrapped well that's six and if I simplify that then I get three quarters probability distributions property distribution shows the probabilities of the possible outcomes uh this basically means to some of all the probabilities is equal to one so calculate the value of y hopefully that's very straightforward so you're just going to add them all up and equal it to one so this gives me 5y equals 0.5 so Y is equal to 0.1 calculate the expected of X so we do zero times 0.5 add so I'm basically timesing these two values together and adding them together that's how you find e x 1 times 0.3 so this is 0.3 because I know Y is 0.1 this is 0.2 and then plus 2 times 0.2 so e x is equal to 0.3 plus 0.4 which is 0.7 binomial distribution so binomial buy again 2 uh we have two possible outcomes uh it's normally success or failure so the probability of success is p probability of failure is 1 minus P there are only two outcomes so it's got to be one of these two things fixed number of trials in the the trials are independent so for example if I get five successes that doesn't mean I'm less likely to get a success in the next one um and then e x is equal to n p P getting our successes out of n trials is entries are turns to probability uh to the power of R times 1 minus probability to the power of n minus r so again if you can remember this and depending on your exam board you're given it or not um binary distribution should not be that difficult research has shown that approximately 10 of the population are left-handed a group of eight students are selected at random what is the probability that less than two of them are left-handed uh so there's a couple of ways you can do this but this is basically the probability of zero students probability of one students um and we can just add them together okay so um if I can do that how do I find the probability of zero well this is going to be so there are ancients so n is equal to eight let me make that clear p is equal to 0.1 that's a success because 10 is 0.1 and then one minus p is of course 0.9 so the probability of zero is eight to zero times 0.1 to the power of uh zero times 0.9 to the power of 8. which actually is just the same as 0.9 to the power of 8. probability of 1 is 8 choose one times 0.1 to the power of 1 times 0.9 to the power of seven um if you add these two things together so probability that X is less than two which is what we're trying to find out when you type it all in the calculator you get 0.813 you can also use this using your account you can also find this um using your calculator or tables if you're given tables using cumulative tables or you can check to see if you know how to use it on the calculator I'm not going to go through a calculator check there are loads of YouTube videos on it um I might even post a link in the description on somewhere that you can find how to do it there are loads of ways but if you if you want X is less than five I think this is just an important tip that not everyone highlights if you want the probability of X is less than five then look up X is less than or equal to four uh if you want X is greater than or equal to four do one minus X is less than or equal to three um hopefully that makes sense uh but if it doesn't have a look at the link in the description if I put it there okay the normal distribution uh so this is defined by uh so mu is the mean of the population and uh Sigma squared is the variance Sigma is the standard deviation uh symmetrical distribution about the means such that two-thirds of the data is within one standard deviation on the mean 95 of the data is within two standard deviations of the mean 99.7 of the data is within three standard deviations of the mean there's a point of inflection uh so bringing in our Pure knowledge here points of inflection on the normal curve lie one standard deviation either side of the mean now occasionally there are some pretty disgusting questions where you need to remember this and you're not using your calculator you're not doing anything like that you just need to know these facts um I think they're a bit mean but that's okay I'm sure you guys if you remember it you'll be able to handle it uh so we want the standard normal distribution where we've got a mean of zero and standard deviation of one and we can do that using Z so Z is so you get your data point uh and you minus the mean and you divide it by the standard deviation and that will give you Z values which will follow the standard normal distribution with mean zero and uh Sun deviation one so calculating probabilities probabilities can be calculated by either using the function running calculator or by transforming the distribution to standard uh normal distribution and you can use the tables if you're given them so sketch graph shaded uh and always sketch the graph guys and shade the region that you want and hopefully that will give you a good idea of what you need to do so let's have a look at this example so I choose a normally distributed with a mean of a hundred and some deviation of 15. what percentage of the population have an IQ you less than 120. so the first thing to say is X follows the normal distribution 100 15 squared So it's 15 squared because that's obviously the variance we need I want the probability that X is less than 120 and then so that means the probability of Z which is the standard normal distribution is less than 120 minus a hundred over 15. which is the probability that Z is less than 1.333 okay now if you look this up in the tables we're using your calculator again I said I'm not going to show you how to do that there are loads of videos on how to do it um so when you type in the calculator you get 0.909 which means 90.9 percent of the population has an IQ less than 120. okay calculating the mean standard deviation or missing value using the inverse normal again you can do this on the calculator um I'm not going to go through that well I'm going to talk about this example but I'm not I've written it out already for you so the time x minutes to install an alarm system may be assumed to be a normal random variable such that p x is less so the probability of X is less than 160 is equal to 0.15 and the probability that X is greater than 200 is equal to 0.05 so determine to the nearest Minute the value is the mean and standard deviation of X okay so we're going to get a simultaneous equation here um so drawing these graphs here um so the area is 0.15 and I know that the value is 160 on the left graph and the area above is 0.05 and the value is 200 so we can use the calculator or the tables to find 0.15 and 0.05 and basically it gives us these values over here and then I can use the standard normal um Z to equal it to these things to find to get these equations here and then I'm going to solve simultaneously to get this and this um if you've got any questions about this just pop them down in the comments and I'll happily answer them I'm just trying to save time a little bit here um I if you don't know how to use calculators like I said that there are loads of videos telling you how to use a calculator to get these values using a normal distribution to approximate a binomial distribution okay so this is only valid in some cases and we need to know the conditions in which uh they're valid so if x follows a binomial distribution NP then n times P has got to be greater than five uh and N times 1 minus P has got to be greater than five so either P needs to be close to half or n needs to be large normally it's n needs to be large so if the conditions are true then it can be approximated using this distribution MP as the mean and then MP times 1 minus p uh as the variance okay so and just another point please do check your exam board um some examples need you to use a continuity correction some don't um again I'm not going to go into it's not worth our time at this point telling you the differences between that but just check either with your teacher or with your exam board whether you need to do that if you've never heard it before then you probably don't need to do it um but it's probably worth checking okay a dice has rolled 180 times the random variable X is the number of times a three is scored so the probability of getting a three uh is a sixth use a normal distribution to calculate that P sorry the probability that X is less than 27 so first of all X follows binomial uh 180 times the probability is 1 6. and then this can be approximated because NP and N times 1 minus B is greater than five because n is so high to the normal distribution of 30 that's NP so that's 180 times a sixth and then the variance is 25 so that's NP times 1 minus p so then what I want and again you can just get this straight from calculator and then that's 0.274 if you're doing it for with a continuity correction it would be uh is less than 26.5 and then that would give you 0.242 to three significant figures okay moving on sampling so if you're working with the mean of a sample of several observations from a population for example calculating the probability that the mean or X bar is less than a specified value then the following distribution must be used so X bar follows normal distribution mu and then the variance Over N where n is the sample size so mu is the population mean and sigma squared is the population variance so let's do this example Alex spends X minutes each day looking at social media websites X is a random variable which can be modeled by a normal distribution with me in 70 minutes and standard deviation 15 minutes calculate the probability that on five randomly selected days the meantime Alex spends on social media is greater than 85. so the first thing is n equals five x bar follows n 70 15 squared because that's the population variance over 5 divided by 5 because that is uh sample size and then using that distribution I want the probability that X bar is greater than 85 that's just what the question wants you type that any calculator and you get 0.01 2 7 to 3 significant figures okay hypothesis testing this is um a really big topic it comes up every year it's so important that you just learn it okay it's really really important it's normally a seven marker if you learn it and you don't necessarily need to have much understanding um just make sure you know uh your hypothesis testing it's so important even if you can't get all seven marks you should definitely be able to get at least half marks even if you have no idea and you just memorize this stuff okay so for binomial hypothesis testing you set up that hypothesis so H naught so you just say the probability is equal to a that will be in the question and then you say the alternative hypothesis H1 uh so p is less than a for a one-sided test P does not equal for a two-sided test and P is greater than a for a one-sided test and it will tell you in the question which one so State the significance level as a percentage the lower the value the more stringent the test okay State the distribution slash model used in the test so in this case it'll be binomial NP calculate the probability of The observed results occurring using the assumed model so all this information will be in the question compare the calculated probability to the significance level and then based on that you either accept or reject H naught um whether it's true or not and then right to conclusion and this is the important bit in context to the question so they're always looking for context in the question if you don't write context you can't get that Mark um so reject H naught and then as an example it will say there are sufficient evidence to suggest that there is an underestimation or overestimation and then if you're accepting H naught there is insufficient evidence to suggest that increase slash decrease therefore we cannot reject the null hypothesis therefore P equals a null hypothesis being h0 so let's have a good uh let's have a look at this example uh the probability that patients have to wait more than 10 minutes GP surgery is 0.3 one of the doctors claims that there is a decrease in the number of patients having to wait more than 10 minutes she records the waiting times for the next 20 patients and three weight more than 10 minutes is there evidence at the five percent significance level to support the doctor's claim okay so first things first H naught is equal to sorry H naught is that the probability is equal to 0.3 H1 the alternative hypothesis is that P is less than 0.3 significance level five percent significance level okay so then we need to be really clear in the context so X is number of patients waiting more than 10 minutes so X follows binomial there are 20 patients and the probability is 0.3 and then using the tables or your calculator we can work out the probability uh that X is less than or equal to 3 is equal to zero point one zero seven which is 10.7 percent as in fact let me write that make that crystal clear as 10.7 percent is greater than five percent there is insufficient evidence to suggest that the waiting times there's my context have reduced therefore accept H naught and conclude that P is equal to 0.3 I hope that made sense like I said just do loads of questions on hypothesis testing learn the mark schemes learn what they're looking for um it's normally about seven marks it should be given on the day okay critical values and regions so for the above example of the um doctors waiting times binomial 20 0.35 significance level five percent significance level um I can get all these points so we're trying basically trying to find the point at which it h naught is accepted slash rejected um so zero one two we've got all the probabilities from uh the the calculator slash tables there um so basically the critical region is X is less than two less than or equal to two because everything below that would have been rejected we would reject H naught so the critical region there is X is less than or equal to two critical values zero one and two anything less than or equal to two so I'm not going to go through this example the the answers are there but let's just read it a suite manufacturer packs sweets with 70 fruit and the rest mint flavored they want to test if there has been a change in the ratio of fruit to Mint flavors at the 10 significance level so a less stringent test than the last one to do this they take a sample of 20 sweets what are the critical regions so this is where we basically do the same thing so X is the number of fruit sweets and this follows the binomial 20 0.7 20 because that's the number of the the sample size uh H naught is p equals 0.7 and this is a two-tailed test because it basically says they want to know if it's changed so it's the alternative um hypothesis is that P doesn't equal 0.7 so out of 10 significance level that's five percent at each end of the tail um so the lower tail using your calculator or the tables X is less than or equal to 10 is 0.0480 so 4.8 so this gives us critical region of X is less than 10 less than or equal to 10 so the critical value is 10. upper tail 17 doesn't quite reach it 18 does 3.5 is obviously less than five percent so the critical region is X is greater than or equal to 18 the critical value is 18. so critical regions uh are X is less than or equal to 10 or X is greater than or equal to 18. okay so normal distribution hypothesis testing um so we set up the hypothesis and basically say the MU is equal to a given mu and then the alternative hypothesis is basically saying that the actual mu is or the mean is less than a given mean so I the mean has decreased uh two-sided is saying that the means are different now is changed and then if the mean is greater than mu naught then one-sided test and they're basically saying the mean has increased okay so H1 if it's uh decreased the critical region is this kind of alpha the two uh the one-tailed test critical region is Alpha O2 because it's two-tailed so you've got to split it and then if it's increased we want the upper end so now we need to investigate the value you are working with by either and there are a couple of methods here see if your observed value lies in the critical region reject H nor if it does or you can calculate the probability the p-value of getting the observed value if there's greater uh sorry or greater if testing for increase if H naught is true and reject H naught if the probability is less than the significance level write in conclusion don't just stay hate uh accept or reject H naught you need to say there is insufficient evidence or sufficient evidence depending on the situation so if we're accepting H naught you say there is insufficient evidences to suggest that the mean of whatever it is putting it into context therefore we cannot reject the null hypothesis or H naught that mu is equal to Mu naught if we're rejecting it there is sorry there is sufficient evidence to suggest that the mean has changed based on the results uh conclude that the mean of the has increased slash decrease does not equal mu naught I hope that makes sense so we've got two different types there um again I'm not going to go through this question but I've got I mean I'm going to talk about it but I'm not going to write it out for you the test results of a large group of students are thought to follow a normal distribution with me 90 and variance 80. a random sample of 20 students is found to have a mean of 94 Point test at the 5 significance level to investigate the claim that the mean has increased so H naught is Mu equals 90. the alternative hypothesis is that mu is greater than 90 because they they said that the mean has increased so then we can say x bar follows n 90 and 80 of 20 because 20 is n and 80 is the variance so because it's a sample we need to um change the normal distribution there slightly again two methods here I'll probably use the left method but it's completely up to you so critical region of five percent find that on the calculator 1.6449 equal it to the Z value if you like to find X and you end up getting X is 93.3 so it gives you the critical region value and as 94 is greater than 93.3 94 coming from here and the question uh is in the critical region indicating there is there is sufficient evidence to suggest that the mean has increased indicating an improved performance in the test okay great correlation coefficient okay so testing to investigate whether the linear relationship represented by R calculated from the sample is strong enough to use the model to model the relationship in the population so R is the correlation coefficient calculated using sample size n rho does not pay its row unknown population correlation coefficient the test checks whether p is close to zero sorry not P row is close to zero or significantly different from zero um and then so this is another type of hypothesis testing so we would say our null hypothesis is rho equals zero there is no correlation between two variables and then alternative uh hypothesis would be row doesn't equal zero rho is greater than zero rho is less than zero okay so again another example um here but the answer is already there the length of service and current salary is recorded for 30 employees in a large company the product moment correlation coefficient R of the 30 employees is 0.35 test the hypothesis that there is no correlation between an employee's length of service and current salary at the five percent significance level okay so you set it up we've done a handful of these examples now so H naught um is rho equals zero the alternative hypothesis is that row doesn't equal zero so it's a two-tailed test n equals 30 uh at the significant levels five percent but that's going to be 2.5 on either side um so the critical values from the tables or the calculator is equal to 0.3610 leading to the critical region of R is less than negative 0.36 or greater than 0.36 one uh so then as R equals 0.35 it's not in the region and so there is insufficient evidence to show that the correlation is significantly different from zero and that is the end of the video guys I hope this helped good luck in your exams give us a like And subscribe if you made it this far it really helps the channel