hey guys welcome back to bison maths my name is mr bison and this is the third part in the series of videos of everything you need to memorize for your a level maths this part is going to be on statistics if you haven't checked out part one on pure and part two on mechanics then go to my channel homepage and have a look at them if you're new to my channel have a look at some of the other things i've got there i've got hundreds of videos with my real life lessons with students in my school and they should be really useful for your revision or for learning any topics you may not have understood so well okay so this third video is about statistics and i've found from my experience students either tend to get really high marks or really low marks the students who get really high marks aren't naturally better at statistics i think that they have just memorized a lot of content and realized that statistics is quite formulaic in what they expect you to do the people who get low marks i don't think i've spent enough time with the content and i don't think they really know all of the little tricks that i'm about to show you so without further ado let's get on with the video and i'll show you everything you need to memorize for statistics okay so as you can see there's quite a lot of stuff you need to memorize for statistics a little bit more than mechanics and we're going to get started by having a look at data collection so this is an area people often tend to leave off a bit but there's lots of valuable marks that we can get here so let's get started straight away so first of all some key words here when you see the word census if they ask what that means a census is something which measures every member of a population and what i've written here these little plus and minuses they are going to refer to advantages and disadvantages so the advantage of this is that this should be an accurate result and the disadvantage of this is that it's probably going to be very expensive or time consuming and sometimes the testing may destroy the the units that you're talking about so i think there's an example in the textbook where they talk about testing all of the avocados to see how ripe they are that would destroy the avocado so it's not a very good thing to do to all of them the sampling units are the individuals of the population the things that you might want to sample they'd have to be people they could be avocados the sampling frame is a list of sampling units and so a sampling frame might be something like a list of all the students in the school or it might be a list of all the residents in a particular area so what types of samplings have we got well for random sampling there are three types we have simple random sampling and the question mark here is to say what is it well it is to give everything the same chance of being selected and how do we do this well for a random sample we can either use a random number and we do that usually with a calculator to get a random number generator or we can do something which is called lottery sampling lottery sampling is where you put all of the written names of people or whatever it might be in a hat or a bowl and you just pick them out randomly a bit like in the hunger games where they pick out the tributes to go to the hunger games and what's the advantage of this well the advantage that it's bias free because it's completely random but the disadvantage is that you need to have a sampling frame in other words you need to know all of the possible things that you might be picking from and that's not always going to be available systematic sampling kind of sounds a bit like what it's talking about following a particular system so what you do for this is you would take every kth unit so might be every third unit every 100th unit and k is going to be found by taking the size of the population and dividing it by the size of the sample that you're looking for but what you also need to remember is that to get started you need to pick a random number and that random number needs to be between 1 and k and that is for your starting points so maybe your first starting point might be number three and then you're gonna go every five so you'll go three eight thirteen etcetera now the advantage of this is that it's really quick to use but a disadvantage again is that you need to have a sampling frame i'm trying to keep as many of the advantages or disadvantages kind of similar to each other so that you can use them for several types of sampling then we've got stratified sampling and this is where the sample represents the group or the groups which are called the strata of a population so the word strata actually means group and the way that you do this is you take your sample size and you divide it by the population size to get the proportion that you want and you multiply this by the strata for each strata the size of the strata sorry for each strata to find out how many people you need in that group and then within that you pick them randomly using a random technique so that could be lottery sampling or it could be um using random numbers as well so the reason this is an advantage is because it reflects the population so this is maybe why you'd want to do something for like exit not exit pulse for political polling you want it to represent the population but the disadvantage is that your population must be classified in strata and that's not always going to be obvious it might not be clear like what groups you particularly want to put the population into so then we move on to non-random sampling and there's two types of this we need to know about the first one is called quota sampling and this one is a bit like stratified sampling but instead of picking them randomly the strata are filled by an interviewer or a researcher so they get filled up in a different kind of way now the advantage of this is that you don't actually need to have a sampling frame so you might put like an advert out to say we are looking for this type of person so you don't actually need to know the list of people you can get them coming to you in a different kind of way but the disadvantage of this is that it's actually not being random anymore and so there's some potential bias that there might be then we're going to look at opportunity sampling this is the one that's like you're outside a supermarket and you pick some five people to try a product it's just taking advantage of what you could be um of what's in front of you so this time the quota is filled by those available at the time like i was just saying like at a supermarket or something and the advantage of this is it's really easy to do it's also pretty cheap you could probably just do this like in your school just go up to people and ask them some things but the disadvantage is that it's unlikely to be representative you may end up getting a particular type of person if you're stood at a particular type of supermarket for example and i've written here no there are other advantages and disadvantages i just picked one of each so to try and minimize the amount of things that you need to memorize so we need to know about the last part different types of data if something is qualitative that means that it's it's going to be like a non-numerical type of data so usually like words a color something like that and quantitative is then going to be numerical data and numerical data is either discrete meaning it can take fixed values or continuous meaning can take any value in a particular range that we've got there now i've tried to condense the large data set as much as possible but there's always going to be more things you need to know about this so if you are interested i have got a video about the large data set and i can link that in the description too so the first thing you need to be able to do is name all of the stations the weather stations in the uk so here's my map of the uk now i start at the bottom and they go in alphabetical order so this one at the bottom which is on the coast is cambodn they don't quite go enough better florida i need to take that back these two down here are heathrow and hearn but everyone knows that heathrow is the one that's in london and this one down here is hearn then if you can remember the next ones we have lemming and we have luchas and again you can see that that's going in alphabetical order as we go from here upwards the only difference is that these two have switched around okay so let's talk a little bit more about the uk stations i've drawn some symbols to try and help us think what these might mean well an area of the uk station that is windier and rainier that is going to be the areas which are coastal so any of the coastal stations are going to be windier and rainier then i've got another bit down here is i've got some area where there is more sunshine and there's so it's warmer and there's also more sunshine in the day that's going to be generally the weather stations that are in the south they tend to be warmer and they tend to have more sunlight as well so when is this large data set being recorded for well it's for just six months between may and october in 1987 and 90 and 2015. so you need to make sure that you know that so it's a bit of a disadvantage to only have six months you're never going to get a whole picture of the year here now there are three international stations and i've done some symbols to try and help us think about what makes them unique so this first one i've got that the summer and the winter are switched around well hopefully we know that the ones where summer and winter have been switched around this is going to be perth which is in australia because it's in the southern hemisphere so it actually has different seasons to us when we have summer they have winter and vice versa perth is also very hot in the summer so that's something else to note now the second place that i've got here i've said that it has really hot and it's also really rainy in the summer and in the winter it gets very very cold so it's more extreme more seasonal and this is beijing which is in china now we know what the last one should be but i just wanted to talk through what i've got written here generally this place is very very warm and it's also prone to having hurricanes i know this looks like a tornado but i think it makes us think of something being windy and in jacksonville which is in miami in sorry not miami which is in florida in the us there were two hurricanes there was a hurricane in october 87 and a hurricane in october 15. so this is jacksonville which is in florida usa and when i think of florida and miami i think of places that are warm in the summer and it generally stays quite warm throughout the year so a few things you need to know about the kinds of data that you've got here first of all when you're measuring the rainfall this letter tr comes up in the data and what this actually means is it means trace and it means less than 0.05 millimeters of rain and you treat this as zero in any calculations and if it ever says n a it means that the reading is not available and this is going to be a problem if you're taking a sample so you can't use it in a sample okay and then cloud cover you need to know about cloud cover the cloud cover is measured in octaves and these are discrete values and these are integers from zero to eight there's actually nine different options and it's just measuring how many eighths of the sky is covered in clouds and then last of all we're going to talk about the maximum gust this is just talking about this the strongest speed that the the wind will get to on a particular day this is measured in knots and one knot is equivalent to 1.15 miles per hour and you also might want to note that there was the great storm in the uk and that was on october the 15th and 16th and that was in 87. so you might get some data about october 87. it's worth remembering that there was a great storm in the uk at that time okay so we're going to move on to a next part which is about the location so location means about where the data is what size the numbers are and i've written here know your calculator i'm not going to be including this in this video but you need to know how to find all of these things using your calculator so to get started you need to know that this x bar which means the mean is the sum of the x values divided by n and if this is in a grouped table it's going to be the sum of fx divided by the sum of f because the sum of the frequencies is just how many pieces of data that they actually are now if you have listed data here to try and find out the position of the lower quartile which is q1 you take the number of number of pieces of data and divided by four the median you divide it by two and the upper quartile is three quarters of the way along so you have three n over four when you do that if you get a decimal what you do is you round up and then you find that position and that will correspond to q1 q2 or q3 if it's hold a you find the midpoint with the next number this is only for listed data if you have grouped data to find the position of q1 you still do n divided by four you do n divided by two and three n divided by four and if it's percentiles and you wanted to find the 56 seventh percentile so you'd call that p57 you do 0.57 multiplied by n you're going 57 percent of the way along and deciles these are 10 chunks so you might say that d3 that's actually saying that that's um like going 30 percentiles along which is just going to be 0.3 multiplied by n so that's how you find the positions you don't do any rounding or any of the stuff that i've mentioned up here for these instead you use linear interpolation and so i'm actually thinking the best way to do this is actually to do an example of linear interpolation so what i've got here is i've got some weights which have been measured to the nearest kilogram and then i've got something called true class limits now true class limits means you need to take them from these rounded values in the first column and actually say what they are representing so this first one is actually going to be representing from 9.5 all the way up to 12.5 we can't have any gaps this one will then be going from 12.5 to 15.5 and then the last one that we've got here we'll be going from 15.5 all the way up to 18.5 and then the frequencies we to find the um the median we're going to do the cumulative frequencies so we'd have 5 add on the 8 to get 13 add on the 7 to get 20. so the second quartile which is the median you would divide it by 2 and it's going to be the 10th term that we've got here so to do linear interpolation we can clearly see that that's going to be lying in this group that we have here because it's going to take us from the fifth to the thirteenth term so at the bottom you're going to write that it starts at the fifth and it goes up to the 13th and we are looking for the tenth term that we've got here i've put it roughly in the right place and that group goes from 12.5 and it goes up to 15.5 and we are looking for the median which is here well to find that median we need to keep the proportions the same so i want to go from 12.5 that i've got here and i want to add on the same fraction along the way that i've got so on the top the fraction is between here and here this is a gap of five and between here and here this is a gap of eight so i want to go 5 8 of the way along that bottom gap and you can see that the whole length of it is three so i want to go 5 8 of the way along from uh 12.5 i'm going to go 5 8 of 3 along from 12.5 so i'm going to just quickly on my calculator to do 12.5 plus 5 8 of 3 which is 14.375 so the median is 14.375 kilograms it's all about keeping the proportions the same then okay we're going to now move on to have a look at spread so again i've said here you really need to know what your calculator is i've put a little f with a heart here this comes in the formula book the heart says you should try and learn this off by heart so the interquartile range is just q3 minus q1 and i've put here a plus the advantage of this is that it ignores extremes the inter percentile range they might ask for the 10th to 90th into percentile range this is just going to be p90 minus p10 they can give you different percentages for that then we're moving on to variance so the variance is found by doing the mean of the squares minus the square of the means and that's what i've written here i've got msmsm so i'm just going to write that down it is the mean of the squares minus the square of the means so i'm now going to do it for groups data this is going to be the sum of f of x squared divided by the sum of f minus the mean of the squares so this first thing here is the mean of the squares and this is the square of the mean and it's written here as a note that the sum of f of x squared is not the sum of f of x squared you square the x values before multiplying it by the frequency not squaring it afterwards okay we're going to move on to coding coding so if i have coded my data so that y is equal to ax plus b well the y bar would be the x bar multiplied by a plus b so it's affected by both of those values but the standard deviation of y is just going to be a multiplied by the standard deviation of x it's not actually going to be related at all um to the b parts of it doesn't matter about it adding or subtracting so also probably just worth noting here the standard deviation is obviously just going to be the square root of this expression because that's uh sigma squared so it'll just be sigma for standard deviation okay we're now going to have a look at representation so a few different types of diagrams that we have here and this one i'm going to start off with cumulative frequency and then move on to box plots so if you have a cumulative frequency diagram you can use it to draw a box plot so the halfway point is going to be the median and you can see that when i draw this line down here it's corresponding to the median then halfway between the start and q2 is q1 and you can see that's corresponding to q1 on my box plot you can then draw three quarters of the way for q3 and that's corresponding to that part of the box and then the top value is going to correspond to that extreme value notice how this is going to correspond to the minimum extreme value as well so for the box plot that we've got here this represents the minimum this one represents q1 q2 q3 this is the maximum and this is what we call an outlier the length of this box here is the inter quartile range now the outlier boundaries the way you would find these is you might do q1 minus 1.5 times the interquartile range and the other way you might find it is you do q3 plus 1.5 times the interquartile range that's a standard way of finding out where the outliers might be and but they would usually tell you that in the question so it might not always be 1.5 but it's always going to be something above or below the upper and lower quartiles okay so we can now have a think about histograms that we've got here for histograms um it should be used just for continuous data if they say why should you use a histogram it might be because the data is continuous you'll notice there are no gaps in the diagram and if you need to calculate the frequency the frequency density you do the frequency and you divide it by the class width now the area is not equal to the frequency like it was in gcse but the area is equal to the frequency multiplied by some constant k which you will probably need to use during the question if you ever get asked to make a comparison between two diagrams you need to compare two things you need to compare one measure of location and you need to compare one measure of spread and usually you should put some of these into context if it's asking about um the time that someone took to run a race you probably want to talk about that time rather than just the explicit values of location and spread okay so we're going to move on to the next part which is about regression and correlation again i've said know your calculator you need to know how to be able to find the product moment correlation coefficient pmcc on your calculator now r can vary between minus one and one and it measures the strength and positivity or negativity of the correlation so that's why it can only be between one and minus one and a regression line is basically just a posh line a posh word for the line of best fits it will usually be in the form of y equals a plus b x and in this particular context then a is going to be the value of y when x equals zero so they might ask you to interpret something and b is how much y changes with x how much y changes with x in particular how much it changes when x increases by one interpolation is when you estimate inside the data range so it's estimating inside the data range and i've put a tick next to that because interpolation is usually more reliable extrapolation has a cross because it is not a reliable thing to do they love to ask these kinds of questions and say why has someone done this wrong and it's because usually they have gone outside the data range which means they have been extrapolating you may have some exponential or non-linear models where you want the line of best fit to be curved or a different kind of shape if you have something like y equals ax plus b sorry a a b to the power of x or y equals ax to the n what you need to do here is take a log so i'm going to skip a step you can take natural logs or you can take logs of base 10 whatever you want to do so you should get lmy equals lna plus x lnb again you can use logs instead of ln and this one here you would get lny equals lna plus n ln x and then you can go ahead and do the same things as before so now look at probability and venn diagrams in particular so here i want to do an example of shading in a dash u b and this is the union of the complement of a and b the complement of a actually just means not a so for unions what you do is you actually shade all of both of the regions that have been suggested so i'm going to start off by shading not a well not a is going to be everything that is outside of a and you also need to shade all of b so you've got this bit that we've got here so that shading is not a or b intersections is a little bit different all you need to do for the intersections which is this n shape is you just shade the overlap so you need to think about the two places sorry the one place where it is not a and it's b well not a is everything outside of a and b is the thing that's here so the only place where they are overlapping is that section okay tree diagrams now tree diagrams you probably know this from gcse as you go along the tree diagram you multiply um there's a little bit of new notation though this bit that we've got at the end here this is going to be b given that a has already been picked this is not b given that a has been picked this is b given that a has not been picked and this is not b given that a has not been picked you can use all your same skills from gcse for tree diagrams i'm going to talk about some other things to do with probability so if you have something about mutual exclusivity if a and b are mutually exclusive the probability of a and b is going to be zero because they can't happen at the same time the probability of a or b is just going to be the same as the probability of a plus the probability of b and as a venn diagram they do not overlap each other because they have no overlap independence is something they love to ask about now these are in the formula book haven't put a heart by them because you may just like to have them to refer to in the exam um but the probability of a and b if they are independent will just be the same as the probability of a multiplied by the probability of b and the probability of a given b is just going to be the same as the probability of a a mistake that lots of my students make is they think that you can tell from a diagram a venn diagram if they are independent but you can't so these are the tests that you can use for independence conditional probability is in the formula book but it's not in this form and i prefer this form about how to do this so conditional probability the probability of b given a is the same as the probability of a and b divided by the probability of a and you'll notice that this second letter here is the thing that we are dividing by if you're looking at venn diagram for this just reduce the whole space to the thing that is given that so here i've reduced it by just looking in and zooming in on this area and then you just deal with all the probabilities and numbers that you've got in that space and again if you're not sure what i mean here i've got a few videos that might help you last of all the addition law is in the formula book so you may not want to memorize this but this is just the probability of a plus the probability of b minus the probability of a and b okay we've got two more columns to go let's see what we've got next we are going to be looking at a very very tiny bit which is the discrete uniform distribution and i've seen this pop up and it's easy marks the discrete uniform distribution is where the probabilities of all outcomes are equal so for example going back to the large data set here if the discrete random variable x is cloud cover which is measured in octaves then your uniform distribution will look like this x can take the integer values between 0 and 8. we talked about that earlier on so there's nine different options which means that the probability of all of them is going to be a ninth so it's discrete because it can take all of the values between one and eight and it's uniform because they all have the same probability that's actually one of the questions that students performed the worst on and i've got a video about the top five hardest questions which i will link for you to have a look at okay now we're going to have a look at the binomial distribution the binomial distribution so we would say that x is distributed binomially and it has a number of some sorry number of trials n and probability p this next part is in the formula book but it's worth trying to know this one as well so the probability that x is equal to r is the binomial coefficient n choose r the probability to the power of r and then one minus the probability to the power of n minus r this one looks more confusing but when you do it with an example it all just tends to make sense so when can you use a binomial distribution again this is easy marks if you can just memorize this so i've put ffit to help help us try and remember this so the first one is that there are a fixed number of trials n the second thing is that there is a fixed probability of success and that probability is p the i stands for that the trials must be independent and the last thing the t stands for two outcomes in this particular experiment or whatever might be going on and those two outcomes are just going to be success or failure now with the cumulative probabilities um the probability that x is less than five it can be equal to five that means it could be one two three or four so actually we would find out using our calculator the probability that x is less than or equal to four the five isn't included and your calculators can only do it um with the less than or equal to sign i've put know your calculator so you better make sure how to do this now if you wanted to find the probability that x is greater than three well we're asking really for four five six seven etcetera so that means we're going to do one minus the probability of x being less than or equal to 3. we're subtracting all of the ones that are 3 or less and we want between 6 and 10 well 6 isn't included we just have 7 8 9 and 10. this looks like you'd start off with all of the values less than 10 and you would remove all of the values where x is less than or equal to 6 to get left with that range and i like writing the numbers to help me visualize what's going on if you're going to be doing inverse probabilities use the tables they are better than the calculator again you can have a look at my videos about how to do that so we're coming up to the normal distribution the normal distribution is for continuous variables and it's a continuous random variable y i will use the letter x for the binomial and i'll use the letter y for the normal so y is distributed normally with a mean of me of mu and the variance is sigma squared this is something that is mentioned in the specification and barely mentioned anywhere else so there is a point of inflection meaning that it changes um it goes from convex to concave at two points at mu minus sigma and mu plus sigma so these are the two points of inflection you also need to know about how much data there is within particular standard deviations of the mean so 68 of the data will be within one standard deviation of the mean 95 percent of the data will be within two standard deviations and 99.7 percent will be within three standard deviations of mu finding probabilities and the inverse normal this is like how you would find this kind of thing or find a in this kind of situation you just need to know how to use your calculator to be able to do this the standard normal distribution we use the letter z we would say that z is normally distributed with a mean of zero and a variance of one squared and to do that coding to make something become the standard deviation you take the y value you subtract the mean and you divide by the standard deviation if you have missing mu sigma or both of them in a particular distribution you use the coding that you have if you have just one of them it will be an equation if you have both of them missing it will be a simultaneous equation then what you can do is also approximate binomial distributions as normal distribute as normal distributions you can only do this if n is large how large isn't clear usually could be 30 40 100 but usually they would ask if it's a large value and you can do it if p is approximately 0.5 0.4 0.6 is probably okay 0.2 0.8 is probably not going to be okay and for this distribution the mean is going to be np so the number and multiplied by the probability and sigma is going to be the square root of np 1 minus p continuity corrections if you are approximating binomial as normal you're going from something that's discrete to something that is continuous and so you need to change the um the particular probabilities you're looking at if you're saying that the probability that x is greater than 5 that is going to be changed to the probability that y is greater than 4.5 you could have all the equal to sign but it's not going to change the outcome for the probability and if it was between 3 and 11 well you would say that it's going to instead be between 2.5 and 11.5 so you basically add on a half um at either end to try and sort of expand out that that range okay so we're moving on to the last section i've put all of the hypothesis testing together here so for hypothesis testing we need to know some of these definitions first of all the null hypothesis we write as h naught and this is what we assume to be true even if you can't do the whole question get yourself some free marks by writing out your alternative hypotheses so the alternative hypothesis is h1 and it's what would be true if h naught is wrong then we have the significance level the significance level we usually use the letter alpha and this is the given threshold of likeliness this usually makes more sense when actually doing a question so a one-tailed test you do a one-tailed test when the alternative hypothesis is something like the probability is greater than k or the probability is less than k a two-tailed test is when your alternative hypothesis is that the probability is not equal to k and what you do in this situation is you halve the value of the significance level for each end so you just need to make sure you know the difference if it's a two-tailed test you half the significance level okay so correlation testing that we've got here this goes back to your correlation and regression bits your null hypothesis is going to be that the um correlation is zero we assume that it has no correlation the alternative hypothesis will be from clearer from the question it will either be that it has positive correlation or it will be that it has negative correlation these two are both going to be one-tailed or it will be that there is some kind of correlation it doesn't say positive or negative so this one is going to be your two-tailed test what you do here is you compare your value of r with the table if your r is more extreme then you reject h naught and you need to give that some kind of context and say there therefore there is some kind of correlation and there is a relationship between these variables and you you actually name what the variables are if your r is less extreme than the value in the table then there is no evidence to reject h naught that doesn't mean that um h0 is definitely true it just means that there's no evidence to reject it so with binomial testing that we've got here the test statistic this is the number of successes observed and the null hypothesis is that the probability is equal to some value k and this is usually going to be given in the context of the question the alternative hypothesis for a one-tailed test it will either be that the probability is greater than k or that the probability is less than k so these are my one-tailed bits or it might be that the probability is not equal to k which would mean that it becomes a two-tailed test and you're going to have to half that significance level what you then do for the third step is you assume that h naught is true ie that x is binomially distributed with n and k is the probability you then find the probability that x is greater than or equal to or less than or equal to the value that has been observed in the question and you are going more extreme so what i mean by that is if they've observed uh that you would expect there to be five something happens and in the question they've observed seven of them happening you would obser you would find the probability that x is greater than or equal to seven so you're doing seven or more if the probability that you've just worked out is less than alpha less than the significance level then you reject h naught meaning that you think it's quite rare to have happened and so we do think that the probability is different from what the question suggests if the probability is greater than alpha well it's just saying there's a chance that could have happened randomly anyway so there is no evidence to reject h naught note you should also be familiar with critical regions okay so you need to be able to do some of these things finding critical regions as well the actual significance level is the probability p that you would have worked out at this stage up here we are now on to the very very last bit we've got the normal testing of normal hypothesis testing of the sample mean now the sample mean x bar is should be normally distributed with a mean mu and this time the variance or the this variance is sigma squared over n where n is the sample size the null hypothesis will be that mu is equal to k and when i say k i mean it could be like k centimeters or k kilograms or k millimeters etc it's actually the mean that we're talking about here the alternative hypothesis will be that the mean is actually greater than k or that the mean is less than k which is going to be your one-tailed tests or it might be just that the average is not equal to k which is therefore going to be a two-tailed test so you assume that h naught is true ie you assume that the x-bar is normally distributed with a mean of k and sigma squared over n for your variance that you've got here then a bit like the binomial one you find the probability that x bar is greater than or less than the mean of the sample taken and that will be again clear from the question you go in the more extreme direction if p is less than alpha then as before you reject h naught so if it's very rare that it's happened you might think hang on i do actually think that the mean has changed i think that the mean is either bigger than k or smaller than k because there's evidence to suggest that and if p is greater than alpha there is no evidence to reject h north and so that is everything that you need to memorize for statistics now there's an awful lot of stuff there um but i think this if you can get it into your head it's going to help you be one of those students who gets the high marks and not one of those students who gets the low marks if you liked the video please do hit the like button and consider subscribing to my channel go and check out part one on pure part two on mechanics and i've also got some for corpyer for further maths as well all of these videos are for edxl they should really really help you for preparations for your mock exams and for the real exams as well go and subscribe check out my instagram page at bisonmaths as well and good luck you