Transcript for:
Basic Statistics for SAT Math

hi everyone welcome to my channel to test prep today we'll be going over ento number nine basic statistics the goal of the ento course is to teach students everything they need to know about the digital sat math exam please see the link in the description to download this video's worksheet for free let's get started okay so first thing we need to talk about is sample size uh we've talked about this in another video before but just a quick overview so sample size we use lowercase n not to be confused with population size um n is the number of data values in your data set it's also the number of participants in your study and those first two are pretty self-explanatory what lots of people tend to forget is that n is the sum of the frequencies so we might have to add up some frequencies to figure out n for some problems as you'll see okay so these are probably some of the most important things in this entire video so the mean and median equations so to calculate mean we do the sum / n and to help us find the median we use the position of median formula the position of median is equal to n + 1 / 2 make sure you know both of these like the back of your hand going into your exam all right so skill number one mean of a list so um first I count and I see there's 1 2 3 4 five six data values so n is equal to six next we are going to use the equation so mean equals sum Over N 4 + 7 + 9 + 10 + 12 + 24 that's our sum and N is 6 you should get 11 if you put that in a calculator um understanding this is important for the problems we're going to do later but just keep in mind a question this simple you could do straight away on Desmos very easily okay skill number two median of a list so first thing we need to do and this is really important guys we need to put the data in order so I think it's 5 8 12 16 100 I see that there's five numbers so that means n = 5 now we're going to use the formula n + 1 / 2 5 + 1 / 2 that's 3 and the three tells us that the third data value is our median so now we're just going to count to the third number so long as it's in order 1 2 3 Okay the third data value is 12 so the median is 12 so that's what you do if there's an odd data set for an even data set set first once again we're going to put them in order we count I think this one has eight so n is equal to 8 then we do n + 1 over 2 8 + 1 over 2 gives you 4.5 now this is really important what you do is think of it this way 4.5 what two numbers is 4.5 directly in between well it's directly between four and five so we're going to find those two values and we're going to average them so we're going to take the average of the fourth and the fifth value so now we count 1 2 3 okay the fourth value is a 24 and the fifth value is 29 to average two numbers all you're going to do is you add them up and divide by two so the median is 24 + 9/ 2 which is 26.5 and of course you could do the these problems very easily on Desmos this was just to teach you guys how these equations work so that when we go to do harder ones that maybe we can't just do right away on Desmos you know how to do it so practice doing it by hand and practice doing it on Desmos as well notice how this saved us a ton of time because we didn't even have to put them in order we could just type them straight into into Desmos in the order they were in okay large data sets so take a look at this list of grades on a math test and um uh I'm going to put this in this bar chart it's not a bar chart but I'm going to put it in this chart and um uh the values are on the xaxis and the frequencies are on the Y AIS so this is called a histogram so whenever I see a histogram first thing I do is I write the frequencies on top of the bars so 2 3 5 1 1 32 just to make that easier so I have to keep reading the graph over and over again and the as we said the x axxis is always going to be the value and the y- AIS is going to be the frequency so why am I doing this with you guys this is confusing for lots of people okay so those frequencies are not numbers in the data set you see right two there's no two in the data set right look on the left there's no two in that list right it's telling you how many 70s there are so you see there's 270s 375s 580s 185 390s and 295 that is what the histogram is telling you that's what a frequency represents if you put the median for this one is like five or something you know that that's wrong because there's no five in the data set okay great um so that's a histogram and if you wanted to find n you could add up all of those frequencies the red numbers and you would get 16 and if you wanted to you could go and put these guys kind of um in like a cumulative distribution like I did on the bottom so like the two has the first two one to two then three 3 four five okay three through five the next one has five 6 7 8 9 10 okay 6 to 10 you could go through and label it like that and that's what some tutors like to do I really don't like doing that I'm just putting it here for the sake of teaching um I'm going to show you guys an easier way to keep track of the position um at least that I think when we go to do an example in a minute so that green stuff I wrote there not that important okay another way to display a large data set that you will see all the time on the SAT is a frequency table so not always but the vast majority of the time a frequency table will have the value in the First Column and then the frequency in the second column regardless remember we we talked about n is like frequency is like the number of participants so like number of students number of Voters number of turtles number of dogs whatever you're studying that whatever sounds like that is going to be your frequency and then probably the other column is going to be your value but as I said usually value is on the left but it's not all the time like with histograms so just pay close attention um anyway so what I guys what I want you guys to understand is that this is just displaying the exact same information nothing is different so everything I'm about to teach you for both calculating the mean and calculating the median from a histogram and a frequency table they are both the same it's the same information just displayed differently so um don't let either of these two things confuse you they're the same thing it's just what's nice about the histogram is you have the nice picture to look at which can help you make sense of maybe how the data is shaped which maybe the frequency table doesn't give you again not important for the for the we're just going to do calculations okay so um skill number three mean of a large data set um often times in the question not always but a lot of time they'll tell you the sample size in the question so do you see how it says 16 students we didn't need to go and add up all those red numbers we could just write down okay n is 16 because they told us so if you notice that that'll save you some time anyway so mean is sum Over N so um n is 16 that part's easy the sum this is a part that confuses everyone what you're going to do is you're basically going to go and find the area of All These Bars and add them up so watch what I do 2 * 70 plus 3 * 75 plus 5 * 80 plus 1 * 85 Plus 3 * 90 plus 2 * 95 so that's what you type in for your sum uh you can do it all in one go on Desmos um you don't have to put parentheses but I highly recommend you put them just because it makes it easier for you to read so that you can see what you're typing and of course 16 we put on the bottom so that sum if I did it right should come out to 1310 and you 1310 divid 16 is 81.87 so the mean for this one should be 81.8 75 so you see you just multiply add multiply add multiply add we're basically finding the areas of the bars okay now we'll do the mean of a frequency table so um this one they don't tell us what n is so we really got to go and add up all those frequencies remember n is the sum of the frequencies so 4 plus 3 plus 2 plus 6 plus 1 plus 5 plus 2 that gives us 23 we're going to do the same thing mean equals sum over n and n is just 23 the sum we multiply and add multiply and add so these guys all go together 5 * 4 + 8 * 3 + 9 * 2 + 14 * 6 + 16 * 1 + 19 * 5 + 23 * 2 you should get a sum of 303 303 over 23 should give you approximately 13.17 great let's do the median now same thing n is 16 and we're going to do the position of median Formula n+ 1 over two 16 + 1/ 2 gives you 8.5 8.5 is smack between 8 and 9 so the median is going to be the average of the eighth and 9th data values now um you don't need to go and find a distribution like I was telling you guys about before we can just go and quickly um figure out where those numbers are just by doing addition so watch what I do all we're going to do is we're going to cumulatively add the frequencies until we reach or exceed the number we're looking for so we're looking for the eighth number so I'm going to add until we hit or exceed eight so watch what I do two okay that's not eight I got to keep going 2 + 3 that's five still not eight got to keep going okay let's try the next one 2 + 3 + 5 that's 10 okay ding ding ding we're going to stop because we're bigger than eight and as a matter of fact we're we're equal to or bigger than nine as well so both the eighth and the 9th numbers are 80 and we kind of got lucky the average of two identical numbers is just that number so the median is just 80 so you'll notice for even data sets when you see these on the SAT a lot of times they'll just both happen to be in the same bar you won't have to do any work but just be careful because you might see one where perhaps maybe let's say in this one the eighth one was the last 80 and then the ninth one was the 85 in which case they're different so you have to actually go through and add them up divide by two we didn't have to do that here because we got lucky they were in the same bar okay great okay for the frequency table we're going to do the exact same thing so n is 23 just like we talked about before and um now we're going to do n plus 1 over 2 23 + 1 over two is 12 and we're going to cumulatively add the frequencies until we reach or exceed 12 Okay so 4 it's too small 4 + 3 is 7 still too small 4 + 3 + 2 that's 9 still too small 4 + 3 + 2 + 6 that's 15 okay we stop because we just we passed 12 we're bigger than 12 greater than or equal to 12 so we're going to stop there and it's not six guys there is no six in this data set and even if there were a six in this data set that's not what the frequency is telling you the mean and median we always get from the values so the median is 14 you would hate to do all that work and put down six which is wrong okay the median is 14 you got to look in the value column okay great let's keep going all right so um something you need to know about histograms and you're going to see these on the SAT is that the bins don't have to represent a single value um they can represent a range of values so what we're going to talk about is how to read histograms that are like that and make sense of them because they've been act asking some rather abstract questions that require you to know how to do this uh I call this the include left exclude right convention so um they don't have to be integers but you'll see a lot of times on the SAT they'll tell you that the the data values are integers because I don't want to over complicate things in that sense um so let's just say these are integers and do you notice how some of those numbers in the middle like 75 80 85 we're like wait which box do those go in like are those in two boxes so that's where this comes in so let's label these boxes one through six box one we include the left exclude the right so box one could have 70 71 72 73 74 and then we exclude the right side so 75 so box one could have 70 71 72 73 74 now we go to box two and we go okay include left exclude the right so start at now we pick up at 75 76 77 78 79 and then we right before we get to 80 we stop because 80 is not in that box then in the next box we pick up hopefully you guys can kind of see this now okay look at box three left include the left 80 81 82 83 84 exclude the right and you can do that for all these boxes and um this is going to be important for some questions we're about to do and um college board will will tell you all this in the exam but it's going to be like two or three sentences of just garbage that just aren't going to make any sense so it's really better if you just understand this going into the test even though it'll tell you in the question it's really not going to help you I I I can kind of assure you that at least I think all right so take a look at these two histograms um they are basically arranged exactly the same the only difference is I'm noticing that um data set b um starts at a bigger number so everything's bigger so instead of starting at 30 it starts at 40 um distribution wise shapewise looks exactly the same just bigger numbers so take a look at this question that's exactly what I'm talking about um there's like two or three sentences of just just does not help you at all it's just explaining the include left exclude right convention you you don't need to even read any of that the question is asking what is the largest possible difference between the mean of data set a and the mean of data set B so um we need to do something that that's very similar to what's called optimization so I'm going to show you guys kind of help you understand what's going on in this problem via a trivial example so let's say Adam has $5 to8 and Betty has $9 to12 and we want to know what's the largest possible difference between the amounts of money they have take a second pause the video see if you can figure it out so um the largest possible difference that they both that the in the amounts of money that they have uh I believe is seven it's 12 minus 5 equals 7 you probably just use intuition to figure that out but essentially what we did was is we took the person with more money and had them make as much money as possible and then the person with less money have as as little money as possible that way the numbers are kind of as far apart as possible on the number line and then we subtracted them so you see 12 and 5 are the farthest apart kind of numbers we get on the number line so we maximized B we took the biggest number possible and we minimized a we took the smallest number possible this is analogous to the problem we're doing I just gave the data sets names um so this is what we're going to do for this problem we're going to maximize data set B and we're going to minimize data set a so one of the most important problem solving skills you can try to develop in any in anything in life is to think of a simpler problem so if you're not sure what these types of questions are asking you just I don't know come up with a tiny data set come up with a little silly example like this until you can kind of figure out what to do um and extrapolate whatever you come up with to the harder problem so think of a simpler problem okay so we have to minim uh let's label all the frequencies on top of the bars that's always helpful and we have to minimize a and um maximize B so okay the smallest number is the left one so in the first box it's 30 in the next box it's 40 in the next box it's 50 in the Xbox it's 60 now to maximize B what we're going to do is we're remember we have to include left exclude the right so it's not 50 because they're integers it could be 49.99 but because it's integers it's uh 49 and the next one is 59 and the next box is 69 and the next box is 79 so now this question now that we figured out what to do all we to do is calculate just basically everything we were doing in the previous skills so we're going to calculate the mean of each of these not really going to go through this I just did it on Desmos really quickly so take a look at this hopefully I did it right um You should get for data set a the mean would be 43.75 and for data set B I got 62.75 and then if you subtract them the largest possible positive difference so do the biggest one in front so 62.75 minus 43.75 you should get 19 and that would be the answer to this question great okay another thing they ask about is they have you do some algebra questions on the mean equation so the equation we working with this whole time mean equals sum Over N what we're going to do is we're going to get sum and N by itself so if we multiply both sides by n we will get sum equal n * mean and if we kind of swap mean and N we will get nals sum divided by mean so you don't have to memorize these I would just remember that you can do Algebra on this equation to quickly solve for the other two things I think kind of doing this at the beginning of the problem and then kind of plugging and chugging makes it a lot easier for me personally um so yeah kind of the big problem with these is that kids so many kids they don't think of mean as an equation they think of it as just oh let me just add up all the numbers divide by the number and numbers that's that's not helpful you need to put some kind of shorthand so that when you do come across an algebra question you you know what to do uh anyway so we're going to do some problems kind of um relating all three of these things oh also for combining data sets um combining a data set is basically like you take all the numbers from one data set and you take all the numbers from another data set and just put them together into one big data set so the sample sizes are additive so if you had a let's say 10 people and then 50 people and then you combined them you'd have 60 people right and similarly sums are also additive so if the sum of the first data set was like 200 and the sum of the second data set was 800 you could add them together and get a th000 now really important guys means are not additive when you combine okay you have to do some calculation using those formulas on the previous um previous slide so just make sure you understand that you combined data sets you can just add up all the samples to get the total sample size and you can add up all the sums to get the total sum that's the idea these aren't really equations you need to memorize or something all right so let's look at this problem this is pretty much directly taken from a dsat math question so um I just changed it changed it up so we've got two data sets and we're going to combine them um so um I see that they gave us n and they gave us the mean so probably finding the sum is going to be helpful hopefully that makes s that's the thing we're missing right so um we're going to do sum equals n * mean for both of them so 45 caterpillars mean of nine so 45 time 9 that's 405 for data set B we've got 55 time 12 which gives us 660 and remember what I said guys the combined sum you just add them together so the sum of data set C is 1065 and I mean they told you I mean they told you and was 100 right it says the lengths of the 100 caterpillar but they didn't even need to tell you that you could have did 45+ 50 and figured it out so just keep that in mind but they saved us some work there we didn't have to do that so we get 1065 over 100 which is 10.65 so the mean of as at C is 10.65 so that would be our answer okay let's look at another one okay so this one has a sum and a mean so I probably want to find n so we're going to use the equation we talked about before n equals sum over mean so 847 over 11 gives us 77 and then 588 over 6 that gives us 98 okay so data set a has 77 caterpillars data set B has 98 caterpillars and if we combine them all the caterpillars from data set a and data sets B um we can just add those two so 77 plus 9 is 175 so there would be 175 caterpillars um in data set C we could also go and find the mean if we wanted to you just need to find the total sum great okay this one's really confusing I wasn't really sure where to put this um I figured this was a good place since we're just talking about equations and stuff so um it says we have eight integers and there's kind of um uh data set a actually has nine integers so one is missing so I'm just going to call the missing One X and um we can write an expression for the mean by doing sum over n and you should get 544 + x over 9 and if you read the question um it says that the integers have to be less than 79 okay so X has to be less than 79 but it also says the mean is an integer that is greater than 68 okay so the current mean is 68 and we want it to be bigger so the number we're adding in order for the mean to go up must be bigger than 68 so X has to be somewhere between 68 and 79 so it can really be you're kind of potential candidates are any of these numbers 69 all the way up to 78 so how are we're going to figure out which one it is well notice they said the mean of the data set has to be an integer so we're basically just going to kind of guess and check these guys until we get one that gives us a whole number so um if you do that you should find that the only one that works is uh 77 if you plug in 77 um you should get a mean of 69 which is an integer so the largest integer and 77 is bigger than all those numbers in the data set so the largest integer is 77 so um I went over this in my Desmos video um these questions you can use Desmos to kind of like graph the the uh Missing value and then um I basically just started guessing and checking means that's the Y so I was like mean is 69 mean is 70 mean is 71 and you kind of see once if the mean gets to 7071 it's not going to cross the red in that little blue box so it's just not going to happen it's got to be a mean of 69 and a missing value of 77 anyway I think this is situationally helpful knowing how to do all this not too important though all right so we already talked about two big statistical measures we talked about mean and median mean uh is what you think of as the average that's what probably the word you use in school median is the middle most number um but there's two what's called measures of spread um uh which we haven't really talked about up to this point range which is just max minus Min that's the biggest value minus the smallest value and standard deviation which is put maybe a little bit too simply roughly speaking the average distance from the mean so let's talk about problems we might see with those two range and standard deviation okay so range remember we said is Max minus Min that's something you definitely want to write down that equation it's often straight up just a fact up question if you know this word you get it right if you don't you get it wrong so the biggest value is 23 and the smallest value is five so we're going to do 23 minus 5 which is 18 so the range would be 18 for this data set that's it okay standard deviation again put very simply is proportional to the average distance from the mean so if the numbers are far away from the mean on average it's got a high standard deviation and if they're um close to the mean on average it's got a low standard deviation so let me show you an example so um I don't even need to do any calculations this data set is symmetrical the mean is five by the way this called a Dot Plot it's very it's very very similar to a histogram very very similar to a histogram it's just instead of a bar you've got dots representing um how many data points there are so I don't even need to do any calculation I see that the mean is five so I'm going to put a little pivot right where five is and you see that a bunch of those points those three there those are those are zero from the mean they are very close to the mean but if you look at these these guys they're one away from the mean so not too far and then if you look at these guys on the outside they're two from the mean so this just give you a rough idea of like kind of what's going on when standard deviation is computed um you don't need to worry about this too much we'll talk about what we do need to worry about in a second but um you kind of see that this has a relatively small distrib standard deviation because so many of those data points are either right on top of the mean zero away or just one away so they're they're pretty much as um pretty much as close almost as close as can reasonably be so this probably has a small standard deviation anyway so do you need to calculate standard deviation no here's the Desmos command if you want to know it just because I figure someone's going to ask me in the comments um this is never going to be the exam kind of useless um uh I don't really think you should bother knowing how to do it but just in case you're curious the Desmos command is STD EV um but they don't expect you to know how to calculate standard deviation they only want you to know how to compare it between um usually two data sets so it'll just be a multiple choice question and you'll figure out which one's got a higher standard deviation if they're higher lower or equal that's it so we'll go over here to do that now so if a data set is more spread out it has a larger standard deviation if a data set is less spread out it has a smaller standard deviation that's it don't even think about distance from the mean just try to eyeball the spread that's all you got to do and um if they have the exact same spatial orientation like they look exactly the same then they have equal standard deviations okay so let's look at these two data sets do you see how despite the green ones being bigger they are arranged spatially exactly the same you've got one dot and then right next to it two dots and then right next to it three dots and then right next to it two dots and then right next to it one dot because they're arranged exactly the same they're not spread out differently they they don't have a different look or anything like that these guys have the same standard deviation so this is an example on two having equal standard deviations so this is what I mean when they have exactly the same orientation I mean they literally look exactly the same okay let's look at another one so I see that these guys are roughly symmetrical just for the sake of teaching let's let's mark the mean with a pivot at about five and I think that data set a it's got a lot of numbers either right on that mean or really close to it but B's got a bunch a bunch more that are farther away like the ones on three and the one on seven so I think B is more spread out so that means data set B has a larger standard deviation than data set a okay let's keep going all right for this one uh my picture isn't perfect it was kind of hard to fit this all on one slide and maybe at first glance these look the same but do you see how in a it's actually going by two so instead of it's got one but then nothing in the next grid marker and then it's got two and then it's got nothing in the the next grid marker whereas B is one after the other so data set a is actually more spread out so data set a has a larger standard deviation than data set B okay let's look at this one and this one okay um these guys I do think look exactly the same we've got three then two nothing one nothing nothing one looks exactly the same so they have the same standard deviation they are equal great let's keep going so um they'll almost always when they ask you these questions give you a Dot Plot or a histogram and as you kind of saw really all we have to do is just look at it um they could give you frequency tables or lists in which case you got to kind of visualize in your head where the numbers are a little bit um but remember to keep in mind regardless of what you do that you just remember spread as you're answering the question how spread out are they whichever one looks more spread out has a higher standard deviation okay maybe you should have done this at the beginning honestly but this is a we're going to go over what's called a bar chart so a bar chart is um any statistical graph which displays categorical data using bars so um this is not the same thing as a histogram and there's no frequencies these are just glorified lists these are essentially just lists so each set of bars represents a list okay so in this one it's not like values on the xaxis and frequencies on the Y the values are on the Y and then the xaxis is categories so that's just a list so we would just write that down as like 32 23 24 you guys see what I'm saying and then do mean and median like we were doing at the very beginning of the video this is a more complicated one but you kind of notice the common theme here um and that if you see categorical variables in other words words on one of the axes it's probably a bar chart and not a histogram okay let's keep going all right side by side bar chart that's actually the kind we were just looking at on the right this is basically just two lists so the blue list would would be like um 18 1 7 all with commas and then the Red List would be 1 4 3 2 and maybe the question would ask you to compare the means or compare the medians or something like that bunch of things they could ask you about this somewhat self-explanatory okay so look at this um I made this graph from real data I found online um it's about movie theater chains and how many screens they have so um I'm just going to write it down as a list before I answer the question at all that I feel like that's helpful so 11 5 2 6 7 hopefully you're kind of noticing these are not in order so a very common mistake is people would go what's the median and they'd go oh it's the one in the middle two right but but you guys see now that that's not correct remember when we finding the median we have to put them in order first anyway you can use scale one and two at the beginning on your own um just for the sake of Simplicity I'm just going to type these into Desmos um you can type it in using square brackets to make it a list kind of save you some typing so you have to type in 115 267 twice you could just write mean of a and then median of a so the mean is 62,000 screens and the median is 6,000 screens great that's it for bar charts what is an outlier okay so an outlier is a data value that diff differs significantly from the other values so that's a little subjective but um the SAT is always going to make it super obvious okay so like if you were looking at a chart you kind of see that those two black dots on the edge are really far away from the majority of the values those are outliers we don't need to do any calculation to verify that it's just they make it obvious for us right so just like how you don't need to know how to compute standard deviation you don't need to know how to compute if something is an outlier it will just be obvious okay so for the sake of Simplicity I'm going to call these guys small outliers because they're smaller than most of the numbers and I'm going to call these guys big outliers because they're bigger than most of the numbers so um you'll get questions where they add or remove an outlier so if we add a small outlier like that number there um the mean will go down considerably the median will go down a little little bit or it will stay the same you don't know which it's going to be so you have to check you actually have to compute it but you don't have to comp comp the mean to know that it's going to go down you know it's going to go down because the outliers pull the mean towards them all right for adding a big outlier so let's say we have this data set and then we Chuck in this number the mean will go up considerably because that outlier pulls the mean towards it and the median will either just barely go up or it will stay the same you have to check if these are lone outliers and they are the new Max or the new Min the range will go up a ton think about it right the if you look at the range of that compact data set it's from like 60 to 45 and then on the left one it was like 19 so now it's going from 60 to 19 that is way bigger that's like two or three times as big so the range is going to get affected a lot by outliers if they do indeed become the biggest or smallest number okay great now um this is kind of redundant but if you take away a smaller or big outlier everything we just did the same happens but now it's just the opposite so if you remove a small outlier now your mean will go up considerably and the median will go up a little or stay the same and if we remove a big outlier the mean will go down considerably and the median will either do the same thing the mean does or stay the same again you have to check and the range will go down a ton if that is the new if that was the biggest or smallest value and the max or Min changed a bunch uh so yeah um again pretty much the same as what we just did on the the last slide just the opposite so okay so let's look at this example the list above gives the heights of six children a seventh child with a height of 70 in will be added to the group which of the following correctly describes how the mean and median of the group okay so let's put that guy in there 70 um he's a big outlier and we added him so we added a big outlier so the mean is going to go up for sure anything that says the mean is not going up we can cross out so that one and that one both say the mean will decrease now we have to figure out if the mean will um increase or stay the same and in this one I just use Desmos to do it really quickly the median uh did go up it went up from 52 to 53 so we're going to select answer CH is a so notice guys it's not asking you did it go up a lot no it's just asking you did it go up at all so if it went up from like 52.4 to 52.6 that is increasing so that would count so this is this one's answer TR a all right another question you could see if you add or remove a loone outlier like one at the very edge it's a Max or Min right kind of like all the ones we've been doing range changes the most then standard deviation then not far behind that mean and and you've probably noticed at this point guys median changes the least so if I Chuck in an outlier and I said what what increased the most you would say the range and if I said what increased the least you would say the median so the range is heavily influenced by Lan outliers and the median is very resistant to change it's not heavily influenced it's very resistant to change okay next up we have something similar but kind of different it's called modifying statistics there's probably like I don't know there's probably more ways more there's probably a bunch of different things they could ask you about changing statistics and you'd have to kind of think on the fly but I'm just going to show you what I think are kind of the three most common and important at least out of what we haven't already covered so something I've already seen on the dsat I'm taking them myself is ADD or subtracting a positive number so let's look at an example we have 1 2 3 4 5 six and the question says we're going to add eight to each number so our new data set is 9 10 11 12 13 14 and you might kind of notice all we did was shift it if we had a Dot Plot all we did was shift it we just moved it so the spread is going to stay the same but the Center the mean or median will increase or decrease depending on if we added or subtracted so because we added eight the mean and median both increased I didn't need to do any calculation to know that you could if you want but I didn't need to do any calculation to know that I can just figure that out from my knowledge of Statistics all right let's go on to the next one so another case that you might see is if you multiplied each number by a positive number so let's say we had 1 2 3 4 5 6 and we're going to multiply each number by seven we now have this 74 21 28 35 42 49 and as you might be able to tell if you made a DOT poot these numbers are now pretty far apart so the spread would increase and the the center the meanor median the meanor median would increase as well um they would decrease if say you multiplied by a fraction so like let's say you took all the numbers in data set a and multiplied them by 1 half then um they would be closer together and the mean and medium would be smaller so great okay let's look at a problem says data set B is created by adding 13 to each of the values in data set a Okay so let's see what did we talk about for adding a number to each number we said that the spread will stay the same and we said that the center will increase or decrease so the median because we're adding 13 not subtracting the median will increase so um we can get rid of B and D because they say that they both say that the range is changing we just established that the range is is going up or or sorry that the range is staying the same and we want the one that says the median will increase so for data set B the new data set it'll be greater so that's answer CH C so you could go and do the calculation but you see you don't have to it's like if you just understand that oh adding and subtracting a number to every number does not affect the spread you're good but multiplying a number does affect the spread all right another case that I've seen like once or twice before that I feel is worth mentioning is if you subtract a number to the left of the median and add it to the right of the median so let's say have 6 7 8 9 10 11 12 the median is n and uh the question says subtract four to every number to the left of the median and add four to every number to the right of the median so we now have 2 3 4 9 14 15 16 so these numbers are now more spread out so the spread um will increase and the center it's still symmetrical we have the same sum we have the same mean and the same median so the center will stay the same okay so we're going to each number less than the median is reduced by five and each number greater than the median is increased by five which of the following correctly compares the medians and standard deviations okay so as we talked about um the spread is going to uh the median is going to the center is going to stay the same so C and D are both wrong because it says the med Medan is getting bigger and we said the spread because we're basically just taking the numbers and pushing them apart right so so the spread will increase so we want the one that says the standard deviation is greater so that's answer B uh again there's a bunch of different tiny variations of this they could ask you but those are probably what I think outliers and then those three cases we just covered those are that's probably pretty much the vast majority of what you're going to see so all right now we're going to do a box plot take a look at this data set uh so box plot is um we're looking at it it's called a box and whisker plot as well um really alls you need to know is that the leftmost tick is the Min the right whisker is the max and the line in the center of the box is the median that is all you need to know I'm just going to tell you what these other ones are called just in case you're curious but it's not going to be on the SAT that left kind of vertical line on the left side of the box is q1 and that right vertical line on the right side of the box is Q3 so like 95% of the time you see a question like this they're just going to ask you the range and the median so the median you just look at it it's 17 it's literally a fact check question it's like if you know that line in the middle is the median you're good the range as we talked about is Max minus Min so for this one it's 28 - 5 which is 23 okay so the other thing you might want to know about this is that each piece of the chart contains 25% of the data values so let's look look at this example do you kind of see if I if I put the numbers in those boxes in those like kind of chunks of the chart there's eight numbers and there's two kind of in each section well two divid by eight that's 25% so you see each of these pieces does contain 25% so I could maybe I've almost never seen this but I could see this in a question being like for example oh like from 10.5 to 28 what percent of the data points are in between 10.5 and 28 and you would go okay that's three pieces 75% that's it so those are really the only three things you need to know median range and that each piece of the chart makes up 25% of the data so that right there between those two purple that would be 75% okay common misconception is they they'll often have trap answers talking about like mean or standard deviation box plots don't tell you about that they don't tell you about mean or standard deviation you can maybe guesstimate how spread out a data set is you know by looking at its box plot but you're not going to be able to come up with anything conclusive so do not if you see an answer choice for a box plot question talking about mean or standard deviation just cross it off right away it's probably not right so here's an example what they'll do to make this a little bit harder is they'll give you two of them so like a side by side box plot so the median we look at the line in the bars the line inside the Box the median of the top one one is 12 and the median the bottom one is 17 for the range we look at the very ends of the whiskers the top one is 28 - 3 = 25 and the bottom one is 28 - 5 equals 23 so maybe you'll get a question being like which one has the bigger median which one has the bigger range or maybe you'll have to find the two medians and they'll be like oh what's what's the difference between the two medians and you would do 17 - 12 equals 5 uh again don't let that confuse you just tackle each box plot individually and then answer the question that's what I would do okay that's that's um that's pretty much it for the slides um I just want to do a quick kind of disclaimer we still haven't covered everything that's on statistics you see this takes forever right so um in the third video we did confidence interval and margin of error and in this video we did basics statistics so um uh there's still two more topics we haven't done which is table probability that'll be a short one we'll do in the future and statistics conclusions so basically any statistics conclusion that does not have to do with margin of error so for example experiments and cause and effect like stuff like that that is just very very rarely on the dsat it seems anyway so we're still not done yet but we've got more to come but we've definitely covered the brunt of what you'll see so that's good okay it's time for the final review remember to download the free final review worksheet using the link in the description try to complete the worksheet yourself before watching the rest of the video and I highly encourage you to ask relevant questions about this topic in in the comment section okay let's switch to the Whiteboard for the final review okay so for this first question we're going to do mean equals sum Over N um I see that there's 1 2 3 four five I think six numbers so n is 6 9 + 26 + 11 + 3 + 4 + 7 that should be 60 and 60 divid 6 gives us 10 so that would be the answer for that one now this next one says what's the median of the same list so first thing if we're going to do it totally by hand we have to um put it in order so I think it's three four let me just do it up here 3 4 7 9 1126 think I did a good job at that again there's six numbers so we're going to do the position of median is equal to n + 1 over 2 6 + 1 over 2 that's 3.5 so we have to take the average of the third and fourth number one two 3 four so that's these two so the median is equal to 7 + 9 / 2 which is 8 and I believe that is the answer let's go over to Desmos and see how we could have done it on there as well okay looks good and just give me one second okay looks good all right let's type it in as a list a equals square bracket 9 26 11 3 4 7 and if you guys didn't know you can um I can't do it that much because I have it set really weird but if you can make drag this and make the Box wider to make it easier to see what you're typing okay so I'm going to do mean of a I get 10 I'm going do median of a and I get eight okay seems like we did a good good job let's move on to the next question okay this is the mean of a frequency table so we're going to do mean equals sum over n and I'm just going to type that all in the calculator in a minute the N is let's see 2 + 5 + 7 + 3 + 8 + 4 + 1 which is 30 so again guys we add up the frequencies to get n and the uh sum we're going to multiply and add those guys in the chart so why we go to Desmos for that so so I'm going to make this as big as I possibly can 3 * 2 + 7 * 5 + 12 * 7 + 14 * 3 + 17 * 8 + 24 * 4 plus 51 * 1 all over 30 and we get a mean of 15 so why don't we write that down that'll be our answer for question three okay let's go back over here now we're going to do the position of the median is n + 1 / 2 and we said was 30 right so 30 + 1 / 2 which should give us 15.5 so we need to take the average of the 15th and 16th and now I'm going to add up the frequencies to find what those numbers are two that's too small I got to keep going 2 + 5 too small I got to keep going 2 + 5 + 7 that's 14 still too small 2 + 5 + 7 + 3 that is 17 so both the 15th and 16th are in this row so that means the median is 14 okay great what is the minimum okay that's just the smallest value so that is three let's put three maximum that's just the biggest value so that's uh 51 and range we talked about Max minus Min so that's 51 minus 3 uh 51 - 3 is 48 okay I think that's our answer for that one looks good okay now we got a histogram asking very similar questions why don't we first just to make it a little easier label the tops of the bars of the histogram so this one's one 2 4 1 2 4 and then this one's one and if you added up all those red numbers you should get 15 and but you didn't need to do that because they told us that there's 15 students so that that is your n so I just write that down immediately on my scratch paper n is 15 and now we're going to do mean equals sum Over N so all that nonsense multiplying and adding over 15 and we'll just do that on Desmos just to save me some some stress here make make this as big as I possibly [Music] can okay 1 * 74 plus 2 * 77 + 4 * 78 + 1 * 79 plus 2 * 80 plus 4 * 81 + 1 * 82 all over 15 which gives me a mean of 79 great let's go back over here so we get a mean of 79 now for this one we're going to do the position of the median is equal to n + 1 / 2 so 15 + 1 / 2 16 over 2 that's 8 so we have to find the eth value that's going to be our median so we go back up here and we add cumulatively until we hit eight 1 No 1 + 2 No 1 plus 2 + 4 no that gives me seven not quite 1 + 2 + 4 + 1 okay that is right here so the eighth one is a 79 so the median is also coincidentally 79 that's our answer for that one okay and range we're going to do range equals Max minus Min and let's see the maximum is 82 and the smallest bar is 74 so we're going to do 82 - 74 and that should give us eight okay great um all this nonsense about the include left exclude right convention we don't need to worry about that we know what that is um 16 integers these are just the same charts we used before but it's a different question so so we want the smallest possible value of B over a so when you have a fraction guys um and you want to make it small hopefully this makes kind of intuitive sense you want that top to be as small as possible and you want the bottom to be as big as possible think about it dividing by a big number makes it really small right um so hopefully that makes kind of intuitive sense if check come up with an example with some numbers if you if you don't believe me um all right so B data set B and and it's asking about means not medians it's asking about means so we're going to maximize a and we are going to minimize B okay so a so the biggest number is 39 then in this one is 49 59 and 69 and then the smallest for B so the leftmost one is the smallest so 40 50 60 and 70 and for both of these n is 16 okay now we just need to figure out b and a so let's go to Desmos let me pull the chart up okay so um a is okay oh didn't write the numbers on top of the bars um 365 2 3652 great okay we're in business okay 3 * 39 + 4 * 49 + 5 * 59 plus 2 * 69 Over N which is 16 and B is 3 * 40 plus 6 * 50 + 5 * 60 + 2 * 70 over 16 okay so a is 46.6 to5 and B is 53.7 um for the question there asking us right and um we're now going to do we want the smallest possible value of B over a so let's do B over a um okay I think I might have did something wrong let me just double check okay yep I uh I had a uh four I just typed it in wrong I had a four where I should have had a six like up here okay so I was wrong a is 52.75 and B is 53.7 and B over okay we get 1 Point round to the 100s that would be 1.02 we round up because of the eight uh you see if I made this a fraction the reason why I asked you to Round Here is because um if you made this a fraction this would not fit in the blue book app it's too many characters 215 over 211 too many characters so let's go to the 100th that's 1.02 since the question told us to do that if it didn't tell us to do that we would just keep typing until we run out of room okay so let's go back to the board so we get um 1.02 that is our answer great let's keep going okay mean is equal to sum Over N so that means that the sum is equal to n * mean because for this question I see they give us n and the mean so we probably need to figure out the sum that just kind of makes intuitive sense so the sum is equal to 42 25 for the first data set which is 1050 and then for data set B we're going to do 38 is n and the mean is 45 so 38 * 45 that gives us 1710 okay so for data set C the sum is 1050 cuz when we combine we just the sum and the samples sizes are additive so 1050 + 1710 gives us 2760 and then for n we're also going to add the NS which is 42 video games and 38 video games uh that should give you 80 and now to find the mean for data set C we're going to do sum over n 2760 over 80 and we should get 34.5 which you know if you're actually writing it in real life in America we would say that's $34.50 but they don't have you R you know dollar signs and stuff like that in the blue book app so you don't need to worry about that so the answer is 34.5 great let's keep going okay this one wants us to compare the standard deviations I'm not even going to look at the answer choices I see that these guys are more spread out they go from like 2 to8 and the other one is more packed together 3 to S so more spread out means it's got a higher standard deviation so I want an answer choice that sounds like they are picking um uh what do you say that sounds like data set B has a higher standard deviation um I think that is this one the standard deviation of data set b um is larger than a so that looks good to me they could have also conversely said the standard deviation of data set a is smaller just be very careful you know they might they might phrase it differently from how you phrased it in your head so just keep an eye on that I don't think I've ever seen one of these questions where this is the correct answer I've never seen that be right um You can almost always tell just by looking at the picture that they provide great let's keep going okay this is real by the way this is real data okay I'm giving you an insiders look um we're going to come back and talk about this because some of these numbers are shocking to me um let let's answer the question first so first things first I notied that each big tick is 10 and there's five spaces so that means 10 divid by five is two so each little tick is two okay so 20 22 24 that's 24,000 that's also 24 that's 30 32 34 and that looks like it's just past 34 so this is 35 this is 10 12 14 and then one past that so 15 and then this is 30 32 um 33 and again guys when you go to do the median it's not just the one in the middle you have to write them down and put them in order so let me just write them down and then we already did that so if you want to do it by hand on your own I leave that to you we already did that at the beginning of this worksheet so I'm just going to use Desmos uh notice how too it says in thousands so we don't need to change it to like 24,000 35,000 if he didn't say that we would need to convert it by multiplying by th but because it says that we don't need to worry about the units because the units match what's in the chart notice that see how it's the same in the chart okay let's go to Desmos why don't we make list a um let's do T for to drill so bracket 24 24 35 15 33 and we're going to do the mean of T and the median of t Okay so for the mean we get 26.2 th000 and for the median we get 24,000 so you see a lot of people if they saw that median question they would have just looked at the one in the middle USA and said 35 but that is way wrong you have to if you're going to do it by hand you got to put them in order first they're not in order okay guys these are my lifetime views by country as of New Year's Eve and it's like now now everyone on everyone all everyone in these top five countries is great but I just want to point out something that shocked me is like I looked up the populations of these countries uh Nepal is like way smaller population than basically every other country on this list and Nepal and again how many of these views in the US column are like my friends and colleagues and myself like going in the comment section so I don't even think us has the most views I think Nepal has the most views if you think about it and like I man I like I I've again met many people from a lot of these top countries to the community and it's been amazing so thank you but you know special shout out to Nepal you guys are putting in work watching my channel um I I I don't know what it is why you guys have such a high tutor view per per capita but uh but you do so uh anyway thank thank you guys so much for watching okay question 16 so for this one we might we might have to do some calculations so why don't I first write the numbers on top of the bars did a terrible job trying to fit this on the screen here okay one 4 3 2 3 4 2 one and if we added up all those red numbers 1 + 4 + 3 + 2+ 3+ 4 plus 2+ 1 we would get 20 but it kind of doesn't matter because they told us there's 20 students in the class any anyway so they tell you that the student with the grade of 96 is getting kicked out so we're removing a big outlier so let me kind of put an X to this guy and we are removing a big outlier so remember we talked about guys when you remove a big outlier um no matter what you don't have to do any calculation the mean is going to go down so anything that says the mean is going to go up we're going to cross out so the mean this one says the mean is going up and this one says the mean is going up so those are both out A and C are out for the median remember guys we can't just use intuition we have to check it because sometimes it will stay the same and sometimes it will just be barely slightly change so we're going to first calculate it including that guy on the right so we're going to do it with n = 20 and if I do that I get a position of the median is 20 + 1/ 2 which is 10.5 so we have to find the average of the 10th and the 11th now if I go and like cumulatively add the frequencies you know like we've been doing add add add and then we stop when we we hit it or we exceed it so greater than or equal to so 1 + 4 too small plus three to that eight too small now I'm looking at the fourth bar plus two 10 okay so the 10th is in this bar and then if we add one more 2 + 3 we get 13 so the 11th must be in the next bar so the 10th value is a 79 and the 11th value is an 80 so the median is you add those up divide by two 79 + 80 / 2 I get 79.5 so that's the old median now we're going to do it where we remove that guy so let me let me also just erase this so n would be now 19 so the position of the median would be 19 + 1 over 2 which is 10 and if we add and go do that again we see that the 10th value is still here and we would get a median of 79 so we got a little unlucky here the median did just bar go down if we had more numbers in that box in those boxes in the middle they it wouldn't have changed at all um so the median just coincidentally did go down for this one as well remember you have to do the calculation it went down just a tiny bit so both the mean and the median will decrease though the answer totally could have been D for a different slightly different graph slightly different histogram both the mean and median will decrease great let's keep going okay so this is a modifying Statistics question we're going to subtract six from each of the values okay so adding or subtracting the spread um stays the same and the center because we're subtracting it's getting shifted to the left so the center will go down it will decrease so um is there any that say the range changes um B says that because that's spread right um and range and D says that we want one that says the spread or the range is the same equal okay so [Music] um looking at answer Choice a this one [Music] says the median of data set B is equal so that is wrong so I think it's got to be C so the median of the new data set is less smaller it got smaller and the range stayed the same because when you add or subtract a number to all the numbers all you're doing is just shifting it left or right so the center moves the position moves but the spread is unaffected great that looks good to me and I mean even if you weren't sure just like you know I don't know write a simple data set like 10 11 12 13 and compute it with Desmos or something perform the transformation and then compute it with Desmos all right I think there's some more great box plots okay let's look at this what what is the positive difference between the medians okay so the medians we look at the line inside so the median for the first one for the red one is 15 the median for the green one know it's kind of hard to tell guys I'm I'm pretty sure it's 18 let's write that uh median equals 18 so the positive difference just do big minus small so 18 - 15 which is 3 now we're going to do the ranges so that one that one that one and that one so the red one has a range of 30 minus 4 which is 26 and then the green one has a range of 24 to 7 which is 24 - 7 is 17 okay so it's asking for the product of the ranges so we'll just multiply them together so the product is 17 * 26 uh let's see what that gives us 442 and it says in the second box plot the red one what percentage of the data lies between 4 and 15 okay so four and 15 so that's that marker to this marker marker that is um two pieces of the box and whisker plot the box plot so each piece is 25% so two pieces would be 50% so that is 50% of the data okay let's go back to the slides okay thank you so much for watching my video everybody I worked really hard on this one uh please comment below your thoughts on the video and worksheet did you find them helpful be sure to like the video and subscribe for more digital SAT Math content and as always if you're interested in my tutoring Services my website will be linked in the description I tutor SAT Math and all math subjects from about 7th grade to AP early college level thanks for stopping by and good luck studying