Transcript for:
Summary Statistics Overview

hi everyone this is Matt to show with intro stats and today we're going to kind of be finishing up this chapter that we've been going through this unit on how to collect data and analyze basic categorical and quantitative data so we've been kind of going through this unit so we're in is gonna be looking at this last section which goes over some of the common summary statistics that are used in quantitative data so if we go to my website Matt - to show RG any click on statistics and we're gonna use my book today just kind of look at my book if you look in the these chapters here are the chapters of the book it's really more like units each chapter has a bunch of different sections so if we go to chapter one we've been kind of working through this material through our videos and through the problems so we're going to be looking at section one G which is the last section so this was on looking at some of the summary statistics that we use in quantitative data so if I go ahead and open up one G and I'm just gonna scroll down to almost the end of the chapter into the section and there it is is on page 94 and it says summary statistics so this is just a list of some common statistics that you find in quantitative data okay so the main thing is they're sort of broken up into centers all right we said a Center is sort of a type of average we want averages to be close to the center of our of our histogram and our dot plot so Center again goes with averages and then we also have measures of spread and then measures of positions so we're going to look at each of those so we're gonna start with four measures of Center so four measures of centered so these are four different types of averages you can think of them as so we've gone over some of these for example the mean average we've gone over already this was the balancing point in terms of distances again measures the center or average when the data set is normal it's not accurate when the data is not normal okay that's kind of one of the things we learned about the mean the median average is our most accurate average this is the center of the data in terms of the orders when you put the numbers in order and go to the center so this is also a type of average you've talked about this one but here's a couple new ones for you the mode the mode some of you may have heard in the past maybe you took a math class and they said mean median and mode and I hate the mode the mode is the number that occurs most often in a data set now the one thing about mode is not all data sets even have a mode since the mode is the number that appears most often some data sets won't have any mode so if you all the numbers in the data set only appeared once you would have no mode in the data set so sometimes if you go to a computer program and push mode sometimes you'll see none it'll just write the word none because because all the numbers appeared only once in the data set and so there was no mode sometimes there's only one number that appears most often maybe most of the numbers appeared once but we had one number that appeared three times so that would be the mode you could also have like multiple modes so if I had like I might have two or three modes so I might have two or three numbers that appear most often and they all would have to appear most often so maybe they all appeared maybe had three numbers and they all appeared four times so the computer would list the three numbers and then it would say n for mode or you know so that's just means how many times that mode appeared so so that's that's a good one to remember that's an a type of average or center mid range is also a good one it's an old not many people use mid-range very much anymore but it's actually a really quick measure of average so it's if you need if you kind of get handed some data and you don't have time to put it in a computer and calculate the mean or the median you can you can actually calculate the mid-range you know pretty quickly without a computer but it's not very accurate the mid-range is sort of a.m. and easy easy to calculate average but it's not super accurate the mean and median are much more accurate so the mid-range is basically calculated by going halfway between the max and the min so you take the biggest number in the data set plus the smallest number in the data set and then divide by two that's called the mid-range so we have four measures of center or average mean median mode and mid-range now we're looking at four measures of spread so we're looking at the measures of variability now again some of these we've gone over in the past and but let's look at a couple we're going to have a couple of new ones standard deviation we said was how far typical values are from the mean in a normal data set so remember the standard deviation was our most accurate spread calculation for normal data this is very famous measure of spread and statistic standard deviation but again remember that it only is accurate if the data is normal another one that you'll sometimes see is the variance the variance variance is sometimes denoted as s squared or the square of the standard deviation so if you square the standard deviation you get variance or if you think of it this way if you take the square root of the variance you get the standard deviation so so it's basically just the standard deviation squared it's very famous in ANOVA testing sort of towards the end of the class we'll get into ANOVA testing a little bit and you'll hear this word variance pop up just remember that because it's the square of the standard deviation it is a measure of how far numbers are from the mean so it is a measure of spread about the mean and it's only accurate when the data is normal or bell-shaped okay so standard deviation variance another very common measure of spread that is sometimes used is called the range just range now the regular range is a very quick easy to calculate measure of spread it's basically just the max minus the min so it's very easy to calculate and it's also very famous a lot of people that may be out in the world would kind of have an idea in their head what the range of a data set is but it's not actually super accurate in the sense that it doesn't really measure typical values it measures really the max it just uses the max and the min so it it can be a little bit misleading I I prefer much prefer standard deviation or interquartile range as my measure of spread than regular range regular language is easy to calculate but again does not really give me typical values so just remember ranges max minus min that's out how to calculate the range now we did the interquartile range interquartile range so interquartile range is down here at the bottom we found that this was the most accurate measure of spread for non-normal data so standard deviation is what we use for normal data interquartile range is what we use for skewed or non-normal data again it measures how far typical values are from each other in a skewed or non-normal data it's calculated by doing quartile 3 minus quartile 1 that's how it's calculated ok so again 4 measures of spread standard deviation variance range and interquartile range and now we have the measures of position so measures of position so measures of position aren't necessarily measures of spread or they're more like markers that you compare to so for example the minimum value in the data set is the number that every all the other numbers are greater than all right so it's a the smallest number in the data set the maximum value in the data set is the largest number in the data set again it's the number that all the other numbers are less then we also learned two more measures of position first quartile q1 is the number that approximately 25% of the data is less than and the third quartile q3 is the number that approximately 75% of the data values are less than we said in non normal skewed data the typical values are between q1 and q3 so these are also used to find typical values in non normal or skewed data sets one last statistic that actually is useful is the frequency or sample size so frequency or sample size n so that is how many numbers are in the data set sometimes it's called total frequency sometimes called sample size I've seen some computer programs that say total number of trials there's all different words for that but little lowercase n is usually denoted as sample size which tells you how many numbers are in your data set and that's always important information if you remember in our study of bias one of the sampling biases is that your data set could be too small and again so that's why it's always good to know how many numbers are in your data set all right so I'm hoping this was useful for you so this was Matt to show and intro stats I'll see you next time