Transcript for:
Data Analysis with StatKey and StatCato

hi everyone this is Matt to show with intro stats and today we're going to look how to use software to analyze normal quantitative data so we've kind of seen the theory behind that we've seen that we want to use the mean and the standard deviation when the data looks normal but again we actually don't want to calculate this stuff by hand we want to actually have a computer program calculate do the heavy lifting for us so let's take a look at an example so this again is my website Matt - to show our G and we need some data so I'm going to go to the statistics tab and I'm going to click on data sets the data we're going to look at today is the health data this has here's the health data so I'm gonna look here it's right here health data I'm just gonna open the excel file so health data Excel and you're gonna get this one now this data actually has a lot of data it has the random sample data from 40 women and 40 men so it has the data separated so you can see the men's ages and heights and weights and women's ages and Heights and weights and things like that and it also has the combined data so the combined data has all 80 of the men and women so it does have quite a beautiful day to set so when you're looking for something just keep kind of scrolling to the right until you find what you're looking for the data I'm going to look at today is women's wrist circumference and infor in inches so the they looked at they measured how far around the wrist is for these 40 randomly selected women in inches okay so again whenever you're working with data like don't it's not as a good idea don't mess up some raw data so I'm just gonna copy this we learned last time that if you want you hold your cursor above the column so it turns into a downward arrow and then just left-click and it'll highlight the entire column for you and then push control-c or its command see if you're in a mac and then we're going to go ahead and paste that into a new data set here so we have a new data set here there we go it's right there there's the new data set yeah if I want to have the computer calculate I'm going to use stat Cato and stat key to calculate the mean and standard deviation so we'll start with we'll start with looking at stat key so again you go to lock 5s com that's where stat key is hosted you don't need to save it it works so you can just open it click on the stat key button now this was one quantitative data set so we're gonna click on one quantitative variable it's under the descriptive statistics and graphs sort of on the top left it says one quantitative variable so we click on that now whenever you're pasting data into something into a into stat key usually you're gonna click the edit data button so edit data now right now this is kind of looks kind of weird you'll notice you'll see the the numbers the quantitative data but it also has a word next to every number that sometimes called an identifier these were how long do certain animals live in years and they were kind of showing what animal that that number came from that's called an identifier most of my data sets don't have an identifier necessarily so so what you're gonna do I'm gonna do is just delete out this data again an easy way to delete data out you can click and drag it down if you want but I just like push ctrl a ctrl a will highlight everything even if you got a really huge data set and then just push delete so ctrl a delete now I want to paste in the data that the wrist data from the 40 randomly selected women so I'm going to do that now if you look this data does have a title so titled means header row in stat in stat key so you leave this checked it says header row but it does not have an identifier it doesn't have a word next to every number so you want to uncheck that one and then just push OK and there we go if you notice right away it calculated there was 40 women that's the sample size we see the mean was 5.0 six seven we see the standard deviation is 0.33 one but again we have to know we we have to know two what the shape of this data set looks like for it to be for us to know if we can use the mean and standard deviation so what we said was it needed to be normal or bell-shaped we can kind of see from the dot plot that it does sort of look sort of normal we got more dots in the middle and as we get away from the middle we have fewer and fewer dots but it's better to look at the histogram let's take a look at the histogram now the the number of bars is a lot for a data set is so small we only have 40 women in this data set so having 10 bars or 10 bins or a stat key calls it buckets so 10 buckets is a lot of a lot of bars for this for this small of a data set so I like to decrease the amount of buckets you can kind of play around with this a little bit and have these go back and forth a little bit but you can see right away that that looks pretty normal right you can kind of see how the data looks looks very normal and also the mean which is the average 5.0 67 is pretty close to the highest bar it's sort of right on the highest bar so this is what we call normal or bell-shaped the highest bar is in the middle and the left tail is about symmetric with the right tail all right that's though you know normal when you see it you can also decrease the number of buckets if you want a number of bins with this slider so here's here's five bars or even three bars three bars works pretty well especially if the data is very very small but this thing this data set looked normal even at five bars or even at seven bars so it looks pretty normal we got we definitely have the highest bar in the middle and it looks pretty symmetric so we this is normal and what we learned in our theory is that that's when the only time the mean actually is an accurate average so that we are allowed to use the mean one thing to take take a look at is that the mean and the median are actually very close in when we get to our discussions about skewed data the the mean sort of gets pulled in the direction of the skew but if the mean and the median are actually very close that's usually a good sign that you're probably the mean is is somewhat of an accurate average also standard deviation is going to be our spread for normal data okay so we know the mean is five point zero six seven and the standard deviation is 0.33 one now I could have actually found this on on statcato as well so whenever you're copying and pasting in stat kado you always want to make sure the title again is in the gray so I'm gonna paste in that data there it is women's wrists data by the way you can pull these if you put the cursor in between where it says c1 and c2 you'll see it turns into a sideways double arrow if you click you can drag it open and see the title better if you want okay so let's take a look at our graph if we want to graph this and see what the shape is I would go graph and histogram all right there's our histogram and then you just click on the column that you want to make a graph of this button down here this is show legend is a good button it puts a title on it for you so it's just kind of nice I always like my graphs tab titles and number of bins I know a stat key called them buckets yes but stat kado calls them bins and so the number of bins 10 is quite a few I think I'm gonna decrease this I think we even like three bars would probably be pretty well or five or seven three usually works pretty well if you have a small small data set and we can see it does look normal right the highest bar is definitely in the middle so it looks looks pretty symmetric okay and we can also have a dot plot if we want it so graph and dot plot and just click on that on the column you want again show legend we'll put a title and there's our dot plot so you can kind of see even the dot plot looks kind of normal alright and now how do we find statistics on stat kado well it's under the statistics menu so you go to statistics basic statistics descriptive statistics so again statistics basic statistics descriptive statistics if you just want to kind of calculate some basic statistics for a quantitative data set that's where you'd look again it will ask you input variable that wants the column so sometimes it'll have like a you know you can click on the column but this one doesn't so we have to type it in c1 stands for column one so it's the one that says here it's asking what variable what column your data is in and notice you have all kinds of different statistics that they will calculate here's a few that are I think are important especially if you already know that what I what I always do is find the shape first and then identify which of these statistics I want based on the shape since we already know this data was normal we're gonna want the mean and the standard deviation so that they're the meanest standard deviation right there also it's nice to have the min and the max I'd like to know the min and the max cuz those might be outliers and I also like to know how many numbers there were so I like this button this is n total if you remember the letter n stands for sample size or how many numbers are in your data set so those when I have a normal data set these are sort of the statistics I like to find and I just pushed ok and there we go it calculated it for us notice it got the same really the same numbers that stat key gave us the mean was 5.0 6 7 and the standard deviation is 0.33 one the men was 4.2 max 5.8 and the end total we had 40 women in the dataset now what I want to show you though is what are these numbers mean in terms of the theory we learned last time so last time we learned that if you add and subtract the mean and the standard deviation you get the typical values the typical values so five point zero six seven the mean minus 0.33 one the standard deviation gives us four point seven three six and the mean plus the standard deviation gives us five point three nine eight if you remember last time where I mentioned that that's about the middle 68% of the data now I just want to kind of show you where that sounds oh it's good to kind of have a visual of the data now the one thing about this is again this data is not in order so what I'd like to do is just kind of put it in order so I'm gonna highlight this data set I'm gonna go to home and right here you'll see sort sort and I'm gonna sort smallest to largest there we go and that just makes it easier for me to kind of see this visually so anything between four point seven three six so four point seven would be less so it has to be between four point seven three six and five point three nine eight are going to be considered typical all right so this would be considered typical all right so there's our our four point eight