Transcript for:
Excel Basics for Statistical Analysis

for I'm trying to have it reasonably line up of where every time we do something I provide for you how to do it in Excel as well as we move along so that way then you kind of end up with a collection of how to do it so that's going to be my goal at least so the next thing that we have in addition to mean is that we have the median so if we're talking about the median of a collection of numbers what we really think of is that if you were to imagine a group of people and this is indeed the exact same example that I give if you were imagine a group of people and you line them up by height the person who's the middle height that's the median now I mean middle in terms of the available Heights so that means that some people could be the same height some people obviously will presumably be different in height but whoever happens to be at the middle spot that person is the median so then as a business example so imagine a ret retail store has seven products with the following prices so you have 5 8 12 15 20 25 and 30 what is the median price well all you have to do is you have to just line them up and that's what's done right here is that you just arrange them in order and then you identify what's the middle number well that's the middle number so median is very easy to do as well now you'll notice that between the mean which is kind of like the fair distribution and the median which is just the middle of an arrangement those are different concepts now similarly if I wanted to calculate an Excel a median all I have to do is type median and then input the what happens to be the range of values so what I'm going to do then is I'm going to go back to Excel and I'm just going to use those exact same numbers from before A1 colon A5 and that gives me the middle number which is 2,000 out of that range so it was arranged in order so it's just the middle number right there next next we have what's referred to as the mode and the mode is the number that appears the most frequent number of times in any collection of values so and it's helpful if I click back on that so you can actually see what I'm doing is this large enough to to see as text I mean I'm making the presumption you have it also in front of you but is that visible okay so because I I want to know for when I make future ones whether I need to go larger size text so just let out a holler if you think it's too small I guess I could do that also that might be helpful okay so for a business example uh so imagine that we have the numbers which happen to be the number of units sold for different products of five s 7 8 10 7 and 12 now you'll notice that those numbers are not arranged in order and it doesn't matter whether or not they're arranged in order for when I'm finding the mode all I have to do is find the number that appears the most number of times which happens to be the number seven cool so far all of this sounding like stuff we've heard at one point okay now similarly it's very very easy for us to find things like modes in Excel so here I give you the formula for it now once something starts to pop up you're very welcome to click on it in Excel so in other words it was making a suggestion I can go with it suggestion that's the one I want to do now before I click enter here what's your guess of what this answer is going to be what's the most frequent number that occurs and the list of numbers that I'm using I'm going to also tell you it's a trick question there isn't one okay so let's see what it says not applicable I would agree with that now what do you suppose happens if I change this the number 1,00 okay well now it's a th okay but you also notice it updated the mean it also updated the median kind of it updated the median turns out 20 still the median next on our list of things that we need to know about let me bounce back correctly to here is we need to know about the variance now the variance is different than the mean median and mode and that it's measuring a different thing what the variance is really measuring and it's as it says here is that it's measuring the amount of spread of the data but in particular we can be more precise in that it's measuring how far on average values are different from the mean so let's go back to that example of where we have five friends and those five friends I mean obviously we have those particular data value you know particular example with money that we look at just a second but let's suppose that we had those five friends and they find the $100 bill and rather than it it being split in terms of five amounts of 20 for each person let's suppose we don't do that what if the first person gets no money the next person gets 40 the next person gets 50 we only have $10 left right then the next person gets 10 and the next person gets zero again so it's zero 40 50 and 10 not a very fair way to get out the money right when seemingly all five friends found it at the same time well when we're then talking about this question that means that we know what the fair amount is the fair amount is 20 how far off was the first person from their Fair dividend their fair amount of money they're 20 off it's kind of like negative 20 in a sense of where it's like hey you know they did not get the $20 that they deserve then we come to the next person the next person they got 50 or no I think I said 40 so the next person they got 40 how far off are they from the fair amount they're 40 off in the positive and then how about the next person they're 50 off in the positive how about the next person they're 10 off in the postive and then the next person they're kind of like - 20 off because they should have gotten 20 and they didn't get 20 so person who got 50 was 30 off the person who got 40 was 20 off the person who got zero was 20 off so it's how far off are you from the fair amount that tells us something but now we then say okay well now since we knew how far off each person was from the fair amount what's the average amount of unfairness that quantity is what we call the variance the average amount of unfairness from what would be equal and that measures how bad we were or how good we were at Distributing things meaning giving out money so let's look at this actual particular example here so calculate the variance for the monthly sales figures 1,00 1500 2,000 2500 and 3,000 so the first first thing that we have to do is we have to find the mean well we've already done that for this these list of numbers we know that the mean is 2,000 we did that before actually so we know what the fair distribution is the fair distribution is 2,000 then what we want to do is we then want to figure out okay well how much is each person's off from the fair amount so how far off is the 1,000 from the fair amount of 2,000 well it's negative 1 ,000 off that's what this calculation is right here how far off is the 1500 from the fair amount of 2000 well this would be -500 right here how far off is this amount from the fair amount well 2,000 is zero amounts off so this will be zero how far off is this from it well this is positive 500 off how far off is this this is going to be uh positive 1000 off now the next thing that we do is is some of these numbers were positive and some of these numbers were negative right now here's the thing this number is the number negative 1000 agreed this number is the number positive 1000 if we just took the the numbers right now and tried to look at how well did we pass out the money these two numbers would cancel and these two numbers would also cancel do you see why so don't pay attention to the additional writing of the squares that it have but this one is A500 this is a positive 500 this one is a positive 1,000 this is a negative 1,000 so everything would cancel out so it would look like we were saying we turned out the money fairly but we know that's not the case right the money wasn't passed out fairly so what we do is is that we Square all of those answers we found to get rid of negatives when we Square them all the negative numbers disappear and when all the negative numbers disappear that means then we're getting just the total amount off we are from how we passed out the money then we divide by how many numbers we used yes so are we also squaring the posi or we're also squaring the positives we do it to all of the numbers we take all the numbers we have that are the differences and we Square them and then we divide by the total number of numbers that we have so right here then because this is an amount of 1,000 1,00 SAR is 1 million this number is 5- 500 squared which is positive 250,000 this number is zero when we Square it it's still zero this number is 500 and when we Square it we get 250,000 this number is 1,000 when we Square it we get 1 million and so that's where we're getting then these numbers and then we divide by five that gets us 500,000 500,000 is the number that stands for the variance which is how much spread we have and how we passed out the money the average amount of offness and how we passed out the money that's how we think of it it's a measurement of that now when we click here this is a very easy thing to calculate using Excel now one of the things I highlighted or one of the things I commented it on on our first day of class was my expectation for how you are going to do problems in this course is I am expecting you to do it using things like Excel and stuff like that I'm not expecting you to do it by hand to understand what it means you have to know how you would do it by hand but I'm not looking for you to do it by hand do you get the difference between that okay so let's go to Excel and let's see exactly how we would do this don't need that so I'm just going to say variance and I type just simply equals V.P and I'm going to say A1 po A5 and that's the variance of those numbers now you'll notice it's not the answer we got before the reason it's not the answer we got before is because this is not the list of numbers from before if I wanted to exactly make it match with the example I was just working through I have to change that number back and now we see that the variance is indeed what we did on our example cool how we feeling so far any questions yes the significance of the number being so high is that that means that for this particular problem we we very poorly oh so there's two ways to look at it the the numbers that are there are very far from the mean now what do I mean by that well this particular number right here so let me scroll like that this number 1,00 1,000 is pretty far from the mean number the mean number is 2,000 1,000 is 1,000 units off that's very far off this number is 500 off from the mean this number is 500 off from the mean this number is 1,000 off from the mean the fact that this variance is so high means that the numbers tend to be very far from the mean number let's do a different example and find out what happens okay so let's let me uh let me do a little thing here okay okay so I have currently hidden what the variance is right okay so let's see if we can figure out what should happen when I do the following okay now I covered up the variance right let's see if we can take a guess do you think the Vari went up or down down why do you think it went down they're much closer to the mean okay so let's let's now do the dramatic reveal oo okay now that variance is still a big number right but let's remember what was it before 500,000 is this much much much less yes this is much much much less okay so now let's then see okay so before I click enter what's our guess what do you think's going to happen to the variance much much much smaller right now what happens if I change one of the values let's change this just this value to I think that's 1 million what should happen to the variance should go up do you think it'll go up by a lot or a little that's a huge number okay so anytime in Excel you see an e it means times times 10 raised to the power that's after the E so that's a 10 uh 10 raised to the 11 which means it's a it means that there are 11 zeros uh that you have to move the decimal place 11 times to the right so the decimal isn't really here it's move it over 1 two 3 four five 6 7 8 9 10 11 so imagine adding as many zeros as you need to write that that's a pretty big number right it's a pretty big number okay so that means that the variance what it does is is that it explains how close are all of the numbers to each other that help a little bit okay now in addition to the variance we also have this thing called the standard deviation now before I explain the standard deviation I want to I want to use an analogy to explain it okay so I'm going to I'm going to come over to the following window this window is very close to being a square correct it's not exactly a square okay but would you humor me to pretend that it's an exact Square okay so let's pretend it's an exact square if I wanted to measure this window there's two ways that I could do it in particular if you wanted to tell me how big this window is and it's not a very large window there's two main categories of how you could explain it to me the one category is that you could tell me what the width of this window is again it's basically a square it's not exactly a square but it's basically a square so if you gave me the width you'd be basically also giving me the height right and that's that's good but a different thing that you could do is is you could tell me what the area is of this window okay well now let's let's imagine the following if for this window which is basically like a square what happens if I shrink the area if I shrink the area what has to be happening to the width it shrinks as well what happens if I make the width bigger what happens to the area it increases okay so we just made an observation about this window that's a a true principle in general which is that areas and widths are connected to things in particular if I tell you something's happening to a width it's like I'm telling you what's happening to an area correct okay okay so now we're going to now then go over what the definition of the standard deviation is the standard deviation as it says is the square root of the variance okay now I'm going to go back over here to this window to now explain through analogy what that means the variance is the area of this window and the standard deviation is the width now what do I mean by that well if I take the standard deviation which is like a width and I multiply it by itself which is like the height I get an area okay so now I have to ask the question okay for talking about the size of this window could you see what I mean if I were to say the following sentence talking about the width and talking about the area actually talk about the same size see what I mean by that so if I tell you that the width is small well I've told you that the area is small if I tell you that the area is small I have really told you that the width is small so it's really describing the same thing now let's come back over to this thing called the standard deviation the standard deviation we find by taking the square root the variance why do we do that the reason we do that is do you remember how in the formula for the variance we squared everything to get rid of negatives taking a square root at the very end of that calculation is an attempt to undo that process because what will happen is is that right here you'll notice there are no units on these amounts there should be unit on these amounts and what should be their units well all of these were dollars so really there should be a dollar unit here but what does it mean to have a dollar squared I don't know I've never paid anybody in a dollar squar I mean if you happen to have a success at p somebody in a dollar squar please tell me what it meant okay so this kind of makes a problem we got rid of the problem of there being negatives in the problem but we added a new problem so when we come down to the standard deviation all we do is is for that problem up above using the exact same numbers of where we found that 500,000 to find the standard deviation we take a square root and that turns it back into the units that we started with when we started the problem which means up there that 500,000 if we were to try to come up with a unit for it it would be $500,000 squared which means I don't know what but this this has an actual nice unit to it this actually is for this number this would have the units of dollars in it again and is it talking about something different the answer is no because again it's just like talking about the size of this window as either width or area so when we now come over here we're going to go to our practical example that we've been working out in Excel and if we want to know how do we find the standard deviation in Excel well it turns out all we have to do is write stdevp so let's go to Excel and standard Dev for standard deviation and then I type A1 colon A5 there we go now for this example it turns out that the variance is a large number and the standard deviation is a large number well that's like what we said when the width of the window is large the area of the window is large now are they the same size of large and the answer is no because it's just like like for that window the area will be bigger than the width but it's because what we're trying to aim for is we're aiming for zero for these numbers to signify everything is close and they're not zero which means the numbers aren't close so now let's go back and then let me hide this and let's see if we can guess what's going to happen to the standard deviation so what should happen to the standard deviation it should decrease should it be a big number or a small number it's a small number or at least a relatively small number would I be happy if someone gave me $12 I'd be happy if somebody gave me $12 I'd be happy if somebody gave me a Dollar by the way I'm not saying incentivize your grade by paying me money I don't mean that I'm saying it's not zero but it is still what we could consider a small number a comparatively small number what should happen to the two numbers of variance and standard deviation they should both go down now did the standard deviation go down by a lot no but did it go down it did go down okay so turning to somebody in the room explain to them what we've done about standard deviation and variance use your own words explain what you think they mean go for it par spe okay so would somebody like to using their own words tell me how they would explain what variance and standard deviation are what is it that the variance and the standard deviation are measuring yeah variance is the total distance that every point is from the mean and the standard deviation is the average of all those distances from yep very good okay so let's go to skewness so skewness is where we start hitting terms of where it it might be a little bit less likely that you've seen it before so what is skewness so the formal definition as I give you there is the measure of asymmetry so for that to have any sense to you you need to know what symmetry is symmetry means that things look the same on both sides sides okay now really what that means is is that if I go like this with my hands so nobody's hands are perfectly exact okay but if I go like this with my hands I can fold them over and they match up correct like my thumbs are pointing in the same direction if I go like this are my hands symmetrical my hands are repeated like they're the same pattern but they're not symmetrical because look what happens okay my thumb are in two different directions so I can't fold it over now I can fold it over fold it over they match up so if I now go like this are they symmetrical they are not and we go wait the thumbs are pointing in the same direction absolutely but there's stuff that doesn't line up so that's what symmetry means symmetry means that it's the same split down the center now when we say split down the center what we really mean is split along the mean which is the average so right here my hands are split along the mean which is the line in between the two so skewness is a way of describing that something leans one way or the other so when we have skewness and we want to reference it as being leaning to the left of the mean that's what we would reference as negative skewness uh or well that's prob me what we would reference as left skewed and then we would have right skewed as where things are more to the right of the mean now skewness is something that's very easy to compute in Excel all you have to do is literally just type skew okay so let's go to right here we say okay so for these numbers here and maybe this isn't obvious but which way would you say it's skewing so here's the mean the mean is 1994 and where are these numbers okay so this number is bigger than the mean right this number is bigger than the mean this number is bigger than the mean this number is bigger than the mean let's find out what the skewness is ooh negative value interesting interesting okay so now let's change some things let's change back to we're going to cover this [Music] up okay so what's your guess so here's the mean as a for that problem what's your guess of what the skewness will be o correct guess want to say it zero why do you why do you think it's zero yeah everything is spaced evenly from the mean so let's check oh very nice very nice indeed okay so now let's so now we just added a lot of weight and now we have a positive skewness let's add a lot of weight in a different way so what do you suppose negative skewness means negative yes yeah it means like there's a lot of weight of the numbers to the left of the median what do you suppose right SK uh right or positive skewness means excuse above the mean okay curtosis this is a fun one so this is probably going to be where we end off for today so curtosis measures what we would refer to as the tailedness of the distribution now I'm going to probably add a topic in here to help with the concept of curtosis making more sense but I'm going to explain it in the following way so imagine that we lined a bunch of people up in the room and we lined them up just kind of in a row of where we would have then the person who was uh the mean height in the center and then we would have the tall folks over here and the short folks over here okay so it goes shorter the taller and then right at the center then we have then uh the person who's the mean which is the average height okay well perosis is a measurement of when we look at these extreme people over here who are the really tall folks are there a few tall folks really tall folks or there are a lot of tall folks and then similarly down here are there are there a few shorter people are there a lot of shorter people curtosis is a way of measuring how many people are at the ends of things and if your curtosis is Big you have a lot of people who are at the ends and if your curtosis is small you have a very small number of people at the ends which is a way of kind of talking about how things are kind of tucked in towards the center now that's different from variant because it's possible that you have people who are spread far and wide but if there's not a lot of them who are out there at the end then you have a small curtosis and the curtosis is something that's very easy to find using Excel as indicated right here so next time what are we going to do well next time we're going to kick off from here and we would like to finish with uh this particular module topic by continuing on distribution so before though I continue one thing I wanted to ask was is so I'm trying this out as a presentation method was this okay to follow along with did we like this so do you so I know you haven't used it yet but do you suspect that this is going to be helpful to have little Excel formulas on all the okay so I'm going to attempt to keep this up if there are things you'd like to see in this like you you'd like more whatever let just let me know yes yes I'd like to try to add that I was trying to figure out so uh I'm still figuring out how to do that in Google sites I have some pictures I want to add yeah and I'm what I might do is is that I might link to other things rather than just try to write because I wrote this in HTML so but thank you I I want to do that and if you if I don't put them in please remind me and I'll add them in yeah so have a good one