Transcript for:
Choosing Measures of Center and Spread

In section 3.4, it's now going to bring it all together. So many times we want to ask a question: What will typically happen? So many times we want to ask a question: How do these two data sets, that's varying, to be able to answer that question we need Center and spread. And yet the thing is, you got to figure out which one am I going to use: median, IQR, or mean, standard deviation. And what we have discovered after studying 3.1 and 3.3 is the first thing you need to do is look at the shape of your graph. Why? Because the shape of your graph is going to directly dictate what Center and what spread we are going to use. In particular, when your graph is skewed, whether skewed left or skewed right, your Center will be median, your spread will be IQR. Versus if you identify your graph having a shape of symmetric, your Center will be mean, your spread will be standard deviation. That will always be the steps you need to take: ask yourself what is the shape of my graph because the shape will directly lead to what Center and spread you will use. Now let's talk one more time about why. Why with symmetric data are we using mean and standard deviation? Well, I want you to look at the graph, the graph of the symmetric graph, and I want you to see how the mean is in the middle. I want you to see that the mean and the median are practically identical. See, mean with symmetric graphs will be in the center, so that means median and mean are practically interchangeable. So if mean and median are interchangeable, you can use either. Why use mean? It's because mean uses all the data values to calculate it. We saw that earlier, right? After the break, you added all of the five data salaries to calculate the mean. That's powerful because it means in that calculation, every data value is represented. Same thing with standard deviation. In standard deviation, all the data values are utilized to calculate it. So when it comes to symmetric data, because mean and median are interchangeable, let's use the form that's ultimately going to use all the data values, whereas skewed is going to have these tails, skewed is going to have these outliers, skewed is going to have these values that are ultimately going to draw the mean away from the median. Notice, notice how when we are looking at a skewed right graph, the mean is pulled towards the tail on the right. Notice when we have a skewed left graph, notice how the mean is pulled towards that tail. What's the issue with that? Well, quite literally, if you look at where the mean is, it's nowhere near where the majority of the data is. And so because of that, when you have skewed data, when you have skewed data that will most likely have outliers, mean is not going to work because of the fact that mean is affected by those outliers. So how do we deal with that? We use the middle. We use the middle and again what's middle median, what's middle IQR. That is the reason why with skew data you use median and IQR. Now at the end of the day, I want to remind you guys that both mean and median will represent Center. They'll both represent the typical value. And so because of that, mean and median are going to have the same interpretation, alright? They're both going to have the same interpretation of the typical value is blah, blah, blah, alright? So mean and median will have the same interpretations. However, I want you to see that standard deviation, which is talking about majority 68%, and IQR, which is the middle 50%, are vastly different interpretations. These are very different interpretations.