Transcript for:
Histogram Shape Analysis

Let's recap: What are we doing here? What we're doing is looking at numerical variables. What we are doing here is looking at numerical variables, whether it is the number of visits or birth weight. These are both numerical variables; therefore, we can graph them as a histogram and ultimately ask three basic questions about their shape. The three basic questions about their shape will be about symmetry: Is it symmetric? Skewed left? Skewed right? Modes: Is it unimodal? Bimodal? Multimodal? And outliers: Are there any outliers? You can only pick one for each. I want you guys to look at the graph on the left and answer those three basic shape questions. Do the same thing on the right. First graph, number of visits: What is the shape of the graph? Symmetric, skewed left, or skewed right? You're right! It's skewed right because the shortest rectangles are on the right. What is the mode here? Is it unimodal, bimodal, or multimodal? It's unimodal because if you were to trace the outline of this histogram, we can see there's only one distinct peak. Outliers: Does it look like the left histogram has an outlier? We can obviously see that there are outliers from this right tail. The outliers in this graph are obvious. Let's keep going, guys! Let's go to the graph on the right. What is the shape of the graph on the right? Skewed left. While the tail is not as pronounced as in my other example, we do see a little bit of a tail here on the left, so I would say that this is probably a left skew. All right, cool! Let's keep going. Let's talk about mode. Now, in this case, what is the mode of this right-hand histogram? I love doing this example because this is where I can always clear up a little confusion. We're pretty 50/50 on unimodal and bimodal. The answer here is unimodal. Let's discuss why. Remember, unimodal and bimodal are all being asked by the question of, if I looked at the outline of this graph (yes, that orange outline of the graphs), how many peaks do I have? One. So, a common misconception with bimodal is you'll see this peak and you'll see this peak being made by two bars, and you'll think two bars must mean bimodal. That's not the case! It's not about the bars' heights; it's about the overall peaking of this graph, and we see it's only going up and down once. That one peak represents unimodal. While the bars don't need to necessarily be the same height to create the peak, it's about the outline. All right, now, a question that's being asked is, when it comes to bimodal, do they have to be the same height? No, but they need to be higher than the rest of them. Do you see that with this first graph there? The reason why this graph here in orange is ultimately bimodal is that these two bars are next to each other; there's no gap between. Outlier: This is one where the graph is not going to help us answer this question because even this tail is still pretty close to the rest of the data. Again, that's where I'm going to tell you guys in section 3.5; I'm going to teach you guys a mathematical formula that's going to draw a line in this number line that will say every number past it will be an outlier. So, yeah, this one was a little inconclusive.