Transcript for:
Effective Graphs in Statistics

When it comes to statistics, the first step – in general, the first step – is always making a graph. So that we can do an initial observation about what our data is like: how will it look? What will typically happen? How is it going to vary? It's always the first step. And just like in anything you do, you want to make sure that first step is right. And so, what I want to discuss then in section 2.5 is what are ways graphs can be done wrong. I want you to understand what they are, why they are wrong, and how to correct them so that every time you do your first step of making a graph, you're making an accurate graph. All right, so what are ways graphs can be misleading? The first one is when your graph does not start at zero. All right, the first way a graph can be misleading is if your graph on the vertical axis does not start at zero. And I need to emphasize, it's the vertical axis. Let me explain why with this little story here. Let's say that I am the governor of Gotham City and I want to run again for governor. And so, I start throwing this graph all over town to emphasize to the Gotham residents: 'Look at how much I brought down crime! I am such a great Governor because look at how little crime we have now! You want to vote for me!' Why is this misleading? This is misleading because it's creating a very great difference in the four years this governor was in office in the amount of crime. But here's the thing, if you were a statistician living in Gotham City, you would say, 'No, no, no, no, that's wrong. It's wrong to ultimately start my vertical axis at four, rather, Governor, let me correct the graph for you and have the vertical axis start at zero.' And notice that when you do that, you make it clear that while crime has gone down, the difference is not as great. It's not as great as what the misleading graph was doing. So what's the bottom line? It's wrong to start your vertical axis at any number other than zero. What's the second misstep when it comes to working with graphs? The second misstep is that you will use symbols other than bars for your graphs. So for instance, like we saw when it came to looking at histograms and bar charts, we use bars. But sometimes people like to use pictures. Use pictures to emphasize frequency. But what's the issue with using something other than bars? We literally cannot identify what the frequency is. Is the frequency the top of the box? Is the frequency the top of the house? Is the area of each of these squares? See, when you use graphics, it becomes very difficult to actually analyze what it is you are trying to see. And so, how will you correct that? You will only use bars of equal widths. So again, it's exactly what we've seen in all of our previous graphs. You use bars of equal widths. And again, this applies for histograms and numerical data as well as bar charts with categorical data. You always use bars of equal width. What is the last misstep? The last misstep is that you use unequal bars. We just talked about that in the correction above. But ultimately, why do you need to use equal-sized bars again? It's because when you have unequal-sized bars, it gives one category, it gives one range of values a greater sense of being represented than it really is. But another way unequal bars are going to come about is 3D pie charts. And I mentioned this already: 3D pie charts are the worst. If you ever see them, throw them away. Why? Because unequal width bars are emphasized in 3D graphs. Because whichever pie piece is in front of your face will seem larger than it really is. Notice how this tan slice seems huge in this 3D graph, yet in reality, that tan slice is actually not as much of the pie chart as we saw. So bottom line is 3D graphs are the worst. 3D graphs are the worst. And rather, the good graphs are two-dimensional pie charts. And really, in general, all your graphs should be two-dimensional graphs. All your graphs should be two-dimensional.