Every single one of these colors—blue, purple, yellow. Brand: Pepsi Cola. Shapes: circle, triangle. What I want to emphasize with categorical data is they all use words, they all use labels. So, when it comes to graphing categorical data, there is actually a lot—a lot—that is similar between graphing categorical variables and numerical variables, with one distinct difference. So let's get to that right now. There are two ways of graphing categorical data: bar charts and pie charts. Where, in particular, when we look at a bar chart, there are going to be two things that are identical. The two things that are identical are first: the vertical axis, once again, is going to represent frequency. So, once again, we are going to make bars where the height of each bar is representing how many times that observation came about. That idea is still exactly the same. And you know what's also still the same? The horizontal axis representing my variable. Once again, the horizontal axis is going to represent my variable. So, in a lot of ways, categorical and numerical graphs are the same: you use bars with frequencies on the vertical axis, you put the variable on the horizontal axis. And so, what is the big difference? The big difference is that categorical variables use words. So, that means each bar, it means the labeling on your horizontal axis, will now be words instead of using a number line and instead of using numbers. The big difference here with the graphs is that you are using words now. But outside of that, everything about graphing with bar charts is exactly the same, where what you're doing is counting how many people have green eyes; you will draw a rectangle whose height represents how many people have green eyes. So, in a lot of ways, bar charts and histograms have many similarities. And if you're wondering what are the differences, hang tight, we're going to talk about that in about two minutes. Alright, so this is a bar chart: you list the categories on the horizontal axis, you create rectangles where their heights represent frequency. Now, when it comes to looking at bar charts, a lot of times with categorical data, we are going to be interested in the category with the highest frequency. We're going to give that a name in section 2.4, but for now, can you tell me which category has the highest frequency? Brown. Alright, and similarly, when it comes to rank, which category had the highest frequency? Private. Again, we're going to give a name to these categories in just a little bit. But particularly when it comes to looking at graphs of categorical data, it's this category—the category with the highest frequency—that we are usually most interested in. And now, the beauty about working with words, with labels, is that order doesn't matter. Whether Brown occurred first, second, third, or fourth, it would still be the category of highest frequency. And so, because of that, when it comes to categorical data and making bar charts, we can swap around the locations of the bars as much as we want. And so, because of that, many times, we will then create a Pareto chart. A Pareto chart is ultimately a chart, a bar chart, where we will make a point to put the category with the highest frequency first. So, notice how that category of the highest frequency, literally the tallest rectangle, is now first, and that each subsequent rectangle has lower and lower frequency. I want to emphasize that the bar chart above and the Pareto chart below are the same exact data, and that you can swap the order of the bars so that you can write them from highest to lowest frequency. And when you do that, that makes a Pareto chart. And so, we already mentioned in yellow that bar charts and histograms for numerical data in many ways have a ton of similarities. And so, the big question then is, okay, Shannon, then where are they different? Where are bar charts different from histograms? First and foremost, the obvious: bar charts are used for categorical data, whereas histograms are used for numerical data. The obvious, but why is this so important? Well, let's go back and remember: when you are looking at a bar chart, that horizontal axis is going to be words and labels—words and labels—whereas when you're looking at a histogram, that horizontal axis is a number line. And so, inherently, already, one of the big differences is bar charts will use words, whereas histograms are using numbers. And acknowledging that is going to let us understand the next three rows. So, for starters, as we mentioned with bar charts, with categorical data, with words, order of the words does not matter. That's the reason why we were able to make our Pareto chart and just swap around the order of the bars, because the order of the words doesn't matter; there's no natural sense of order with words in general, whereas numbers have a very clear order—you do one, then you do two, three, four; there is an order to numbers. So, you have to keep those ordered numbers in that proper order. Second, in bar charts, the width of the bars has no meaning. Meaning, if we were to look at this graph about eye color, the width of these purple bars means nothing; we needed the width to be the same. Alright, the bar width still needs to be the same, but there's no meaning behind the width of these purple bars. Whereas when we looked at histograms, let's go back and remember, the width of the bars had meaning: you increase by one. Later on, we saw you can increase by 21. The width of the bars for histograms have meaning and need to be the same size. Lastly, I don't know if you noticed, but when we made our bar charts, there were distinct spaces between the bars. That is one thing you always do with bar charts: you always leave a gap between the bars to really emphasize individual categories. Versus with histograms, I mentioned this already, but the bars touch. And that's because numbers occur continuously. And so, because numbers occur continuously, those bars need to touch to emphasize all of the numbers within that range. The other type of graph for categorical data is pie charts. The majority of you have probably seen a pie chart before, where each category gets its own slice. So, we can see here the different categories of eyes get their own slice, and that the size of the slice is important because the size of the slice is proportional to that category's frequency. So, what do I mean by that? Well, we took how many brown eyes there were and divided that by the total, and that gave me 47%. We took how many green eyes there were and divided that by the total, and that gave me the percentage: 11% of green eyes. When it comes to making a pie chart, what makes a good pie chart is when you include the percentages. What makes a good pie chart is when you include the percentages. So, what I want you to see is that the graph on the right of the military ranking is actually a bad graph because there are no percentages. And while we can clearly see private is, in fact, the biggest category, is, in fact, the most well-represented category, the issue is, it's hard to tell if corporal and major are the same or are they different, and by how much different. And so, the badness of this right-hand graph is that there's no percentage there. A little bit later on, we will talk about pie charts, and one other thing I want to emphasize is what makes a good pie chart is when it's two-dimensional. Alright, three-dimensional pie charts are complete garbage. Don't ever trust them, and we'll talk about why a little bit later.