Transcript for:
Visualizing Statistics: Frequency and Histograms

Hello and welcome to lesson number three in our statistics class. Today we are going to move away from the vocabulary of statistics and we are going to start talking about what it means for us to be able to visualize statistics. And we're going to talk about two special types of graphs.

We're going to start off by looking at something that we call a frequency distribution and then we're going to talk about how we can turn it into a graph. So let's get started. turn a frequency distribution into something that we can picture and visualize, we're going to turn it into something that we call a histogram. So for lesson number three, we're going to learn about different ways in which we can visualize statistics using, first of all, something that we call a frequency distribution.

And then when we're done talking about frequency distributions, we're going to talk about how we can use frequency distributions to construct something that we call a histogram. And a histogram gives us a shape or gives us information regarding what the shape of the data is going to look like. So let's begin.

When we talk about a frequency distribution, it is a list, a table, or a graph that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval that we call classes. It gives us a quick snapshot of how the data is distributed.

So we're going to work backwards first. I found a good example of a frequency distribution on the Internet that I liked. Notice I changed this value and I'll talk about that in a minute. But what I'd like for you to do now is just kind of take a look at this data here.

So here is a data. Here's a frequency distribution. And it's a frequency distribution.

Now, I'm not telling you what it's for. See, if you can just look, right, we have this column over here, which is annual income. This column over here, which says frequency. This frequency, this one, which says relative frequency.

This one says percent. I really want to focus on what's going on here. We'll define these two a little later.

Okay. So annual income, $1 to 20,000, frequency is 5. 20,001 to 40,000, frequency is 3. 40,001 to 60,000, frequency is 33. 60,001 to 80,000, frequency is 6. 80,001 to 100,000, frequency is 3. This is an example. Now, again, we're not defining it so we can get a little bit creative with this if we want to. But this is an example of a graph over here, or a table.

We can call it a table. over here, where we asked, well, we have a total frequency of 50. So maybe we asked 50 adults, what is your annual income, right? So one thing, one way that we could think about this graph is let's ask 50 people, a total of 50 people, what your annual income is. Now, the first thing that I want you to notice here is that the annual income, we have one, two, three, four, five, we have five. discrete bins or groups that we break this up into.

Five discrete bins or groups that we break this up into. And this is what we call a class. Okay, this is what we call a class.

So a class is a bin or range of values that we break. frequency distribution up into. Okay.

Notice here, we have five classes. We have five different bins, right? Our first bin is, well, how many people earn between $1 and $20,000 a year?

Our second bin is between 20,001 and... $40,000 a year. Our third bin is with $40,000 and $60,000. Our fourth bin is $60,000, $1,000, $80,000.

Our fifth bin is $80,000, $1,000 to $100,000. So classes represent a bin or a range of values that we break a frequency distribution up into. Notice that these classes all have the same range, that these values, these ranges are all the same in each one of them. And that's because of the next definition.

We have something that we call the class width. So let's talk about what the class width is. The class width is the difference between Any two consecutive lower limits in a class.

Ooh, so what does that mean? The difference between any two consecutive lower limits in a class. Now let's talk about what that means.

Okay, so I'm going to go to class limits and then we're going to come back to class width. Let's talk about class limits and then come back to class width. Notice that in each one of these classes, they consist of two different numbers. There's a small number and a big number. There's this smaller number in each one of these classes.

Here's the smaller number, here's the bigger number. Here's the smaller number, here's the bigger number. Here's the smaller number, bigger, smaller, bigger.

In each class, we have something that we call class limits. We have something that we call the lower limit or the lower class limit. and that represents the smallest value of the class.

And then we have what we call, guess what, if one's the lower limit, the second one is called the upper limit. The upper limit represents the largest value of the class. Okay, so each class has an upper limit.

and a lower limit. All right. So class one, this first class over here has a lower limit of one and an upper limit of 20,000. Okay.

Quick quiz. What is the upper limit of class four? Upper limit of class four, the upper limit of class four.

So we're going to go to class four. We look at the largest value that's represented in that class, which is 80,000. And that represents the upper limit of class four.

Okay. So Let's use that definition of class limits to help us out with class width. So the definition of class width is the difference between any two consecutive lower limits in a class. So let's look at consecutive lower limits in a class.

And consecutive means next to. So consecutive 1 and 20,001. These are two lower limits that are next to each other.

Consecutive 20,001 and 40,000. Consecutive 40,001 and 60,001. Consecutive 60,001 and 80,001.

Now when you hear the word difference, difference means subtraction. So if we subtract any two consecutive lower limits in a class, okay so let's try it. So 80,000, 1, minus 60,000, 1, gives us, when you subtract these two, what do we get?

Well correct, we get 20,000. Interesting, hold on to that number. Now what about when we can, when we subtract 60,001 to 40,001. So 60,001 minus 40,001. Interesting.

20,000 again. Okay, well, what about from 40,001 to 20,001? Okay, well, the difference between those two consecutive classes is once again, it's 20,000. Interesting. And one more consecutive class between class two and class one.

So 20,001. And one gives us 20,000. So this idea of a class width is fixed. That number will always be the same.

That is, if you take the difference between any two consecutive classes, the class width will always be the same. So the class width of this frequency distribution is 20,000. And you don't have to do it four times if you have five different classes.

You only have to do it once. And that value will always be the same. OK. So. Once again, we have this idea of classes, and classes are the total amount of bins or range of values.

We have five classes over here. We have the class width, which is the difference between any two consecutive values, and that's constant. That value stays the same in each frequency distribution. We have the class limits, right, and that represents the, we have a lower limit, lower limit, which is the smallest value in the class. And then we have an upper limit, which is the largest value in the class.

Each one of these classes has a lower limit and an upper limit. Now we have something that we call the frequency. The frequency of a class is the number of data entries in the class.

Well, what does that mean that the frequency of the class is the number of data entries in the class? Well, let's take a look over here. If I were to say to you, what is the frequency of the first class? Well, over here it says five.

What does that mean? That means out of the 50 people that I surveyed. Five of those people told me that they earn between $1 and $20,000 a year.

Maybe these five people, maybe one person earns $5,000 a year. Maybe another person earns $19,943 a year. Maybe another person earns $2,500 a year. Maybe another person earns $7,000 a year.

Maybe another person earns $12,643 a year. I really like that 43 today, I guess. But anyway, all five of these values are values between 1 and 20,000. Now, the thing that we don't like about frequency distributions is they don't give you the exact data value. We don't know what those five are.

We could guess what those five are, but we don't know what there are. But what we do know is that five people earn between $1 and $20,000 a year. If I were to say to you, what is the frequency of the third class? The question I'm asking you is how many people...

earn between $40,001 and $60,000 a year? And the answer to that is 33. If I were to say to you, what's the frequency of the fifth class? The fifth class, $80,001 to $100,000.

How many people earn between $80,001 and $100,000 a year? The answer to that would be three. Now remember, we don't know exactly what those numbers are, but we know that three people earn between here.

Maybe we could say one person earns $85,000, one person earns $99,600. and one person earns, I don't know, 93,489. We don't know what those three values are, but we do know that they're between 80,001 and 100,000. So that's the definition of frequency.

All right, so what other definitions are we going to learn about here? Now, when we talk about constructing a, when we talk about constructing a frequency distribution, this is really what a frequency distribution looks like. It gives us the lower limit and upper limit of each class, and it gives us the frequency of each class as well.

But we can break it down a little further. Maybe we want some more information about what's going on in this class. Another way that we can break this down is into something that we call relative frequency.

Let's talk about what the relative frequency is. Now, relative frequency, anytime you guys hear relative, relative frequency, I want you to think about this in terms of being a decimal or percent. You're saying to yourself, what percent of the data ended up in this class? OK, so the question you're asking yourself when we talk about the relative frequency is what percent of the data ended up in this class.

And the way that you find it is this. You take the frequency of each class. and you divide it by the total frequency. For example, over here, if we take a look at the first class, the frequency of the first class was five. The total frequency was 50. Five out of 50 people earn between $1 and $20,000 a year.

If you get on your calculator and take five and divide that by 50, guess what you get? You get 0.10. So that's the relative frequency. And we can talk about relative frequency in terms of being a decimal or in terms of being percent.

Remember, to convert a decimal into percent, you pick up the decimal and move it two places to the right. So 0.10 is the same thing as saying 10%. If I were to say to you, what is the relative frequency of the second class?

Right. So, again, let me just number these classes over here so we don't forget that. The relative frequency of the second class.

OK, so that's the same thing as saying three over. 50. Three because there's three in this class, 50 because there's 50 altogether. If you take three and divide it by 50 on your calculator, you'll get 0.06. To convert that into a percent, pick up the decimal and move it two places to the right, and that's 6%.

Okay, let's do one more class. If I were to say to you, what is the frequency of the third class? Right, there's 33 people. A lot of people earn between $40,001 and $60,000 a year. In fact, 33 out of 50 total people earn there, earn between these two amounts.

So if you take 33 and divide it by 50, you'll get 0.66. Pick up the decimal, move it two places to the right, and that gives you the percent. Okay, so the relative frequency is found by looking at the frequency of the class and dividing it by the total frequency. Let's take a few other definitions that go hand in hand with us.

Now we also have something here that's called the midpoint. So let's talk about what the midpoint is. Now the midpoint, or the definition of the midpoint, represents a single value in which you can talk about each class. The midpoint is the point halfway between the lower limit and the upper limit of each class. So instead of using two values to represent the class, let's just say, you know what, I don't want to use two values to represent this class.

Instead of using two values, I want to use one value to represent that class. The value that we would use is called the midpoint. So let's talk about how we find the midpoint.

I'm going to do this. We're going to practice with this in a second, but I'm just going to put midpoint over here. So I want you to think about how do you think we would find the halfway point between 1 and 20,000?

You guys know how to do this. You've done this before. Here's how you find the midpoint. It's an average. So you add the lower limit to the upper limit.

Remember in how you find the average of two things, you find that point that's exactly halfway between, then you divide it by two. So for each one of these, if we want to find the midpoint, so to find the midpoint of class one, I'm going to take the lower limit, which is one, add that to the upper limit, which is 20,000, hit equals on my calculator, then hit divide it by two. So the midpoint of this first class is going to be 10,000.5. Again, lower limit plus upper limit divided by two. Let's take a look at the midpoint between a few other classes.

So the lower limit over here is 20,001. The upper limit is 40,000. Add those two together, we get 60,001.

Divide by two, we get 30,000. 0.5. Okay, so the midpoint of the second class is 30,000.5.

Let's do two more midpoints. Let's try to do the midpoint between these two, 40,001 and 60,000. Again, the midpoint is just that point that's halfway in between the lower class and the upper class. Add those two values together. So 40,001 plus 60,000 divided by two, that gives us 50,000.

0.5 and for the last two 60 I'll let you guys do the last one on your own but for the fourth class over here 60,001 plus 80,000 gives us 140,001 divided by two we're going to get 70,000.5 okay see if you can find the midpoint of this last one real quick And hopefully you end up getting 90,000.5. All right, so that's the definition of the midpoint. It's the point that's halfway between the lower limit and the upper limit.

Now, one more thing that I'd like to talk to you about is this thing that we call cumulative frequency. Cumulative frequency. And so what do we mean when we talk about this idea of cumulative frequency?

Here is what cumulative frequency means. I'm going to scroll down and then I'll scroll back up again. When we talk about figuring out with the cumulative frequency of each classes, it is the total frequency of the class you are on and all classes that come before it. You know, I want you to think about it. We hear sometimes in the weather, you'll hear about a cumulative snowfall, right?

And so what they're doing is they're talking about the total snowfall, the snow that was already on the ground, plus the extra snow that they added on from that new snowfall, right? They'll say the cumulative amount of inches of snow on the ground is 14. That means that the new snowfall that you had dumped on a couple extra inches, and now it's a total of 14. It's exactly the same thing. When we talk about constructing the cumulative frequency, excuse me, I'm trying to erase this, it's exactly the same thing when we talk about trying to construct the cumulative frequency of a frequency distribution. So here's how it works. Hold on one second.

No, don't do that. Let me see if I can get back on here. Okay.

Okay. So we have the frequencies, right? We have these frequencies over here.

Let's put one more little bit on here and I'm going to call this last guy the cumulative frequency. I was trying to erase this. There we go. All right, here's how the cumulative frequency works. You take a look at each class.

You take the frequency of that class and add the frequency of any class that comes before it. All right, so let's take a look at the frequency of the first class. Well, the frequency of the first class is five.

There are no classes that come before it, right, because it's the first class. So the cumulative frequency of that class is five. Now the cumulative frequency of the second class, you take the frequency of that class with just three and you add on the frequency of any class that comes before it, which is 5. And 3 plus 5 gives us 8. So the cumulative frequency of the second class is 8. The cumulative of the frequency of the first class is 5. Let's go on to the third class. Well, the frequency of the third class is 33. Add on all the classes that come before it, so we have plus 3 plus 5. When we add together 33 plus 3, that gives 36 plus 5, that gives us 41. So the cumulative frequency of the third class is 41. The cumulative of the frequency of the fourth class, well the frequency of that class is 6. Add on all the classes that come before it, 33, 3, and 5. Okay when you add together 6 plus 33 plus 3 plus 5, we are going to get 47. And then for our last class, we have a frequency of 3. Add all those classes that come before it, 6, 33, 3, 5. Sorry, I should have left more space there, plus 5. When you add up all those numbers, we're going to get 50. Now, for the last class, the cumulative frequency should always be the total frequency, right? Because you're done there.

So when you add them up, you should always get that. At the last class, you should always get the total frequency there. So here are the different parts of a frequency distribution.

Now, in a few in the next thing that we're going to do instead of deconstructing a frequency distribution which is what we just did is we're going to construct one. So let's take a look at the directions for how we construct a frequency distribution based on all of the data that we get when we start collecting it. So let's talk about what it means for us to construct a frequency Now the first thing that we have to do is we have to decide how many classes that we want our data to have.

And normally when you define how many classes that you want to have it's between 5 and 20. 20 classes if you have like tens of thousands of data pieces in there, and then 5 classes if we have a handful of pieces of data in there. Now a handful, a handful could mean, you know, as the example we had up above, we had 50 pieces of data. So the smaller amount of data you have, the less classes you use, the larger amount of data you have, the more classes. Normally for your homework, we'll tell you how many classes we want you to use. But just keep in mind a ton of data.

You use a lot of classes, small amounts of data. You use a small amount of classes. So the first thing we have to do is we have to decide on how many classes we want to use.

And then we're going to find the class width. Here's how we find the class width. We start off with the largest data value, subtract from it the smallest data value, and then find the number of classes.

So you take the range and, excuse me, and then divide it by the number of classes. So you take the range of your data and divide it by the total number of classes that you want to use. In the example that we do, and in the examples you're going to see in your homework, we're going to tell you, so in this example I'm telling you I want you to use seven classes. Now once you get your class width, you always round up. up.

What does that mean? That means even if you get 14.1, we're going to ignore as your class width. We're going to ignore any type of rounding rules that we know, and we're always going to round up to the next integer value. So we're going to call that 15. If we get 16.99, we're going to round that up to 17. If we have a class width of 13, even if you get a whole number as your class width, you always round up.

We're going to call that 14. Otherwise, you won't capture all of your data. So keep in mind, ignore regular rounding rules and always round up to the next integer data value if we're dealing with integer data, which we will be in this class. Now the next thing we do is we start with the smallest data value, and then we add the class width until you have all of your lower class limits. Okay, so you're going to start with your smallest data value. You're going to look in here and determine what your smallest data value is.

Add the class width until you have all of your lower class limits. And then you're going to figure out what your upper class limits are using one integer value smaller than the lower class limit of the class that comes out. Now, this is a lot of blah, blah, blah for a concept that's not too bad, and we'll talk about that in a second. And the last thing that we do is we take tallies or we take frequencies.

So the last thing we're going to do are take tallies or frequencies. So, OK, let's take a look here. First thing I'm going to do is I'm going to ask myself, well, OK, let's take a look at the data. Using the data below, construct a frequency distribution for the daily high temperatures in New York City and Central Park during the month of May in 2021. So this is real data that I got my hands on.

And these are the daily high temperatures in May 2021 in Central Park. OK, so here's our data. And I want to use seven classes.

So we're telling you how many classes to use. So knowing, first of all, how many classes we use, I'm going to create this chart and I'm just going to start writing in. OK, well, I want to use seven classes.

One, two, three, four. five, six, seven. So I'm going to leave seven blanks over here.

I'm going to leave seven, make this giant table for seven blanks. Now remember, this is the frequency distribution, the classes, the class limits, and the frequencies. This is just all bonus extra stuff that we're going to work on.

Okay. So we determine how many classes, we'll call that part zero, number of classes. Now I have to find the class width. I'm going to take the largest data value, subtract from it the smallest data value, and divide by the number of classes. So let's take a look on here.

What's the largest data value? When was the temperature the warmest? What was the high temperature? May I have an 89 over here? I don't see anything that's smaller.

So the 89 was the largest data value. Okay, now what's the smallest data value? I see some in the 60s, 50s, so we know it's going to be in the 50s, none in the 40s. far.

So the smallest one in the 50s looks like it's 51. Okay, so we're going to take the largest data value, which is 89. We're going to subtract from that the smallest data value, which is 51. And then we're going to divide by the number of classes, which is 7. Okay, so I'm going to get on my calculator. I'm going to do 89 minus 51 equals, divide that by 7, and I end up getting 5. point four two eight. Remember always round up.

Always round up to the next integer value. So my class width is going to be six. Six is my class width.

Okay so my class width is going to be six. Start with the smallest data value. Add the class width.

Okay so what's our smallest data value? We know the smallest data value is 51. So that's going to be my lower limit for the smallest class. class.

Don't start at zero. Don't make up a number to start with. The definition of the lower limit is the small of the lower limit of the smallest class is the smallest value that you have. So I'm going to call this 51. Now I'm going to figure out what all of the lower limits of my classes are. And I'm going to get that by adding the class width to each one of them.

So we're going to add six. So 51 plus six, right? 51 plus six gives me 57. 57 plus six.

gives me 63. 63 plus 6 gives me 69. 69 plus 6 gives me 75. 75 plus 6 gives me 81. 81 plus 6 gives me 87. All right. So again, you start with the smallest data value that you have, which is 51. And then you add the class width to it, which is six. And this will give you the lower limit of all of your classes. This is a lower limit of all of our classes right here.

Next, we need to. All right. So let's let's go back up here.

Sorry. So we took care of the number of classes. We found the class width.

We're always rounding up. We started off with the smallest data value and added the class width until you have all of your lower class limits. Now we have to find our.

upper class limits. Here's how we find our upper class limits. Okay, let me erase this so this isn't confusing. Remember, we got all these values by adding the class width, whatever the class width is, onto each one of these.

Now, to find the upper class limit, because remember, each class is a range of values. It's a small value and it's a large value. Here's how we find the upper limit. Notice that I have a lower limit of 51 and I have an upper limit of I don't know.

But in the class that follows it in class 2, I have a lower limit of 57. So if class 2... ends at 57, that means class 1 has to begin right before that, which is at 56. So to find the upper limit of your class, you go to the lower limit of the class that comes after it and subtract 1 from it. 63 minus 1 is 62, right?

If this starts at 63, we don't know where it ends, but the class after it has to start at 69, that means this guy has to start at 68. Okay, same thing. This guy begins at 69. We're not sure where it ends, but we know this guy begins at 75, so it has to end at 74. This guy starts at 75. We're not sure where it ends, but this guy ends at 81, so that means 81 minus 1 is 80. This guy starts at 81. We're not sure where it ends, but we know that the class after it ends at 87. 87 minus 1 is 86. And the question is, well, how do you know where that last class is going to end? Right.

How do I know where this guy's going to end? Now, remember that these values between it are constant, right? Because we added 60 each one of those, which means that the distance between the lower limit of the upper limit are going to be constant as well. Think about it. How do you get from 51 to 56?

What do you have to add to 51 to give you 56? That's five. Right.

Does that work here? Well, 57 plus five gives you 62. 63 plus 5 gives us 68. 69 plus 5 gives us 74. 75 plus 5 gives us 80. 81 plus 5 gives us 86. And to figure out what this one is, we would do 87 plus 5, which gives us 92. Okay, so this is constant. You don't have to do this. If you just do it once or twice, you figure out what the pattern is, then you'll be able to figure out where that ends.

Now that we have... the classes and the class limits, now we're going to take frequencies. Well, how do we take frequencies?

Well, here's how we take frequencies. I'm just going to erase all the superfluous information here that we don't need. Here's how we take frequencies.

Okay. So 65, right? 65 belongs in which class? Well, 65 is between 63 and 68. So that's going to go there. I'm going to tally it for one.

  1. I'm going to take tallies here. 82 goes between 81 and 86. Tally it for there. 66. 66 goes between 63 and 68. Tally it for there.

  2. 73 goes between 69 and 74. Tally it for there. 62. 62 is between 57 and 62. 66. 66 again. 55. 65. Oh, I did five, so I'm going to do like that, right?

Because now I know I have five pieces of data there. It's easy to do it in groups of five. You don't have to if you don't want to. 60, 67, 68, 72, 76, 79, 75, 78. 82, 86, 76, 79, 89, 88, 68, 73, 86. 81, 69, 51, 51, 70, and we got it.

So here are the frequencies. Now again, you know, you can write them here as tallies. I'm going to write them as numerals.

So in class one, we had a total frequency of three. In class two, we had a frequency of two. In class three we have frequency of five six seven eight in class four we have a frequency of five In class five, we had a frequency of seven.

In class six, we had a frequency of four. And in class seven, we had a frequency of two. So if you add up all these frequencies, it should give you the total frequency or the total number of days that we took the data for.

So let's add these up. Three plus two plus eight plus five plus seven plus four plus two, which gives us 31, which makes sense because there is a total of 31 days in May. So a total of 31. All right, now again, this right here, this is a frequency distribution, but frequency distributions have multiple parts. So let's talk about the midpoint, and then let's talk about the relative frequency and the cumulative frequency. Remember what the midpoint represents.

The midpoint represents the point that's halfway through the class limits. So to figure out the midpoint, you add the lower limit with the upper limit and divide by two. Each class is going to have its own midpoint.

So 51 plus 56. divided by 2 gives us 53.5. 57 plus 62 divided by 2, 59.5. 63 plus 68, divide that by 2, 65.5. 69 plus 74 divided by 2, 71.5. Right, and then you'll get the general idea, 77.5, what is that, it's going to be 83.5, and then this guy should be 90, also last one, 87 plus 92 divided by 2, 89.5.

Okay, so remember the midpoints of all these you would get by adding the class limits together and dividing it by 2. And it's if we want to represent the class as a single value. So class one, single value is 53.5. It has a frequency of three.

Now the relative frequency, remember the relative frequency is your frequency in that class. So class one has a frequency of three divided by the total frequency, which is 31. Okay, so three divided by 31. As a decimal, it gives us 0.096. If it asks for this as a percent, that's the same thing as calling it 9.6 percent.

Okay, so the relative frequency for the second class, 2 divided by 31. 2 divided by 31 is approximately equal to 0.06, uh, 0.065, let's round it to three decimal places. As a percent, that's the same thing, pick up the decimal, move it two places to the right, 6.5 percent. This guy has a frequency of 8, so 8 out of 31. You get the general idea.

I'll just do this here. 5 out of 31, 7 out of 31, 4 out of 31, and then 2 out of 31. We already know what 2 out of 31 is, we did that right there. That's the same thing as saying 0.065 or 6.5 percent. 4 out of 31 is a decimal. That's the same thing as saying 0.129.

as a percent, 12.9%. So 12.9% of the days in May in 2021 were between 81 and 86 degrees, right? That's another way for you to think about it. Between 75 and 80 over here, 7 out of 31 gives a 0.226, rounded to four decimal, excuse me, rounded to three decimal places, or 22.6%. Five out of 31, 0.161 or 16.1%.

Remember, pick up the decimal, move it two places to the right. And then last but not least, eight out of 31. That's the same thing as saying 0.258, which is 25.8% of the days in May. So looking at this, the probability if you come to visit New York in May or if you did come to visit New York in May is that the temperature wouldn't be too hot. The temperature would be because if we look, the highest percent over here is 25.8%. Most of the days are going to be between 63 and 68 degrees.

A small amount of the days are going to be really hot in May. Only 6.5% of the days in May are going to be between 87 and 92 degrees. but you get the general idea for relative frequency.

And lastly, we have cumulative frequency. And remember what cumulative frequency is. You start off with the frequency in your class and add the frequency to any classes that come before it. Well, we just have a three.

There's no classes before it, so the frequency of the first class is just three. The frequency of the second class is two. The class that comes before it is three.

And two plus three gives us a total of five. The frequency of the third class is eight. Classes that come before it have a frequency of two and three. right? So the frequency of this class is the same thing as saying 13. Box it off once we have the answer.

The frequency of the fourth class is 5. The classes that come before it are 8 plus 2 plus 3. But notice also that you could just say 5 plus 13 because that's the sum of those values. So either way will work. And when we do 13 plus 5, we get 18. The frequency of the fifth class is 7. All the classes that come before it are 5, 8, 2, and 3. And that gives us a total of, what is that, 25? The cumulative frequency of the sixth class.

Let's see. So 4 plus 7 plus 5 plus 8 plus 2 plus 3. Okay, when we add all those guys together, we get 29. And remember the... frequency of the last class has to be the total frequency.

So it has to be 31, but you can see it here because we have a total of 29. If we add those last two onto it, you're going to get 31 or two plus four plus seven plus five plus eight plus two plus three gives you that total frequency of 31. Okay. So here's how we construct a frequency distribution. Now the nice thing about getting our data in terms of a frequency distribution is we can easily convert it into something called a histogram. The histogram is a diagram consisting of rectangles whose area is proportional to the frequency of the variable and whose width is equal to the class interval.

Ooh, what does this mean? Professor Hirsch, what are you making us do? I'm going to make you do a lot of fun things.

I'm going to make you do a lot of fun things right now. So the idea is... that we are going to be able to visualize the data that we just talked about over here. We're going to be able to see what this data looks like in graphical form rather than in numerical form using these three components, using classes, class limits, and frequencies. So in order for us to construct a histogram, we need the basic components of what we talked about.

from the frequency distribution. So we're going to start building this histogram using the frequency distribution, but notice there's this class boundaries on here which we didn't have. In frequency distributions we already had the classes, we had the class limits, and we had the frequencies.

So we're adding this extra column that we call class boundaries and we build the class boundaries using the class limits and we're going to talk about that in a second. For now, let's just start like we did before. Let's talk about how we can construct a frequency distribution with this data. It's data that I just made up.

Let's convert the frequency distribution into a histogram. Let's use eight classes. Okay, so remember the first thing that we need to do when we're constructing a frequency distribution is we have to figure out how many classes that we want, and I'm telling you here that we wanted to use eight classes, so we're going to stick with that. Okay, let me label these classes.

One, two, three. 4, 5, 6, 7, 8. Now, the next thing we do is we figure out what the class width is. Remember, the class width, we take the maximum value in our data, subtract from it the minimum value, and divide by the number of classes. So what's our largest value in this data set? Let's look for the largest value in the data set.

I am quickly scanning it and finding a 93 is my largest value in the data set. So the maximum is 93. The minimum or smallest value in my data set here is one. So minimum is one.

And remember for the number of classes we want to use eight. Right and again I'll tell you on your homework normally will tell you how many to use. If you're building your own data then you can determine or if you if you're running your own report using your own data you can determine how many classes that you want.

So 93 minus one gives us 92 and then 92 divided by eight gives us 11.5. So it's equal to 11.5. Remember we always round up.

We always round up. So 11.5 rounded up to the next integer value is 12. So our class width is 12. Okay, in order for us to construct, now we're going to construct the class limits. Okay, so remember what we're doing is we're building a frequency distribution.

So once you have the number of classes, then you have your class width, which we have is 12. Now we're going to find all of our lower class limits. Okay, so the smallest value that we have in our data set is 1. I'm going to start with that. And then we're going to add the class width onto it.

So 1 plus 12 gives us 13. 13 plus 12 gives us 25. 25 plus 12 gives us 37. 37 plus 12 gives us 49. 49 plus 12 gives us 61. 61 plus 12 gives us 73. 73 plus 12 gives us 85. Okay, so we have all of our lower limits for our classes. These are the smallest numerical values that are going to be contained in each one of these classes. Now we have to find the upper limit. Now remember, in order to find the upper limit, We just go to the lower limit of the class that comes before it and subtract 1. So 13 minus 1, 25 minus 1, 37 minus 1, 49 minus 1, 61 minus 1, 73 minus 1, 85 minus 1. Okay, and then remember to figure out what the upper limit is for your last class, right?

Remember, these guys all have a constant width apart. They're all 11 apart here. This would be 85, 13 plus 11, 1 plus 11, 25 plus 11. It's going to be constant.

85 plus 11 gives us 96. Okay, all right. So we have the class numbers. We have the class limits.

Now let's take the frequencies of each one of these classes. Okay, so we're going to do exactly what we did before. I am going to go to, and we'll worry about class boundaries in a second.

I'm going to go to each one of these and make a tally in their appropriate box. And I suggest right now that you, you know, if you printed this out or copied it down on a separate sheet of paper, do the same thing on your end. So let's make sure that we have the same answers.

So you can pause it now and then check when we're done with the frequencies, or you can follow along as I do it. 61, 88, 40. 5, 12, 12, 18, 23, 1, 15, 1, 6, 81, 15, 21. 27, 5, 13, 24, 22, 1, 5, 1, 32, 12, 23, 93. 38, 29, 16, 36, 2, 71. Okay, so regarding what our frequencies are, let's tally up the frequencies in each one of these. 5, 10, 12. Five, nine, four, two, one, two, one, two. So here are what the frequencies look like.

All right, so keep that in mind. Now, something that we have to be careful of is this idea of class boundaries. The last class, when we talked about the definition of different values, we talked about how data can be discrete or data can be continuous, right?

So what I want you to notice is that this data right now, which is discrete, could possibly be continuous. So we didn't talk about what these numbers represent, but what if we had a 12.5? Or what if we had 24.3?

Or what if we had 60.2? Where would we put those decimal values? We don't have a place to put them right now.

And that's why we talk about class boundaries, because class boundaries will force this data to be continuous. It will put it on a continuous scale rather than a discrete scale with little jumps in it, right? There's a jump between 12 and 13. There's a jump between 24 and 25. There's a jump between 36 and 37. So how do we do that? Well, assuming that you have integer values, which we have here, you take the lower class limit and you subtract that from one half. You subtract one half from it.

You take the upper class limit and you add one half to it. And that will force these guys to be on a continuous scale with no jumps. So one minus one half is one half or point five. Twelve plus one half is twelve point five.

Thirteen minus one half. Thirteen minus one half is twelve point five. 24 plus 1 half is 24.5. Again, I'm subtracting 1 half from each one of the lower class limits. And I'm adding 1 half to each one of the upper class limits.

Now I want you to see this. We're creating a continuous scale. This goes right from 0.5 to 12.5.

And 12.5 and 12.5 are back up against each other. Right? And then from 12.5 to 24.5.

24.5 and 24.5 are back up against each other. And then 36.5. 37 minus 1 half.

48 plus one half, 49 minus one half, 60 plus one half. So it's forcing this to be on a continuous scale. There aren't going to be any jumps in our data.

And this will also help us when we go to sketch the histogram because the histogram, all of the data values are going to be back to back with each other. 61 minus one half, 72 plus one half, 73 minus one half. 84, sorry, this should be 72.5. Sorry about that.

84 plus one half, 85 minus one half, 96 plus one half. Now, we only need the class boundaries if we are constructing a histogram. We don't need them for frequency distribution.

So this is just what we add on because we're dealing with histograms. So for histograms, here's the data that we have. Here's the data we need for histograms.

We need classes. We need class boundaries and we need frequencies. So we need these through and you can get the class boundaries from subtracting one half to the lower limit of the class limits and add one half to the upper limit. OK, so so how does this help us? Well, let's do this.

We are going to graph this data that we see right here. We're going to convert it into a histogram. Now, to construct a histogram, you need the general components of a frequency distribution, which we have.

class upper and lower limit, frequency and class boundaries. Class boundaries force the classes to be continuous like we just talked about. So here's how we construct a histogram. The vertical axes of a histogram, this over here, this represents frequencies.

The horizontal axes represents whatever continuous data that you are, whatever continuous data that you're putting into your classes. And it consists of the class boundaries. Well, what do we mean when we say it consists of the class boundaries? So the continuous data, which comes from the class boundaries, is going to be graphed on the x-axis or on the horizontal axes. So 0.5, 12.5.

Okay. So this is the first class. The first class goes from 0.5 to 12.5.

The second class goes from 12.5 to 24.5. So it starts here at 12.5 and then it goes to 24.5. Try to keep them about the same interval apart because remember the class width is the same.

Okay, the third class goes from 24.5 to 36.5. So here's class one, here's class two, here's class three. The fourth class goes from 36.5 to 48.5. The fifth class, so again first class, second class, third class, fourth class, let's see the fifth class goes from 48.5 to 60.5.

Okay, the sixth class goes from 60.5 to 72.5. The seventh class goes from 72.5 to 84.5, and the eighth class goes from 84.5 to 96.5. Okay, so again in between each one of these, these represents the classes.

This is class one, class two, class three, class four, class five, class six, class seven, class eight. Now a histogram has continuous data, which is represented by the class boundaries on the horizontal axes, on the x-axis. You guys remember this is an x-axis. And the vertical axes contains the frequencies.

Now what that means is I have to make sure. Now here are my frequencies, right? The highest frequency is 12 and the lowest frequency is 1. If we look here, we have 12, 9, 4, 2, 1, 2, 1, 2. So we have to make sure that we partition it correctly, that we space these guys apart in a good manner.

And remember, the largest frequency we have here is 12. So you can scale it however you want. I'm just going to do it by twos. So 2, 4, 6, 8, 10, 12. Okay, so you make sure your maximum frequency is included and everything smaller is on here. So my first class, my first class.

should have a frequency of 12. That means I should have a bar graph that's 12 units high. So here is my first class. Okay, I have 12 data values that are between 0.5 and 12.5. Now one thing also we have to make sure of in a histogram is that all the bars touch each other.

And you'll see that's another reason we need these class boundaries. There can't be any jumps in them. All of these bars are going to touch each other. Let's take a look at the second class.

The frequency of the second class was nine. So that means that this, here's my second class over here between 12.5 and 24.5. I have to have a height of nine, right? Nine is halfway between eight and 10. You can even write the heights in here if you want. So I can write a nine in here and I can write a 12 in here.

Now my third class, what was the frequency of my third class? The frequency of my third class was four. So this guy has to be four. Okay, the frequency of my fourth class was a two.

So I have to make sure that we have a two in our fourth class over here. The frequency in our fifth class was a one, so I should have one data value between 48.5 and 60.5. Let's put a one down here.

Let's color that guy in first. The frequency of my sixth class was two. Frequency of my seventh class was one and the frequency of my eighth class I believe was two again.

Let's just go double check. Frequency of my eighth class was two again. So this is a histogram.

A histogram gives you an idea as to what the shape of your distribution is, right? So if you take a look here, you can tell that I have a lot of data in my lower classes, right? In my classes one, two, and three, that's where it's mostly densely populated. And as my classes increase, I have less and less data over here.

So this shape gives us some information about the data. Now let's talk about the different... types of shapes that we can get.

There's a ton of different types of shapes of histograms that we can get, but here are some of the bigger ones. First of all, we have something that's symmetric and unimodal. Some data that's symmetric and unimodal in a histogram has data that starts low, goes high, and comes back low again.

Now this is approximate, so we have this low, high, low shape. This is symmetric unimodal, also known as normal or bell-shaped. Okay, normal or bell-shaped has that low, high, low distribution to it.

So these guys, this normal or bell-shaped curve over here, a lot of data in the real world has this distribution. Well, like what? Well, one thing that we can talk about that has this kind of continuous data scale here is IQ. Here's low IQ.

Remember, we start small and then we go, here's high IQ. The idea is that not too many people have a low IQ. A lot of people have a standard IQ, an IQ that's right in the middle, and not too many people have a very high IQ.

Not so many geniuses, not so many people that are mentally capacitated. Most people are right here, right around normal. What else has this normal or bell-shaped curve?

Weight does. Not too many people are very skinny. Not too many people are very fat. Most people are here right in the middle.

Height does. Not too many people are very short. Not too many people are very tall. Most people have a height that are right here in the middle.

Real-world data tends to have this normal or bell-shaped curve. What does this guy look like over here? Let's take a look at the data we just looked at.

This data has a skewed right distribution, skewed right, which means it starts off high, and then it tends to go lower, lower, and then lowest. So skewed right, also known as right-tailed. Right-tailed, think of a mouse whose tail goes off to the right.

That's another way to think about it. So this is right tail data. Now the opposite of that is data that's skewed to the left.

That's when you have over here data that's skewed to the left is when you have a tail of data that's going to the left. Well what are some things that are skewed left or skewed right? Let's think about it. Something that's skewed right is housing. If you think about the price of houses, we have the price of houses are mostly Not so expensive, right over here.

So these are the cheaper ones. Remember that it's going to be low to high. And as the pricing of houses increases, there are less of them available.

So there's less ridiculously priced things available and a lot more lower priced things available if you think about selling houses, if you're in the real estate market. Other examples of distributions when we put something on a histogram or when we think about scaling something on a histogram. We also have this uniform distribution. Uniform distribution is something that doesn't change very much. It has a standard rectangular shape.

We have a uniform distribution. A bimodal distribution. Bimodal distribution is when you have something that peaks about twice. You have two different peaks.

We have a peak here in terms of frequency and a peak here in terms of frequency. An interesting phenomenon of a bimodal distribution are the cost of books. So something interesting, and I read about this. Let's write down some costs. Let's say $0, let's say $5, let's say $10, let's say $15, let's say $25, let's say $30, $40, so on and so forth.

So why do books have this bimodal distribution over here? Well, the more expensive books, hardcovers, right? You can buy hardcover books, but hardcover books tend to be in the 20s.

And most books that are soft covered tend to be down here in the anywhere from the seven to twelve dollar range. So we have this bimodal distribution. If we think about the price of books, we have a multimodal distribution.

And that is if we have a few different peaks in it. And there's an example of three different peaks in it. And one of my students came up with a really good example.

He said, what about eating times? Right. So maybe you wake up here at eight o'clock a.m. This is nine o'clock a.m.

So here's breakfast, right? And again, on a continuous scale. So here's lunch. Here's the time you would eat lunch or how many hours have passed since you've woken up to eat lunch.

And here is dinner. So this is the amount of time that you spend eating, frequency. So more people are going to eat here around this time for breakfast. More people are going to eat here around this time for lunch. And more people are going to eat here around this time for dinner.

And here's the snackers. Here's me. Here's all the people that snack in between.

So all of these shapes are shapes that give us information about the distribution of what we're looking at. And particularly, we're really going to focus on this normal or bell-shaped curve, because this is a curve that we can run a lot of interesting statistics on. If we have data that has this normal or bell-shaped distribution, that will give us a lot of information and a lot of statistics that we can run on that data. Now going back, something that had that normal distribution or that normal bell-shaped curve, was this first example of frequency distributions that we talked about.

If you look at it, class one is a frequency of five. Class 2 is a frequency of 3, 33, 6, and 3. So we go to that low, high, low. And again, it's approximate. It doesn't have to be exactly.

But that gives us information about the distribution, and it gives us information about what we can do to the distribution, what we will be doing to the data in the future. So as a recap, we talked about how to construct a frequency distribution. We talked about how to construct, how to turn a frequency distribution. into a histogram. We talked about all those extra things that go in the frequency distribution like midpoint, relative frequency, and cumulative frequency.

And last but not least, we talked about some different shapes that histograms can have, and the meanings, some meanings or some interesting creative examples about those shapes. The next class that we talk about together, we're going to introduce some other types of ways for us to visualize statistics.