Hello everyone. Today we're going to continue our talk on graphs and today we're going to look at graphs for interval and ratio level data. Okay? And I want you to keep in mind one thing that area means something. If there's one lesson from all this talk on graphs I want you to take away from this, area means something.
So now we're talking about interval and ratio level data. not only does order matter, but distance matters as well. So now distance has a meaning. And because of this, you're going to see when we draw a histogram, that histogram has certain distances on the x-axis, and these distances mean something. And because of this, one thing you're going to see is the bars touching each other.
You're only going to... You're going to see bars touching each other in a histogram, which is for interval and ratio level data. You don't see the bars touching in a bar graph because distance doesn't mean anything. This is something that we learn about histograms.
They touch, again, because distance has meaning. If they touch, that means there's no distance between them. So, let's look at an example.
And all of a sudden it's very noisy here. I'm hoping you can hear me okay. I mean, I lean into the speaker to speak loud in order to overcome a lot of the noise, if you ever wonder why I'm so close to the camera. So we have people of different heights.
Let's say 60 to 62 inches tall, 62 to 64, 64 to 66, all the way to 76 to 78 inches tall. Now, I also have the percentages, the percentages of people in this category. We could use the raw frequency, or we can have the relative frequency.
I'm not too concerned about that, but just realize that... if you're given the frequency, you can easily convert that into a relative frequency. If you see there's a certain number of people in a certain category, you should be able to figure out what the proportion is. portion is, what the percentage is.
And let's look at our histogram here. Again, along the x-axis we have inches tall, and along the y-axis we have percentages. Again, we could either have the percentages, which are relative frequencies, or we could have the raw numbers. We could have the actual number of people in each category. Either way, the graph is going to look the same.
What I'm more concerned about right now is the look of the graph. You might also note that I've labeled the endpoints of the bars. Historically, it's more common to label the midpoint, but a lot of computer programs now tend to label the endpoints.
It's not the same thing. It doesn't really matter. It's just, again, labeling.
It's, do you want relative frequencies or do you want actual frequencies along the y-axis? Do you want to label the midpoints? Do you want to label the endpoints?
This is just fashion. This is just what you choose, right? It doesn't really matter. It doesn't affect the graph.
One thing that you might be interested in is the number of bars. Now, some of you might have been taught certain formulas to figure out how wide the categories should be. Don't worry about that for this class.
You have to have enough bars to make it interesting, right? If you only had five categories of height, that probably wouldn't be terribly interesting. If we had everyone down to a quarter inch or a tenth of an inch, that might be too many bars. It's too busy.
So again, Again, visually we like to see between 10 and 20 bars looks like a good graph. Again, this is not a concern of this class. I am not teaching you so much how to draw the perfect graph.
I'm more interested in you understanding certain properties of the graph, okay? And what its look is going to be like. So, again, notice we have 60 to 62. And notice the distance between 62 to 64 is the same.
It's 2 inches. 64 to 66. 66 to 68. Note all of the categories have the same width. This is important in a histogram. This is important in a histogram. We need to have the categories the same width.
We'll get there in just a minute. Okay, so we want our categories the same width. Note Those two spaces are blank. That's because there's no one there. If we see a gap, that means there's no one there.
You wouldn't want to push. This category against this category, that empty space means something. Again, distance means something. It means we have an area where there aren't any people. So gaps have meaning now, right?
Which is why we have the bars touch. Because if we see a gap between the bars, it means there's no one there. Okay?
Okay? Now. Simple histogram, right? The categories have to be the same width.
You can either label the midpoint or the endpoint. You can either put it in frequency or relative frequency. And that's what most histograms look like. I want to talk about some other aspects though, right? Some other aspects that people tend to do incorrectly.
First of all, let's go down here, and if we had 0 to 5 to 15 to 35 to 65, maybe those are our two categories. Maybe we're talking about the variable age, and we're looking at one... group of very young children between 0 and 5 and then we're looking at younger people between the ages of 5 and 15 and then maybe we're looking at, you know, once you're more adult we might have another group 15 to 30 and then we might have another group between 30 and 65. So it's conceivable that we might have groups of unequal width.
You absolutely do not want to graph it like this where you have 0, 5, 15, 30, and 35 because when I say equal I don't mean physically the graph they look equal. That is a repercussion. But I mean the distance between 0 and 5 isn't the same as the distance between 5 and 15. The x-axis has to represent distance. Right? So if this is 5 units wide and this is 10 units wide, between 5 and 15 is 10 more units, the distance between 5 and 15 has to be the same as the distance between 5 and 15. has to be twice as large as the distance between 0 and 5. It would just be like a number line.
You have 0, 5, 15, 30, and 65 is way down there. Distance means something. If you ever do a graph where the distance between the marked values wasn't correct, you would lose points on a test.
It would be a bad graph. Right? So, again, we have to make sure that distance means something on the x-axis.
Right? Distance has to mean something. Right?
So again, this would be 0, 5, this would have to be labeled 10, 15 would be over here, 20 would be over here. Okay? Because again, the physical space is equal between those, so the numerical distance has to be the same as well. Okay? So.
Don't ever do anything like that, right? Don't ever do anything like that. Now, let's say we have a category where we don't have equal width. So, up here I decided, let's look at these three categories and let's merge them into one.
So we have something between 72 and 78 inches tall. Now, we have 0%, 15%, and 0% for a total of 15%, right? So there's 15% of the people in that category. But what happens now?
Because we have a category that's wider, Well, how do we label it? Pay no attention to these words, numbers right now. Assume they're still 5%, 10%, 15%, 20%.
If that was the case, we'd want to mark this at 15%. high because 15% of the people are in that category. But if we did that, well look at the size of that box. That box is approximately 35% of the total area of the grass. And remember, area always means something.
So we look at that and visually it tells us that more than a third of the people are between 72 inches tall and 78 inches tall. It doesn't matter that if we look across and it says 15 percent. It doesn't matter what the numbers say.
Visually, it's lying to us. That's not how we treat graphs. Area has to mean something. Okay?
So, this is also... incorrect. And that's why the categories have to be the same width.
If you're using a frequency scale or a relative frequency scale on the y-axis, you have to have categories that are of the same width. Otherwise, the concept of area meaning something is compromised. Okay? So what do we do?
Well, what we do is we use a density scale. And now a density scale isn't something you're going to read about very often in many texts. You'll find a few texts that talk about them.
At least they don't talk about them directly. A density scale is changing things. I mean a density scale on the y-axis.
So again, our textbook isn't going to have anything about this. You're not going to find it very frequently. But density scales are in fact used all the time.
You probably all know what a normal curve looks like. And you might even be able to think about what the numbers on the x-axis are. What are the numbers on the y-axis?
What do they mean? You can all draw your normal curve. curve, you might be able to put values on the x-axis, the value of your variable, but what's on the y-axis? It's a density scale.
A density scale is percent per unit. So we have a unit width, a unit of what we have on the x-axis, and that's our density scale. Okay, density, it's the number of people per unit, right? So we might have an inch as a unit.
How many people per inch are there in that category? It's similar in concept to the idea of density population. density, right?
Population density is how many people per square mile or per square kilometer, right? And some areas are very dense. They might have 50,000 people per square mile. mile.
They might have more than 50,000 people per square mile. Some places are very undense, right? They might have half of a person per square mile or three people per square mile, right? And that's what we have here.
We have a density scale. What state in the U.S. is the most dense? What state has the most people per square mile?
And a lot of you are probably thinking of New York because of New York City. Yeah, but New York has a lot of people. of empty area upstate, right?
It's not New York State, but it's similar. It's New Jersey. New Jersey is more developed over its entirety, and it has large centers of dense population. New Jersey averages about 1,200 people per square mile, but New Jersey is very small, so the units are square miles along here. So New Jersey is very small, but very dense.
California is actually Not nearly as dense because California has huge areas where there are no people living, right, or very few people living. In California, there's only 246 people per square mile on average. So California is a lot less dense. But California is 21 times the size of New Jersey. Okay, now the question is who has the most people?
Well the most people Isn't by with Nor is it by height. This is how dense it is. The biggest area is the one that is the biggest region. So I'm basically saying which of these rectangles is bigger? This little one for New Jersey or this short but very wide bigger one for California?
The one for California. The area is bigger. The area...
in the graph refers to the proportion of people in that portion. Okay? Area is important, and the reason I'm emphasizing this, it's not used so much in histograms, although I want you to make sure you're not illiterate in making a histogram and do something like this. Right? I want us to understand area under a graph.
Not width, not height, but area tells us how many people are in that section. And this is what a normal curve is all about. You look at the area. We're going to be talking about probability distributions later on. We have to look at the area.
Area means something. So anyway, how do we fix this problem? Well, let's take a look at that.
Sorry, my chalk fell on the ground. And let's make it a density graph. Let's make the vertical axis a density scale.
And that's what I've done here. In percent per unit, what percent of the people are... there per unit and I've abbreviated that PPU for percent per unit. Now if each of these is two units wide, right, if this is five percent, right, we have five percent of the people in this box, that box, that bar is two units wide. That's five divided by two is 2.5 percent per unit.
We have two units, so it's 2.5% per unit. 2.5 times 2 is 5%, right? We have 10% of the people in that area.
10%, again, two units wide. That's 5% per unit. So I'm drawing the exact same graph so far because these are all two units wide, so it's going to look the same. But it's 2.5%, 5%, 7.5%, 10% per unit.
Okay? Now what's going to change is when we get down to this last box. We have a box that now goes an area of unequal width that goes between 72 and 78. Okay, that goes between 72 and 78. So it's six units wide.
It has a total of zero plus 15 plus zero. It has a total of 15% of the people. 15% divided by six units wide is 2.5% per unit.
So if I made it a density scale, it looks a little different than it does above, right? In fact, it's lost some of the finer points. Up above, we can see that all of the people are, in fact, between 74 and 76. But if we don't know that, and if we only know that 15% of the people are above 72, Well, we're going to draw it like this. And that's a percent per unit.
If you have categories of unequal width, you have to use a density scale. And now... That box right there has 15% of the area, not the misleading 35% it had before. So now we have 15% here, 15% here, 20% here.
Area still means something. Okay, and this is what is called a density histogram. Again, you're not going to see them in USA Today or anything like that. They're more, there's something you have to do, again, if the categories aren't the same width. But just make the categories the same width and everything will be okay.
Okay, so again, density histogram. We're learning about it not so we can build these graphs. to put in papers, we're learning about this concept so when we start talking about probability distributions later on again, the concept of area meaning something sticks with us. It's area, not height, not width. Area tells us how many people are in a certain region.
Right? That's it for this lecture. There's going to be one more lecture.
Bye-bye.