Continuing our study of tables and graphs in chapter 2, this lesson is going to be all about histograms. A histogram is a graph consisting of bars of equal width drawn adjacent to one another, as long as there are no gaps in the data. For example, we could take this frequency distribution and construct the following histogram to display the data in a different way. On a histogram, the horizontal scale represents classes of quantitative data values.
We can label these classes with either class boundaries or class midpoints. On this example histogram, the horizontal axis represents age in years, which is quantitative. Also notice that the center of each bar is labeled. These are the midpoints of each class. The vertical scale represents frequencies.
It tells us how many individual data values are in each class. In other words, the height of the bar corresponds to each frequency value. Let's take a look at an example.
Given the following frequency distribution, let's create a histogram and we will use class boundaries. On the previous page, we had a histogram that used class midpoints. So in this example, we will get a chance to see what it looks like to use class boundaries instead. Let's first find each of the class boundaries. That's the center of the gap between each class.
The first class ends at 19 and the second begins at 20. The center of the gap between those two classes is 19.5. You can also find that by adding 19 and 20 together and dividing by 2. Similarly, the second class ends at 39 while the third begins at 40. The center of that gap is 39.5. Let's continue in this way and find the class boundaries between each of the classes.
Now we can continue this pattern to find the class boundary before the first class and after the last class. Since the width between each consecutive class boundary is 20, it's important that we continue that same width when determining the first and last boundary. Our first boundary needs to be 20 less than 19.5, or negative 0.5. And our last one needs to be 20 more than 119.5.
or 139.5. Now that we have our class boundaries, let's begin making our histogram. Our horizontal scale will represent our x values. The vertical scale represents frequencies.
We need to label the horizontal scale using the class boundaries that we just found. We begin with negative 0.5 and work our way to 139.5. With the vertical axis representing frequencies, We need to create a scale that can accommodate our frequencies up to 12. I'll create a scale that goes up to 15. Now we can finally create our bars with the right height for each class.
The first class, 0 to 19, has a frequency of 5, so our first bar needs to have a height of 5. Our second class has a frequency of 7, so our second bar needs to have a height of 7. Our third class has a frequency of 10, so our third bar needs to have a height of 10. And continuing in this way, we can construct the rest of our histogram. Now that we have our completed histogram, we can actually tell that this data is approximately normal. We can see an approximate bell-shaped curve to this graph. Now that we know what a histogram is, we can construct a relative frequency histogram. A relative frequency histogram has the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies instead of frequencies.
Now let's use this following table of data to create a relative frequency histogram. Since in our last example we used class boundaries, this time let's use class midpoints. The two things that we need to calculate before we can construct our relative frequency histogram is the relative frequency for each class and the class midpoint for each class. To find the class midpoint for each class, we add the lower limit and the upper limit together and divide by 2. So for the first class, we add 0 and 9 and divide by 2. The first class midpoint is 4.5.
Similarly, if we add 10 and 19, we get 29. and divided by 2, we have 14.5. Continuing in this way, we can find the class midpoint for each class. Now to find the relative frequency for each class, We need to take the frequency for each class and divide by the total frequencies.
Adding up the frequencies for all classes, we have a total of 99. So we need to take each class frequency and divide by 99. For our first class, we have a relative frequency of 20 over 99, which is about 20 percent. For the second class, we divide 5 by 99, which is approximately 5 percent. For the third class, we have a relative frequency of 10 over 99, which is approximately 5 percent.
class we take 10 and divide that by 99 and we get approximately 10%. And continuing in this way we can find the relative frequency for each class. Now we have everything we need to construct a relative frequency histogram.
First we will label the horizontal scale with our class midpoints. Then we will label our vertical scale with relative frequencies. And since the highest frequency we have is 20%, we need to make sure that our scale can handle up to 20%. Now we can begin constructing our bars.
Our first class has a relative frequency of 20%, so our first bar needs to have a height that corresponds with 20%. Next, our second class has a relative frequency of 5%, so our second bar needs to have a height corresponding with 5%. And continuing in this way, we can construct a bar for each class.
To finish up this relative frequency histogram, I need to label the horizontal and vertical axes. our horizontal axis corresponds with our x values, and our vertical axis corresponds with relative frequencies, and we calculated them in terms of percentages. We could have also constructed a relative frequency histogram using proportions.
That would have been using the decimal form of the relative frequencies rather than the percentage form. Either way, the relative frequency histogram would take on this same shape. Now looking at this relative frequency histogram, we can tell that this definitely does not take on the shape of a normal distribution.
It is not even close to being approximately normal. Speaking of, let's look at some standard shapes that we will often see with histograms. We've already talked about a normal distribution, which is a distribution that takes on an approximate bell shape.
A different type of distribution is one where all of the bars of the histogram have approximately the same height. In other words, the different data values or classes occur with approximately the same frequency. A property that sometimes shows up with histograms is this idea of skewness. A distribution of data is skewed if it is not symmetric and extends more to one side than to the other.
There are two types of skewness, skewed to the right and skewed to the left. Skewed to the right is also called positively skewed. It has a longer right tail.
Notice that the distribution in this image takes on almost a bell shape, except that it tends to extend on to the right much more than it does to the left. Skewed to the left, on the other hand, has a longer left tail. It tends to drag on as the curve goes to the left more than to the right.
This is also called negatively skewed. One way to remember skewed to the right and skewed to the left is to think about the toes on your feet. On your right foot, your toes get smaller as you go to the right, kind of like the longer right tail with positively skewed data. On your left foot, your small toes go off to the left, similarly to data that is negatively skewed. And that concludes this lecture video on histograms.