Transcript for:
2.3 Graphs that Enlighten or Deceive

In this lesson we're going to look at  some different types of tables and graphs.   Some of which will tell us some very meaningful  information about the data while others will   tend to be more misleading. We'll begin  with what's called a frequency polygon.   A frequency polygon uses line segments connected  to points that are located directly above   class midpoint values. Unlike our histograms where  we created bars with the heights for each class,   with a frequency polygon we simply plot a point  for each class. It's important that we use class   midpoints and that the height corresponds to  frequency. A variation of this is the relative   frequency polygon; where we change the vertical  axis from being frequency to being relative   frequency (or proportions) or percentages. Here  are some important features of frequency polygons.   Heights correspond to class frequencies. Line  segments are extended to begin and end on the   horizontal axis. That means that after we've  plotted a point corresponding to each class   midpoint, we must connect our polygon down to the  horizontal axis using the appropriate width from   the first and last midpoint. Also, if we want  to put two or more polygons on the same graph,   this can really help us compare two sets of data.  Let's try out an example. Given this frequency   distribution we're going to construct a frequency  polygon. We first need the class midpoints and we will label these class midpoints on the  horizontal axis. Now let's label the vertical   axis. Since our frequencies simply go from 0 to  5 let's let each one of these lines represent   one unit. Now we're ready to plot points for each  class. The first class has a midpoint of 4.5 and   a frequency of 1. So we'll first plot the point  above 4.5 with a height of 1. The second class   also has a frequency of 1 so above the midpoint  14.5 we will plot a point corresponding to a   height of 1. Our third class has a frequency of  2 so above 24.5 we will plot a point with height   2. Our next two classes each have a frequency of 0  so above 34.5 and 44.5 we need to plot points that   correspond to a height of zero, which is on the  horizontal axis. The next class has a frequency of   4. Next we have a frequency of 2, following that  we have a frequency of 4, then 5 and lastly 4. Now   that we have our points located directly above our  class midpoint values, we can connect these points   with line segments. Working from left to right we  connect consecutive points with straight lines.   The last thing we need to do is connect the  endpoints back down to the horizontal axis.   It's important not to just simply draw  a line back down to the horizontal axis,   but rather do it at a regular increment as set  by the class midpoints. On this graph our class   midpoints line up with the grid and so I'll go  one unit, or one line, on the grid to the left   and place a point with a height of 0. I'll do  the same to the right. I'll also label these   two points by using the width found between any  two consecutive midpoints. Between 4.5 and 14.5   is a width of 10. So to label this first point  I need to go 10 below 4.5 to -5.5. To the right   I need to go 10 above 94.5, so this last point  corresponds to an x value of 104.5. The last   thing we'll do is connect these two endpoints. Now  we have our frequency polygon with our horizontal   axis corresponding to our x values and our  vertical axis corresponding to class frequencies. Unfortunately there are many ways to be deceptive  with our graphs. One very common way to deceive   the reader is to have a non-zero axis. This is  where we have an axis that is anything other than   0 which can completely exaggerate the difference  between two graphs. Take a look at this graph   below. The horizontal axis corresponds to years  and the vertical axis corresponds to interest   rates. Looking at this graph it appears that  interest rates increased rapidly from 2008   to 2012. It also appears that from 2009 to  2010 interest rates doubled because the bar   corresponding to 2010 is twice as tall as the bar  corresponding to 2009. Now look at the same data   on a different graph. These are the exact same  values. Looks a little different, doesn't it?   On the first graph notice that the vertical axis  starts at 3.14 instead of 0. On the second graph   it starts at zero. Now you can see that from 2008  to 2012 interest rates barely changed. By having   a non-zero axis, the difference between two years  is grossly exaggerated. Another type of graph that   can be very deceptive is a pictograph. Whenever  we have drawings that are given in two or three   dimensions. We can really distort one-dimensional  data. Take a look at this pictograph that compares   how much Halloween candy was collected by  Shayna and Michael. Looking more closely,   it appears that Shayna collected about 18  pieces of candy while Michael collected 36.   So Michael did collect twice as much candy as  Shayna, but by using this pictograph, where the   candy corn looks three-dimensional, it really  exaggerates the difference between these two.   Another type of graph is a scatterplot. A  scatterplot (also called a scatter diagram)   is a plot of paired (x, y) quantitative data  with a horizontal x-axis and a vertical y-axis.   The horizontal axis is used for the first  variable (the x) and the vertical axis is used   for the second variable (the y). Let's create  a quick scatterplot using the following data.   Our first point needs to correspond to the (x, y)  pair (1, 2). So beginning at (0, 0) we will move   one unit to the right and two units up. The second  point needs to correspond to the ordered pair (3,   5). So we will move three units to the right  and five units up. Next we need a point that   corresponds with the ordered pair (5, 9), so  we move to the right 5 on the x- axis and up 9   in the y direction. Next we have the point (7,  10), then (9, 14) and last (11, 15). Unlike a   frequency polygon, with a scatterplot we do  not connect the points with line segments.   With a time-series graph we are interested in  data that is quantitative and has been collected   at different points in time. Let's look at this  table of values and construct a time-series graph.   The first step is to label the horizontal axis.  Our x-values in this case correspond to time   given in days, so I'll label the horizontal axis  accordingly. The corresponding y- values are the   amount of rainfall given in millimeters, so I'll  use that information to label the vertical axis.   The second step is to determine an appropriate  value for each unit on the vertical scale.   Looking at the rainfall amounts,  the highest value we have is 45   so let's label each unit on  the vertical scale by 5s. Now labeling the days on the horizontal axis we  are now ready to plot points on the grid that   correspond to the data values in the table. On  day 1 there were 45 millimeters of rainfall,   so we'll plot a point above the 1 with a  height of 45. For day 2, we need a point   that corresponds to a height of 20. For day 3 we  need a height of 40, day 4 a height of 38, day 5,   42, day 6, 15. day 7, 10, and finally day 8, 22.  The final step is to connect these dots with line   segments in the correct order. Notice that unlike  a frequency polygon, we do not connect this back   down to the horizontal axis. Next, let's look  at what's called a stemplot. With a stemplot we   are again representing quantitative data, but we  are going to separate each value into two parts;   the stem and the leaf. For example, take a look at  this data set. We can represent this data set in a   stemplot by having the stem be the tens place and  the leaf be the ones place for each data value.   For example the data value 13 is represented on  our stemplot in the row corresponding to the stem   1 and leaf 3. Since there are two 13s in  this data set you see that there are two   3s on this row. Similarly, the data value 9 is  represented on the row corresponding to a stem   0 and leaf 9 because the tens place for the  number 9 is a 0 and the ones place is a 9.   A variation of the stemplot is a back-to-back  stemplot or back-to-back stem and leaf plot.   Let's create a back-to-back stemplot using these  two sets of data: weight of dogs and weight of   cats. We'll begin with our weights of dogs.  The first data value in this row is 48   so in the row corresponding to a stem  4 (or tens place 4) I will list an 8.   Next, we have a 13, so I'll put a 3 in the row  corresponding to the stem 1. Next we have a 15,   so I'll put a 5 in that same row. The tens place  is 1, the ones place is 5. for the 22, I'll go to   the row that corresponds to a tens place of 2 and  list a 2 for the ones place. We have another 48   so I need another 8 in the row corresponding  to the 4. For a 56 I'll put a 6 in the row   with the 5. 62 I'll put a 2 in the row with a 6.  73 I need a 3 in the row corresponding to the 7.   52 we need a 2 in the row with a 5.  For 71 we need a 1 on the row of the 7   and for 66 we need a 6 on the row with a 6.  To clean up this left side just a little,   I'm going to write these values from least to  greatest. Now we can move on to the other side.   For the cats, I'll repeat the same process but  this time I'll put all leafs on the right side. Now I will clean this up just slightly by  replacing this 3, 2, 3 with 2, 3, 3. Having it in   increasing order is slightly easier to read.  The benefits of a stemplot are that we can see   the shape of the distribution of the data; we  get to retain all of the original data values   rather than clumping them into classes; the  sample data are sorted or arranged in order;   and when we put two sets of data side-by-side  we can easily compare them. Now let's talk about   bar graphs. This is our first graph that we've  looked at for qualitative (or categorical) data.   A bar graph uses bars of equal width to show  frequencies of categorical or qualitative data.   The bars may or may not be separated by  small gaps. Let's try an example. In a   survey 1004 adults were asked to identify the  most frustrating sound that they hear in a day.   In response 279 chose jackhammers, 388 chose  car alarms, 128 chose barking dogs, and   209 chose crying babies. In order to construct  this bar graph we need to first label the   vertical axis. The vertical axis corresponds  to the frequency for each of these categories   so we need to make sure that our vertical scale  goes high enough. Let's make it go up to 400.   We need a bar above jackhammers  with a height corresponding to 279.   Since 388 chose car alarms, we need a bar  above car alarms with an appropriate height.   And similarly for barking dogs and crying babies.  Now we have a complete bar graph. Notice the major   difference between a bar graph and a histogram is  qualitative versus quantitative data. Let's end   this lesson by looking at a pie chart. A pie chart  is a very common graph that depicts categorical   data as slices of a circle. The size of each  slice is proportional to the frequency count   for that category. Let's create a pie chart using  the data that we just had in the last example.   Here are the same values from that last example.  There are two different ways to construct a pie   chart. We could look at relative frequency in the  form of percentages and we can also turn that into   an appropriate amount of degrees of the  circle. I'll show you both of those here.   With the total frequency or total number  of people being 1004, we can change each   of these values to a relative frequency. We do  this by dividing each frequency by the total. In order to find the corresponding degrees of the  circle, consider that all the way around is 360   degrees, so we can take 360 degrees and multiply  it by the relative frequency for each category.   In other words, for jackhammers we can  find the appropriate amount of degrees   by multiplying 0.2779 by 360 degrees.  This gives us about 100 degrees. For car   alarms we can find the appropriate amount of  degrees by multiplying 0.3865 by 360 degrees.   This gives us approximately 139 degrees. And  doing the same for barking dogs and crying babies,   we have approximately 45.9 degrees  and approximately 75.0 degrees.   Now let's think about how we could use relative  frequency or degrees to construct this pie chart.   Let's start with the category that has the  largest frequency, car alarms. We need a slice   of the pie that corresponds to about 39% of the  circle or 139 degrees. Speaking in percentages,   this would be 50% of the circle and this would  be 25 so we can approximate the location of 38.7%   appropriately between 25 and 50. In a similar way  we can think of half of a circle as 180 degrees,   and a quarter of a circle as 90 degrees, and  we can approximate the location of 140 degrees   between there. Halfway between 90 and 180 is  135 degrees which is pretty close to where   we want to be, so we can approximate  140 degrees with something like that. And we have the first slice of our pie. Our  next largest wedge needs to correspond to   the 279 people that said that jackhammers  were the most frustrating sound. This wedge   needs to correspond to 27.8% of the  circle or approximately 100 degrees.   In terms of degrees, consider that three quarters  of the way around the circle would be 270 degrees.   Our last wedge ended right around 140 degrees  and we need to go another 100 degrees around the   circle to 240 degrees. That's 30 degrees less than  270 which I'll approximate to be right about here.   Now, we have the second slice. Continuing in  this way we can sketch the last two slices   that correspond to crying babies and barking dogs. Now we have our completed pie chart  and the last graph for this lesson.   That concludes this lecture video  on various tables and graphs;   some of which were very enlightening  while others were a little deceptive.