Transcript for:
Stem-and-Leaf Data Visualization

a stem and leaf display is a graphical method of displaying data it is particularly useful when your data are not too numerous in this section we will explain how to construct and interpret this kind of graph as usual an example will get us started the top of the display shows the number of touchdown passes thrown by each of the 31 teams in the National Football League and the 2000 season a stem and leaf display of the data is shown below the left portion of the stem and leaf display contains the stems they are the numbers 3 2 1 and 0 arranged as a column to the left of the bars think of these numbers as tens digits a stem of three for example can be used to represent the tens digit of any of the numbers from 30 to 39 the numbers to the right of the bar are leaves and they represent the ones digits every leaf in the graph therefore stands for the result of adding the leaf to ten times its stem to make this clear let's examine the display more closely in the top row the four leaves to the right of the stem three are two three three and seven combined with the stem of three these leaves represent the numbers 32 33 33 and 37 which are the number of touchdown passes for the first four teams the second row has a stem of two and contains twelve leaves together they represent 12 data points namely two occurrences of 20 touchdown passes three occurrences of 21 touchdown passes three occurrences of 22 touchdown passes one occurrence of 23 touchdown passes two occurrences of twenty-eight touchdown passes and one occurrence 29 touchdown passes I will leave it to you to figure out what the third row represents the fourth row has a stem of zero and two leaves it stands for the last two entries in the data namely 9 touchdown passes and six touchdown passes the latter two numbers may be thought of as 0 9 and 0 6 one purpose of a stem and leaf display is to clarify the shape of the distribution you can see many facts about touchdown passes more easily than the stem and leaf display than from the raw numbers for example by looking at the stems and the shape of the plot you can tell that most of the teams had between ten and twenty nine passing touchdowns with a few having more and a few having less the precise numbers of touchdown passes can be determined by examining the leaves we can make our figure even more revealing by splitting each stem into two parts the display on the right shows how to do this the top row is reserved for numbers from 35 to 39 and holds only the 37 touchdown passes the second row is reserved for the numbers from 30 to 34 and holds the 30 to 33 and 33 touchdown passes made by the next three teams you can see for yourself what the other rows represent the figure on the right is more revealing than the figure on the left because the latter figure lumps too many values into a single row whether you should split stems in a display depends on the exact form of your data if rows get too long with single stems you might try splitting them into two or more parts there is a variation of stem-and-leaf displays that is useful for comparing distributions the two distributions are placed back-to-back along a common column of stems the result is a back-to-back stem-and-leaf graph here you see such a graph it compares the numbers of touchdown passes in the 1998 and 2000 seasons the stems are in the middle the leaves to the left are for the 1998 data and the leaves to the right are for the 2000 data for example the second-to-last row shows that in 1998 there were teams with 11 12 and 13 touchdown passes and in 2000 there were two teams with 12 and 3 teams the 14 touchdown passes there are two things about the football data that make them easy to graph with stems and leaves first the data are limited to whole numbers that can be represented with a one digit stem and a one digit leaf second all the numbers are positive if the data include numbers with three or more digits or contain decimals they can be rounded to two digit accuracy the data shown here are from a study on aggressive thinking each value is the mean difference over a series of trials between the time it took an experimental subject to name aggressive words like punch under two conditions in one condition the words were preceded by a non-weapon word such as bug and the second condition the same words were preceded by a weapon word such as gun or knife the issue addressed by the experiment was whether a preceding weapon word would speed up or prime pronunciation of the aggressive word compared to a non weapon priming word a positive difference implies greater priming of the aggressive word by the weapon word negative differences imply that the priming by the weapon word was less than for a neutral word you see that the numbers range from forty three point to two negative 27 point for the first value indicates that one subject was forty three point two milliseconds faster pronouncing aggressive words when they were preceded by weapon words than when preceded by neutral words the value negative twenty seven point four indicates that another subject was twenty seven point four milliseconds slower pronouncing aggressive words when they were preceded by weapon words the data are displayed with stems and leaves and the lower right since stem and leaf displays can only portray two whole digits one for the stem and one for the leaf the numbers are first rounded thus the value for T 3.2 is rounded to 43 and represented with a stem of four and a leaf of three similarly 42.9 is rounded to forty three to represent negative numbers we simply use in negative stems for example the bottom row of the figure represents the number negative 27 the second-to-last row represents the numbers negative 10 negative 10 negative 15 etc observe that the figure contains a row headed by zero and another a headed by negative zero the stem of zero is four numbers between 0 and 9 whereas the stem of negative zero is 4 numbers between 0 and negative 9 for example the fifth row of the table holds the numbers 1 2 4 5 5 8 and 9 and the sixth row holds 0 negative 6 negative 7 and negative 9 values that are exactly 0 before rounding should be split as evenly as possible between the 0 and negative 0 rows although stem-and-leaf displays are unwieldy for large data sets they are often useful for datasets with up to 200 observations this stem and leaf display portrays the distribution of populations of 185 US cities in 1998 to be included a city had to have between 100,000 and 500,000 residents since a stem-and-leaf plot shows only to place accuracy we had to round the numbers to the nearest 10,000 for example the largest number 490 3559 was rounded to four hundred ninety thousand and then plotted with a stem of four and a leaf of nine the fourth highest number four hundred sixty three thousand two hundred one was rounded to four hundred sixty thousand and plotted with the stem of four and a leaf of six thus the stems represent units of one hundred thousand and the leaves represent units of ten thousand notice that each stem value is split into five parts zero two one two two three four to five six to seven and eight to nine whether your data can be suitably represented by a stem-and-leaf graph depends on whether they can be rounded without loss of important information also their extreme values must fit into two successive digits as a previous data fit into the 10,000 and 100,000 places for leaves and stems respectively deciding what kind of graph is best suited to displaying your data this requires good judgment statistics is not just recipes you