1.2 Types of Data

Welcome back to Chapter 1, our Introduction to Statistics. In this lesson, we're going to discuss various types of data. We'll begin our discussion by talking about the difference between a parameter and a statistic. A parameter is a numerical measurement describing some characteristic of a population, while a statistic is a numerical measurement describing some characteristic of a sample. Let's look at these two examples and determine if they are each a parameter or a statistic.

First, using property records for the entire city of small town USA, it was found that the median home value is $200,000. The numerical measurement here is the median, $200,000. We need to determine if this measurement describes the population or a subset of the population, a sample. Since it stated that they used property records, we assume that they looked at every home value, so we would call this a parameter. In contrast, let's look at Part B. According to a sample of 500 members of a gym, 40% are extremely satisfied with the variety of equipment at the gym.

This time, we are told that they only talked to 500 members of the gym, a sample. Since they didn't ask every member of the gym, we have a statistic, not a parameter. One way to remember the difference between a parameter and a statistic is to think population parameter, sample, statistic. Now let's talk about the difference between quantitative data and categorical data.

Categorical data is also sometimes called qualitative data. Quantitative data consists of numbers that represent counts or measurements. Here we're referring to attributes that can be measured by a magnitude or a size. It's quantifiable. It's important to use appropriate units of measure for this kind of data.

For example, IQ would be quantitative data because it's a measurement of somebody's intelligence. Height is another example. or energy use.

Any numerical value that is a count or a measurement is quantitative data. Categorical data, on the other hand, or qualitative data, consists of names or labels. These are not numbers that represent counts or measurements. Often these types of variables have a limited number of possibilities. In other words, values fall within one of a few given categories.

Some examples of this type of data would be eye color, assigned gender, city of residence, and any other name, label, or category that doesn't represent a count or a measurement. Under the category of quantitative data, which is our numbers that represent counts or measurements, we can separate this type of data into two other categories, discrete and continuous. Discrete data is quantitative data that can only take on certain values, and the number of values is finite or countable. Finite means that there's only a limited number of values, like 10 or 320. Countable means that there can be infinitely many values, but the collection is countable if it's possible to count them individually.

Often, we can think of there being a natural next. Let's think through a few examples. Consider the data value, the number of coin tosses before tails.

In other words, how many times do you have to flip a coin before it lands on tails? Think about how this data can only take on certain values. Maybe you only have to toss the coin one time before it lands on tails. But maybe you have to toss the coin 10 times before it lands on tails.

The options are 1, 2, 3, 4, and so on. They're all whole numbers. Now is this finite or countable? This is actually countable. There's no limit to how many times you may have to flip a coin before it finally lands on tails.

So even though there's an infinite number of possibilities, We can count each one individually, and there's a natural next. 3 comes after 2, 4 comes after 3, and so on. Another example of being discrete and infinite, so countable, could be thermometer readings on a digital thermometer. A digital thermometer typically reads values to the nearest 10th or possibly the nearest 100th, so there is a natural next.

Consider the standard digital thermometer that rounds to the nearest tenth. What comes after a temperature of 98.6 degrees? The next one would be 98.7.

So even though these aren't whole numbers, there's still a natural next because the digital thermometer doesn't read anything between 98.6 and 98.7. Now for an example of discrete data that's finite. How about the number of physical exams done by a physician over a given week? This is an example of discrete data that can only take on certain values and is finite. There's only a certain amount of patients that a single doctor can see in one week.

There is not an infinite number of values that this can take on. Now, if quantitative data is not discrete, that is, it's not finite and it's not countable, we have what's called continuous data. Continuous data is numerical data, so it's quantitative, that can take on any value within a range. There are infinitely many possible quantitative values where the collection of values is not countable.

Let's think of some examples of continuous data. First, let's talk about length. How about all lengths between 5 and 15 centimeters?

Even though I'm limiting the lengths between 5 and 15 centimeters, there's still an infinite number of possible values between 5 and 15 centimeters. It's not countable. Even if we get a ruler and we measure something to be 7.2 centimeters.

If we get an even more accurate ruler out, we might find that that's actually 7.21 centimeters. And what would be next? What's the next height above 7.2 centimeters? Is it 7.3 or 7.24? For any two given lengths, there's a length in between.

There's not a natural next, so it's not countable. Another example would be thermometer readings on a mercury thermometer. Unlike a digital thermometer that's limited to the nearest 10th or 100th place, a mercury thermometer works much more like a ruler, and the temperature can take on any value within a given range. When deciding if a given quantitative data is discrete or continuous, think about if you obtain the data by counting or measuring, and that can often tell you which category it falls into.

Finally, we're going to turn our attention to different levels of measurement. It's important that we only do calculations or statistical analysis tests that are appropriate for a given set of data. For example, if your data set is phone numbers, it doesn't make any sense for you to find the mean of the phone numbers. That is absolutely meaningless because phone numbers are not the right kind of data to calculate something like the mean. Our most basic level of measurement is nominal.

Data are at the nominal level of measurement if they cannot be arranged in some order, such as low to high, least to greatest, best to worst, because the data consists of labels or categories only. Some examples might include eye color or survey questions where your options are yes, no, or undecided. Even a coded survey on a phone where it says press 1 for yes, 2 for no, 3 for I don't know. Even though they are numbers 1, 2, and 3, they don't have a meaning that allows us to put them in some logical order, because the 1 represents the word yes and the 2 represents the word no.

Our next level of measurement is one step above nominal. Unlike nominal, data are at the ordinal level of measurement if they can be arranged in some order, but differences, or subtraction between numbers, are meaningless or cannot be determined. For example, think about rankings.

U.S. News and World Report will often rank colleges in the United States. So you have some schools that rank higher than other schools, and we can put them in an order from low to high, least to greatest, best to worst. But the difference between two rankings doesn't mean anything.

So we are at the ordinal level of measurement. Another example might be course letter grades. We can put these in some order.

A to F or even F to A, but what's the difference between A and B or B and C? Difference between an A letter grade and a B letter grade really cannot be determined. But that does take us to our next level of measurement. Interval is similar to ordinal.

Data are in the interval level of measurement if they can be arranged in order, but we have the property that differences between any two data values is meaningful. But what the interval level lacks is that there's no natural zero starting point that indicates none of the quantity present. Some examples might include years. We can list years like 1982, 1997, 2024 in a logical order. And differences actually do have meaning.

The year 1990 is 10 years after the year 1980. The difference between those two data values is meaningful. However, with years, there's no natural zero that indicates none. The year zero in our system does not indicate no time or no years. Another example would be body temperature.

And you could think of that in degrees Fahrenheit or degrees Celsius. We can list temperatures in order. One can be larger than the other. And differences have meaning. 42 degrees is 10 degrees more than 32 degrees.

But there is no natural zero that indicates none. Even if we have zero degrees Fahrenheit or zero degrees Celsius, that does not indicate that no temperature is present. But that does bring us to our final level of measurement. Ratio is one step above interval.

It's similar to interval where data can be arranged in order and differences can be found and are meaningful, but there is a natural zero, indicating that none of the quantity is present. And because of this property, distances and ratios are both meaningful. In other words, we can say that one value is twice that of something else.

Examples of data that are at the ratio level of measurement include heights, lengths, volume, an amount of time, and so on. To determine if data is at the ratio level of measurement, we can do two tests, the ratio test and the zero test. In order to do the ratio test, think about one value being twice as much as another value. Let's talk about temperature.

When we talked about body temperature, we can't say that 40 degrees is twice as hot as 20 degrees. It does make sense to say that someone is twice as tall as someone else, or weighs twice as much, or that you have two times as much water in your glass than I do. And in terms of amount of time, we could say that one class is twice as long as another class.

For the zero test, we check for meaning of zero, meaning none of the quantity is present. We talked about how years and body temperature do not have a natural zero because a zero doesn't mean that there's no time or no temperature. But think about height.

A zero height does represent that no height is present. A zero length means that no length is present. And a zero volume means that no volume is present. Or zero time means that there is no time. When we're determining the level of measurement for given data.

Begin with the nominal and work your way up to ratio. All data will satisfy the nominal level of measurement. And as we move up to ordinal, interval, and ratio, we add requirements.

So first check and see if the data satisfy the ordinal level. That is, they can be arranged in a certain order. Then check to see if it's at the interval level of measurement by seeing if differences are meaningful. Finally, check to see if we're at the ratio level of measurement by looking for a natural zero and check using the ratio test and the zero test. That concludes this lecture video where we talked about all different types of data.

Transcript for:1.2 Types of Data

Transcript for:
1.2 Types of Data