Transcript for:
Sampling Distributions Overview

Assuming we are gathering good samples, we are now going to develop from all of these samples we are taking a sampling distribution. So, yeah, guys, here comes yet another definition. All right, sampling distribution. It seems like a really big word, all right, but let me break it down for you. The idea of a sampling distribution is you are taking multiple samples, all right? Literally, I want you to see that in this pink word "sampling distribution" is literally the word "sample," all right? What you're doing in a sampling distribution is taking a sample, and not just one sample, you're going to be taking multiple samples, plural, multiple samples. And that's what we're doing in a sampling distribution, just making a probability distribution, which you're like, "Oh my gosh, Hannon, did I learn that word before?" And the answer is yes, all right? You learned the word "sampling distribution" back in 6.1. And so if you're like, "Oh shoot, I don't remember what that is," you're going to need to go back to Chapter Six and refresh your memory. But just as a quick reminder, a probability distribution, and therefore a sampling distribution, is simply a table. It's simply going to be a table where on one side it's listing your outcomes and on the other side it's listing your problem. But the big question is, what are the outcomes? What are the outcomes of this probability distribution table? And so we're going to come back to this table in just a little bit because I want to discuss with you guys how we're going to construct the outcomes of this probability distribution table. Now, particularly in Chapter 7, what we're studying is sample proportions. Chapter 7 even is all about looking at some sort of categorical variable. And remember, categorical variables are summarized by proportion. And remember, we symbolize that using the symbol "p-hat." And so, in particular, let's look at a categorical variable like, "What face did you land on a coin? Did you land on a head or did you land on a tail?" So I want you to note that yeah, we are definitely looking at a categorical variable here because your responses would be a word, "head" or "tail." And what I want to do is ask the question, "What's the probability you will let land on a tail?" Now let's go back and remember when it came to probability, you had two choices. You could either do a theoretical probability or you could do an empirical probability, where empirical probability was practically running trials flipping coins over and over again. Where remember in Chapter 5, theoretical probability was formed by simply thinking about your object of interest and forming the probability from that. So I'll have you guys give me a hand. Let's start with theoretical. I got my coin. My fair coin, two sides, and I want to ask the probability of landing on a tail. What fraction was that? No flipping involved. You're just looking at this coin and asking yourself, "What fraction will represent the probability of landing on the tail?" Yeah, it's 1/2 or 1/2. We will call this the population proportion. That's going to be "P," why? Because theoretical probability is going to represent that true probability. It's what should happen. It's what we are aiming to look for. Theoretical probability, no coin was flipped. Empirical probability is then where we practically flip some coins. So one semester, I ended up bringing 25 coins to my classroom. Passed them out to 25 students, and I had these 25 coins be flipped. Flip, flip, flip, flip, flip. And we found that in that first round, 13 of the 25 coins landed on a tail. Everyone gathered their coins again. I told them to flip their coins again. So everyone did a second flip. 25 coins clanging everywhere. And we found that 10 out of the 25 coins landed on a tail. We did a third trial. Third trial. Flip, flip, flip, flip, flip. And we found that in this fourth trial, 11 out of 25 points landed on a tail. I decided to do five trials total. So flip, flip, flip, flip, flip. This last trial, 12 out of 25 coins landed on a tail. See, each one of these trials are representing a sample. Each one of these trials is representing a sample of what happens when I flip 25 points. And that each of these fractions we just created here are proportions. These are the sample proportions. Hence, "p-hat," the sample proportions of landing on a tail. Now, my first proportion, 13 out of 25, became the decimal 0.52. My second sample proportion, 10 out of 25, is equal to 0.4. My third sample proportion, 14 divided by 25 is 0.56. My fourth sample proportion, 11 out of 25, is 0.44. Lastly, my fifth sample proportion was 12 out of 25, 0.48. I want you to note that each of these decimals here are sample proportions. Each of those decimals there are sample proportions. And the sample proportions "p-hat" are then the outcomes that we then list in this probability distribution table. So my sample proportions are 0.40, 0.44, 0.48, 0.52, and 0.56. The idea of the probability distribution is that you are creating a probability distribution table whose outcomes are the sample proportions. And that if you were to do, instead of five trials, 100 trials, 5,000 trials, you would have repeats of different proportions. And using those repeats, we then can calculate the probability of getting any of those particular resulting proportions. What we just formed here in this table, after having done all of these samples, all of these trials, is then the probability distribution. And if you guys go to the next page, that's then what we're looking at. What we see from this previous example is that we are taking a sample proportion "p-hat" and that taking multiple "p-hats," multiple "p-hats," we created a probability distribution of "p-hat," called a sampling distribution. And that's the definition of a sampling distribution. Let's try constructing one more. Let's try constructing one more sampling distribution. Let's now particularly look at rolling a fair six-sided dice. Can you guys remind me of the fraction, the theoretical probability of rolling a five? You guys remind me of that. What is the probability? 1/6, 0.167. That will be my population proportion. Theoretical again, you're just looking at the object. But again, what we can do is practically take multiple samples, multiple samples of what it means to roll a dice. So again, one day during the semester, I ended up bringing 20 dice with me to class. And I had us roll the dice over and over and over again. So first sample, first trial, 20 dice are rolling all over the place, and 4 out of 20 dice end up landing a 5. Everyone gathered their dice, told everyone to roll again. 20 dice are rolling around, and 2 students rolled a 5 out of the 20 dice that got rolled. Trial 3 happens. Roll, roll, roll. Three out of 20 students rolled a 5. Trial 4 happens. Roll, roll, roll. A different set of four students rolled a 5. Just for good measure, I had everyone do five trials. Roll, roll, roll, roll, roll. And a different set of four students rolled a 5 again. What I want to emphasize is that each one of these trials is representing different samples. Sample 1, sample 2, sample 3, sample 4, sample 5. And that each of these fractions are representing a sample proportion "p-hat," one "p-hat," two "p-hat," three "p-hat," four "p-hat," five. And that each of these proportions, each of these fractions can be written as a decimal, 0.2, 0.1, 0.15. And again, what I want to emphasize, what I want to emphasize is that when it comes to making a sampling distribution, what we are doing is ultimately creating a table. We are creating a table where the left-hand column is representing the outcomes, the outcomes of what proportions we can get. And notice, we can get 0.1, we can get 0.15, we can get 0.2. The idea here when it comes to making a sampling distribution is that we are trying to identify what are all of the possible outcomes we might get and listing them in a table. Why? Why do we want to do this? Well, again, the idea is that each sample is a good sample, and so each sample should represent my population. So if I grab multiple samples, a sample over here, a sample over there, a sample over here, and I gather multiple samples, what I can do then is ultimately look at all of their sample proportions and use them to estimate my population proportion. Question, question is how? And the how is going to come down to the fact that when you're looking at these sampling distributions, when you're looking at the fact you are literally creating proportions which are decimals, which are numbers, we realize we've actually just created a set of data that is numbers. Notice that all of these sample proportions are literally numbers. Why is that so powerful? Why is it so powerful that my sampling distribution literally is numbers? Well, it means then that we can describe an analyze this sampling distribution using everything we learned in Chapter 3 regarding numbers. I know this seems like so long ago, but remember, Chapter 3 was all about looking at numerical data. Remember, Chapter 3 was all about looking at numerical data and asking how can I summarize it. And we said you can summarize numerical data by using these three ideas: shape, center, and spread.