Discrete Data and Probabilities

Let's remember: discrete numerical data is when we are looking at counting numbers. When the responses of my variable, the outcomes of my variable, are counting numbers, the first thing, and probably most important thing, I want to point out is that in a lot of ways, categorical variables are actually very similar to discrete numerical variables. Why? Because ultimately, in both of them, we can list the outcomes. With discrete numerical variables, you literally can list the outcomes as either zero, one, two, three, or four, and with categorical variables, you can also list out the outcomes like blue eyes, green eyes, brown eyes. What I want you to realize is that a lot of what we're going to do in this next example will feel almost identical to what we saw with the categorical variable example of song type. Why? Because instead of listing words for my outcomes, we are now going to list numbers. See, when we're working with discrete data, the probability distribution will once again be a table. It will once again be a table where the first row or column will list all the possible outcomes, and the second row or column will once again list all the probabilities. So, in this example below, we see this probability distribution, which again is just a fancy, fancy word to say we are looking at a table that is listing my outcomes. Number of books checked out by a patron, it's either 0, 1, 2, 3, or 4, and then their corresponding probabilities. Again, what I want you to see here is that the variable of interest is the number of books checked out by a patron per visit. Again, when you go to a library, you check out one book, you check out two books, you do not check out 3.5 books, you do not saw that fourth book in half and only take home half of it. Number of books is a discrete variable because it is used with a counting number. So, what I want us to do is first make sure we have a valid probability distribution. Again, what is the first thing we need to be a valid probability distribution? You need to make sure every probability is between zero and one. That's the first rule we absolutely always need to check. But honestly, the second rule is more important. The second rule is more important because we want to make sure the total probabilities equal one. If you add up all the probabilities together, you see here that the total of all the probabilities is in fact one. Why do you need to check that? It's only when both condition one and Rule two hold can you then find probabilities using this table. What is the probability a patron does not check out a book? What number of books are we looking at here? Yeah, we're looking at zero. And so, what you do is you go to your table, you look for that outcome of zero, and you find its corresponding probability, 0.05. Now, it does get a little bit more complicated when you look at multiple outcomes. So let's try another one. In this case, what is the probability a patron checks out two or more books? In situations where you are being asked for multiple outcomes, notice "two or more" is looking at multiple outcomes. When they say "two or more books," which outcomes are we looking at here from my table? Yeah, we're looking at two, three, or four. When you're being asked to look at multiple outcomes, the way we handle this is you look for the probability of each outcome and then you add them together. So, in this case, we look for the probability of two books being checked out, the probability of three books being checked out, the probability of four books being checked out, and ultimately, we are going to want to add all those probabilities together. So, 0.20 plus 0.10 plus 0.05, adding them all up together gives us 0.35. Gives us then the probability. What's the rule of thumb here, guys? If you are looking at multiple outcomes, it means you need to add the probabilities. It's also really nice that we can actually graph discrete data. Again, you can use histograms like we saw in the past, where a single block might represent that single outcome, but a lot of times with discrete data, we'll actually draw these line graphs instead, really to emphasize the individual of each of the outcomes. So, say if we're looking at number of stars, notice how each outcome has a line shooting out from it to represent the probability, the probability that outcome will occur. When you're looking at the graph of a probability distribution, the x-axis will still be all the outcomes, but now the y-axis is going to be probability. That's the big difference. Now, we're used to having the variable on the horizontal axis, but the difference now is that the vertical axis is going to represent the probability. So, let's try graphing the probability distribution again of these books that we're looking at. Again, we're going to draw vertical and horizontal axes where the horizontal axes is once again going to be number of books, and we'll draw equally spaced tick marks to represent the five possible options of number of books they can check out from the library. Again, the concept of your variable being on the horizontal axis is exactly what we've seen from chapter 2. What's new is that the vertical axis now is going to represent the probability. But the beautiful thing is that these distribution tables literally give you the probability, right? So there's no calculations you need to do. All you simply need to do is look for the largest probability, the largest probability, which is 0.6, and just make sure however you draw your tick marks that largest probability can appear in the tick marks. So, for instance, if we increase by 0.1 from tick mark to tick mark to tick mark to tick mark to tick mark, we just want to make sure that that largest probability does in fact appear on the vertical axis, so that when we draw each of the vertical lines, say for zero having a probability of 0.5, that line can be fully drawn. When we're looking at that number one book being checked out, it has a 60% probability, we can ultimately draw it to that appropriate height of the probability. It's pretty straightforward. It's pretty much like what we've seen with histograms by this point. Um, it's just using lines now, lines and dots instead. I want you guys to feel like, "Man, this kind of feels like chapter one and two all over again." So then what about chapter three? Well, chapter 3 was all about finding that summary value. But here's the thing: when it comes to discrete data, when it comes to discrete data, the idea of taking a mean is going to be a little different. The idea of taking the mean of a discrete probability distribution isn't simply just going to be adding up all of my outcomes and dividing by five. Why? Because ultimately, the probability, the weight of each outcome is different. Notice how there is a much higher chance someone's going to pick out one book than four books. And so when it comes to discrete data, discrete data that have different probabilities of occurring, we have to calculate that idea of center in a different way. And we do that by finding expected value. For any of you guys who enjoy playing casino games, poker, roulette, for any of you guys who enjoy playing video games, the way that those games are created is based off of the idea of expected value. Because expected value is going to calculate for outcomes that occur more, it's going to calculate for outcomes that occur less. So let's talk about how do we calculate expected value. The way that we calculate expected value is you are going to take every outcome and multiply it by its probability. So, for instance, we're going to take zero, that has a probability of 0.05, and we're going to multiply those two numbers together, multiply those two numbers together, and then add this, add this to then the product of the next outcome times its probability: 1 * 0.6. You probably guessed it. We're going to continue this trend of taking the next outcome of two, two, and are multiplying that by its probability of 0.20. We'll keep doing that for 3 * 0.10, its probability, and four times its probability of 0.05. And you're quite literally going to type this into your calculator exactly as you see it and find a number. You should get 1.5. And so, what 1.5 is emphasizing is what say the librarian will then expect, what the librarian will expect on a typical day. They expect one and a half books will be checked out by patrons per visit. Now of course, the librarian knows no one's gonna cut a book in half, and so really the idea of one and a half is just emphasizing a patron will typically check out somewhere between one to two books per visit. And so the idea of expected value, much like mean, is it's giving us this one number to hang on to, to understand the general trend, the typicalness of say, books being checked out.

Transcript for:Discrete Data and Probabilities

Transcript for:
Discrete Data and Probabilities