Hi, welcome back. We're starting chapter 12, 12A, in-class activity 12A today. We're going to be talking about Airbnb. So, and it's good because we're close to the end of the semester. So I hope you're thinking about going away and having some fun.
And this will get us started with that. Wouldn't it be great if at this point in the semester, I could just, instead of teaching new material, I could just kind of go over all the old material. That would be great.
We can't do that. But the next best thing we can do, which is up until now, everything we've been doing has been about proportions. We've learned about the sampling distribution for p hat, and we've used the central limit theorem to help create confidence intervals for p hat, and also for the difference of two different proportions.
We've done that. And it's all been in the context of proportions, proportions, proportions. Well, we're now going to do almost the exact same thing for averages.
And we could means we could have taught chapter 12 before we taught chapter 11. And so if something didn't make sense, maybe it'll make sense a second time around. So, and thoroughly engage in chapter 12. And then if you had any problems with chapter 11, go back and maybe it'll all make sense now. So let's go to it. Get out your handy dandy worksheet.
Yay. Okay. So Airbnb, right? I'm going to assume we all know what Airbnb is.
Way of getting really cool places to stay rather than corporate hotels. And we're going to be looking at Airbnb housing prices for staying in New York City. But we're limiting it to the listings, the Airbnb listings under $500.
So we're limiting it to not the super high end places to stay. So the average Airbnb listing in New York City, the average Airbnb listing in New York City is $130 per night. Okay. And that's a footnoted. I think this is in, I think this is 2018. What does it say?
2019. So it might be a little bit higher now, especially with inflation. Which of the following would be more surprising? So what would be more surprising? Doing a single random sample, a single randomly selected listing that costs $200 per night, which is $70 more than the mean, or an average listing price of $200 per night. for a random sample of 25 listings.
So which would be more surprising? So pause and write that out. Tell me what you think and explain.
Okay, usually I say, oh this is just opinion. You could, this part, you know, these openers like this, you could just list your own opinion. There is a right answer to this one.
The one that would be more surprising. would be the listing of 25. Oops, that was not what I wanted to do. This situation, having an average of 25 listings.
getting 200 on average for so equal sharing of the 25 would be more surprising because to get that average of 200 for all of those it would mean that you would have to have all of them or at least most of them significantly higher than the average. Maybe not all 25, but maybe 10. And then nothing could be low like it's high. So it would be more surprising because to yield an average. 25 listings were high than average. And you might, if you're super smart, which I know a lot of you are, you might go, well, maybe there was one outlier.
Maybe there was one hotel that cost, you know, $2,000 a night. Nope, because we limited it. We made sure there weren't extreme outliers like that.
So that's what makes this situation. Could you randomly select one that we know that, you know, we know that it's, well, actually it didn't say that it followed a normal distribution or anything, but we know there are probably outliers out there. And it's possible that you can, when you do a random sample, you can pluck one really super high value out. But to pluck, when you pluck 25. chances are the average of those 25 is going to be very close to the average, the true average. So this is what we're doing today.
We're going to be looking at sampling distribution for averages as opposed to proportions. But let's break it down. As sample size increases, the sampling distribution for the sample mean, and the letter we're going to use for sample mean is X bar.
So everything we did for p-hats, we're now going to do for x-bar. The sampling distribution becomes closer to a normal distribution. Huh, what does that sound like? The check, check. The bigger the sample size, the more p-hat is approaching normal.
And as the sample size increases, variability of the sample means decreases. So... We had that exact same conversation about p-hats.
And when you take larger numbers of samples, larger numbers of samples, so instead of maybe 25, you take 100, the mean of the sampling distribution of sample means equals the population mean. So we did that for p-hats. Center stayed the same.
It's going to be the true proportion. proportion. Now it's going to be the true average.
And it's the, the language is a little confusing because the average of all the P hats was the true proportion P. The average of the averages is an average. So we're going to be using the word average a lot. So, or mean interchangeably.
So equals the sample mean, regardless of sample size, and we're going to have an exploration, just like we looked at the simulations to discover that. In chapter nine, we're going to do the exact same thing today for averages. So it's going to feel very deja vu. We're going to look at the central limit theorem, which I love, and see how it applies to means. So just like the central limit theorem was very powerful for p-hats, it's going to be really powerful for x-bars.
And we're going to, if our sample size is large enough, we're going to be able to use Normal approximations to calculate probabilities for the sample means, i.e. x-bars. Okay, so it's going to be just the same old thing, which is really nice, especially at this time of the year. So when you looked at in-class activities in section 9b and 9c, you were introduced to the idea of sampling variability.
that every p hat could be slightly different from the other. And you were introduced to the idea of the sampling distribution, which at that point was just distribution of p hat. Now it's going to be x bar two.
So it's a probability distribution of the test statistic. It was p hat. Now it's going to be sample means. So that was all introduced to you.
And then in in-class activity 9b, we really kind of hit it home for what the sampling distribution of the sample proportion looked like. And so, and we justified everything. And we said that you would explore in this activity, so we'll pick a pretty green color for it.
new, this is the new stuff. We're going to explore sampling distributions for sample means of varying sizes, just like we did in chapter nine. And we're going to discover that the central limit theorem as it applies to means.
So let's get to it. I can't wait. So let's do a couple of simulations now. So we're going to go to um, this right here so you can cut and paste that into directly into your browser.
I'm going to find it though because I like to find it. That's too big. Okay so it is a sampling distribution. You see here it's got sampling distribution so I'm going to look right here and it's continuous.
it's it's actually this was what we did for p hats and now this is what we're going to do for x bars okay so let's open that up i think that's going to be it continuous there we go so there's and and sure enough that matches this down here so we're good about that so when i hit on that and quite right, did it? Okay. So you will use this tool to simulate samples of different sizes from the Airbnb listings under 500. So no huge outliers where the mean price is calculated. It's going to be the simulated mean.
So we're going to randomly select from the simulation and So step number one, select real population data. Well, let me erase this. It's not relevant.
It's already at real population data, but it's set at college debt. So if you want to really depress yourself, you can look at that one. We're now going to pick New York Airbnb prices. So that's right here. We'll click on that.
and New York Airbnb prices. There we go. Wow. That's interesting. I'm looking at the data window right here and there's 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 places in New York city and 2009 where you can stay there for free.
If I was your mother, I would say don't do that. So here we go. Well, there's no, they've said they're not looking at super high prices. It looks like there's no lower bound. Okay.
That's interesting, but I really can't tell the distribution until I look at the physical graph of it till we get some kind of a dot plot or histogram. So for number two, examine the population distribution for the Airbnb listed and displayed in the data tool. What is the shape of the distribution?
So. It's this is the data right here. Your window might look slightly different.
You might be able to see the distribution. I'm going to have to go down here and there it is. And just like I suspected, I'm going to draw it over here.
So annoying, but I don't know why it's doing that. If anybody knows, I'd like you to draw this in your notes. So we have it.
So I don't know why it keeps changing. It's very weird. When I tap it, it just disappears. So I'm going to, okay.
So the range goes from zero to 500. So there's zero. Oh, that's not doing it. Okay.
500. So that's the range. And it does, it's kind of a wiggly thing. I'm not doing a perfect job, but that's kind of what it looks like. And right here is the true.
population average, which is 130. They don't have that marked. What they have marked is 50, 100, 150. So my scale is not going to be perfect, but you see that. So that's, I'm doing the best I can. It's good enough. What I'd like you to do, make sure to take down, I know that new, the average of all of the listings is 130. And what's nice is they've also calculated the standard deviation.
And this really is a parameter because we have all the listings of all the things that happened in, I guess it was 2019. I was just looking at the footnote. So that's what it looks like. It actually is a little more skewed than I drew it, but I'm not going to. And then let's make sure to label this. X, X.
equals, oh, that's kind of tiny, but X equals individual costs per night. And because I'm running out of room, I'm going to put it the title Airbnb costs. in New York.
So I got a title up there. And but most importantly, I want you to make note of what the mean and the standard deviation is. So true or false, can something that is not normally distributed have a standard deviation?
Yes, it can. And so we have a standard deviation. So the mean is there. And that means the average price is $1.00.
is 130, give or take $85, $85.1, $85. So if we go this direction and this direction, we've captured our typical values are within that range. So it's a special range of the inner most common typical values. So, so I've got that on my graph now, and I can remember that.
And that, and I really want to stress that these are individual. They're just one house after another, after another. So they're not, we're not taking, this is not a sampling distribution and it is not a probability distribution because I'm willing to bet that what is listed up here would be the counts.
That's what I'm, I don't know. I can't see. I guess I could have.
If I divide it by all of them, then it would be a probability distribution, but I'm guessing it's not. Okay, so that is the individual prices. It has a pretty skewed distribution.
So for number three, we're now going to look at two individuals at once and average them. So we're not going to look at a single. cost.
We're going to look at two costs. We're going to randomly select two costs from our database. We're going to average them together.
And what we will calculate is an X bar. And I want you to draw a picture of that. And I want you to, I'm going to put the range, we've got to pay attention down here. I want to keep that same range. So before I even look at the distribution, I'm going to draw this.
And I know the smallest possible average would be zero. And the largest possible average would be 500. And now just like I have here, x is the individual cost per night. I'm going to describe. X bar equals the average or the mean, what are they calling it?
Mean average, it's the same thing. If you were to take a thousand samples from, oh, by the way, how much, how many, what was the data set? How many are there? Does it say? I mean we could look it's going to be oh it's going to be a lot so obviously look at that that's a lot you see that going by oh um it's every oh here it is if I look here I can see it was four 47,666 listings if you look at the if you go way over to the right you can see it in the bolded print okay so we're going to look at we're going to grab two at a time N is 2. and we're going to average them, and we're going to do that a thousand times.
So x-bar is the mean n equals 2. So we're not looking at individual scores anymore, and let's see what we get. So does it tell us how to do this? I know I should know this off the top of my head.
So if you were to take a random sample of a thousand in Airbnb, what shape would you expect the distribution of the sample mean to have? Oh, wait a minute. We're not doing this yet. We want you to guess.
Do you think the shape would be this shape? Oh, I didn't answer two either. What kind of shape is that? It's not, it's not a beautiful symmetrical shape and it's a little more skewed than I said.
It's skewed right is the answer, right? What do you think the shape would be for if you were looking at two at a time? And you know what?
This is actually driving me nuts. I think I have to fix this. The truth is that it's a little more. skewed than I drew it. And I do want it to be accurate.
So it goes up and then it gets kind of skinny. So it's a little more skewed. It has a bigger tail.
So that's a little better. Okay. So what's your guess on what the shape would be if you were to draw it?
to take a thousand random samples, what shape would you expect the distribution for the sample means to be? So shape, you guess. Guess it.
I'm not saying. And then what if you were instead to look at 10 at a time? So if you were to draw the distribution and it would be a similar thing, but it would be x bar equals the mean and n equals 2. What would this shape look like?
What would you guess? So guess away. And if it were 50, x bar mean.
n equals 2. What would it look like? What would the shape be? So you're just guessing, and we're going to end with how would you expect, oh, it's a different question. How would you expect the variability to be?
So variability, what would happen to sigma as the sample sizes increase? Do you think it would be, do you think as... Sigma, no, sorry, as n increases, I guess that the spread, the variability, the spread, and you could use IQR, you could use standard deviation, standard error, same thing, or you could not.
exactly the same thing once the estimate of the other, or you could use range, any of those. What do you think would happen? I'm guessing the spread would blank, increase, decrease, or stay the same.
You choose. It's a guess. Okay.
How would you expect the average of the sample mean? listing prices change as the sample size increases. So again, see, I've taught this class now many times, so it would be unfair for me to guess.
But as N increases, I guess the mean And we're going to call it mu because it's the... Well, no, it's really not the mean. The mean, it might not even be the center of the distribution.
As you can see, that new isn't the mean would A, get bigger, B, get smaller, or C, stay the same. Yes, there is no wrong answer. And we're going to come back and look at those guesses again when we actually have a simulation. This right here is the mama distribution, and it's clearly skewed right.
Are all the shapes going to be skewed right, or will they change over time? Will the variability stay the same? Will it get bigger?
Will it get smaller? Will the center stay the same or will it get bigger or will it get smaller? Okay. So now what we're going to do, now that you have your own personal guesses here, we're now actually going to simulate.
And so just like I wanted you to be really careful here, and I wanted you to draw the actual distribution of the individual scores. I'm going to want you to do the exact same thing. And it's really important to me that you have a picture and that that picture for each situation, it's important that the scale is the same on the X axis. So we're going to have an X axis here.
We're going to have an X axis here. And we're going to have an X axis here. So those are a little bit wonky, aren't they?
But oh well, I tried to draw them straight. And so we're going to have one for n2. So we're going to be averaging two.
We're going to randomly choose two prices and average them. Two prices and average them. We're going to do it a thousand times. So use the tool to generate the sample of each of the following sample sizes.
Select the show normal approximation box to overlay the normal distribution for each. Sketch the graph resulting in the sample means. So we're going to sketch them. We want to make sure the scale is the same.
And we want to mark down, we want to write down the sample means. And we want to write down the standard deviation. and I'm going to do standard deviation in green and I'm going to do the sample means in red.
Okay so it's saying if we go up here so for the first one and I'm going to label my axes too. X bar is the sample mean. You don't really need to say that, but n equals two. So each one is an X bar and the smallest possible value is zero and the largest possible value is 500. And there are values out there that are 500, but it would be kind of shocking to get an average of 500 because if you look at the parent, it would take that you would have to get two.
observations that are super high and average them together, but it could happen. So I'm going to make sure I do everything they say. So for the sample size, I'm going to slide it to two.
Okay. And I don't need to enter the numerical value. And They asked us to show, where does it say that? I didn't do this before. It says, select normal approximation box.
So the normal approximation box is right here. So I'm going to go ahead and check that. Okay.
and that'll give us a nice overlay to see how normal our shape is. And, and it says to do a thousand samples. So even though you're doing two at a time, and that two does not look like a two. I hope that looks more like a two. Oh, that looks even worse.
Let's get rid of that. Put a real two in there. There we go.
That's a healthy two. So just to remind me that I know what am I doing, I'm going to do one and I'm going to hit draw. And so it selected the two values that it selected were $37 a night and $298 a night. And we only have one.
If you look down here, that's the sampling distribution. And that is so not interesting. And so I can do it.
Let me do it 10 more times. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. So the last one, I averaged 250 and 100. So that's going to be 350 divided by 2 is, that's embarrassing, but it's, you know, in 100. So there's our sampling distribution down here. And it's just, it's kind of a mess because we don't have, I only did it 10 times.
So that was just to get a feel. So I'm going to go ahead and reset everything. And I'm going to say, I want it to stay sample size of two. I want to do 1000 of these.
So I'm going to click 1000 and nothing's happened yet. I'm going to hit the draw samples. hopefully it'll show me as I do it. There we go. This one, remember, is the last sample that you saw.
But here is the sampling distribution down here. And the blue is the actual distribution. And does that look, and then we have the normal over it. Does that look normal to you?
No, it's jiggity-jaggity. right? So I'm going to try to draw it, but I do notice that the range is between, it looks like the range maybe is 35 to 350. So I'm going to just, let's keep the scale the same. So this is 50. 150, 100, 150, 200, 250, 300. And the last thing I see, so I didn't make it to 500. So it's not a perfect scale.
I really should have done that. Oh, we'll do it back. Sorry, guys.
just needs to be squashed in a little bit so if that's 500 this is what i'll do this is one oh whoops come back to me okay if that's 500 this is 100 200 300 400 okay that's good so um this is 100 so that's 50. And it's a little bit before 50. That's our first little jiggity-jaggity. And then it goes up from, come on, mobility. It's just, so I'm not, so the mean is that. 150. So this is 200. Scale is going to be really important here.
So the high point is before 100. So I'm just doing the best I can at sort of remembering what the, and yes I'm kind of giving up on trying to make it accurate. But the last place. it really peters out at 300. So it looks like that. So I know that's terrible, but that's good enough.
And I've got a mean of 131. Interesting. And I have a standard deviation of 58. So we would expect the average of the averages to be right about here. There it is.
That's our mean of 131. And the standard deviation, meaning most data points. can be found between here and here. So but that it's by no means normal.
It's not a normal curve. It's pretty skewed just like it's mama. So here's the mama graph and it's pretty close to being that. So that was for n equals two. Let's hit reset.
It's very important that you hit reset. And oh, you would have had different results than me because I just did a simulation, right? So your results, but it still should have pretty similar. And well, even with a sample size of two, your results are probably pretty close to mine.
Maybe not perfect, but so now we're going to do a sample size of 10 and we're going to see what that looks like. So when you want to do a sample size of 10, we're going to take this marker and we're going to slide it to 10. There it is. And I want to make sure I hit reset so that I have a clean slate.
There's the mama distribution and the sample mean everything else is there. So I'm going to do again, a thousand samples and I'm going to draw samples. And what you see, if we look the frequency, this is actually a much taller distribution.
the frequency now, it goes, the first thing, so this right here is that 50, I've got x bar n equals 10, it starts at 50, and so here is zero, there's a hundred, 150, 200. And what I'm noticing is it's all over. The peak is still at about 130. So that's my highest point. And it goes down and it goes like this, tickety-jaggedy, all the way down. And it ends before 50. And then the other way.
It's kind of the same thing and it ends at 200, a little bit past 200. So I can see that my data is closer in. Pay attention. What do they say the mean is?
I got a mean of 131. Looking down here, it's right there. Mean of 131. You might have gotten a slightly different mean, but I bet it's not far off than that. So I hope you're doing this with me.
And then for standard deviation, the spread, I got 26.8. So is the standard deviation staying the same or did it change? It changed.
And it's like, it's a little bit tighter. What do you think is going to happen when we go now to... and a 50. So I'm starting here's zero, there's 50, I'm making sure the scale lines up nicely. There's 100, 150, 200. And I'm going to make a little mark, my mean is still the same. Okay, so now I'm going to do it again for 50. So I'm going to come back here.
And I hope you're doing this because I actually didn't. to now the punchline. I actually know the answer to this.
I don't tell the data center I'm hiding their logo, but I just named all the real estate. Okay, so come back up here, and we're now going to make the sample size. We're going to make the sample size.
Oh, when I wipe that out, it's more normal. But can I say for this one that it's actually normal? No, all that jiggity-jaggity, it's not normal. That's not normal. But even with the sample size of 10, it's getting pretty close to normal.
But it's not normal. It's way better than the sample size of 2. But we're now going to change the sample size. right here to 50. So that means we're looking at, we're randomly selecting 50 prices in 2019, averaging them together. That's one X bar and we're doing it again and again and again, and we do it a thousand times, which would exhaust us if we were not using this beautiful distribution tool. So I'm going to hit the thousand.
And, um, Oh, there it is. Look at that. Oh, that is a thing of beauty. So the smallest value it looks like is if I'm looking, the range looks like just a little bit before a little bit before 100. And then it ends a little bit after 150. that's my range it it kind of peters off right here and the mean my mean and your mean could be different because these are simulations my mean was 130 and my standard deviation is 12.3 and i bet if you're doing this with me your standard deviation is 12.3 maybe it's 12.2 maybe it's 12.1 but it's not a lot different and so um it's we took all the averages and it's actually a much taller distribution because it should have the same area and it's it is kind of jiggity-jaggity but not like that's terrible it's looking a lot like and I'm going to get it right.
It's looking very, very, very normal. And you can see the black curve is way better fitting. So let's make a note that this is x-bar, n equals 50. And these are the averages of 50 prices at a timing average. So what's the pattern here? Okay.
Now you'll compare your results from question four, whatever your guesses were, to your predictions, your guesses from three. So whatever you had guessed here. How does it compare to what actually happened here?
And it's really hard for me to answer that because I've done this a couple of times now. But what I, let's ask specific things. As the sample size increased, so from here to here to here, does the shape of the sampling distribution of the mean listing prices, the X bars change.
Does it match the pattern you predicted? Well, I hope you can see that it's becoming, if you look here and here and here, it's becoming more and more normal. So I'm going to slap that down as an observation. The shape becomes of the x-bar distribution.
and it is a sampling distribution because it's a picture of a statistic, is getting more and more normal as n increases. Okay, so that's shape. So the next thing is what's happening to the standard deviations? What's happening to the spreads? So here was the spread of the first one.
It's about 60. So if we're going to average two at a time, the average of all the averages is going to still be 131. But give or take, 60 bucks. So for the next one, if we averaged 10 at a time, the average of the averages is still 131, give or take $26 and 80 cents. So there's less variation.
And when we go down here, give or take $12 and 30 cents, when you're looking at 50 at a time, you're averaging 50 at a time. So What's happening to the standard deviation? I don't know what your expectation was, but as n increases, the variability, in this case, I'm using standard deviation, variability decreases. Okay, and the last question, as the sampling size increases, How does the mean of the sampling distribution of the mean prices change? Does it match your prediction?
So what I notice here is basically all the same. It doesn't matter. So the shape's inconsistent, maybe.
The spread is inconsistent, but you see the trend that it's getting more normal. The spreads are getting tighter. But the center of all these x-bar distributions, sampling distribution for x-bar, is the same, are the same, is the same, the center is the same. and actually equals the true average of the individual. And we call that average mu.
That is the average of the original data set. And you can see it here. The average of the original data set was 30. And it did actually get more precise. It was identical at the end.
There's a little bit of fluctuation. That's because it was a simulation. But that's $31 versus $130, pretty close.
And so all of these turn out to be the true population. mean or average, just like for P hat, the center was the true population proportion. So that, so everything is just as it's always been.
So, so we're now going to give you, we got this pattern from simulations. There's no math. It's just looking at pattern recognition, but the central limit theorem actually, well, more getting ahead of myself.
you can take it to the bank that mathematical formulas also work. And if you know the true population mean, the true average of everybody's Airbnbs, it's going to be the same value for the sampling distribution. And if you know...
The true... standard deviation, the true spread, it's the new one can be calculated. It's not the same, but this is the same. So what I say, the way I say this verbally, is the new standard deviation equals the original.
chopped down by the sample size. So it's the new one divided by, oh, the square root of the sample size. Don't forget that.
So we're going to practice these math formulas in just a minute. The nice thing is that we do the proportions first because that math formula was way worse than this one. Remember that the variability was square root of P1 minus P all over n. So I have not mentioned the, I haven't mentioned the central limit theorem yet. This is true regardless of the central limit theorem.
And I know that I think right now it's probably going right over your head. But once we practice it, it's going to be really clear. Examine again the population. So we're going to practice these formulas right here. And I'm going to give you a picture.
And that's really going to solidify it. Examine again the population distribution of New York City prices displayed in the data tool. What was mu?
What was the true average? True. population mean and what was sigma the true population standard deviation and if you look at this picture you can see that it was 130 so I'm looking here and I'm looking at the title And the true standard deviation was 85.1.
And if you go back, you can see that in the paragraph there. So that is from the mama distribution of all individuals. So if we look here, because I can't draw on that, it's from this distribution right here.
It's from this and this. It's from the original individual values. Okay, so we're about to use the math formulas.
I'm just, I better stop for a minute. I'll be right back. I can get my screen to let me. Okay, so on to number seven. Let's use the mathematical formulas to answer number seven.
So we're not going to look at the graphs to answer number seven. So I think what I'm going to do is I'm going to write, I'm going to make a nice little chart here, kind of separate out everything. Let's give it a title. Okay.
So we know according to, so this is, I can imagine that this is not, this is floating in your head right now, but it's going to make sense in just one minute. So calculate the mean. and the standard deviation of the sample means listed, of sample means listed, sorry, of sample mean listing prices for each of the following sample sizes using the mathematical formula previously given. So this formula right here.
So. There's not much. um to do in terms of the mean the mean of the distribution is equal of the sample means is equal to the original mean so what was the original mean 130 so we have no work to do it's going to be mu x bar so the average of all the sample means of size two is going to equal equal 130 and mu x bar where we have sample size of 10 it's going to equal 130 and mu of the sampling distribution where you have a sample size of 50 is also equal to 130. No math formulas there we're just using the information and but the standard deviation is going to be a little bit more work.
So for the standard deviation of X bar, standard deviation of the sampling distribution, we know it's this formula, and it's a much nicer formula than the one for proportions. It's right here. It's the new standard deviation is equal to the original standard deviation chopped down by the sample. sample size.
So you grab the original distribution and chop down by the square root of the sample size. So in this case, we know that the original standard deviation is right here. So we'll pop an 85 point.
I'm going to keep staying out. 5.1. And then we know that the sample size is two. So when you run that through the calculator and make sure to get order of operation, right, you're going to get 60.17. So that's for if you have sample size of two.
So if you have a sample size of 10 standard deviation equals the old standard deviation chopped down by the square root of the sample size. And so we're going to have nothing changes. The original distribution for Airbnb is 85.1. And the sample size in this case is 10. So the bigger the sample size.
I get your 26.91. And the last one, standard deviation spread of your sampling distribution. We're going to have a sample size, my doggy, of sample size.
a 50, we're going to pop the 50 in there, but everything else stays the same. And when you work that out, you're going to get a new spread for that sampling distribution, slightly different 12.03. And so what this is telling you is if you're looking at sample sizes of two, You'll expect the center of your distribution to be the same center, but give or take $60.17.
If you look at bigger sample sizes of 10, the center is still $130, give or take $26.91. So the variation chopped down a bit. And then if you have a sample size of 50, center doesn't change.
but the new standard deviation, give or take $12. So there won't be a lot of fluctuation in the sample means. So for number eight, compare the simulated means and standard deviation to the sample means listed in question four. So if I look at the simulated means, and I wonder if this is going to get too small. So...
The simulated memes that I got... 131, 131, and 130. And the simulated standard deviations, they are right here. Boom, boom.
Is that right? I'm going a little blind here. Yep.
that is right. I'm right there. Wow.
They're pretty close. So if they're going to be for 58.8, 26.8, and these are mine, yours are going to be ever so slightly different because simulations always have a slightly different, because they're also, you know, fluctuations so how does that compare well pretty darn spot on um twelve dollars and three cents was from the mathematical formulas twelve dollars and thirty cents was from the simulation so it's only off by pennies really um you $26.91 came from the math formula from the simulation. It was $28.80. So it's the fluctuations a little bit, but it's pretty darn good from the math formula. If you have a sample size of two, you're going to expect a fluctuation, a give or take of $60.17 from the simulation.
It was 58. So it's really close. So this, this simulations are very powerful and the math formulas are very powerful. Um, but another thing that is like the most powerful thing of all is a picture. So, um, so I'm about to draw a picture that's going to tie all of this together. So in class from 9c, you saw the central limit theorem worked for sample proportions.
Um, in this act, activity, you witness the central limit here. I'm working for sample means. So for sample proportions, what we learned, because I really am a picture person, is we learned, I hope that's big enough. It's kind of ironic that, can you see that?
Is that too light? Maybe I should make it just a little bit darker. So this is in the past. I'll make it, what the heck, I'll make it purple. For sample proportions, so you're getting a bunch of p-hat's.
As long as n is big enough, you know that p-hat is going to have this lovely shape. and that the true center, the center of your p-hat distribution, is the true population proportion, whatever that is, and the standard deviation signal is equal to the square root of p 1 minus p over n, and the bigger n gets, the more normal and the smaller your standard deviation. That came from the central limit theorem, and it was covered in the last midterm. Today, you've got this picture going on, and I'm going to read it to you, but I want to draw the picture first. As long as n is big enough, The distribution of the sample means as long as n is big enough.
What's the shape of this distribution? Big surprise. It's called a normal distribution because it's really common.
You get big enough sample sizes, all the data settles down to being this. Nice bell-shaped curve. The center is, we use the Greek letter mu, and it is the true population mean or average.
Those are completely interchangeable. And the standard deviation, sigma for x bar. So just like this was sigma for p hat, sigma for x bar.
the spread, the give or take is going to be equal to the original from the population. This is the population standard deviation chopped down by the square root of the sample size. That's it. So the only real difference is what big enough means.
So So for proportions, what did I have you do? I had you do a check, check. NP is greater than or equal to 10. And N1 minus P also has to be greater than or equal to 10. So you got to do a check, check.
But for this one, That is an amazing color that we can pick here. I guess pink is just so amazing. N, big enough. For X bar, it's so much nicer. N just needs to be greater than or equal to 30. Oh my gosh, there's no check check that you have to do.
You just have to make sure. your sample size is big enough and big enough the threshold the rule of thumb is um 30. so that's what this paragraph so i think a picture is worth a thousand million words so this picture right here oh i'll keep it pretty mind green this picture right here is what this box is saying right here So let's read it. When taking many random samples, so the samples have to be random, just like before, you can't just pick your friends and study them, of sample size n, the population from a population distribution with a mean of given standard, given some known quantity.
some given mu and a given standard deviation, the mean of the distribution for x bars for the sample mean, so I'll just, that's x bar, is also going to be the mu and that's what we're saying right here, same center. And as long as, so n can be any size if you know your original distribution is normal. Then n can be 5. It can be anything you want.
But if you don't know, if the original distribution is not normal, which this one right here. this is definitely not normal, then you want to have a threshold of 30. The central limits theorem states that the distribution for x bar follows normal as long as you get 30 or more. So just if you're doing an experiment, make sure that you, so the replication, make sure you get at least 30. and things will settle down.
Okay. So for number nine, so, and, and that, this picture kind of says it all right here. N equal to 30 works all the time.
N equal to five, if you know your original distribution is normal. Okay. So for number nine, Suppose you're planning a vacation to Los Angeles instead of New York. So you're going to smoggy LA.
Oh, I'm sorry. Beautiful LA. And you'd like to learn more about where you're like, oh, Airbnb there. I bet it's just like New York. You take a random sample and the mean is 152. So this is your observation.
So this 152, that's your X bar, very much like you used to have to deal with p-hats. This is your sample mean. Assume the population of all LA Airbnb listings has the same mean and standard deviation as New York.
So that's an assumption. Okay. Use the mean and standard deviation.
you found in problem seven to calculate the z-score of this observation. And I know this might seem a little overwhelming, but if I just tell you you've done this before and you believe me, then good. You're going to be good.
So, oh, z-score. What the heck is that? Z of anything.
is the observation minus the center of your distribution, or not your distribution, the distribution that it came from, over the spread of the distribution. That's what z-score is. So in this case, The observation is going to be your X bar. In this case, since you're dealing with a sampling distribution, the center is going to be the true.
We know the center, right? Smacked up at the center is the true population proportion. And the spread, the spread, it's not going to be sigma because you're dealing with this. So your... Your...
spread is going to be sigma over the square root of n according to the central limit theorem. So it's not the original population, it's the original population chopped down by the sample size. So this is the formula for z-score.
So we just need to pop in everything we know and we'll be good. Why don't you pause it and work this out. So I get 152 and I'm basically standardizing it minus if New York and LA are the same, the standard deviation, the sigma was 130. I'm sorry, the mu was 130. The average was 130 and the spread is going to be whatever. the population spread was over the square root and my sample size sample of 50 and my spread my original spread is they say it's in question seven Let me go back up here. The original spread.
Oh, we worked it out. You know, the answer actually is 12.03. So it's going to be 85.1 over 50. But why do all that work? Why not just plug in? Let's be lazy.
So for number seven, I know that the standard deviation that for this that sampling distribution is 12.03. So I'm going to pop that in and what do I get? Okay I got for that z-score I got 1.8 8 to...
eight, seven, dot, dot, dot. Do yourself a favor. For z-scores, go to just two places past the decimal. There's your z-score, and that is zx-bar.
That's your z-score for that particular observation. And in particular, if you want to be super particular, it's 152. Okay. So did I answer the whole question?
It said to calculate the z-score. So I went ahead and did that. But then it also said to interpret it, interpret the value.
So I got to do that too. x-bar equal to 152 is almost, but not quite, two standard deviations from the average, the population average from the center of the sampling distribution if LA is just like New York regarding Airbnb. Okay, so sorry then. that blood into that one, but I think I thoroughly answered that now. So it's the same, but it's also different.
Same, but different, same, but different. The formula, the only difference really is that you want to make sure that your standard deviation is this new formula. All right.
Using the normal approximation, find the probability of observing that sample mean listing or higher from a random sample of 50. Well, is n big enough? Do I have to do a check check? n's 50, so it's bigger than 30. So I can assume, Norma, I can assume it's on the normal distribution. So what I can do to find the probability Probability that my x-bar is smoothness, it says 52 or higher, is greater than or equal to 152, is going to equal the area of tail on my normal distribution.
So I could, where does these scores live? They live on normal zero one. So I have a choice I can do, or the probability that Z equal is greater than or equal to 183, 1.83. You choose, you can do either one.
I think this one is probably going to be easier. So there's less to do. Z-scores live on normal and I know that the observation is going to be a little bit less than two standard deviations away, maybe right here, Z, and that tail is going to be right there.
So I've got a sketch. You don't have to sketch it, but I'm going to sketch it. And now I'm going to go to, um, I'm going to go to my lovely tools and I know, well, why not?
I know it's a normal distribution. Why don't I just go to the normal distribution and use my Z score? It's less work.
So find probability, normal, the center for z-scores, you don't change this, you just throw in, it's not going to be a lower tail, it's going to be an upper tail, so switch it around, and then put in your z-score, 1.83, and we get it equals 2.5% of observations of X bars will be equal or greater than $152. Okay. If you don't want to do that, if you want to do the original observation on the original distribution, you could do that too. So I'll just change it for the original distribution.
I know that my center is 130. So I'll put that right here. And I know that my spread, let's obliterate that, my spread is 2 is 1203. So I'll put that right here. But in my opinion, that is just as more work. 12, come on, 12. 12.03. I hope now that's a comma.
12.03. So look at, yeah, I'm looking at the picture. Looks good.
And my observation is 152. And it's a tiny bit different. I think that's because of, oh, I think it's because it's reading, for some reason, it's reading 132 instead of 152. Interesting. I think that my website is not working, is my guess.
It's just not picking it up. Here it is in my single, but there's, there's. Hmm.
Very weird. I have a feeling that it's not speaking to the, I'm not getting a signal, but for some reason Zoom is just going to pause this for one second. Okay, well, there's some weird thing in the space out there.
I don't know what it was, but it wasn't, I think it wasn't, it was seeing a comma. So I'm going to back up for just a minute. Okay. it was having trouble reading my writing.
So right here, right here, we can get the z-score, and it's on, so the picture is fine. So our z, so there's some errors here that we got to fix. Our z is 1.83, and it lives on standard normal. So I'm going to back that up because that was not correct so we're gonna we're gonna type it in because that's probably where there's some error maybe i typed in this so actually the standard the center is zero and the spread is one And we've got the right tail.
So I'm trying to recreate this picture right here. And our z-score is 1.83. So I will pop that in.
That's my cut point, 1.83. And I think it was seeing a comma instead of a decimal. I have no idea. But that means that...
this now that I it's 3.37. So I'm going to scratch this out right here. I'll do it with a big black pen or red pen.
That wasn't right. And now I'm looking at the picture here and I can see that you probably saw it too, that the true area is 3.37%. Okay. So that worked. And then you can say, Well, that was me doing the z-score.
Just to convince ourselves, we could consider a different distribution. So choose that one or choose the original, which would be center of 130, spread of 1203. and similar cut point but the observation is 152 and what would the area what would this area be question mark so i'm going to use my i'm not going to use the my handwriting because that's where that got messed up so the center is 130 the spread is 12 0.03 and the observation is 152. And presto bingo, we get the same area. So the chances of getting that observation or observations that are even more expensive observation the x-bar is going to be 3.37% of the time.
So we did it twice. You don't have to do it twice. And I think moving forward, I want to just do it once.
I want to just do the Z-score because the area associated with the Z-score is equivalent to the area associated with the X-bar. Okay. So what's next?
That was the C. Now we're going to make some decisions. Based on the answers in part A and B, do these data provide evidence that the mean Airbnb listing in LA is higher than the mean Airbnb listing in New York? Well, I want to have an alpha level.
What's my line in the sand? I mean, I do think. Generally speaking, this tells me this observation, this X-bar is unusually high, expensive. So I'm going to, I would be inclined to say that LA is more expensive.
Um, evidence. to suggest LA is more expensive. So if you want to just be informal about it, you can say that.
If you want to be formal about it. If this were a hypothesis test, I'd want an alpha. So if I had an alpha level compared to my p-value, this really is a p-value. The chances of seeing that observation if LA really is just like New York.
So my p-value is 3.37. percent. That's a chance of seeing that or more extreme.
So if my alpha level a normal alpha level, well, we always do it the other direction, don't we? I learned that. So let me put the alpha over here.
It's just, if my alpha level were a normal level, which is the most famous alpha level is 5%. Well, then my p-value is smaller here. So my decision is going to be p-value is smaller than alpha. So reject. the assumption that LA is just like New York.
That's what we were assuming when we went through this. So then, yeah. But what if we had a p-value, same observation, everything, our p-value is not going to change, but our alpha level, maybe it's 1%. Like maybe you really... really want to hold on to the idea that LA is just like New York.
Maybe you want to go there and you don't want to hear it from your partner who you're going with that LA is too expensive. So you're not going to change your mind unless you have overwhelming data. So here, same observation.
And in this case, the p-value is bigger than alpha. So Oh, so that means it's not unusual enough. Remember, p-value measures how likely.
How likely you'll see this observation if H0 is true. And here, it's not surprising enough. Big p-value means not, means pretty likely.
Bigger p-value means not surprising, pretty likely. That's confusing because it's a really, it has, the p-value hasn't changed. It's... pretty small three point but it really depends your decision depends on the threshold that your alpha level is so in this case you would say p value is not p value is not small enough to reject h0 so you would keep the idea keep idea that LA is just like New York. Okay.
So, so I wouldn't, I, unless I'm given an alpha level, I don't know what to say. I don't know what decision to make. But I think I'll go, I think I want to go with this one because it seems more natural to me.
It seems like, yeah, that's the one I'm going to go for. So my conclusion would be, if I'm going with the top one, which I am going with, yay, going with this one, got a small p-value, then my conclusion in context is going to be, and here's that template. that I kind of botched up in the last, I think in the last one. So I wanted to have it ready for you. Template for conclusion, there is or there is not evidence that, and then you put H-A in there and your alternate was LA is more expensive.
So I am going to be very clear that I have an alpha level. of five percent so I'm going to cross this one out. We didn't know which one it was.
But if this is the alpha level that we have, then we're going to reject H naught. So that means there is evidence to reject H naught. So there is evidence that.
And now you have to, when you're answering this question, you want to mention the parameter. You want to mention the population of interest. You've got to fit all that in there. So the population of interest is LA. Airbnb and the parameter is average prices.
So that the true mean prices of Airbnb stays per night in LA. is greater than, you can say New York, but I like to put a solid number in here, greater than, go up here, $130, $130, $130, $130 a night. Okay, yay, we're done. Okay, so let's just check back and see how we did.
As the sample size increases, the sampling distribution of the sample mean will become more normal. So I think we'll just, let's just erase. So as the sample size increases.
x-bar becomes more normal. We saw that happen. The next one, as the sample size increases, the variability of the sample mean, which by the way, we now have a formula for that, sigma over the square root of n gets smaller, and you can actually see it in the mathematical formula that If your denominator gets bigger, your whole calculation gets smaller. When we're taking a large number of samples, the mean doesn't change. It's whatever your true population mean is, regardless of the sample size.
Okay. And we're going to look at the central limit theorem. We did that.
And what we discovered is that you can use your Fremsville normal distribution to calculate the probabilities of events happening. And the highly recommended thing to do is to find your z-score. Find the z-score of the observation, in this case x-bar. and plot on normal, on normal zero one. And we abbreviate that like that, where this is the mean and that's the standard deviation.
So the first one that pops up, normal zero one. And then I wouldn't have all the problems with the decimal point. Oh, I'm still having it. Oh, it's turning.
It's just muting on me. So, and I guess I have to type instead of handwriting. Okay, so we're done with the first section of 12A.
It was a lot of discovery. Use those mathematical formulas and be a little comfy with the simulations when you go ahead to do the practice. All right, and I'll see you next time.