Hypothesis Testing for Population Means

So in this module we're going to start talking about testing hypotheses about a single mean. So remember in statistics our goal is to make an inference about a population parameter given a sample that we've collected from that population. Our goal here is to compare a population mean mu to some known or hypothesized mean mu naught. So mu naught is some known value or some theorized value that we're going to compare. to the population mean mu.

The assumptions here are that we have continuous variable x, so we have continuous data, and that data x is normally distributed with a mean mu and a variance sigma squared. Our design possibilities here are one sample where the population variance is known in which we would use a z-test, or we have a single sample where the population variance is unknown. Where we would use a t-test.

In most cases you will not know what the population variance is, so generally you'll be using the t-test. But in some cases, particularly when you're trying to compare to a hypothesized value, you may use a z-test. The forms of our hypotheses basically can be one-sided or two-sided.

So in the two-sided case, our null hypothesis is that our population mean mu is equal to that hypothesized value mu naught versus the alternative where the population value mu is not equal to the hypothesized value mu naught. In other words, it doesn't matter whether it's greater than or equal than, we're just looking for a difference in those two values. And the one-sided alternative will depend on whether you're testing if the population mean is less than the theorized value or greater than the theorized value. In the case where you're testing whether the population mean is less than the theorized value, your null hypothesis will be that the population value mu is greater than or equal to that theorized value. And in the case where you're testing whether the population value is greater than the theoretical value, your null hypothesis will be that mu is going to be less than or equal to that theorized value.

And so note that the null hypothesis is always going to have the opposite direction from the alternative hypothesis. Our test statistic in the population variance known situation will be a z test. And the z statistic will be computed as shown here. Your sample mean is the x bar, which is collected from the data.

So that's our estimate of the population mean. And of course mu naught is the value of the population mean when the null is true. In other words, the theorized value.

So that difference between the observed and the expected, in other words our sample mean minus the theoretical value mu naught, That's our signal or effect size. And then we're dividing that by the variation in the data or the noise. And here we're dividing by the standard error of the mean.

So sigma, right, which represents our standard deviation. And in this case, again, because the population variance is known, right, remember the popular variance sigma squared is the variance and the square root of that is going to be our standard deviation sigma, and we're converting that to a standard error by dividing by the square root of our sample size of our sample. So we're dividing basically the signal by the noise, the standard error. So a z-statistic, and in fact all of the statistics, or most of the statistics we're going to talk about, eventually break down into a signal-to-noise ratio as shown here. The z-statistic that you get Compute has a standard normal distribution if the null is true, and we can determine a p-value from that test statistic.

So let's look at an example. Let's say we're curious as to whether the mean IQ of LA residents is different from the national average of 100, assuming IQ is normally distributed with a variance of 225. So in this case the population variance is known, it's 225. We can compute the population standard deviation as being equal to 15, right, because that's the square root of 225. And we're going to compare that to the known population. mean IQ of 100. So mu is the true mean IQ of LA residents. Mu naught here in this case is going to be 100. So our null hypothesis is that mu is equal to 100 versus the alternative hypothesis that mu is not equal to 100, a two-sided test.

Right. And we get that because we stated the question as the IQ of LA residents just being different. from the national average. We didn't say it was less than the national average or greater than the national average, we just said it was different.

So there's no direction indicated here. So two-sided test. Our significance level we're going to set alpha equal to 0.05. And let's say we have data from seven LA residents where we've done an IQ test and these are their scores. And if we take the average of these seven values we get a mean value of 99.5.

So this is our estimate of the population IQ mu, right? So that's gonna be our mu here. That's our estimate of that. And we're gonna use that then to do our Z test. So here's our test statistic, right, so z is equal to our sample mean observed average, so that's our 99.57143.

We're subtracting off the known or theoretical value of 100 and we're dividing by the standard error. right so it was the known population variance right 225 so we take the square root of that to get the population standard deviation Sigma which is 15 and we're dividing by the square root of 7 our sample size of 7 the number of residents that we tested and that gives us a standard error and our Z statistic is minus 0.07559 So we can use the Bognar website then to determine the two-sided p-value for a z statistic of minus 0.07559. So if we go to the Bognar website we choose the normal distribution.

Make sure when you go to the normal distribution that the mu, this population mean, is set to 0 and that the population standard deviation is set to 1. That's going to give you the standard normal distribution. And then you enter your Z statistic, our minus 0.07559. We're doing a two-sided test, right? So make sure you choose the two-sided option.

And once you do that, the Bogdnarr website will spit out the p-value, which in this case is 0.93974. And the Bognar website gives you the graphic that shows you how the distribution looks in this situation. So here are the blue bars basically represent these things, represent the Z value and then the area that we're computing are on either side and that's what constitutes our 0.93974.

So again just showing you again the Z statistic. Our two-sided p-value from the Bognar website is 0.93974. Remember, the Bognar website only gives you five decimal places.

If you calculated, manually calculated the p-value, the precise p-value actually is 0.939742. So because this p-value is greater than our alpha level of 0.05, we do not reject the null hypothesis. We accept the null.

And therefore we would conclude these data show that the mean IQ of LA residents of 99.6 is not statistically significantly different from the national average of 100 with a p-value of 0.94. When you report p-values you don't need to put all the decimals. It's reasonable to put you know just a couple of decimal places.

Obviously if the p-value was smaller, let's say our p-value was you know something like 0.001, you want to be able to report those values as well. So be reasonable with the reporting of p-values. Again,.939, it's not significant, so you don't need to report all four or five or six decimal places. Two decimal places gets the point across.

Now, conclusion statements, when you've done a statistical analysis and you're making a conclusion statement based on this, are very, very important. And in particular, there are certain phrases, key phrases, phrases that always should be included in your conclusion statement. First of all you want to repeat what the test really is right so again we're testing whether the mean IQ of LA residents is different from the national average.

Okay so again you see that here in this statement it's useful to report the sample statistic that you're testing so in this case it was 99.6 and you need to state whether the result that you obtained was or was not statistically significant and so the phrase here not statistically significant is important so you make sure you have to you include that in your conclusion statement indicate the direction or the sightedness of your result so in this case again we're doing a two-sided test so we're just saying whether it's different or not If you were doing a one-sided test, you would include statements like, was not statistically significantly greater than or less than, right, depending on how things were supposed to be stated, right? And then, you know, talking about what the comparator value was, so the national average of 100, and then reporting the p-value. Now, many of you may have been taught or told that you can get away with just saying p less than 0.05. or p greater than 0.05 because 0.05 was the alpha level that we used for our test. But reporting p-values in that manner doesn't give the reader a sense of how significant or how not significant your result was.

So in this particular case, for example, our p-value is 0.94, which is very close to a 1, right? It's very not significant. If I had said that our p-value was greater than 0.05... I don't know if my result was 0.051 or whether it was 0.94. And so we always encourage you to report what we call precise p-values, something that gives the reader a sense of what your result really looked like.

Okay, let's look at another example. So in the fall of 2016, we did an experiment in the PM510 class. and ask students to theoretically flip a coin.

So in other words, we ask students to sit down and randomly write out heads and tails as if they were flipping a two-sided coin. And then to take that data that those runs of heads and tails that they did, they were supposed to do 100, and then to count how many times they had consecutive heads or consecutive tails. And so this is basically showing you the distribution of that result from the fall 2016 class. It's shown in blue.

So the number of streaks of three, four, five, or six heads or tails that they did relative to what should have been observed theoretically, which is shown in green. So our question here is, is the mean longest streak of heads or tails in 100... coin flips by pm 510 students pretending to flip coins is that different from what should be what it should be if they were really flipping coins so here we're going to let mu be the true mean longest streak by pm 510 students pretending to flip coins the theoretical distribution so this distribution that we're showing here in green that theoretical distribution has a population mean of 6.977432 with a population standard deviation of 1.7926.

So our null hypothesis is that mu is equal to 6.977432 versus the alternative that mu is not equal to 6.977432. This is again a two-sided hypothesis test because again we're saying is it different right. the way we stated the question, we're just asking if the result by PM510 students is different from the theoretical distribution.

Again, our alpha level is going to be 0.05, and in the fall of 2016, we had 62 PM510 students, and the average streak was 5.112903. So Because we know the population standard deviation, we can do again a Z statistic. So here is our sample mean from our PM5. 10 students.

Here's the theoretical value from the theoretical distribution. Here's the population standard deviation from that theoretical distribution. And again we're computing the standard error by dividing that by the sample size of this class, PM510 class, in the fall of 2016 which was 62 students.

And when we do that we get a Z statistic of minus 8.189956. If we go to the Bognar website and plug in all the numbers, it's a very small p-value. It goes beyond the digits that are displayed on the Bognar website, so it's a p-value less than.00001.

And in fact, a reminder that when you use the Bognar website, if the p-value exceeds the number of digits shown on the screen, you have to report it as less than.00001. right or whatever the number of digits displayed is if you compute the precise p-value so if I compute go and manually calculate what the p-value is the precise p-value actually is 2.6 times 10 to the minus 16 so it's a very very small p-value okay so again because the p-value is less than 0.05 our alpha level we reject the null hypothesis and we accept the alternative So here we would conclude these data show that the mean longest streak of heads or tails in 100 coin flips by PM510 students pretending to flip coins, 5.11, is statistically significantly different from what would be expected from actual flips with a p-value less than 0.00001. Now, alternatively, I personally would probably have reported 2.6 times 10 to the minus 16 here.

But again, in this class, because you're using the Bognar website, you would report less than 0.00001 because that's what Bognar can give you. In real life, you really should report the precise p-value. Okay, so again, going through this conclusion statement.

Sort of restating the scientific question, right? So comparing longest streaks in PM 510 students, the observed value from our data, 5.11, a statement of significance, statistically significantly different. Again, direction, so this different telling us that we're doing a two-sided test as opposed to a one-sided test, and what we're comparing to, right? What we'd expected from actual flips.

And then the level of significance, our p-value. Now again, in most cases, the population variance is unknown, and you're going to have to estimate the population variance from your sample data. So in that case, rather than using a z-statistic, we're going to use a t-statistic.

And you can see here that computing the t-statistic looks very, very similar. to how we computed the Z statistic. The only difference being we're substituting the sample standard deviation now for the population standard deviation.

But everything else is the same. So here's our sample mean from our data, the theoretical value, mu naught, and we're dividing by the square root of the sample size in our sample. So this test, this T statistic, has a distribution that follows students't distribution, with n minus 1 degrees of freedom.

Again, remember, n is our sample size. So it's got n minus 1 degrees of freedom if the null is true. Now what's the difference between the t and the z distribution? The two are related, and in fact, the z is just a special subset of the t distribution.

The way they differ is that the t distribution has fatter tails and the fatness of those tails depends on the sample size. So here you just we're just showing you the z distribution which is our standard normal distribution with a mean of zero and a standard deviation of one and then we're showing you the equivalent t distribution in this case where the sample size the total, this is not the degrees of freedom, this is the sample size is equal to 12, so here the degrees of freedom would equal 11, right, because it's n minus 1 degrees of freedom. But you can see that basically the t distribution has wider tails, and that's because the It's related to the fact that we're estimating the population standard deviation based on the sample that we've collected, right? So we compute the sample standard deviation, and we're using that sample standard deviation as an estimate of the population standard deviation. And that creates additional uncertainty, and that's how the t-distribution takes that into account.

So depending on your sample size... These tails in the distribution are going to change, and that reflects this uncertainty that you have that you're using the sample standard deviation to estimate the population standard deviation. So let's look at an example of this. So let's say we have a question, is the mean CD4 level for HIV positive patients different from 400? And 400 is basically sometimes used.

as the cutoff for an abnormal CD4 level. Okay, so this is our theoretical value or our population value, 400, okay? And we're gonna assume that CD4 levels are normally distributed.

So again, mu, the true mean CD4 level in HIV positive patients is, we're gonna compare that to 400, so our null. Mu is equal to 400 versus the alternative. Mu is not equal to 400. Again, doing a two-sided test, right, because again, we stated the question, is it different from 400? We didn't specify if it was greater than or less than. And let's say we went out and we collected a bunch of data from 10 HIV positive patients.

And we got a sample mean of 305.5 for the CD4 level, and a sample standard deviation of 400. Again, we're gonna set our alpha level to be.05. And here's our t-statistic. So again, walking through this, the sample mean from our data, 305.5, our population, our theoretical value of 400. Our sample standard deviation of 100 that we got from our sample that we collected, and we're computing the standard error by taking the standard deviation and dividing by the square root of our sample size, which is 10. We had 10 HIV positive patients.

And when we do that, we get a t-statistic of minus 2.98835. Again, remember our degrees of freedom for a t statistic is n minus 1, so here n is equal to 10, so our degrees of freedom equals 9. If we go to the Bognar website, we're going to pull up the t distribution. With the t distribution, you have a box here where you enter the degrees of freedom. You then enter your t-statistic, so again, minus 2.98835.

Again, we're doing a two-sided test, so make sure you choose two-sided. And when you do that, you get your p-value of.01524. And again, you can see here on the graphic, basically it's showing you plus 2.98835 and minus 2.98835. nine eight eight three five and we're basically computing the area that's either that's extreme to those values so we're computing the area here and the area here and that makes up 1.5 percent of the area okay so our two-sided p-value is 0.01524 again if you manually computed the p-value the precise p-value is 0.0152416 you Again, because the p-value is less than our alpha level of 0.05, we can conclude that these data show that the mean CD4 of HIV-positive patients, 305.5, is statistically significantly different from 400 with a p-value of 0.015.

Let's look at another example where we're looking at white blood cell count, WBC, in leukemia. And our research question is, do children with newly diagnosed leukemia have elevated white blood cell counts? So is the mean white blood cell count in these children greater than 7,500?

And again, 7,500 is a typical cutoff used for white cell counts. The population parameter mu. is the mean white blood cell count in children with newly diagnosed leukemia.

And because we're questioning whether they have elevated white cell counts, this is a one-sided alternative. We're testing whether the observed sample value is greater than 7500. So our null is mu is less than or equal to 7500, which I've just written out in scientific notation here. So 7.5 times 10 to the third cells per microliter of blood. Versus the alternative that mu is greater than 7500 cells per microliter of blood. And again here our alpha level is set to 0.05.

So we go out and we collect data for white cell counts for 43 children with newly diagnosed leukemia. So here you see their data. It's written in scientific notation.

So you notice here that it says 10 to the minus 10 to the 3 down here at the bottom. And over here on the right, I'm just showing you a histogram of the data. And clearly this data are not normally distributed.

Remember with a one sample test, whether we're doing a z test or whether we're doing a t test, the data must look normally distributed. And therefore, we could not use this test on this data as it's displayed here, because these are not normally distributed. However, if we transform the data and we take the logarithm, for example, we can convert this data to something that looks more normally distributed.

So if I take the log base 10 of my white cell counts and I do a histogram again, now the log base data look much more normally distributed than the original data. So I can do my Z or T test. In this case, we're doing a T test.

Right now I can do it on the log transform data. And we'll talk about this in a little more detail in a future lecture. But you can do these types of statistical transformations where you can take non-normally distributed data, make it look normally distributed, and use tests that require the data to be normally distributed, as in this case.

So this is much better. So we have to rethink or restate. our hypothesis because now we're working with the log base 10 of the data.

So our new population parameter, which we're going to call mu star, is the mean of the log base 10 white blood cell count. Right, remember data, we recorded those data in units of thousands, right, 7,500, but we're using scientific notation, so 7,500 is recorded as 7.5. So we're taking the log base 10 of 7.5. So our new cutoff now is 0.875.

So restating our hypothesis that mu star, right, so mu star, the log base 10 of white cell counts, now is less than or equal to 0.875, which represents our cutoff of 7,500. And our alternative is that mu star is greater than. 0.875. Again our alpha level is 0.05. This is just showing you the analysis using SPSS.

So if you do a one sample test in SPSS you get a brief summary of your data basically showing you so again showing you here log of white cell count it's telling you the sample size and 43 The mean value of your data of those 43 observations was.6765 with a standard deviation of.74226 and a standard error of the mean of.11319. The middle here shows you your one sample test. Very important to note it says here test value so this is the value you're testing against and notice that it says.75 This is something that you manually enter in your when you set up the one sample test. In a lot of cases, you're testing for zero.

And so it's preset to zero. So make sure in this case, right, because again, we're using this log transform value of 0.875. So make sure that you enter 0.875 in the option so that it comes out.

It does the right comparison. So here you see the t value minus 1.753 degrees of freedom. Remember n minus 1 degrees of freedom.

We have 43. observations. So our degrees of freedom is 42. SPSS gives you both a one-sided and a two-sided p-value for this result. So the one-sided p-value it says is 0.043 and the two-sided says it's 0.087.

It tells you what the average difference is. So in other words, it's taking the difference between each of the observations and point the comparison value. 0.875.

So that average, the average of those differences, is minus 0.19847. And then over here it gives you the 95% confidence interval for this mean difference. So the mean difference of minus 0.19847 has a 95% confidence interval that goes from minus 0.4269 to positive 0.03.

Now below that are some calculations for what are known as the effect sizes. This is a default output from SPSS. You do not need to worry about that.

We're not going to cover this in this class. But just know that that's there. Now remember, let's go back. We're doing a one-sided alternative. We want to test whether our cutoff...

our observed data is greater than the cutoff value of the log of 7.5 or 0.875 okay so if you're doing a one-sided alternative you cannot automatically say oh SPSS gave me a one-sided p-value so my p-value must be 0.043 unfortunately that's not how things work and this is you know this is unfortunately a very misleading thing that SPSS does It's convenient in one sense and it's also misleading in another sense. Okay, so let's walk through what should happen. For one-sided tests, the easiest thing to do is to first find the p-value assuming that the test is two-sided. So in this case, right, it was 0.087. You're going to divide that p-value by 2, okay, because you're doing a one-sided test.

So The one, the sidedness gives you half the two-sided test. So that's how this.043 comes. This is, that's basically.087 divided by 2 is.043. Okay.

So you divide the two-sided p-value by 2. We're going to call that p-star. Okay. So.043 is our p-star.

Now, the one-sided p-value will either be that.043, or p star, or 1 minus p star. And to help you determine which one you should use, you should think about it in terms of this little table that we're showing you here. Okay?

So... Here we're showing you the two versions of the alternative hypothesis, right? You could have said that the alternative hypothesis was that the mu is greater than mu naught or mu is less than mu naught, right? So this is how the alternative hypothesis is stated, the actual statement of the alternative hypothesis. And then what you want to do when you do your test is you want to see whether your actual sample mean is consistent with the way the alternative was stated.

So in other words, if your sample mean is less than the cutoff value, as shown here, and you stated the alternative hypothesis to be that, you know, the mu should be less than the cutoff value, then you would report P star. Okay? If the observed value, the sample mean, was actually greater than mu naught, your cutoff, and you had stated it as that it should be less than, then you're going to report 1 minus p star.

So again, you cannot take this one-sided p value at face value. You need to sort of walk through. and see what exactly happened in your data and decide whether you're going to report P star or 1 minus P star.

Okay so in our example the alternative is stated as mu is greater than 0.875 right so that's our mu naught 0.875. Our sample mean was 0.67653 right so let's go back for a second right. So our mean value here, 0.6765, okay? So our mean value is actually less than 0.875, our cutoff value. So if we look at that table, we stated the alternative as mu is going to be greater than the cutoff value.

But what we observed is that our sample mean was actually less than the observed value. And therefore we're gonna report one minus p star. So we choose one minus p star, which is one minus.0435.

So our p value is.9565. Okay, so that's our one-sided p value here, okay? So showing you that on the Bognar website, first you get the two-sided p-value from the t-distribution, right? So you type in the degrees of freedom, 42. Here's our t-statistic of minus 1.753, so let's go back here, right? So that's our t-value, right?

Minus 1.753. And then here's right, we've selected the two-sided p-value, which gives us our 0.069, or 0869, excuse me. Okay, now you have to choose which one-sided test you're going to do.

So again, our observed mean is less than the hypothesized value. So we're going to choose, again, 1 minus p-star, or the right tail of the test. So I'm choosing p. greater than x, p of x greater than x, right, for our minus 1.753, so that gives us my p value of 0.95655. So we're computing, again, this side of the distribution to get our p value, right?

Okay, so please... Carefully walk through the one-sided versus two-sided p-value thing to make sure you're very clear about how that's done. And move on now to the next module.

Transcript for:Hypothesis Testing for Population Means

Transcript for:
Hypothesis Testing for Population Means