module 23 hypothesis test for a difference in two population means three of four the hypothesis tests for a difference in two population means the general steps of this hypothesis test are the same as always as expected the details of the condition for use of the test and the test statistic are unique to this test but similar in many ways to what we have seen before step one determine the hypothesis the hypotheses for a difference in two population means are similar to those for a difference in two population proportions the no hypothesis ho is again a statement of no effect or no difference so this case we have that the no hypothesis is mu1 minus mu2 which is equal to Z which is the same as saying that mu1 is equal to mu2 the alternative hypothesis ha can be any of the following we could have that the difference is greater than zero we can also have that the difference is greater than zero or we can have that the difference is not equal to zero all of these can be translated as mu1 being less than mu2 let's take a look at how we got that so if we have mu1 minus me2 so we have mu1 minus mu2 being less than zero if we add mu2 to both sides then we get that mu1 is less than me2 which is exactly what we have right here in a similar case we can add mu2 to both sides in this inequality and we will get that mu1 is greater than mu2 again here if we add mu2 to both sides we're going to get the mu1 is not equal to mu2 so these are the possibilities for our alternative hypothesis step number two is to collect the data as usual how we collect the data determines whether we can use it in the inference procedure we have our usual two requirements for data collection samples must be random to remove or minimize bias samples must be representative of the population in question we use this hypothesis tests when the data needs the following conditions the two random samples are independent the variable is normally distributed in both populations if it's not known samples more than 30 will will have a difference in the sample means that can be modeled adequately by the T distribution as we discussed previously T procedures procedures are robust even when the variable is not normally distributed in the population if checking normality in the population is possible then we look at the distribution in the samples if a histogram or Dot Plot of the data does not show extreme SK or outliers we take it as a sign that the variable is not heavily skewed in the populations and we use the inference procedure note this is the same condition we used for the one sample T Test previously step number three assess the evidence if the conditions are met then we calculate the T test statistic the T test statistic has a familiar form of T equaling the sample [Music] statistic minus hypothesize population parameter we then divide that Difference by the estimated standard error but we will not expect you to calculate it instead we will use that crunch if you are curious here is the formula for the T statistic as we learned previously the T distribution depends on the degrees of freedom in the one sample of matched pairs matched pair cases DF is equal to n minus one so the degrees of freedom for the two sample T test determine the correct degrees of freedom is based on a complicated formula that we do not cover in this course for this reason we will use stack crunch for all of these calculations step number number four state a conclusion to State a conclusion we follow what we have done with other hypothesis tests we compare our P value to a stated level of significance if the P value is less than or equal to Alpha we reject the no hypothesis in favor of the alternative if the P value is greater than Alpha where we fail to reject the no hypothesis we do not have enough EV evidence to support the alternative hypothesis as always we State our conclusion in context usually by referring to the alternative hypothesis let's take a look at an example context and calories does The Company You Keep impact what you eat this example comes from an article titled impact of group settings and gender on meals purchased by college students in this study research researchers examined this issue in the context of gender related theories in their field for our purposes we look at this re research more narrowly step one stating the hypothesis in the article the authors make the following hypothesis the attempt to appear feminine will be emperically demonstrated by the purchase of fewer calories by women in mixed gender groups than the women in same gender groups we translate this into a simpler and narrower research question do women purchase fewer calories when they eat with men compared to when they eat with women here the two populations are women eating with women population one and women eating with men population 2 the variable is the calories in the meal we test the following hypothesis at the 5% level of significance the non hypothesis is always mu1 minus mu2 equaling Z which is the same as saying that the null hypothesis is that mu1 is equal to mu2 in this case the alternative hypothesis is going to be that mu1 minus mu2 is greater than zero which is the same as mu1 greater than mu2 here mu1 represents the mean number of calories ordered by women when they are eating with other women and mu2 represents the mean number of of calories ordered by women when they are were eating with men note it does not matter which population was labeled one or two but once we decide we have to stay consistent throughout the hypothesis since we expect the number of calories to be greater for the women eating with other women the difference is positive if women eating with women is population one if you prefer to work with positive numbers choose the group with a larger expected mean as population one this is a good General tip step number two collect data as usual there are two major things to keep in mind when consider considering the collection of data samples need to be representative of the population in question samples need to be random in order to remove or minimize bias representative samples the researchers State their hypothesis in terms of women we did the same but the researchers gathered data by watching people eat at the Hub Rock Cafe 2 on the campus of Indiana University University of Pennsylvania during the spring semester of 2016 2006 almost all of the women in the data set were white undergraduates between the ages of 18 to 24 so there are some definite limitations on the scope of the study these limitations will affect our conclusion and the specific definition of the population means in our hypothesis random samples the observations are collected on February 13 2006 through February 22nd 206 between 11: a.m. and 700 p.m. we can see that the researchers included both lunch and dinner they also made observations on all days of the week to ensure that weekly customer patterns did not confound their findings the author state that since the time period of since the time period for observations and the place where they observed students were limited the sample was a convenient sample despite these limitations the researchers conducted inference procedures with the data and the results were published in a rebuttal Journal we will also conduct inference with this data but we also include a discussion of the limitations of the study with our conclusion the authors did this also do the data meet the conditions for use of a T Test the researchers reported the following sample statistic in a sample of 45 women dying with other women the average number of calories calories order was 850 and the standard deviation was 252 in the sample of 27 wom women dining with men the average number of calories order was 719 and the center deviation was 322 one of the samples has fewer than 30 women we need to make sure that distribution of calories in the sample is not heavily skewed and has no Outlets but we do not have access to a spread sheet of the actual data since the researchers conducted a t test with this data we will assume that the conditions are met this includes the assumption that the samples are randomly selected and indep and independent step number three assess the evidence we use stat crunch to to conduct the two sample T Test two sample T summary hypothesis test mu1 being the mean of population one mu2 being the mean of population 2 and we are looking at the difference between mu1 and mu2 here is our no hypothesis meaning that the difference is zero our alternative hypothesis is that the difference is greater than than zero so here we have our hypothesis test results no we will not require you to calculate the chest statistic by hand or to use an applet to find the P value if you are curious here's the formula for computing the t t statistic here are population one is women eating with other women so xbar 1 is equal to A50 the standard deviation for population 1 is 252 n one right the sample size for population one is 45 and population two is women eating with men so the sample mean is 719 the sample standard deviation is 3 122 and the sample size was 27 and so if we wanted to compute the T statistic by hand we would use this formula and we would get the T statistic is about [Music] 1.81 the formula for determine that degrees of freedom is even worse we will leave these calculations to St Crunch and focus instead on drawing conclusions step number four stated conclusion generic conclusion the hypotheses for these tests are that the no hypothesis say is saying that the difference between the means is equal to zero the alternative hypothesis is telling us that the difference between mu1 minus mu2 is greater than zero since the p value is less than the significance level we reject the null and accept the alternative conclusion and context at Indiana University of Pennsylvania the mean number of calories ordered by undergraduate women eating with other women is greater than the mean number of calories ordered by undergraduate women eating with men here we have a P value of 0385 a comment about conclusions in the conclusion above we did not generalize the finding of all women since the samples include only undergraduate women at one University we include this information in our conclusion but our conclusion is a cautious statement of the findings the authors see the results more broadly in the context of theories in the field of so social psychology in Con in the context of these theories they write our findings support the assertion that M size is a tool for influencing the impression of others for traditional age predominantly white college women diminished meal size appears to be an attempt to assert femininity in groups that include men this Viewpoint is echoed in the following summary of the study for the general public both men and women appear to choose larger portions when they eat with women and both men and women choose smaller portions when they eat in the company of men according to new research published in the Journal of Applied social psychology the study conducted among a sample of 127 college students suggests that both men and women are influenced by unconscious scripts about how to behave in each other's company and these scripts change the way men and women eat when they eat together and when they eat apart should we be concerned that the findings of the study are generalized in this way perhaps but the authors of the article address this concern by including the following disclaimer with their findings while the results of our research are suggested they should be replicated with larger representative samples studies should not be done studies should be done not only with primarily white middle class college students but also with students who differ in terms of race ethnicity social class age sexual orientation and so on so forth this is an example of a good statistical practice it is often very difficult to select truly random samples from the population of Interest researchers therefore discuss the limitations of their sample design when they discuss their conclusion in the following activities you will have the opportunity to practice parts of the hypothesis test for difference in two sample means in two population means