two sample tests of hypothesis part two this is the series of the three t-tests all right the three t-tests the first type of t-test is the pooled t you're testing two samples um coming from different populations where the population standard deviation are unknown so we're using the sample standard deviation but we make an assumption that they are in fact equal the second type of test is a test of unequal variances so this is where we have population standard deviations are unknown but they are also unequal and the last type right this is called the paired t-test this is when we test dependent samples okay so comparing population mean with equal but unknown population standard deviations again this is called the pooled t-test you will not have to distinguish between when the population standard deviations are equal or unequal for this course however there is a procedure using something called the f statistic where you actually check statistically to see whether the uh standard deviations are equal or unequal so let's read the assumptions of the pool test so in this case the t distribution is used and if you have one or more samples that have less than 30 observations the required assumptions are both populations must follow a normal distribution which is an assumption you don't necessarily have to make that's what was assumed when we look at the problems the populations must have equal standard deviations and the samples are from independent populations the method does not require that we know the standard deviation of the population this gives us a great deal more flexibility when investigating the difference in the sample means so there's two major differences in this test and the previous tests and here we just assume that the population standard deviations although unknown are equal so because of this assumption we pull the sample standard deviations and we use a test statistic the t statistic all right so how do we find the t statistic for the pooled uh pooled t-test well the first thing we need to calculate is the pooled variance and the pooled variance sp squared s squared variance p pooled is we take the sample size of each group so n one minus one times the standard of the variance of sample one plus n two minus one times the variance of sample two divided by n one plus n two minus two this denominator here n one plus n two minus 2 this is in fact the degrees of freedom for this problem so when you look at your t table this is the degrees of freedom you're going to be using n1 plus n2 minus 2 for the pool t-test now the t-test statistic right this is effectively step five of the problem right this is the t is x one minus x two divided by the square root of the pooled variance times one over the sample size of one plus one over the sample size of two the next series of slides will go through an example of this process all right so first question uh comparing population means equal unknown population variances so our pooled so the pooled t-test again we're making a few assumptions about the uh general format of these standard deviations but we're going to assume that the population standard deviations are equal so our question is is there a difference in the mean mounting times between the two different techniques we're going to use a 10 percent level of significance the next slide will go through a full solution of the pooled t-test in this problem in lots of detail the pool t-test sample one okay so let's start with our data set here and our information so our question here is is there a difference in the mean time to mount an engine on the frames of the lawnmowers we have two techniques that we're observing the wells method and the atkins method so this company wants to know if there's a better way to mount the engines or it does it really not matter which way you mount them the time is not any difference so we're going to be testing this at the 10 level let's start off by setting up our hypothesis so step one the null and alternate so remember the question here is based on the alternate hypothesis so is there a difference means that if we set this up we'll say the mean wells is no different than mean atkins which means the null will be me the mean wells is equal to the mean of atkins since we're not given a direction this is going to be a two-tailed test okay so let's go on to step two step two is this is an alpha of point one or 0.10 step 3 this is going to be a t-test well why is it a t-test because first of all you don't know the population standard deviation so sigma 1 and sigma-2 are unknown we're going to make an assumption in this question that they are equal there's not a way to know this by just looking at the data set itself you have to conduct what's called an f test where you basically find the f statistic by taking the standard deviation or the variance of one versus the variance of two in order to calculate the f statistic and then compare it to a critical value but we were not going to be doing this in this class so you will be told if the standard deviations are equal or unequal all right so let's go on to step four so step four is our decision rule okay so the decision rule is set up effectively the same way as it always was this is going to be a t test so we're going to have a t critical value based on the degrees of freedom alpha and the number of tails for the pooled t-test the degrees of freedom are found by taking n1 plus n2 minus 2. there's a document that i uploaded called the five two sample t-tests sorry the five two-sample tests and in it you'll see division of all the different types of tests and the type of degrees of freedom that they have so in this case we had n1 was equal to five and two was equal to six so the degrees of freedom for this problem are 5 plus 6 minus 2 or 9 degrees of freedom as we already established alpha is equal to 0.1 and this is a two-tailed test therefore our t critical value in this case is going to be 1.833 okay so let's draw our diagram these are t diagrams so we have negative 1.833 and positive 1.833 okay so here's our rejection regions okay so if we get a test statistic between negative 1.83 and positive 1.83 we fail to reject and when we fail to reject in this case we're basically saying that the two tests uh the two methods are not different from each other they're the same if we reject we're saying that the tests are the methods are different from each other okay so we have our region set up now we go on to step five step five is the t test but before we can do that for the pooled variance we need to set up the the pooled variant test and in fact what we need to find is the mean and standard deviation of each of our methods so if we look at our data set here if we enter that into our calculator what we find is find that the mean x bar for wells is equal to four and x bar for atkins is equal to five right when we calculate the standard deviation using the standard deviation equation we get that the standard deviation for wells is 2.9155 and the standard deviation for atkins is 2.0976 okay now we need to calculate the pooled variance so let's let's fill this in here into where we were working so we have x bar wells is four x bar atkins is five s for wells is two point nine one five five and s for atkins is 2.0976 now we need our pooled variance and the pooled variance is um remember it's n wells minus 1 times the variance of well squared so standard deviation of well squared plus n atkins minus 1 times the standard deviation of atkins squared which is basically the variance and the denominator is n1 plus n2 minus 2. so the sp squared in this case will be 5 minus 1 times 2.9155 squared plus 6 minus 1. 2.0976 squared all divided by 5 plus 6 plus minus 2 and the pooled variance is 6.222 okay so let's cast let's put that into the test statistic equation so t equals x wells minus x atkins x bar wells minus x bar atkins and that's going to be four minus 5 divided by the square root of 6.222 1 over 5 plus 1 over 6 and that gives us a value of negative 0.662 so now if we go to our diagram up top our rejection regions right negative point six two is in the fail to reject region so we fail to reject the null and we conclude that there really is no difference or there is not enough evidence so not enough evidence that wells and atkins have different times for assembly so the first question the alternate hypothesis question here is there a difference is there a difference in the mean mounting times we would say no there is not a difference okay now don't forget we do have a step six right and don't forget step six is our t our p value so we need to find the p-value for this question it'll be a fairly large number we're not we're going to fail we're going to fail to reject the null but remember degrees of freedom is 9 okay and we're going to be looking at nine degrees of freedom on our t table okay and we're going to be looking at where would we find point six six in that table okay okay so if you look at your table what you'll notice if we're really looking at the two-tail section you have 0.2 you have 0.1 you have 0.05 right you'll notice that and when you go to 90 degrees of freedom the first number there is the second number is 1.833 etc now what you'll notice is that this .66 is going to come somewhere before this 1.33 therefore what you find is that the p-value is going to be greater than 0.2 and if we compare that to our level of alpha right which was 0.1 the p is greater than alpha and therefore we feel to reject the null or there's not enough evidence to reject the null and conclude that there's a difference in the mean assembly times all right the second type of okay so as we saw in this question the decision is not to reject the null hypothesis because 0.662 falls within the region between negative 1.833 and positive 1.833 so therefore there is no not enough evidence that there is a difference in the mean times to mount the engine all right comparing a population means with unknown and unequal population standard deviation so this is the second type of t-test so here the difference between this test and the one we just did with the pooled test is well first off the formula which goes into step five the formula notice that it's not a pooled variance in the denominator whereas the standard deviations are separated and the other really big important feature is that the degrees of freedom has a fairly complex formula this formula is meant to actually down adjust degrees of freedom downward it's almost like a um like an average between the two degrees of between the two sample sizes but this is the formulation you would use when you're doing this type of process all right let's look at our first sample question so here we're looking at comparing paper towel absorbencies so the store wishes to compare a set of store brand towels versus name brand towels so they've selected a random sample of nine store brand towels represented here and 12 name brand towels and our hypothesis question our alternate hypothesis question is using the 10 level um is there a difference in the mean amount of liquid absorbed by the two types of paper towels all right so the next slide we'll go through a tutorial video of solving this full problem unknown but unequal standard deviations all right so let's look at this question so this question is asking um to basically figure out what is the paper dial absorbency so here we have a series of store brand paper towels okay so this is store brand and this is name brand so think of this as like vaughn's brand or ralph's brand and this is like bounty okay and so our question in this case is basically using the 10 percent level is there a difference in the mean amount of liquid absorbed by the two types of paper towel so we want to know is there really a difference between store brand paper towel versus name brand paper towel okay so let's let's start setting this up the real big difference that we're going to see from the pool t-test going from the pool to the unequal t test is that you're going to have a different degrees of freedom equation and you're not going to need a pooled variance in this case but let's set up the equation let's set up the problem so number one step one null alternate okay so the alternate hypothesis in this case is basically to say there is a difference so you'd say the mu of the store brand is not equal to the mu of the name brand okay so the null would be mu s is equal to mu n okay so again since there's no direction we're not saying one is better than the other or more absorbent than the other so this is going to be a two-tailed test step two alpha is going to be point one and you know i know the last two problems we had an alpha point one but really it could be anything that's on the t table um step three uh this is going to be a t-test and specifically it's going to be an unequal variance or unequal standard deviation t-test because sigma 1 and sigma-2 are unknown okay so we're gonna have to solve for the sample standard deviations okay step four the decision rule all right so in this case there is a little bit a little bit of difficulty and that is in the equation for the degrees of freedom okay i'm going to type write this out for you guys as it's a little bit tricky hopefully i get it right okay so let's start with the numerator we have the variance of 1 divided by n1 plus the variance of 2 divided by n2 okay and then this whole thing is squared okay now the denominator the denominator we have s1 squared over n1 and that's squared plus divided by n1 minus 1 plus s2 squared over n2 this is also squared and this gets divided by n2 minus 1. okay so in this case what we need before we really move on and even solve this part is to get the means and standard deviations for this problem okay so let's solve for those okay so we'll just solve these here on the bottom you have the data i recommend you pause this video and solve them yourself and check your work so let's start with the mean sorry x bar for the store brand is going to be of course the sum of all the store brand divided by the number of observations just nine so in this case we get a mean of 6.44 and we get a standard deviation for our store brand ones as 3.32 our name brand paper towels are a mean of 9.417 and a standard deviation of 1.621 okay now our degrees of freedom in this case once we plug in all those fun values so let me let me see if we can do this fairly quickly so we get 3.321 divided by nine okay so that's squared plus 1.621 squared divided by 12 then remember this whole thing gets squared the denominator is going to be three point three two over nine and all that is squared this gets divided by um nine minus one plus 1.621 squared over 12 and that whole thing gets squared and we divide that by 12 minus 1. now if you go through the whole process and the order of operations correctly you should get 10.86 now what you're going to do with this is you're actually going to round it down so no matter what the degrees of freedom is no matter what the decimal is you always round down so in this case we're going to use a degrees of freedom of 10. what this equation actually does is it takes these two values the n one and n two and it actually shrinks them to the lowest possible degrees of freedom and that's why we round down okay so now we have a degrees of freedom of ten so to find our t critical value remember it depends on degrees of freedom alpha and the number of tails degrees of freedom is 10 alpha is 0.1 and we have two tails so our t critical value in this case is going to be negative 1.812 okay so let's draw the distribution okay so since it's two-sided we have negative uh 1.812 positive 1.812 and here's our rejection region so again if we get a value greater than our critical values we reject less than we reject and of course in the middle we fail to reject now we can finally go on to step five okay and step 5 is going to be just purely our t-test so t is equal to x1 minus x2 so it's 6.444 minus 9.417 you divide that by the square root of the variance of one three point three two one squared divided by nine plus the variance of two over 12 and this gives us a value of negative 2.474 all right so if we go to our distribution here our critical values negative 2.474 is clearly in our rejection region so we reject the null so what we conclude we conclude in fact that there is a difference between the name brand and store brand paper towels and therefore we accept the alternate hypothesis that the mean of the store brand is not the same as the name brand okay so we reject the null so if we go to this question that started this series we said is there a difference in the mean amount of liquid absorbed by the two types of paper towels the answer would be an absolute yes now we need the p-value okay so step six our p-value so p-value remember our degrees of freedom is 10 okay so if we look at our table and we look for 2.474 again so if we had our t statistics here's degrees of freedom sorry degrees of freedom of 10. again our two tail values so 0.2 0.1 0.05 0.02 etc if we look at this chart we find that 2.475 is somewhere between .05 and .02 so somewhere right in between those two here so it's between the values of 2.228 which is this one and 2.764 okay so what we would say is that the p-value is between .02 and .05 which are clearly smaller than our alpha of 10 so once again we reject the null and we can conclude that name brand paper towels and store brand paper towels absorption rates are not the same okay uh the two sample tests of dependent samples so what we see here is we get a new uh test statistic in this case and we find that the test statistic is found by taking the d bar and d bar is the mean difference and you divide it by the standard deviation of the difference between the two um the two either time periods or the two prices etc and dividing it by the square root of n this looks very similar to our z equation from chapter 7. the other difference in this section is that the degrees of freedom for a dependent sample is n minus one okay so the degrees of freedom is n minus one all right so this is gonna the next slide will set up a problem so we have this question it says nickel savings and loan wishes to compare two companies it uses to appraise the value of residential homes nickel savings selected a sample of 10 residential properties and scheduled both for an appraisal the results are reported in thousands which doesn't matter in this case and we're going to test the following at the 5 level can we conclude there is a difference in the mean appraisal value of the homes the next slide will go through a tutorial um and solve this question in its entirety so this is when we're testing dependent samples remember dependent samples mean we're testing the same object or subject across time or by different entities etc so for example if you started a new diet and you were interested to know if it was effective you would be weighed at the beginning of the diet and then at the end and therefore you'd be the same subject being weighed at different time periods and that's when you would use a dependent t-test okay so let's look at this question it says nickel savings and loan wishes to compare two companies it uses to appraise the value of residential home nickel savings selected 10 residential properties and scheduled both firms for an appraisal the results are reported in thousands of dollars we don't really have to worry about that and our question at the 5 level can we conclude that there is a difference in the mean appraised value of the homes so there is our hypothesis our alternate hypothesis all right so i'm going to set up the entire problem and then we'll see where we make our modifications again the modifications are going to be in the degrees of freedom and in our last step where we calculate our test statistic okay so let's start with our null and alternate hypothesis so h1 is that their mu shadick is not equal to mu boyer mushadic is equal to mu boyer again since there is no direction implied in this case we're going to say that this is a two-tailed test there's also an alternative way to show the hypothesis test for this type of problem which i don't really want to necessarily go through as it makes things a little bit more weird and confusing but i'm going to write them down just in case the alternate way of showing the null and alternate for dependent test looks like this if you're looking for if there's a difference you set up the alternate as difference is not equal to zero implying there's a difference and then all would be that the difference is equal to zero stating that there is no difference between the two groups so that's the alternate way of writing the null and alternate for dependent samples step two alpha is 0.05 given in the problem step 3 as we're basically looking at the same object by two different entities this is going to be a dependent t-test anytime you see the words before and after or you know you're looking at again the same subject over time or object over time it's your dependent t test okay step four is our decision rule now as we saw that the degrees of freedom vary from group to group so in this case the degrees of freedom for a dependent sample are simply n minus one because you only in this case for example you're only looking at these 10 homes right that's your entire sample and so the degrees of freedom are just n minus 1 because that's the number of observations that are free to vary when calculating the mean so in this case we have 10 minus 1 or 9 degrees of freedom all right so to get our t values again it's a two-tailed test we have t critical value based on 9 degrees of freedom alpha of 0.05 and a two-tailed test that gives us a t-statistic of 2.262 so negative 2.262 positive 2.262 and here's our rejection regions so reject and reject and of course the middle area is our fail to reject region all right so step five is a little bit different now so for step five we need to do a little bit of work we need to calculate a statistic as you saw in that t-val distribution okay so remember the t-statistic is d-bar over sd divided by the square root of n d bar is the mean difference between the two groups so in this case the difference in appraisal value between s and b right that's going to be d bar this is the standard deviation of the differences between the two groups okay so what we need to do first is we need to calculate the mean and the standard deviation before we can plug them into that equation okay so let's let's have a look at our information here all right so um here we have our data all right we have for example we have shadic and boyer so in order to calculate our difference equation we're going to basically calculate a difference between the two so difference is s minus b we'll call it so the first one will be 235 minus 228 that difference is going to be seven the second is five 231 minus 219 that's 12. 2 then we have 7 7 4 negative 5 3 and 4. okay so there's our differences now remember to get the standard to get the mean we have to add up all the differences and divide by n so we add up this column and we get 46 we divide that by the ten observations and find that the mean difference is 4.6 now for the standard deviation just like we calculated standard deviation before we just do d minus d bar squared the difference is squared and we have 5.76 0.16 54.76 6.76 5.76 0.36 92.16 2.56 and 0.36 so the sum of this column is 4 174.4 but remember to get the standard deviation we do the square root of 174.4 divided by n minus 1 10 minus 1. so the standard deviation is 4.40 so we have our mean and our standard deviation so now we can plug it into our t equation so t is equal to 4.6 divided by 4.402 over the square root of 10 and this gives us a t statistic of 3.305 right so when we look at our critical values here 3.305 is out there in this tail and therefore we reject the null and we find that there is a difference in the appraisal values of the homes by the different by the company so s is not equal to b okay so that's our finding just to go back to the question and answer it at the five percent level can we conclude there is a difference in the mean appraised value of the homes the answer would be in fact yes and for our p-value so step 6 for our p-value we would have 9 degrees of freedom 3.3 is somewhere between point zero zero five and so point zero one and point zero zero zero one okay so the p value is going to be between .01 and .001 very small and we know that p is smaller than alpha therefore we further reject the null okay so that concludes our three different types of t-tests okay so in this case we found that there is a difference between the mean of appraised homes by each of the different companies one last component here is when we're thinking about dependent versus independent samples how do we know well the dependent sample is characterized by a measurement followed by an intervention of some kind and then another measurement this could be a before and after dependent sample is characterized by matching or pairing observations so why do we prefer dependent samples to independent samples well by using dependent samples we are able to reduce the variation in the sampling distribution all right well that concludes chapter 11. please make sure you do the practice problems and the connect problems and of course we will do another set of practice problems during our next zoom session