Hello and welcome to our lecture today. So, we would briefly recap what we had discussed in last lecture which was analysis of variance or ANOVA. So, ANOVA allows us to probe the contributions of different factors to the variability.
in our measurements. So when you do an experiment, so you have the experimental unit which is the object on which you are doing your measurements. You are doing your measurements, you have your factors or example gender is a factor, Or let us say height, age can be a factor so on and so forth. You have settings which is the gradation of your factor. So, also something called treatment.
So, the treatment can be sometimes it can be the same as setting. So, let us say your setting is just men versus women or gender. So, in case in that case your treatment is same as your setting. However, let us say your treatment can be men women with age.
than 40 and greater than 40. Similarly, men with age less than 40 and greater than 40. So, you can add this you can have. So, in this case you have 2 into 2 combinations or treatment settings. So, if let us say on top of that I introduce height as another metric then I have 2 into 2 into 2 8 combinations ok.
So, treatment may or may not be equal to the setting value and finally, is your response is what you measure. So, ANOVA allows you to try to understand the role or effect of each of these individual factors on the overall variability in the population. So, ANOVA is based on one assumption is based on the assumption that each population is is normally distributed with variance, with a common variance. So, if there were more than population, so it is each population if you have populations are normally distributed with a common variance sigma square.
So, what you are saying is each population is normally distributed and the variance is the same for each population. So, the means may or may not be equal to each other. So, let me give an example of how that might be. So this is your slide 2. So imagine you have two sets of samples randomly selected from two populations.
with identical comma each with identical means ok. So, let me draw one particular case ok. So, imagine one populations one population one of the samples gives you this value the other of the samples ok. The other of the samples gives you this value. So, this is your scenario A.
Imagine the other scenario. So, this is your x1 bar and this is your x2 bar. In the other case, let us say you have one population here.
So, these are your two situations. So, what you observe is in this situation a there is the variability within the groups is much less than between groups. Versus in this case the variability within groups is much greater than that between groups. So this is where the ANOVA approach is important because ANOVA can be used.
So it can be used to compare two means, it can also be used to compare more than two means and determine. effects of various factors. So, in ANOVA you can do one of the two. So, as we said let us take an example. So, there are various ways in which you can draw the sample.
So, one of the experimental ways. designs is called the randomized design, design where random samples so are selected independently from each of k populations. So, in this particular case the number of factors is equal to 1 since you just have population as your variable. But the levels of the factor is equal to k because you have k such populations. So you can ask the question are all population means same.
So, it is possible to do the students t test. So, you can do the students t test, but then you will have to test various hypothesis. You have let us say mu naught, mu 1 equal to mu naught, mu 2, another high test of hypothesis mu 2 equal to mu 3. H0 is mu1 equal to mu3 so on and so forth ok.
So if you have k different populations you have kC2 is a number of tests of hypothesis that you will have to perform. So, increasing the number of test increases the possibility of error. So, ANOVA provides you that ability to come to this conclusion with a single test ok. So, what do we do in ANOVA ok. So, let us say.
So, here you have mu 1, mu 2, mu 3, mu k each of these populations are. So, your variances are same. So, anyone to test the high.
So, let us say your sample sizes are n 1, n 2, n 3. nk and you can have xij is the jth measurement from ith sample. So, you want to test the hypothesis that mu 1 equal to mu 2 equal to mu 3 so on and so forth. So, your Nile hypothesis is mu 1 equal to mu 2 equal to mu k and alternate hypothesis is at least one of the means is different.
So, what you do? So, as I said xij is your jth measurement from ith sample. So, you calculate the following quantities Tss is total sum of squares and you have samples drawn from n 1 n 2 dot dot n k you define n as summation of all n i. So, TSS is defined as total sum of squares. And let us say x bar is summation xij by n.
So, TSS is defined as summation of xij minus x bar whole square. So, you can show if you expand this, you can show that this is nothing but summation of xij square minus. So, this term is called the correction for the mean.
or Cm. So, Cm is summation xij whole square by n and this you can write as g square by n. So, g represents the basically sum of all the terms.
So, in ANOVA you distributed the TSS into two fragments, you distribute the TSS into two fragments one you call as SST. So, TSS equal to SST plus SSE sum of squares of treatments. So, this stands for sum of squares. for treatments and this term is sum of squares of errors.
So, you define SST as summation of, ok. So, xi bar is the sample average for the ith sample, ni is the sample size of the ith sample. So, this you can again expand. and show you can show that SST is same as TI square by NI ok.
TI is the sum of all in i th sample ok total sum of i th sample ok. So, and SSE is given So, you can show that TSS is equal to SST plus SSE. So, what can I write about the degree of freedom or degrees of freedom for each of these terms? For TSS, you can write about You have n terms which you square and add up. So, TSS has to be n minus 1. For SST, you have k terms.
So, this for SST you do this over all k terms. So, SST is the degree of freedom for SST is k minus 1 and for the errors SSE so you have n1 minus 1 S1 square plus 1 and so says E has contribution for n 1 minus 1 plus n 2 minus 1 plus dot dot dot n k minus 1. So this is k times. So this you can simplify equal to n 1. So summation n i you have n terms all together and minus 1 which is k times.
1 k types this is nothing but n minus k. So, given that TSS is equal to SST plus SSE. So, degrees of freedom of n minus 1 is equal to k minus 1 plus n minus k.
So, the degrees of freedom also add up and for corresponding to these. terms you can define MSE which is the mean square either you can define MST mean square for treatments or mean square for error it is. SST by this degree of freedom that is SST by k minus 1 and MSE is the mean square of error is given by SSE by n minus k.
So when you plot all these together you generate an ANOVA table which looks something like this. A typical ANOVA table. will look something like this.
Source, degrees of freedom, sum of squares, mean squares and f value. Your source correspondence to each of the treatments. So, if you have two treatments you will have two, if you have four you will have four of these conditions and you have accumulation from error. This degrees of freedom is k minus 1 and n minus k this is your SST or SST.
E, your M S T or M S E and you will get some value, some F value. So, again, so you have T S S is summation x i g square minus. minus C m S S T is so C m is C m is given by summation x i j whole square by n S S T is summation of T i square by n i minus C m where T i is the total of sample i.
So let us do a case. So imagine you are doing an experiment where you wish to study the effect of nutrition on attention spans. In other words, you want to see that is there a difference between students who have their breakfast and then come to class, whether they pay more attention compared to students who have light breakfast or no breakfast.
So I have a plot of attention times. And I have three categories, three treatments. Students who did not have any breakfast, so no breakfast.
The values are who had light breakfast and who had heavy breakfast. So, for each of these cases I can calculate T i ok. So, T i is what is total of sample i ok. So, you have n 1 is the sample size for condition 1 which is equal to 5, n 2 is also 5, n 3 equal to 5. So, n is summation of n i is equal to 15 ok. So, what is the value total sum of T i you can sum these up.
this is 47, this is 70 and this so this is t1, this is t2 and t3 I can find out as 65. So what is the k? k value is the number of populations. So in our case k is equal to 3, you have 3 different populations. So summation xij, Summation xij is you add up all these values, so it should come out to be 47 plus 70 plus 65 equal to 182. So, your Cm the correction for the mean is summation xij whole square by n should come out to be 182 square by 15. For Tss, is summation t i square by n i minus c m.
So, which you can calculate as 47 square by 5 plus 70 square 70 square by 5 plus 65 square by 5 minus c m and you will get a value of 12. this equal to roughly 130. SST is given by, sorry this is SST. So, your SST is given by summation T i square by n i minus C m. Your TSS, so this is actually gives you a value of. SST will give you a value of roughly 58.5 TSS you have to add up all the squares. So, in other words you have to do 8 square summation Xij whole square.
So, TSS is equal to 8 square plus 7 square plus 9 square you square up all the terms in for each of these conditions and minus you do CM ok. So, this comes out to be value of 129.7 ok. TSS returns you a value of SST. So, you can calculate SSE is equal to TSS minus SST and this gives you a value of ok roughly.
be 58.53, no SSE gives you a value of 71.2 ok. So based on these values we can calculate the ANOVA table ok. So your source is, so is your meal or breakfast, your df you have ss you have ms and you have error. So for bf your degrees of freedom is 2 because k is equal to 3, for error you have equal to 12. N minus k will give you a value of N equal to 15. So N minus k will give you a value of 12. This we calculate as 58.5.
This you calculate as 71.2. You can accordingly calculate M S and M E and you can do the total. This is 14. This is 129.7.
So this is your ANOVA table. Now what do you need to do to calculate your test your hypothesis? So, your H naught is mu 1 equal to mu 2 equal to mu k.
So, if these means were all same then you would have had a distribution. Let us say hypothetically you can have x 1 bar here, x 2 bar here, x 3 bar here. So, in this case you might have agreed to say h 2 is true.
However, if your case was something like this, here your h naught is false. So, sigma square So sigma square your assumption is sigma square is common variance for all k. So your MSE is given by SSE by n-k it is an estimate of sigma square ok and if your H0 was true. So, your MST which is SST by k minus 1, it should give you an unbiased estimate of sigma square.
So, you can use the test statistic. as F equal to Mst by Mac and as before you can use your F test calculate F of alpha and then see whether you can test whether if your F value is greater than F of alpha then you know your hypothesis is not true. With that I conclude my talk today and so you get an idea of how ANOVA can be used instead of repeatedly using students to test for calculating whether means are same you can have.
come to this conclusion using a single approach which is your ANOVA ok. So, you create your ANOVA table by calculating by distributing the total sum of squares or TSS into SST or sum of squares of treatments and SSE which account for the total sum of for random error. So, sum of squares of errors from there you calculate the mean square error or the mean square sum of t and then use the statistic Mst by Mse to find out whether your means are same or they are distinct.
Thank you for your attention.