Transcript for:
Sample Size and Power Analysis Guide

in this module we're going to introduce the idea of estimating sample size and of statistical power so sample size estimation is also known as power analysis or power calculations and the idea is that you're trying to determine how much statistical power you have or how many samples do you need to achieve a certain level of statistical power for any given analysis that you're going to perform so basically what we're doing is estimating the minimum sample size that gives us a high probability of detecting a meaningful difference when that difference actually exists and here a meaningful difference can be something like a clinically significant difference or a socially meaningful difference you can Define meaningful in a variety of different ways you can also do this as being representative of the minimum detectable effect size for a given sample size in other words if you're restricted in the number of participants that you can recruit you can flip the question and say if I can recruit 20 people what's the minimum effect size or minimal difference that I can detect given that specific sample size so in this statistical World size does matter there are several common ways to define a meaningful difference so you can derive that by looking at the scientific literature you can get an expert opinion or you can do verification of differences found in a pilot study the idea here is that you want to identify how you're going to Define what meaningful means so in the clinical sense you can think about it as a clinically significant change in some biologic parameter in other words for example if you're looking at cholesterol level what's a meaningful difference health-wise in cholesterol level that one might want to detect so when we do sample size determinations there are four things that we have to specify first is the significance level of our statistical test our Alpha level and of course we've been generally using 0.05 for this but again there's nothing magical about 0.05 and you can change the alpha level to fit the situation that you want to be testing second is the type of test in other words are you doing a one-sided or two-sided alternative when in this study that you're proposing so deciding that ahead of time the level of power 1 minus beta so this is the probability of detecting an effect of the size you've chosen if it really exists typically again we we set power to be 80 percent but there's nothing magical about that number if you want to increase the power to 90 percent or lower it to 70 percent that's appropriate you just need to be able to justify why you're adjusting the power into where you're at and then finally the effect size this is how large of a difference you want to be able to detect sample size calculations or power calculations are extremely important when you do Grant applications if you write a grant and you don't include a sample size or power calculation you can pretty much guarantee you won't get the grant this is something that's very fundamental that goes into every Grant application so again if we go back to the general idea about our decision tree right we want to be able to maximize making a correct decision and we want to minimize our false positives and our false negatives and so we want to set our Alpha level to a level that's appropriate in terms of our type 1 error right trying to avoid a false positive and we want our set our beta right our level of type 2 error our false negativity rate also to a level that's appropriate and again as a reminder 1 minus beta is the power our statistical power so let's look at a paired t-test situation so when we do a paired t-test we have to specify five variables in order to compute a sample size first is our Alpha level our level of significance again typically 0.05 second is our alternative hypothesis are we doing a one-sided or two-sided alternative our level of power 1 minus beta Delta which is the difference we wish to detect right this again is usually a clinically or biologically significant difference in other words remember when we talked about t-tests we talked about the fact that we're looking at a signal to noise ratio so this is the signal and then the standard deviation so this is the assume standard deviation of the paired differences remember with a pair of t-test we're looking at the differences in the two measurements and looking at the the average difference that we observed and so that's what we're looking for is the standard deviation of those differences so that's the level of noise and you can get these the standard deviation usually through pilot data you might do a pilot study to see what kind of variation you might observe or you can get that number from published work or previous work that you've done right so the important note here is that Delta and the standard deviation of the pair differences determine the effect size so there are two equations to compute sample size based on whether you're doing a one-sided test alternative or a two-sided test or alternative and basically you can see that the two equations are very very similar they only differ by this Z value so if you're doing a one-sided alternative you're plugging in Z of 1 minus Alpha and if you're doing a two-sided alternative then you're plugging in a z of 1 minus Alpha divided by two right and that makes sense right because here with the one-sided alternative we're only looking at one side of the distribution while with the two-sided alternative we have to take into account right that the differences might be positive or negative the brackets you notice that the brackets here are not square brackets like this but they're missing the little bottom tag right so instead of doing this we're doing this that's a notation for at least integer in other words it says to round up so whatever we compute for this equation we're going to round the number up to get our sample size so this is just a small table to help you with your sample size calculations so these are common percentiles from the normal distribution that we use to get these critical values so if we go back and let's just focus on the one-sided alternative we have to plug in a z one minus Alpha that represents the the the proportion of the distribution that's related to the type 1 error rate and then we have to do a z of one minus beta which is to represent the proportion of the distribution related to our power and so if we go to this table that we've created for this is just pulling values out of a straight out of a standard normal distribution you can get these values by going to the bognar website and just typing in the right values but we we're providing this table just to make it a little easier for yourself because you notice we've bolded some numbers and those are sort of the common ones that are used so let's start off if we're trying to decide for a one-sided test either our Z of one minus beta so here's our one-sided test if we're trying to determine our Z of 1 minus beta or our Z of 1 minus Alpha all we have to do is look up the appropriate value here so let's say Alpha is 0.05 so here's our take our column for Alpha I'm going to go down to 0.05 there it is and so my Z of 1 minus Alpha is 1.64485 similarly if I'm trying to look up my Z of 1 minus beta and let's say we've set power to 80 percent so beta is 0.2 so you notice we have a column here now for beta we also have the column for one the corresponding column for 1 minus beta so if we come down to a beta of 0.2 that corresponds to a power of 80 percent 0.8 so my corresponding Z of 1 minus beta is going to be 0.84162 now similarly if I was looking for the Z my Z of 1 minus Alpha over 2 for my two-sided Alpha again if Alpha is 0.05 right again we're coming over here then my corresponding Z is going to be 1.95996 right so let's look at an example let's say we're testing whether a new laser treatment is effective in reducing intraocular pressure IOP in glaucoma patients so patients will have their IOP measured before and after treatment so again this is a paired situation right we're doing measurements before and after laser treatment so how many subjects should be enrolled in order to have an 80 chance of detecting a drop of five millimeters Mercury at the 0.05 significance level so note that a drop of five millimeters of mercury is considered to be meaningful by the ophthalmologists so walking through our requirements so we have our Alpha 0.05 the type of alternative is one-sided because we are looking for a drop of let's go back right so we said we're looking to reduce intraocular pressure right so and then under the question we said we're trying to detect a drop of five millimeters of mercury so one-sided alternative so we want to look up a z of 1 minus Alpha because Alpha is 0.05 that gives me a 1.64485 power we're going to set to 80 so my Beta is 0.2 and my therefore my Z of one minus beta is 0.84162 Delta we're looking for a drop of five millimeters of mercury so we have a minus five to indicate the direction right so going down the standard deviation let's assume we have some pilot data of six subjects who had changes in IOP that are shown here so if we compute the sample standard deviation it is 0.9.6 so that's our estimate of the standard deviation so we plug that all in so here's our standard deviation of 9.6 here's my Z of 1 minus Alpha my Z of 1 minus beta my Delta squared and I get 22.791 29 and again we're rounding up we're always rounding up with this notation so that means that my sample size is 23. and so in order to have 80 percent power and to be able to detect a five millimeter mercury drop with an alpha level 0.05 I would need to study 23 people now that's the minimum number that's required but in reality when you do a study people drop out of studies you get dropouts people start they decide oh I don't want to participate anymore so you might want to increase your sample size to take into account expected losses so just as an example for studies that I perform I.E based on previous experience we typically see about a 10 percent loss in participants in our studies so I always increase the sample size by 10 percent to take that into account another important thing is in this particular calculation we did a pilot study right we did a pilot study to get our sample standard deviation to get an estimate of the standard deviation so we do not include the six pilot subjects as part of the 23 subjects we're doing for the study we have to do an additional 20 new 23 participants to do our formal study now let's look at the independent sample situation so without what we just went through was for the parrot samples now looking at the independent samples t-test very similar right so again we have to set our Alpha level we have to decide on our two our alternative whether it's one or two sided we have to set our power typically 80 percent again we have to look at the difference and here the difference Delta is going to be the difference in the means in the two groups that we're testing and again that difference should be clinically or biologically significant we then need to have both the variances in the two groups and this is important because remember we in the independent samples t-test situation we have the variances equal possibility or the variances in the two groups not equal possibly so we need to specify the different variances in the two groups they can be equal but you can also specify different variances for the two groups and again the combination of deltas and the two variances are going to determine your effect size so here is the equation again for the one-sided and the two-sided Alternatives again like before they only differ by the designation of these of the Z value the critical value associated with our Alpha level but you can see that these equations are very similar to the ones for the parity test the only other difference relative to the paired t-test situation is now we have to take into account the variances in the two groups as opposed to the paired situation where we had the variance in the delt the differences in the measurements okay so very similar situation so let's look at an example let's say we're testing whether the mean blood pressure of oral contraceptive users which is going to be our group one is different from the blood pressure in OC non-users group number two okay so the design we have two groups two independent samples one group are OC users the other are not we're measuring their blood pressures so how many OC users and non-users do we need in order to have a 90 chance of confirming the difference in mean blood pressure found in our pilot study at a 0.05 significance level now notice here we're saying a 90 chance so we've bumped up the power of our study from 80 percent to ninety percent so again setting our Alpha level at 0.05 the type of alternative this is a two-sided alternative so I'm going to look up a z of one minus Alpha over 2 which is 1.9 5996 our power again we've set power so beta is 0.1 so we've got ninety percent power so a z of one minus beta for a 90 percent power is 1.28155 and then from our pilot study we found a mean and plus or minus standard deviation of 133 plus or minus 15 for the OC users and on 127 plus or minus 18 millimeters of mercury for the ocus and non-users excuse me and so if I look at the differences in the means 133 minus 20 127 so I'm looking for a six millimeter mercury difference based on my pilot data and based on this two samples that I've collected I have estimates of the variances in the two groups so that's 225 and 324 respectively based on the standard deviations of 15 and 18. so plugging those into the equations we plugged in our standard deviations for the two groups here right plugging in our 1 minus a z of 1 minus Alpha over two our Z of one minus beta and our difference are Delta of six so that comes out to 160.23765 again we're always rounding up so that means we need a total of 161 subjects so this calculation differs from the paired situation because the paired situation it's telling you how many pairs you need here you're Computing the sample size per group so that means I need 161 subjects per group so because we have two groups I actually need a total of 322 participants 161 times 2. and a study like this with equal numbers of subjects is what we call a balanced design and this calculation is for balance designs only We There is a way to compute power and sample size for unbalanced designs but we're not going to introduce that in this class now when you do power and sample size especially when you're presenting for a grant application it's usually useful to provide a range of values to help a reviewer you know evaluate your grants so here's just two examples where you might do a Power table and say in order to have different level you know how many participants do I need given different levels of power or another way to do it is to do a Power curve like this shown here on the right where you plot the number of subjects against the power and then you can show for example that if you wanted to achieve eighty percent power right you you know here's 80 percent you have to get about you know 100 and whatever participants if you want to increase the power it also gives the power curve is nice because it gives the reviewer of visual representation of how quickly the power curve changes because you could have power curves that look like this or a power curve that can look like this and so it gives them a much better idea of what you're dealing with now there are common software packages that do just sample size or power estimation and these include things like inquiry and pass also SAS has a power and sample size series of functions built into it and basically SPSS and other packages also have ways to do power and sample size and there are resources available online to do these types of calculations now I just want to note to everybody that all the sample size formulas that we've introduced in this lecture are using an approximation and so software like inquiry or pass or SAS or SPSS they do these power calculations in a more sophisticated way so if you try to compare the hand calculation that you're that we're doing in these lectures with something that's done on one of these software packages you might get slightly different answers be just because of the way the calculations are being done all right so that concludes this module and that also concludes a series of modules for this week and you can move on to next week's lecture material