hey guys one other frustrating thing about learning a b testing is how you calculate the sample size you might often read literatures about you know some sample size formula but how is that formula actually derived and that is something that you have to build intuition on if you want to succeed in product data science and also prepare for interviews so hopefully in the next couple minutes I can provide some information you need in order to understand how sample size is calculated so let's just start with the definition of sample size calculation what you're trying to do is you're trying to calculate sample size based on the underlying statistical model synfix level statistical power and minimum detectable effect now when you look at literatures on sample size calculation you often see this formula which is n equals the Z critical value of the significance level sum of the Z critical value of your statistical power and then this sum is squared and and it is multiplied by the variance of the point estimation divided by the Delta squared so if I were to label what each of the parameter means what you have is the following so this n represents a sample size per group it's not the sample size that is required for the entire experiment when you're running an A B test you have a control in a treatment condition so this calculation basically means that this is a sample size that's required for one of those two groups so you would have to multiply by two in order to get the sample sizes that are required for the entire experiment now this Z critical value represents the um the one for the significance level um and that's something that we're going to talk about in a bit more and we also have a z critical value of the statistical power and then we also need to estimate what the variance is which is not a straightforward process so that's also something we're going to do more of a deep dive on and this numerator is divided by the Delta squared and this keep in mind that this Delta squared it is not the same thing as lift and that's something that I will clarify more on now let's start by understanding the significance level so when you look at the significance level it is essentially a threshold for which a p-value of an observed effect is considered as distally significant but what does that really mean well basically it means the following If You observe an effect right you ran an experiment on a B test and you see that there's an improvement of your kpi of Interest so let's just say sign up rate and it is better for the treatment um than what you saw in the control but the question is that the the effect that you observe the difference between the two didn't happen because of some random chance or because of the underlying difference between the treatment effect versus the control effect so you need to evaluate this distal significance of that effect on the way you gauge that is comparing the p-value the probability of observing the effect or more extreme under the assumption that the null hypothesis is true and if this p-value is less than the significance level the alpha your rejection region in this case then you would deem that effect to be statistically significant now this null hypothesis has a following distribution of null distribution and you're trying to calculate the probability of observing the effect or more extreme under this null distribution because remember you're assuming that the null hypothesis is true now this node distribution when you are evaluating using a statistical test is normalized in standard normal distribution and what standard normal distribution is it's you're essentially scaling your data in such a way that the mean becomes zero and then the standard deviation becomes one now in most statistical tests oftentimes it is a two-tailed test and what that means is that the effect may be in the negative Direction meaning the control might be worse than the treatment or it might be in the positive direction meaning that the treatment is better than the control and because this is a two-tailed test you have to split the alpha in this case so you allocate half of it by dividing Alpha by two um so one of the region on the left side is going to have Alpha divided by two and then the region on the right is going to be Alpha divided by 2 as well and the threshold applied on this scale that creates this rejection region is going to be the Z critical value or the z-score so on the left side this is defined by z-score of alpha divided by 2 and on the right side this is defined by a z-score of 1 minus Alpha divided by 2. and you could think of the subscript as the percentile that is applied under this probability distribution so let's presume that Alpha is .05 then what this means is that you're trying to find what is the z-score of the 2.5 percentile under this null distribution whereas on the right side 1 minus Alpha divided by 2 is going to give you 0.975 so you're trying to find what is the z-score that represents the 97.5 percentile of this distribution now one thing I want to tell you is that when you are interviewing for a product data science role questions involving how to calculate the sample size is a fair game it might require you to recall what is the corresponding Z critical value for a given Alpha so I want to lead itself for you because I think it's really helpful so common set of Alphas that you often see in a b testing or just statistical testing in general will be 0.01 0.05 and 0.10 and oftentimes the statistical test is going to be two-tailed so the corresponding Z critical value for the alphas in this case is going to be 2.58 for the alpha 0.01 1.96 for the output of 0.05 and 1.64 for the alpha of 0.10 now notice that this z-score is the one on the right side because the subscript is 1 minus Alpha divided by 2. what if you're trying to get the z-score for the left side whatever z-score value you have on the right side you can apply that to the negative region as well so let's proceed in the next concept that we need to cover which is statistical power the statistical power is a probability of detecting an effect given that the effect exists what is the underlying difference between the significance level and statistical power so the underlying difference between those two is that the statistical significance gives you the statistical evidence you need in order to reject the null hypothesis or not but the null hypothesis often claiming that the two distributions are the same however this doesn't give us a sense about what is a probability of detecting an effect if that were true so how do we get the power in this case so statistical power is the region covering the the intersection of this critical value on the null distribution so whatever is to the right of this critical value is going to be your statistical power in this case so if you have this null distribution and you have your Z critical value you would calculate what is a z-score on the alternative distribution that intersects this Z critical value that is on the null distribution and the region to the left is going to be your beta the type 2 error rate which is a probability of rejecting the null hypothesis when the alternative hypothesis is true and the region to the right of this critical value is going to be your statistical power the 1 minus beta in this case so just like how I covered the values that you need to consider for significance level let's also think about what are the critical values you need to know for statistical power as well the common set of statistical power is 0.80.90 and 0.95 the corresponding critical values is going to be 0.84 1.28 1.64 so this is something that you definitely want to know when it comes to thinking about what are the statistical power values you need in order to calculate the sample size now let's talk about the calculating Delta so there's often a misconception that the delta in this case is the lift oftentimes when you think about what the minimum detectable effect is keep in mind that that is represented in terms of lift so what that means is that it is a percent improvement from the Baseline parameter to the new parameter of Interest let's take a very simple example so let's just presume that the sign up rate of the current email form is let's just say 50 and the treatment signup rate is going to be 55 now based on the lift which is the way the MD is expressed that means that there's a 10 lift from the control to the treatment because how do you go from the 50 to 5 well that's essentially a 10 gain off of the control signup rate in this case however in terms of the absolute difference form that is not going to be 10 it's going to be by percentage point so that's what the Delta expresses which is the percentage point so it's important that when you're trying to calculate the sample size you don't plug in the mde expressed in the lift form into this Delta squared because they're not the same thing so this Delta squared is the absolute difference between two parameters as the note of my Theta 2 minus Theta 1 squared now another thing I just want to quickly mention is that the directionality of how you set the data parameter does not matter in this case because it's squared so oftentimes when we're calculating the sample size we're also often evaluating in terms of the mean or proportion in the case of the Delta for comparing two means the expression is going to be mu 2 minus mu 1 squared for proportion it is going to be P2 minus P1 squared now let's see how we calculate Delta squared in action suppose we're given the following problem let's suppose that mde relative lift is 20 and the Baseline CTR is 15 what's a Delta squared in order to calculate Delta squared we need to get what is the treatment CTR in this case and we can derive this from the mde and the CTR of the Baseline so the mde is going to be 20 the CTR is 0.15 in order to get the CTR up 2 or the CTR of the treatment we apply this following formula so we can get it by setting the ctr2 equals the ctr1 times 1 plus mde divided by 100. the reason why we divide the mde by 100 is because the MD is in a percent form so divided by 100 gives us the fraction form and then we add it by one and then we multiply this term by the CTR of the Baseline which is 15 in order to get the ctr2 so what is CTR of two it is going to be 0.15 times the 1 plus 20 divided by 2 which is equal to 0.18 so now that we know what our CTR of two and ctr1 are what is going to be the Delta squared in this case so the Delta squared is equal to CTR of 2 minus ctr1 squared and when we plug in the values it is going to be 0.18 minus 0.15 squared which is equal to .0009 so this is an example of how we translate our mde to the Delta squared which is a required form for the sample size calculation now there's one other parameter we need to cover which is variance and variance is not that straightforward because you can't just simply plug in values as a way to get the corresponding values you have to think about how you're estimating the variance in order to do so it makes sense for us to first of all understand the variance in a one simple form so what is this variance formula for the one simple proportion in this case it is going to be the equal to P times 1 minus p so p is the success rate one minus p is a failure rate what about two simple case so the two simple case is going to be the following it's going to be the pull variance which is equal to P2 times 1 minus P2 plus P1 times 1 minus P1 the subscript in this case represents sample size one representing Group 1 and 2 representing group two and pulled variance means that you're summing the variances of the two groups now what about for the mean case so when you're trying to estimate the population variance you need the simple variance form which is s squared and that is equal to the summation of I equals 1 to k k representing the sample size in this case x i minus X bar X bar representing the sample mean squared and this is divided by K minus 1. the reason why you would divide it by K minus 1 is because it gives you the unbiased estimation of the population variance now when you're trying to calculate the sample size that is required for two simple means the formula is going to be the following you simply take your Baseline variance which is the S1 squared in this case and you would multiply it by two to get the pulled variance for this two simple case you might be wondering why can't you calculate the sample variance of group 2 in this case the reason why you can't do that is because you don't know what this simple variance is of group 2 which is often the treatment until you've actually run the experiment think about the sequence of designing an experiment even before you gather your data for your treatment in this case you have to preliminarily calculate your sample size and the sample size calculation requires you to know what your variance is so in a way it is a little bit of a chicken and egg problem on one hand in order to know what the sample size is required for your A B test you need to know what the variance is yet unless you've actually run the experiment you wouldn't know what the variance is for the treatment so that's why it's important to note that this variance is ultimately an estimation it is an approximate estimation of what you expect the variance is going to be and that's why you would multiply the symbol variance of the Baseline condition which you can gather through the current performance and multiply by two in order to get what is a pull variance the pull variance that you would plug in to the sample size calculation formula now there's another final point that I like to make when it comes to the sample size formula oftentimes you might see it in this following form which is that the sample size per group approximates to 16 times the variance divided by Delta squared now one thing I want to let you know is that this is an approximation of the sample size per group in the test for two simple means but where does the 16 come from so that's something I want to clarify so what does 16 comes from is the assumption that you're trying to calculate the sample size based on common parameter values of Interest the alphas being 0.05 and a beta being 0.2 meaning your Cisco power in this case is going to be 0.80 and the various form because it is the two simple mean this is going to be the pull variance two times these simple variance as I mentioned a minute ago so when we plug in the following values what we get are the following so for the Z critical value at Alpha equals .05 it is going to be 1.96 and the Z critical value when the statistical power is 0.80 it's going to be 0.84 and the pull variance in the two sample mean case is going to be 2 times the sample variance and this is divided by the Delta squared so when we simplify this the value that we're going to get is going to be 15.68 times your sample variance of your Baseline condition divided by Delta squared which approximates to 16 times the variance divided by Delta squared so hopefully this provides additional context in terms of a popular formula you often see when it comes to sample size calculation all right so now that you have an understanding of how to calculate the sample size let's apply this in an actual a b testing example suppose that in an A B test the desired mde and a two-tailed test is 10 at the significance level of 0.05 and the statistical power of 0.80 given that the Baseline mean is 10 and the variance is 20. what is the total sample size required for the experiment assume that there are two groups so this question is a bit of a mouthful so we're gonna deconstruct this as a way to make sense of this so what we know about this test is that it is a two-tailed test comparing two means in this case we have some baseline we have the treatment and the significance level is 0.05 with the Cisco bar being eighty percent and the mde is ten percent so what this means is that with the Baseline mean at 10 we want to see 10 Improvement in the treatment case and we want to validate that with statistical significance so the main question is what is the total sample size required for the experiment so keep in mind that we're trying to calculate the total sample size not just to sample size per group in this case we have the two groups the control and the treatment so ultimately we would try to find out what is the sample size required on a per group basis and then multiply it by two because the sample size is for the control and the treatment are the same so we start with the following formula which we have discussed before that n equals the Z critical value of alpha plus Z critical value of statistical power squared those simple variance divided by Delta squared we're going to divide and conquer we're going to first of all figure out what the critical values are the Delta and the variance as a way to get the sample size required on a per group basis so what are the critical values the cortical values for the alpha in this case is going to be 1.96 . the critical value for the statistical power is going to be 0.8 what about for the Delta squared in this case so our Baseline mean is at 10 and our mde is 10 percent so what that mean is that we want to see the treatment mean at 11. and given that the Baseline mean is at 10 and treatment mean is at 11 the Delta squared is going to be 11 minus 10 squared which just gives us 1. what about for the pulled variance it's going to be to pull simple variance which is equal to two times the simple variance and we know that the simple variance for the Baseline is 20. so we simply Take 2 times 20 which gives us 40. so we know what the critical values are the Delta and the pole variance we now have all the values we need in order to calculate what is the sample size per group the sample size per group becomes a following it equals to 313.6 which we need to round up in order to accommodate for the decimal so the sample size we need on a per group is 314. now the question ultimately asks us what is the total sample size required for the experiment so we simply take 314 and multiply by 2 which gives us 628 so that is the total sample size required for the experiment so there you have it so hopefully this provides comprehensive details in terms of how you calculate the sample size per group in an A B experiment if you want to learn more about this make sure you check out data entity.com and there's an A B testing course which covers more in-depth details along with coding exercise on a b testing please give a like if you found this video helpful and I'll see you in the next one