📊

Lecture on A/B Testing Sample Size Calculation

Jul 17, 2024

Lecture on A/B Testing Sample Size Calculation

Introduction

  • Challenges in A/B testing: Calculating sample size.
  • Importance for product data science and interview preparation.

Sample Size Calculation

  • Definition: Calculating sample size based on statistical model, significance level, statistical power, and minimum detectable effect (MDE).
  • Common Formula:
    • n = [(Z_critical_significance_level + Z_critical_statistical_power)² * Variance] / Delta²*

Key Components

Parameters in the Formula

  • n: Sample size per group (double for total experiment size).
  • Z critical value: Significance level (α) and statistical power.
  • Variance: Estimation necessary, detailed dive required.
  • Delta: Absolute difference between parameters, not synonymous with lift.

Significance Level (α)

  • Threshold for p-value to consider effect statistically significant.
  • p-value: Probability of observing the effect under null hypothesis.
  • Two-tailed test: α split into α/2 for both sides of the distribution.
  • Common Z-Scores:
    • α = 0.01 -> Z = 2.58
    • α = 0.05 -> Z = 1.96
    • α = 0.10 -> Z = 1.64

Statistical Power (1 - β)

  • Definition: Probability of detecting an effect if it exists.
  • Regions under the curve: Type 2 error rate (β) and statistical power (1 - β).
  • Common Z-Scores for Power:
    • Power = 0.80 -> Z = 0.84
    • Power = 0.90 -> Z = 1.28
    • Power = 0.95 -> Z = 1.64

Delta Calculation

  • Misconception: Delta ≠ Lift. Delta is the absolute difference.
  • Real-life example of the calculation:
    • Baseline rate: 50%, Treatment rate: 55%, Lift = 10% -> Delta = 0.05.
    • Formula for Delta²: (θ2 - θ1)²
  • In practice, for means: μ2 - μ1 and for proportions: P2 - P1.

Variance

  • Formula for one sample proportion: p(1-p).
  • Two sample case: Pooled variance P2(1-P2) + P1(1-P1).
  • For mean scenario: Sample variance s² = Σ_(i=1) [X_i - XÌ„]² / (K-1).
  • Approximation: Multiply baseline variance by two for pooled variance._

Approximation Formula

  • Common approximation: 16 * Variance / Delta².
  • Assumptions leading to 16 multiplier.
    • Significance level (α) = 0.05, Power = 0.80.*

Example Calculation

  • Question Setup:
    • MDE = 10%, α = 0.05, Power = 0.80, Baseline mean = 10, Variance = 20.
  • Steps to Solve:
    • Calculate Z critical values and Delta, and pooled variance.
    • Apply formula to get sample size per group; total size is twice that.
  • Result:
    • Total sample size required = 628 (Control and Treatment combined).

Conclusion

  • Comprehensive understanding of sample size calculations in A/B testing.
  • Additional resources on dataentity.com A/B testing course for more details.