Understanding Sample Size Calculations
Sample size determines how many observations you need to detect an effect of a given size with a specified level of confidence and statistical power. Proper sample size calculation is crucial for reliable A/B test results.
Key Parameters
Baseline Conversion Rate
Your current conversion rate before making any changes. This is the performance of your control variant. A higher baseline rate generally requires larger sample sizes to detect changes.
Minimum Detectable Effect (MDE)
The smallest relative improvement you want to be able to detect. Expressed as a percentage improvement over baseline:
- Small effect (5-10%): Requires very large samples
- Medium effect (10-20%): Requires moderate samples
- Large effect (20%+): Requires smaller samples
Example: If baseline is 5% and MDE is 20%, you're trying to detect an improvement to 6% (5% × 1.20).
Confidence Level
The probability that your results are not due to random chance. Standard levels:
- 90%: Acceptable for low-risk decisions
- 95%: Standard for most business decisions
- 99%: High-stakes or irreversible changes
Higher confidence requires larger samples but reduces false positives (Type I errors).
Statistical Power
The probability of detecting an effect when it actually exists. Standard is 80%:
- 70%: Minimum acceptable (30% chance of missing real effect)
- 80%: Standard recommendation (20% chance of missing real effect)
- 90%: Conservative approach (10% chance of missing real effect)
Higher power requires larger samples but reduces false negatives (Type II errors).
Factors Affecting Sample Size
Baseline Conversion Rate
Lower baseline rates require larger samples to detect the same relative change:
| Baseline Rate |
Sample Size for 20% MDE |
| 1% |
~45,000 per variant |
| 5% |
~6,000 per variant |
| 10% |
~2,500 per variant |
| 25% |
~600 per variant |
Effect Size
Smaller effects require much larger samples:
| Minimum Detectable Effect |
Sample Size (5% baseline) |
| 5% relative lift |
~90,000 per variant |
| 10% relative lift |
~22,000 per variant |
| 20% relative lift |
~6,000 per variant |
| 50% relative lift |
~1,000 per variant |
Best Practices
Calculate Sample Size Before Testing
- Never start a test without knowing the required sample size
- Ensures adequate statistical power
- Helps estimate test duration
- Prevents premature conclusions
Be Realistic About Effect Size
- Most successful A/B tests show 5-20% lifts
- Radical improvements (50%+) are rare
- Small but meaningful changes (5-10%) require large samples
- Consider practical significance vs statistical significance
Account for Multiple Variants
Testing more than two variants requires larger samples:
- Split traffic among more variants
- Increases required total sample size
- Consider sequential testing for many variants
- Use Bonferroni correction for multiple comparisons
Consider Traffic Limitations
- Low traffic sites may need months to reach sample size
- Test larger changes that require smaller samples
- Consider testing on high-traffic pages first
- May need to accept lower power or confidence
Common Scenarios
- Baseline: 3% conversion rate
- MDE: 15% relative lift (3% → 3.45%)
- Confidence: 95%, Power: 80%
- Required: ~17,000 per variant
- Baseline: 8% signup rate
- MDE: 20% relative lift (8% → 9.6%)
- Confidence: 95%, Power: 80%
- Required: ~3,200 per variant
- Baseline: 20% open rate
- MDE: 10% relative lift (20% → 22%)
- Confidence: 90%, Power: 80%
- Required: ~2,500 per variant