490+ Tools Comprehensive Tools for Webmasters, Developers & Site Optimization

Sample Size Calculator

Calculate required sample sizes for statistical significance

Current conversion rate of control variant
Relative improvement you want to detect (e.g., 10 for 10% lift)
Probability that results are not due to chance
Probability of detecting effect when it exists
Estimate test duration based on daily visitors

Understanding Sample Size Calculations

Sample size determines how many observations you need to detect an effect of a given size with a specified level of confidence and statistical power. Proper sample size calculation is crucial for reliable A/B test results.

Key Parameters

Baseline Conversion Rate

Your current conversion rate before making any changes. This is the performance of your control variant. A higher baseline rate generally requires larger sample sizes to detect changes.

Minimum Detectable Effect (MDE)

The smallest relative improvement you want to be able to detect. Expressed as a percentage improvement over baseline:

  • Small effect (5-10%): Requires very large samples
  • Medium effect (10-20%): Requires moderate samples
  • Large effect (20%+): Requires smaller samples

Example: If baseline is 5% and MDE is 20%, you're trying to detect an improvement to 6% (5% × 1.20).

Confidence Level

The probability that your results are not due to random chance. Standard levels:

  • 90%: Acceptable for low-risk decisions
  • 95%: Standard for most business decisions
  • 99%: High-stakes or irreversible changes

Higher confidence requires larger samples but reduces false positives (Type I errors).

Statistical Power

The probability of detecting an effect when it actually exists. Standard is 80%:

  • 70%: Minimum acceptable (30% chance of missing real effect)
  • 80%: Standard recommendation (20% chance of missing real effect)
  • 90%: Conservative approach (10% chance of missing real effect)

Higher power requires larger samples but reduces false negatives (Type II errors).

Factors Affecting Sample Size

Baseline Conversion Rate

Lower baseline rates require larger samples to detect the same relative change:

Baseline Rate Sample Size for 20% MDE
1% ~45,000 per variant
5% ~6,000 per variant
10% ~2,500 per variant
25% ~600 per variant

Effect Size

Smaller effects require much larger samples:

Minimum Detectable Effect Sample Size (5% baseline)
5% relative lift ~90,000 per variant
10% relative lift ~22,000 per variant
20% relative lift ~6,000 per variant
50% relative lift ~1,000 per variant

Best Practices

Calculate Sample Size Before Testing

  • Never start a test without knowing the required sample size
  • Ensures adequate statistical power
  • Helps estimate test duration
  • Prevents premature conclusions

Be Realistic About Effect Size

  • Most successful A/B tests show 5-20% lifts
  • Radical improvements (50%+) are rare
  • Small but meaningful changes (5-10%) require large samples
  • Consider practical significance vs statistical significance

Account for Multiple Variants

Testing more than two variants requires larger samples:

  • Split traffic among more variants
  • Increases required total sample size
  • Consider sequential testing for many variants
  • Use Bonferroni correction for multiple comparisons

Consider Traffic Limitations

  • Low traffic sites may need months to reach sample size
  • Test larger changes that require smaller samples
  • Consider testing on high-traffic pages first
  • May need to accept lower power or confidence

Common Scenarios

E-commerce Product Page
  • Baseline: 3% conversion rate
  • MDE: 15% relative lift (3% → 3.45%)
  • Confidence: 95%, Power: 80%
  • Required: ~17,000 per variant
SaaS Landing Page
  • Baseline: 8% signup rate
  • MDE: 20% relative lift (8% → 9.6%)
  • Confidence: 95%, Power: 80%
  • Required: ~3,200 per variant
Email Campaign
  • Baseline: 20% open rate
  • MDE: 10% relative lift (20% → 22%)
  • Confidence: 90%, Power: 80%
  • Required: ~2,500 per variant
Quick Tips
  • Use 95% confidence and 80% power as defaults
  • Be conservative with MDE estimates
  • Account for traffic fluctuations
  • Run tests for full weeks
  • Don't stop tests early
Common Mistakes
  • Not calculating sample size beforehand
  • Stopping tests when reaching significance
  • Overestimating the effect size
  • Not accounting for traffic seasonality
  • Testing too many variants