Measurement & testing
Power (statistical)
Also known as: statistical power · test power
The probability that an A/B test correctly detects an effect when a real effect exists. 80% power is the operator standard — 80% of tests with real underlying effects will produce significant results.
Statistical power is the probability a test detects an effect when one exists — the complement of Type II error (false negative). At 80% power, if you ran 100 A/B tests all with real effects of a given size, 80 of them would produce statistically significant results. Power is a function of sample size, effect size, baseline rate, and significance threshold. Setting power lower (70%, 60%) saves sample size but misses more real effects. Setting power higher (90%, 95%) requires larger samples but catches more. 80% is the operator-standard trade-off for most marketing decisions; bumping to 90% is warranted when the cost of missing a real positive is high (a launch decision, a budget-defending test). Pre-compute required sample size for your chosen power before running — undersampled tests will rarely hit significance, and the team will mis-interpret "inconclusive" as "no effect."
Try the tool
Read next
Sample size: the calculation everyone gets wrong in email A/B tests
Most email A/B tests are powered to detect effects far larger than the test could actually produce. The result: false positives and false nulls, with confident conclusions in both directions. Sample size calculation fixes this before you send. Takes 5 minutes. Here's the 5-minute version.
A/B testing in email: sample size, novelty, and what to report
Most email A/B tests produce winners that don't reproduce. Three reasons keep showing up: under-powered samples, the novelty effect, and weak readout discipline. This guide is about designing tests that actually drive decisions instead of theatre.