Measurement & testing

Power (statistical)

Also known as: statistical power · test power

The probability that an A/B test correctly detects an effect when a real effect exists. 80% power is the operator standard — 80% of tests with real underlying effects will produce significant results.

Statistical power is the probability a test detects an effect when one exists — the complement of Type II error (false negative). At 80% power, if you ran 100 A/B tests all with real effects of a given size, 80 of them would produce statistically significant results. Power is a function of sample size, effect size, baseline rate, and significance threshold. Setting power lower (70%, 60%) saves sample size but misses more real effects. Setting power higher (90%, 95%) requires larger samples but catches more. 80% is the operator-standard trade-off for most marketing decisions; bumping to 90% is warranted when the cost of missing a real positive is high (a launch decision, a budget-defending test). Pre-compute required sample size for your chosen power before running — undersampled tests will rarely hit significance, and the team will mis-interpret "inconclusive" as "no effect."

Try the tool

Read next

See also

← Back to the glossary