Measurement & testing

Statistical significance

Also known as: p-value · significance testing

The probability that an observed difference between two A/B test variants could have occurred by chance — conventionally, a result is 'significant' when the p-value is below 0.05 (95% confidence the difference is real).

Statistical significance is the framework for deciding whether an A/B test result is real or noise. A p-value quantifies the probability that the observed difference between variants could be due to random chance under the null hypothesis (no real difference). Conventional thresholds: p < 0.05 = 95% confidence (standard for most marketing decisions), p < 0.01 = 99% confidence (used when the downside of a false positive is high). The catch: significance requires sufficient sample size — a test with 100 users per arm will never reach significance for a 5% lift even if the lift is real. Pre-compute the required sample size from baseline rate and minimum detectable effect BEFORE running. Also: declare significance on the primary metric only. Running 10 significance tests against 10 secondary metrics guarantees at least one false positive at p<0.05 by chance (multiple-comparisons problem).

Try the tool

Read next

See also

← Back to the glossary