Measurement & testing
False positive
Also known as: Type I error
An A/B test result declared significant when the apparent effect was actually due to chance — increases with each additional metric compared, with early peeking, and with multiple variants tested against a single control.
A false positive is an A/B test that appears to show a real effect but actually doesn't — the observed difference was noise. At a standard 95% confidence level, 5% of A/B tests where there is NO real effect will still produce a "significant" result by chance alone. The false-positive rate inflates further when operators: peek at results mid-test and stop early (peeking inflation, up to 3x the false-positive rate), compare multiple metrics and pick the winning one (multiple-comparisons problem — testing 5 metrics at p<0.05 produces an effective false-positive rate closer to 23%), or run many variants against one control without Bonferroni adjustment. Every falsely-declared winner rolls out changes that don't work, and over time the cumulative effect on program performance is substantial. Discipline: pre-declare primary metric, run to pre-computed sample size, declare significance once.
Try the tool
Read next
False positives in email A/B tests: why half of winning tests don't actually win
Run enough A/B tests and some will show 'significant' lift from pure noise. Programs that ship every significant winner end up with a collection of imaginary improvements they can't tell apart from real ones. Here's how to spot the fakes and avoid the trap.
A/B testing in email: sample size, novelty, and what to report
Most email A/B tests produce winners that don't reproduce. Three reasons keep showing up: under-powered samples, the novelty effect, and weak readout discipline. This guide is about designing tests that actually drive decisions instead of theatre.