· 9 min read
Price-testing through email: what's testable, what isn't
Someone on the team has an idea for a new price, discount, or offer structure. The email channel is the fastest way to get it in front of users. The test ships. A winner is declared. Three months later the winning offer has been rolled into the base program and nobody can reproduce the lift. This pattern is so common it should be the default expectation. Price tests through email are uniquely prone to false positives — and uniquely able to damage the program when the wrong lesson gets locked in.
Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
What email can actually test
A winner is declared. Three months later the winning offer has been rolled into the base program and nobody can reproduce the lift. This pattern is so common it should be the default expectation.
Email is a credible test environment for a narrow set of pricing questions. It can test whether a specific offer (X% off for Y product) converts better than a different specific offer at the same message moment. It can test whether free-shipping framing beats dollar-off framing at the same effective discount. It can test whether a deadline, a free-trial extension, or a price-lock message produces better conversion than the control version of the same campaign.
The common thread in that list: each test is a single message moment, a narrow audience, and a conversion window that closes quickly. Inside those constraints, email tests can be rigorous. You can power them correctly. You can read them cleanly. You can ship the winner without risking the rest of the program.
What email cannot test: whether the underlying price should change. Whether the product is worth more or less than its current price tag. Whether a subscription tier is correctly positioned. These are pricing questions. They require audiences, durations, and measurement apparatus that email tests cannot provide.
Why most email price tests produce false positives
Three mechanical problems push email price tests toward false positives at a rate much higher than other kinds of email tests.
Novelty effect, amplified.Price-related copy is unusual in a lifecycle program — most emails don't lead with a number. When a variant does, engagement spikes on novelty alone for the first few days. On a standard two-week test, the novelty window can carry enough of the measurement period that the variant looks like a winner even after it fades.
Audience selection bias. Price tests in email often reach only the opened-the-email cohort. That cohort is dramatically more engaged than the full audience. A discount offer that converts well for a heavily engaged cohort will over-predict performance when rolled out to the full base. Measuring conversion per sent (not per opened) and confirming the test population matches the rollout population are both essential, and both frequently skipped.
Cannibalisation.The variant's lift is often real within the test window but disappears at program level because the converting users would have converted anyway at the control offer a week later. A 20% conversion lift that entirely consists of pulled-forward conversions is a timing change, not a revenue gain. Short-window tests rarely catch this.
The Orbit Experiment Design skill builds cannibalisation checks into the readout — holdout comparisons, payback-window modelling — so that lifts that would wash out at program level are flagged instead of shipped. For the general A/B testing discipline that sits underneath all of this, the A/B testing guide covers sample size, power, and novelty effect.
The measurement most teams get wrong
The question that matters is not "did this variant convert better" but "did this variant produce more revenue than the control, net of the discount, measured over a period that captures the downstream behaviour of the users who converted". Three pieces, each usually missing.
Net of the discount. A 20% discount that lifts conversion by 15% usually loses money at a unit level. Almost every conversion-lift number looks different once you subtract the margin given up to produce it.
Over an appropriate period.A 7-day conversion window on a price test misses retention and repurchase effects. Users who converted on a steep discount often retain worse than users who converted at full price — they bought the price, not the product. Measure the relevant window (usually 30–90 days) or accept you're optimising for an intermediate metric, not revenue.
Against the right counterfactual.A proper price-test measurement uses a holdout — a random slice of the eligible audience that gets no offer at all. Without a holdout, the test measures "variant vs control offer", which is a weaker question than "variant vs no offer". Many discount campaigns beat their control variant while underperforming the holdout.
Copy-level tests are safer than offer-level tests
There's a specific subset of price-adjacent testing that email handles well: copy-level framing around a fixed underlying offer. "Save $20" vs "20% off" at the same effective discount. "Limited time — ends Sunday" vs no deadline. "Your exclusive offer" vs generic framing. These are legitimate email tests because the underlying economics are identical — only the framing changes.
The significance calculatorwill tell you if the framing-level lift is statistically real. Copy tests generally don't carry the measurement traps of true price tests, so a winning framing typically transfers cleanly. This is where email experimentation effort is usually well-spent.
Offer-level tests (changing the underlying discount, changing product mix, changing price tiers) can still be run through email, but treat the email result as the first piece of evidence — not the final answer. Pair it with holdout data, a 30+ day retention window, and explicit margin accounting before declaring a winner.
When not to run the test at all
Two common failure modes where the honest answer is "don't test this through email".
Sample size is too low for the effect you're trying to detect. If your audience per variant is 5,000 users and the effect you actually care about is a 3% lift, the test mathematically cannot answer the question. You'll get a number; it will not be signal. A decision based on it is a coin flip dressed up in significance language.
The test's winning condition would damage the program. An aggressive discount variant that wins in email will train your audience to wait for discounts. That training effect is a real cost that shows up months later as suppressed full-price conversion. Some tests are worth running only if you're prepared to either ship the winner everywhere or accept the training cost. If neither is acceptable, don't run the test.
The Retention Economics skill covers how to model the downstream costs of discount-trained behaviour so they enter the decision explicitly rather than as a surprise six months later.
Frequently asked questions
- Can I use email to test a new price?
- For framing questions (how to present a fixed price) yes. For the underlying price itself, email is a weak test environment — audience selection bias, short measurement windows, and novelty effects all push toward false positives. Use email as one data point, not the final answer.
- What's the single most common email price-test mistake?
- Measuring conversion without measuring revenue net of discount and without a no-offer holdout. A discount variant that lifts conversion 15% can still lose money at the unit level and can still underperform a no-offer holdout. All three numbers matter.
- Do I need a holdout in an email discount test?
- Yes, if you want the test to tell you anything more than 'discount A vs discount B'. A random slice of the audience that gets no offer at all is what tells you whether the offer produced net lift versus just shifting when users would have converted anyway.
- How long should a price test run?
- The conversion window is shorter than the full measurement window. A 7-day conversion window is fine for deciding the test; a 30–90 day window is what you need to see retention and repurchase effects. Teams usually decide on the 7-day number and miss the 90-day reversal.
- Can framing tests be measured like regular A/B tests?
- Usually yes, because the underlying economics are constant — only the copy changes. Framing tests don't carry the same novelty, selection-bias, and cannibalisation risks as offer-level tests. Use the significance calculator as normal and ship the winner.
- What's the risk of running a successful discount test?
- Training your audience to wait for discounts. Every successful aggressive-discount campaign makes the next full-price campaign slightly weaker. The cost shows up in full-price conversion rates weeks or months later, which makes it hard to attribute back. Model this explicitly before running heavy-discount tests repeatedly.
This guide is backed by an Orbit skill
Related guides
Retention economics: proving lifecycle ROI to finance
Lifecycle programs get deprioritised when they can't defend their impact in dollars. This guide covers the four models every lifecycle leader should know — LTV, payback, cohort retention, and incrementality — and how to present them to a CFO.
Personalisation that doesn't feel creepy
There's a line between personalisation that earns a user's trust and personalisation that breaks it. This guide is about where the line actually is, how lifecycle programs cross it without noticing, and the specific patterns that keep you on the right side.
A/B testing in email: sample size, novelty, and what to report
Most email A/B tests produce winners that don't reproduce. This guide covers the three reasons — under-powered samples, the novelty effect, and weak readout discipline — and how to design tests that actually drive decisions.