· 9 min read
Holdout group design: the incrementality tool most lifecycle programs skip
A holdout group is a random sample of your audience that receives no lifecycle messaging for a measurement period. Comparing their revenue to the messaged audience tells you incremental lift: the revenue your program is actually producing vs the revenue that would have happened anyway. It's the most defensible measurement in lifecycle and the most under-used. Here's how to run one that holds up.
Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
Why attribution can't replace a holdout
Attribution tells you which touchpoint got credit. Incrementality tells you whether the touchpoint produced revenue. Different questions, different answers.
Attribution models (first-touch, last-touch, multi-touch) allocate credit across touchpoints that occurred before a conversion. They don't answer the causal question — would the conversion have happened without any of those touchpoints? For lifecycle, the answer is often yes for a meaningful share of revenue. Users who would have returned, renewed, or bought anyway appear in attribution reports as lifecycle wins.
A holdout strips that confusion. Random assignment means the holdout cohort is identical on average to the messaged cohort except for message exposure. The revenue delta is incremental by construction — no attribution model required.
Sizing the holdout
5–10% of the eligible audienceis the operator default. 10% gives statistical power faster; 5% loses less revenue if the program is a winner. For a program with tight monthly revenue targets, start at 5%. For a program where you're testing whether to keep running it at all, 10% gets to an answer faster.
The holdout needs to be large enough to detect the effect size you care about. Ballpark: to detect a 10% incremental lift at 95% confidence over a quarter, you typically need ~5,000 users in the holdout (varies by baseline conversion rate). Below that, the test is underpowered and you'll get a number that isn't reliably different from zero.
The Orbit Experiment Design skillhandles the power calculation given your specific baseline and expected lift. Skip it at the risk of running a test that can't answer the question.
Assignment rules that keep it clean
Three rules for assignment that matter more than most teams realise:
1. Assignment must be stable.A user assigned to the holdout today must stay in the holdout for the entire measurement period. If they flicker in and out, the measurement is contaminated. Use a persistent random integer (Braze's Random Bucket Number is built for this) rather than a recalculated random value.
2. Assignment must be random.Not "users who haven't engaged in the last 30 days" — that's a non-random cut. The holdout has to be statistically equivalent to the treatment group on every dimension except message exposure. Use the RBN or equivalent.
3. Assignment must be global.If you exempt specific programs from the holdout ("we won't hold out users from onboarding because it feels cruel"), the measurement is compromised. Either hold out or don't. Global holdout means every marketing send respects the same exclusion; transactional sends are exempt.
The three mistakes that invalidate the result
Mistake 1: holdout leakage.Users in the holdout occasionally receive mail because a broadcast ignored the holdout flag. Even 2% leakage invalidates a measurement — you're not measuring messaged vs unmessaged, you're measuring heavily-messaged vs lightly-messaged. Audit broadcasts monthly for holdout compliance.
Mistake 2: seasonal confounds.A holdout running only in November (Black Friday period) will show enormous incrementality because of the volume spike. The measurement doesn't generalise to the rest of the year. Run holdouts for full quarters or full years to average across seasonal effects.
Mistake 3: reading before statistical power.A two-week holdout result is almost never significant. Leadership asks for an update; teams produce a number anyway. The number then gets cited as the incrementality forever. Fix: don't publish interim reads. Publish once at the measurement period end, with the full analysis.
What to do with the result
A holdout produces a single most-important number: incremental revenue per user. Multiply by audience size to get total program contribution. Divide by program cost to get ROI. This is the number that goes in front of finance and replaces the attribution-model-based numbers that typically get questioned.
The retention economics guide covers how to frame the incrementality number in a CFO conversation. A defensible quarterly holdout study is usually more persuasive than six quarters of attribution spreadsheets — it answers the causal question, not just the correlational one.
Run one annually at minimum. Programs that run holdouts annually have budget conversations that go differently from programs that don't — the incrementality number is a piece of evidence the attribution-based programs can't produce.
Frequently asked questions
- What size should my holdout group be?
- 5–10% of the eligible audience is the operator default. 10% reaches statistical power faster; 5% is cheaper in lost revenue if the program is producing real lift. Below ~5,000 users in the holdout the test is usually underpowered.
- How long should I run a holdout?
- Full quarters or full years. Shorter windows get confounded by seasonal effects — a holdout during Black Friday will show enormous incrementality that doesn't generalise. Don't publish interim reads; they become cited incrementality forever.
- Should I use a Global Holdout or program-specific holdouts?
- Global. Program-specific holdouts produce measurements that each program can tell a story about, but they don't answer the portfolio question: is lifecycle marketing producing revenue for us? Global holdout answers that and replaces most of the attribution-model debate.
- How do I assign users to a holdout in Braze?
- Random Bucket Number (RBN) filters. A fixed slice (e.g., RBN < 500 = 5% holdout) is stable, random, and respected across every program. The IP warm-up guide covers RBN mechanics; the same attribute is perfect for holdouts.
- Can I exempt onboarding from the holdout?
- Technically yes; mathematically it compromises the measurement. A global holdout that exempts some programs is a program-specific holdout in disguise. If you want onboarding excluded for ethical reasons (new users need it to succeed), accept that the holdout measures 'everything except onboarding' and caveat accordingly.
- What do I do if the holdout shows no incremental lift?
- Don't panic, don't bury it. A holdout showing zero incremental lift is an honest result. Investigate: is the program targeting users who would have converted anyway? Is the offer too weak? Is the timing wrong? Zero lift is information. It's usually not accurate when the program is well-designed; when it is accurate, the program needs rethinking, not continuation.
This guide is backed by an Orbit skill
Related guides
Retention economics: proving lifecycle ROI to finance
Lifecycle programs get deprioritised when they can't defend their impact in dollars. This guide covers the four models every lifecycle leader should know — LTV, payback, cohort retention, and incrementality — and how to present them to a CFO.
A/B testing in email: sample size, novelty, and what to report
Most email A/B tests produce winners that don't reproduce. This guide covers the three reasons — under-powered samples, the novelty effect, and weak readout discipline — and how to design tests that actually drive decisions.
Price-testing through email: what's testable, what isn't
Email is often the first place teams try to price-test, and it's often where the wrong lesson gets learned. This guide covers what can genuinely be tested in email, what can't, and the measurement traps that make most email price tests unreliable.