Updated · 7 min read
Send-time optimisation: what it really moves, and what it doesn't
STO promises per-user delivery at the hour each person is most likely to engage. Every major ESP has shipped a version; every vendor deck shows a flattering number. Real-world lift is meaningfully smaller than what the decks claim, and the effect often doesn't land where programs want it to. Here's what STO actually does, when it helps, and when it's theatre.
By Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
What STO actually does
STO looks at a user's historical open and click behaviour, predicts the hour they're most likely to engage, and schedules the send for that hour (or the nearest window the sending infrastructure can hit). Different ESPs run different models — some very simple (most recent open hour), some time-of-day by day-of-week cohorts, a few ML-based with extra features. The marketing is always more sophisticated than the model.
STO changes the time the email lands. It doesn't change the email, the audience, or the offer. The ceiling is bounded entirely by how much the time of delivery affects engagement — and for most users, the honest answer is: not much.
Emails sit in the inbox. Users check at intervals. A 9am vs 2pm send is read at about the same time by someone who checks at lunch. STO earns its value on users with predictable, concentrated engagement windows. For everyone else, it's picking between two equivalent rooms in an empty hotel.
The measured effects
Vendor case studies show 20–40% open rate lift. Independent benchmarks (Mailchimp, Litmus, academic studies) land on 3–8% open rate, 1–4% click rate, and typically no significant revenue lift against a proper holdout. That's a five-to-ten-times gap between the pitch and the reality.
Three reasons for the gap:
Apple MPP inflation. STO treats machine opens as real opens. Apple devices open all mail immediately after delivery, so the algorithm "learns" that the user engages at the pre-fetch time — which is unrelated to real engagement. The vendor-reported lift is often this inflation compounding with itself.
Confounded comparisons. Many case studies compare STO sends to a control group at a different send time, not to a random-time holdout from the same population. That's selection bias in a PDF.
Small effect, noisy metric. Email opens are noisy. A 5% real lift sits well inside the natural variance of a single send.
,
When STO is worth turning on
Global audiences across time zones. Users in Sydney opening at 9am local and users in New York opening at 9am local require different send times. STO (or simple time-zone-aware sending) stops one group from receiving email at 3am. Always worth doing.
Broadcast campaigns with no time-sensitivity. Newsletters, content emails, non-promotional broadcasts that can spread over several hours without cost. STO delivers modest lift at no real downside.
Large, diverse audiences. Above 500K users, individual send-time differences aggregate into measurable effects and the algorithm has enough data per user to be non-random. STO scales with data availability.
When STO is worthless or harmful
Time-sensitive sends. A flash sale ending in four hours cannot wait for each user's preferred time. Send now. STO is the wrong lever.
Triggered sends. Welcome emails, order confirmations, password resets — the trigger is the user action. STO would delay these, which is the opposite of what the user wants. Running STO on a password reset is one of those decisions you can tell was made without anyone reading the spec.
New users with no history. STO has nothing to optimise on, so it defaults to the category average, which is roughly equivalent to picking a reasonable time manually. Many programs unthinkingly apply STO to welcome emails, producing worse performance than a fixed send.
Small audiences under 50K. Most users have too few data points for per-user optimisation to mean anything. STO falls back to category average. You've paid for a premium feature to get the default answer.
The alternative: simple time-zone-aware sending
For most programs, time-zone-aware sending captures roughly 80% of STO's value with none of the complexity. Hit every recipient at their local 10am (or whatever default the program uses). No per-user optimisation. Just respect for the time zone.
Easy to implement, supported natively in every modern ESP, no dependency on machine-open-inflated data, and the lift versus "everyone at 10am UTC" is usually in the same neighbourhood as what STO achieves against the same baseline. Most of the value of STO is actually time-zone handling wearing a fancy jacket.
The A/B testing playbook covers how to validate STO vs time-zone sending vs fixed-time sending for your specific audience.
What to do if your ESP pushes STO
Most ESPs charge for STO as a premium tier. The sales pitch uses vendor case studies with the methodology problems above. Before agreeing, do four things:
1. Ask for independent validation, not vendor cases.
2. Ask how they measure lift — against what control group?
3. Run the 30-day holdout test before deciding.
4. Compare measured lift to simple time-zone-aware sending (free in most ESPs).
The usual conclusion: time-zone-aware captures most of the lift; STO's marginal addition doesn't justify the premium. For some programs the premium is worth it. The discipline is measuring before deciding, not buying the feature because it sounds clever.
On testing send times in general: don't bother testing 9am vs 10am. Too similar. Test 10am vs 6pm, or weekday vs weekend. Once you've found the right window (morning vs evening, weekday vs weekend), fine-tuning within that window rarely moves anything real.
covers the holdout methodology for validating vendor-marketed features. STO is one of several where the claimed lift diverges meaningfully from the measured effect. There will be more — this is the industry we picked.
Read to the end
Scroll to the bottom of the guide — we'll tick it on your reading path automatically.
This guide is backed by an Orbit skill
Related guides
Browse allA/B testing in email: sample size, novelty, and what to report
Most email A/B tests produce winners that don't reproduce. Three reasons keep showing up: under-powered samples, the novelty effect, and weak readout discipline. This guide is about designing tests that actually drive decisions instead of theatre.
Price-testing through email: what's testable, what isn't
Email is the fastest place to try a new price, and the easiest place to learn the wrong lesson. What you can test cleanly, what you can't, and the measurement traps that quietly turn price tests into expensive false positives.
Sample size: the calculation everyone gets wrong in email A/B tests
Most email A/B tests are powered to detect effects far larger than the test could actually produce. The result: false positives and false nulls, with confident conclusions in both directions. Sample size calculation fixes this before you send. Takes 5 minutes. Here's the 5-minute version.
False positives in email A/B tests: why half of winning tests don't actually win
Run enough A/B tests and some will show 'significant' lift from pure noise. Programs that ship every significant winner end up with a collection of imaginary improvements they can't tell apart from real ones. Here's how to spot the fakes and avoid the trap.
Holdout group design: the incrementality tool most lifecycle programs skip
Without a holdout, lifecycle ROI is attribution-model guesswork with a spreadsheet. With one, you get a defensible number you can actually put in front of finance. Here's how to size, run, and read a holdout — and the three mistakes that quietly invalidate the result.
Incrementality testing: the measurement that tells you if a program actually works
Last-click attribution makes lifecycle look bigger than it is. Incrementality testing strips out users who would have converted anyway and surfaces the real number. This is how to design a test that produces a figure you can defend in front of a CFO.
Found this useful? Share it with your team.
Use this in Claude
Run this methodology inside your Claude sessions.
Orbit turns every guide on this site into an executable Claude skill — 62 lifecycle methodologies, 84MCP tools, native Braze integration. Pay what it's worth.