Updated · 8 min read
Inbox placement testing: seed lists, their limits, and what to do instead
Seed-list inbox placement reports show a clean number — '82% inbox, 18% spam' — and programs treat that as a deliverability rate. It isn't. Seed lists measure what happens when you send to roughly 100 test addresses at major ISPs, which is a sample of convenience, not a representative sample of your real audience. The numbers are directionally useful and literally wrong. This is how to use them without getting misled.
By Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
What seed-list tools actually measure
Validity Everest, Litmus Email Guardian, GlockApps, MailGenius. All operate seed lists — a set of test email addresses at major ISPs (Gmail, Outlook, Yahoo, Comcast, and so on). Add those addresses to your send list. After the send, the tool checks where each test message landed — inbox, spam, Promotions tab, or missing entirely — and reports the aggregate.
The output looks authoritative: "Gmail: 78% inbox, 12% Promotions, 10% spam". Sum across ISPs and you get an overall inbox placement rate that is then often quoted to executives as "our deliverability".
The seed list is 100–200 test addresses. Your real audience is millions of real people with years of personal engagement history. A seed list can tell you if there's a gross delivery problem. It cannot tell you what individual users see.
Why the numbers are imperfect
Seed-list accounts have no engagement history. Personal Gmail filtering is heavily influenced by how the user has interacted with mail from that sender in the past. Seed accounts don't have that history, which means they reflect what "a brand-new user with no relationship" sees — not what your engaged subscribers actually experience.
ISP-level variation within seed lists. "Gmail" in seed results is a small handful of addresses at gmail.com. Real Gmail filtering varies by user, by recent behaviour, by location. Three addresses won't capture any of that.
Sample size matters. A seed list of 50 addresses has a margin of error of roughly ±14% on a reported 80% placement rate. Which means "78% inbox this week, 82% next week" is within noise. Not a real change. Definitely not a metric worth changing the program over.
Tests can be gamed, intentionally or otherwise. Sending only to seed addresses from a clean IP with clean content produces better numbers than your production send. If the test isn't inside a normal campaign send, you're measuring best-case placement, not real placement — which is useful only if you're looking for a reassuring number rather than a true one.
So is the reported inbox placement rate accurate? Directionally yes, literally no. The trend matters more than the absolute figure. Real users with engagement history generally get better placement than seed accounts with none; the seed result is closer to a lower bound than a precise measurement.
What seed-list reports are actually useful for
Spotting major regressions. Normal placement sitting at 80%+ and suddenly dropping to 40%. Something broke — authentication, reputation, content. The seed list catches this kind of gross problem faster than real engagement data does, because engagement data lags.
Provider-specific signals. Seed shows 85% inbox at Gmail and 30% inbox at Yahoo. You have a Yahoo-specific problem. Most seed tools break placement out by ISP, which is useful even when the absolute numbers are imprecise — the gap between providers is more informative than either provider's number in isolation.
Relative comparison between test conditions. Two variants of a campaign, both sent through the seed list. Variant A shows 80% placement, variant B shows 60%. Variant A is better, with the usual noise caveat.
Pre-send sanity check. Before a big campaign, a seed test catches obvious problems before you hit the full list. Worth it even when the absolute number doesn't mean what people think it means.
,
The better signal stack
Seed-list is one of several deliverability signals. Here's the ranking that actually reflects how much each one tells you:
1. Actual engagement data. Open rate, click rate, and revenue per send across your real audience. Real users reading real mail. Noisy on opens because of Apple MPP, but click and revenue are solid. When these are healthy, deliverability is working regardless of what the seed list says.
2. Google Postmaster domain reputation. Gmail's own view of your domain. Four-tier rating, directly from the source. Much more trustworthy than any third-party measurement for Gmail specifically. See the Postmaster walkthrough.
3. Spam complaint rate. From your ESP's feedback-loop data. Leading indicator for future placement problems; if this is climbing, the seed number will follow in a few weeks.
4. Seed-list inbox placement. Useful as a secondary confirmation signal and for provider-specific diagnosis. Not the number you lead with.
What to do when seed-list says 60% inbox but engagement metrics are fine? Trust the engagement metrics. They're measuring what your real audience actually experiences. A seed-list 60% on a program with healthy engagement usually means the seed lists skew toward a harsher filtering environment than your real audience's average. Monitor. Don't panic.
When to actually invest in a seed-list tool
Seed-list tools are useful but not essential. The right call depends on program size and deliverability stakes:
Under 500K monthly sends: probably skip. Free tools (Postmaster Tools, Mail Tester) plus real engagement metrics do the job. The seed-list cost of $200–$1000/month isn't justified at this scale.
500K–10M monthly: valuable for provider-specific diagnosis and pre-send checks. A mid-tier Validity or GlockApps subscription is reasonable here.
10M+ monthly: probably essential. At this scale, small placement improvements translate into meaningful revenue, and real-time monitoring earns the premium tier.
Which tool is best? Validity Everest and GlockApps are the market leaders. Litmus has a smaller but solid offering. For most programs, they're roughly equivalent in capability. Choose on pricing, UI, and how well each one plays with the rest of your stack. Running two in parallel doesn't meaningfully increase signal — it just doubles the cost and gives you two numbers to argue about.
Before a big campaign, the three-step version: send to your seed list plus a small internal list, check rendering and placement; send to a 10% sample, wait four hours, check engagement for anomalies; release the remaining 90% once the sample looks clean. Catches most production issues before full-scale send, and the seed number is one input of several rather than the whole story.
The Deliverability Management skill uses seed-list as one of several inputs, never in isolation. The most informative view combines real engagement data, Google Postmaster Tools, and seed-list results — each catches failure modes the others miss, and none of them is reliable enough to stand alone.
This guide is backed by an Orbit skill
Related guides
Browse allList hygiene: the six-rule policy
List hygiene isn't cleanup; it's a continuous policy that runs automatically. Here's the six-rule policy every lifecycle program should have written down, each tied to a specific deliverability outcome.
Google Postmaster Tools: a walkthrough for people who actually send email
Postmaster Tools is the single most valuable free deliverability tool and most programs either ignore it or misread the charts. Here's what each tab actually says, what to act on, and what to stop looking at.
The deliverability mental model: one picture for authentication, reputation, content, and monitoring
Every deliverability guide covers one piece — SPF, DKIM, DMARC, BIMI, reputation, warmup. What's missing is the systems-level picture that ties them together. This is the one diagram a senior operator needs: how mailbox providers decide whether your email reaches the inbox, and where each piece of the stack plugs in.
Email deliverability — the practitioner's guide
Deliverability isn't a setting. It's the running total of every send decision you've made since you bought the domain. Four pillars hold it up. Break one and the whole program starts leaking.
IP warm-up in Braze — the playbook that actually holds
A fresh dedicated IP has zero reputation on day one. Most warm-up guides fixate on ramp speed and ignore the harder question — which users get the send each day. Here's the schedule, the Random Bucket Number trick, and the day-10 mistake that ruins most of them.
Apple Mail Privacy Protection, four years in
Apple broke the open rate in 2021. Half the lifecycle industry is still pretending it didn't happen. Four years on, the programs that actually adapted are beating the ones that kept optimising a metric that doesn't exist anymore.
Found this useful? Share it with your team.
Use this in Claude
Run this methodology inside your Claude sessions.
Orbit turns every guide on this site into an executable Claude skill — 54 lifecycle methodologies, 55 MCP tools, native Braze integration. Pay what it's worth.