Updated · 9 min read
AI personalisation at scale: the architecture that actually works
Braze ships BrazeAI (the rebrand of what was Sage AI). Iterable ships AI Optimization. Klaviyo ships AI everything. Salesforce ships Einstein. The pitch is identical: turn on the AI, watch the numbers go up. The reality is that AI personalisation only produces lift when the architecture underneath it is healthy — clean events, real-time profiles, modular content, and a measurement layer that catches when the model is wrong. This guide is the architecture brief, not the vendor brochure.
By Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
What AI personalisation actually is (and isn't)
AI personalisation is a stack, not a feature. The model is the visible 10%. The data, content, and activation layers underneath are the 90% that decides whether the model has anything useful to say.
The category collapses three different capabilities under one marketing label:
Predictive personalisation — models that score users (churn risk, propensity to buy, predicted LTV) and route the message accordingly. Braze Predictive Suite, Iterable Brain, Klaviyo Predictive Analytics, Salesforce Einstein. Output: a number per user that drives a decision.
Generative personalisation — models that produce content (subject lines, body copy, recommendations) per user or per cohort. BrazeAI (formerly Sage AI), Iterable Copy Assist, Klaviyo Subject Line generator. Output: the copy the user actually reads.
Optimisation personalisation — models that pick from a candidate set: best send time, best variant, best frequency. STO, multi-armed bandits, next-best-action. Output: a decision among options the marketer pre-defined.
Different stacks, different failure modes, different ROI profiles. A program that says "we're using AI personalisation" without specifying which of the three is using marketing language to obscure an architectural question. The first job in any AI personalisation effort is naming which capability is being deployed and why.
The four-layer stack
Every AI personalisation deployment that produces measurable lift sits on top of four layers. Skip one and the model on top is decoration.
Layer 1 — Event capture. Real-time, well-named, complete events flowing from product to ESP. Without this the model trains on a partial picture of user behaviour and fires recommendations on stale signal. The most common cause of "our AI doesn't work" is that events lag by hours, fire inconsistently across platforms, or use names like click_button_2 that mean nothing to a model.
Layer 2 — Unified user profile. A single, real-time view of the user combining behavioural events, custom attributes, subscription state, and product catalog interactions. Braze ships this natively; Iterable, Klaviyo, and HubSpot ship variations. If your profile is fragmented across systems and the AI only sees one slice, expect performance bounded by what that slice contains.
Layer 3 — Modular content. Content Blocks, dynamic blocks, catalog items — anything the model can swap into a message without a human re-authoring the email. A program with one master HTML per send and no modular structure has nowhere for AI personalisation to write its output. The model produces a recommendation; there's no slot to drop it into. This is where most programs stall.
Layer 4 — Activation logic. Liquid, Connected Content, Catalogs, segment filters — the runtime that picks which version of which block goes to which user. Braze does this through Connected Content + Catalogs + Liquid; Iterable through Catalog Lookup; Klaviyo through dynamic content blocks. The model's output is meaningless without an activation layer that can route it.
Braze-specific: how BrazeAI, Predictive Suite, Connected Content, and Catalogs fit
Braze packages the four layers into named features. Useful to map them, since most operators inherit a Braze instance with some of these turned on without anyone documenting why.
BrazeAI. The generative layer — subject lines, copy variations, image suggestions inside the Braze composer. Rebranded from Sage AI; same feature set, new umbrella name. Operates on the message in front of you, not the broader program. Useful for accelerating production, less useful for personalisation in the strict sense (it generates copy a marketer reviews, not per-user output that varies in real time).
Predictive Suite. Predictive layer — churn risk, conversion propensity, LTV prediction. Outputs a score per user that becomes filterable. Sits in segment logic. Quality varies dramatically by data volume — a program with 10K monthly actives and limited event history gets a worse model than one with 10M and three years of clean events. The model is the same; the input is the bottleneck.
Connected Content. Activation layer — pulls real-time data from external APIs into the message at send time. The bridge between AI output produced elsewhere (your warehouse, an internal recommendation service, a Vertex AI endpoint) and the rendered message. The most underused Braze feature in most programs and the most powerful when you outgrow Predictive Suite's built-ins.
Catalogs. Content layer — the product, content, or asset library that AI personalisation references. Without a properly structured Catalog, recommendations are limited to what fits in custom attributes. Catalog quality (taxonomy, completeness, freshness) directly bounds recommendation quality.
Liquid. Activation layer (the runtime). Every dynamic decision in a Braze message resolves through Liquid. The Liquid reference covers the syntax. The activation layer is where bad architecture becomes visible — a Liquid expression that breaks ships {{ ${first_name} }} to 50,000 people regardless of how clever the AI was upstream.
The same architecture maps onto Iterable (Brain + Catalogs + Catalog Lookup), Klaviyo (Predictive Analytics + dynamic blocks + product feeds), HubSpot (Smart Content + Personalization Tokens), and Salesforce Marketing Cloud (Einstein + Personalization Builder). Different feature names, identical layers.
The three architectural choices that decide whether AI personalisation lifts revenue
Build vs buy the model. Most ESP-native AI is good enough for activation, churn risk, and basic recommendations at small-to-mid scale. It stops being good enough when the use case is specific to your domain — a fashion marketplace needs style-similarity recommendations the ESP doesn't ship, a B2B SaaS needs intent scoring tied to product usage patterns, etc. The build vs buy decision should be made on use case specificity, not on which option sounds more sophisticated. The AI Personalisation skill covers the decision framework.
Real-time vs batch activation. Real-time scoring (Connected Content calling a recommendation API at send) opens the door to fresh signal but adds latency and a runtime dependency. Batch activation (recommendations precomputed and synced as user attributes nightly) is more reliable but always 24 hours stale. Most programs over-rotate to real-time when batch would do, then under-invest in the resulting latency monitoring. Pick based on how fresh the signal needs to be. A product recommendation can be 24 hours stale; a churn-save trigger probably can't.
How much human review the model output gets. Generative AI output that ships unreviewed will eventually produce something off-brand, factually wrong, or quietly offensive. Generative AI output that gets human-reviewed before every send is too slow to scale. The middle path: the model produces variants, a human approves the first batch per template, then the system rotates approved variants automatically. Klaviyo and Iterable have built variants of this pattern; BrazeAI is more interactive (per-message generation in the composer) and works better in a pre-send review flow.
What an honest rollout looks like
The temptation with AI personalisation is to flip every switch on day one and see what happens. The pattern that works is sequential — one capability, one program, one measurement window — and only expands once the previous expansion has produced a measurable, holdout-validated lift.
A defensible rollout sequence:
1. Pick one program where the data is clean and the activation layer already exists. Usually onboarding, abandoned cart, or post-purchase. Programs with messy data or no modular content are fixable but should be fixed before AI gets layered on.
2. Pick one capability — predictive scoring on a single segment, or AI subject lines on a single template, or a single product recommendation slot. Not five at once. The goal is to learn whether THIS lever moves THIS metric for THIS audience.
3. Run for at least 30 days against a holdout (10–20% of audience receives the non-AI version). The holdout group guide covers the design. Anything shorter is noise; anything without a holdout is vendor marketing.
4. Read out against the metric the program exists for — typically downstream conversion or revenue, not opens or clicks. Apple MPP makes opens unreliable; AI personalisation often inflates opens without moving revenue. The measurement guide covers the honest readout.
5. If the holdout test produces real lift, expand to a second program or capability. If it doesn't, don't expand — diagnose why, usually back at the data or activation layer.
The teams that get the most out of AI personalisation treat it as a series of small bets validated individually, not a platform decision validated by the vendor. The teams that get the least flip everything on, watch dashboards trend up due to MPP and seasonality, and write a case study the data won't survive an audit of.
Frequently asked questions
- Do I need a CDP before deploying AI personalisation?
- Not necessarily. A unified user profile is the requirement; a CDP is one way to build it but not the only way. Braze, Iterable, and Klaviyo can serve as the unified profile layer for many lifecycle use cases without a separate CDP. The decision turns on whether the personalisation needs to span multiple downstream tools (web, app, ads, support) — if yes, a CDP starts paying for itself. If the AI personalisation lives entirely inside the ESP, a CDP is optional infrastructure.
- How much data do I need for predictive models to work?
- ESP-built predictive models typically need at least 10,000 users with 90+ days of behaviour to produce stable scores. Below that, the model falls back to category averages or refuses to score. For custom-built models on specific use cases (churn, propensity, recommendations), 50,000+ users with rich event histories is a more realistic floor. Programs below those thresholds can still benefit from optimisation features (STO, multi-armed bandit testing) which require less data per user.
- Should I use BrazeAI for subject line generation?
- BrazeAI (formerly Sage AI) works best as an acceleration tool for marketers who already know their voice — it produces variants you select and refine, not finished copy you ship blind. The lift comes from generating five candidates in 30 seconds instead of 10 minutes, which lets you A/B test more often. The pattern that fails: shipping BrazeAI output unreviewed. The brand voice drift accumulates and the program ends up sounding like every other program using the same model.
- How do I know if AI personalisation is actually working?
- A holdout group that receives the non-AI version of the same program. Compare downstream conversion or revenue (not opens, which are corrupted by MPP) over 30+ days. If the AI version does not produce a statistically meaningful lift against the holdout, the AI is not earning its place in the program. Vendor case studies are not validation; they're marketing.
- What's the biggest mistake teams make with AI personalisation?
- Treating it as a feature flip rather than an architectural commitment. Turning on Predictive Suite or BrazeAI without first auditing event quality, profile unification, content modularity, and activation logic produces a model that runs on broken inputs. The output looks like personalisation; the lift never materialises; the team blames the model. The model wasn't the problem.
Read to the end
Scroll to the bottom of the guide — we'll tick it on your reading path automatically.
This guide is backed by an Orbit skill
Related guides
Browse allPredictive models in lifecycle: churn, propensity, and recommendations without the magic
Predictive models in lifecycle are mostly three things: churn risk, conversion propensity, and product recommendations. Each one earns or loses its place based on how its score actually changes a decision. Here's the operator view of what's worth deploying, what to expect from ESP-native suites, and when to build your own.
Segmentation strategy: beyond RFM
RFM is the floor of audience segmentation, not the ceiling. Every program that stops there ends up describing what users already did without ever predicting what they'll do next. Here's the segmentation stack that actually drives lifecycle decisions — and how to build it in Braze without ending up with 400 segments nobody understands.
Lifecycle marketing for flat products
The standard lifecycle playbook assumes weekly engagement and neat stage progression. Most real products aren't shaped like that. This is how to design lifecycle for products used once a year, once a quarter, or whenever the user happens to need you — where the textbook quietly makes things worse.
Generative AI for lifecycle content: where it earns its place and where it embarrasses you
Generative AI inside lifecycle ESPs has moved from novelty to default in 18 months. BrazeAI (formerly Sage AI), Iterable Copy Assist, Klaviyo's subject line generator — they all promise per-message copy at scale. Some uses are genuinely useful. Others are a fast path to brand drift, factual errors, and reputational damage. Here's the line.
What is lifecycle marketing? A field guide for operators starting from zero
If you're new to CRM and lifecycle, the field reads like a pile of acronyms and vendor demos. It's actually one simple idea executed across five canonical programs. Here's the frame that makes the rest of the library make sense.
Retention economics: proving lifecycle ROI to finance
Lifecycle programs get deprioritised when they can't defend their impact in dollars. The four models that keep the budget — LTV, payback, cohort retention, incrementality — and the four-slide pattern that wins a CFO room.
Found this useful? Share it with your team.
Use this in Claude
Run this methodology inside your Claude sessions.
Orbit turns every guide on this site into an executable Claude skill — 62 lifecycle methodologies, 84MCP tools, native Braze integration. Pay what it's worth.