Should I use Braze Predictive Suite or build my own model?

Use Predictive Suite for generic churn and propensity at standard scale. Build your own when the use case is domain-specific (style similarity, intent scoring tied to product features), the activation surface spans multiple systems, or the audit / explainability requirements demand transparency. Most programs end up with a hybrid: ESP-native for the simple cases, custom models in the warehouse for the cases where domain specificity matters.

How long do predictive models take to deploy?

ESP-native: hours to a couple of weeks, depending on event hygiene. Custom warehouse-trained models: 2–6 months including data engineering, model build, validation, and activation. The unsexy truth: most of the timeline is data work, not model work. Programs that say 'we built it in two weeks' usually built it on top of a data layer someone else built before them.

Can predictive models work for small audiences?

Below 10K active users with at least 90 days of clean events, ESP-native predictive features fall back to weak defaults and don't earn their place. Below 50K, custom models tend to overfit. For small audiences, simple heuristic segmentation (recency, frequency, tier) usually outperforms the predictive layer. The right time to deploy is when audience scale and event richness justify the complexity.

What's the difference between propensity and intent scoring?

Propensity scores predict the probability of an action over a defined future window — 'probability of purchase in next 30 days'. Intent scoring is typically near-real-time, surfacing users showing high signal right now (currently browsing, currently engaging). Propensity is for medium-horizon segmentation; intent is for fast-trigger programs. They use overlapping data but answer different questions and feed different programs.

How often should predictive models be retrained?

ESP-native models retrain on a schedule the vendor controls — typically weekly. Custom models depend on the use case: ecommerce recommendations daily or weekly, churn models monthly, LTV models quarterly. The signal is drift — when the model's predictions stop matching observed outcomes, retrain. Build drift monitoring as part of the deployment, not as a future project.

Advanced

Updated 27 April 2026 · 9 min read

Predictive models in lifecycle: churn, propensity, and recommendations without the magic

Name: Orbit
Availability: InStock
Author: Justin Williames

Every predictive feature ships with a chart that goes up and to the right. The chart is true, in the sense that the model produces scores. The harder question is whether those scores change a decision the program would have got right anyway. This guide covers the three predictive use cases that actually move lifecycle outcomes — churn, propensity, recommendations — and the operator-level questions that decide whether each one earns its activation cost.

By Justin Williames

Founder, Orbit · 10+ years in lifecycle marketing

SharePost Post

The three use cases that actually pay for themselves

A predictive score earns its place when it changes a decision. A score that ranks users you would have segmented the same way without it is decoration — interesting to look at, expensive to maintain, and quietly trained on data the team should be using directly.

Churn / risk of inactivity. Output: probability the user will lapse in the next 30 / 60 / 90 days. Used to trigger save flows before lapse rather than win-back after. Real ROI depends on whether your save flow actually saves users — a churn model feeding a save flow that doesn't work is just an expensive way to identify the problem you can't solve. Test the save flow first; deploy the model only when the flow has proven incremental lift.

Conversion propensity. Output: probability the user will convert (purchase, upgrade, sign up) in a defined window. Used to suppress low-propensity users from heavy promotional pushes (saving deliverability and user goodwill) or to time high-intent triggered programs. Especially valuable for SaaS trial-to-paid programs and ecommerce cart abandonment where the audience is heterogeneous and the cost of a wasted send is real.

Product / content recommendations. Output: a ranked list of items per user. Used in onboarding, post-purchase, browse abandonment, and newsletter content selection. The use case where AI personalisation most clearly pays for itself when the catalog is large enough that hand-curation breaks down. Below ~200 SKUs / content items, hand-curation by lifecycle stage usually outperforms.

Other predictive features exist (predicted LTV, churn cause attribution, optimal discount level) but the three above carry most of the measurable revenue lift in lifecycle programs. Start there. Expand once each is proven in your context.

ESP-native predictive: what Braze, Iterable, and Klaviyo actually ship

Braze Predictive Suite. Out-of-the-box churn risk and conversion propensity scored against custom-defined events. Configurable target event (whatever your team defines as churn or conversion) and lookback window. Output is a per-user score available in segment filters. Strengths: zero build effort, integrates natively with Canvas. Weaknesses: limited transparency into the model, dependent on Braze having complete behavioural data on the user, scoring quality plateaus with audience size.

Iterable Brain. Send-time optimisation, frequency optimisation, channel selection. Less focused on score-based segmentation, more on per-message decisions. Best for programs running multi-channel and needing per-user channel selection.

Klaviyo Predictive Analytics. Predicted next-purchase date, predicted CLV, churn risk. Strong for ecommerce because the model is purpose-built for transactional data. Available as filterable user properties similar to Braze. Strength: tight integration with Klaviyo's ecommerce-native data model. Weakness: harder to extend to non-ecommerce use cases.

Salesforce Einstein. Most extensive feature set (engagement scoring, send-time, content selection, journey decisions) but requires the most setup, has the steepest learning curve, and the model quality is meaningfully tied to how complete and clean your SFMC data extensions are.

When ESP-native is enough

The realistic answer is most of the time, for most programs, for most use cases. ESP-native predictive features are good enough when:

The use case is generic — "score users by likelihood to churn in 30 days" rather than "score users by likelihood to churn for reason X given product feature usage pattern Y."

The audience is large enough — typically 50K+ active users with 6+ months of clean event data. Below that, ESP models fall back to weak defaults regardless of vendor.

The activation surface is the ESP itself — segment filters, Canvas branching, Liquid conditions. Pulling the score out of the ESP for use elsewhere is where ESP-native starts to feel constraining.

There's no internal ML team to maintain a custom model. The hidden cost of custom is not the build — it's the ongoing retraining, drift monitoring, and on-call when the recommendation API breaks at 9pm on a Friday.

When you need to build your own

Custom models earn their cost in three patterns:

Domain-specific signal. Style-similarity recommendations for a fashion marketplace. Intent scoring tied to specific product feature usage for a B2B SaaS. Personalised content rankings for a publishing brand where the "catalog" is articles. The signals that produce useful predictions are too domain-specific for a generic ESP model to learn from a generic event stream.

Multi-system activation. The score is needed in the ESP, in the website, in the app, in paid retargeting audiences, and in support tools. ESP-native scores can be exported but the architecture quickly tips toward "build the model in the warehouse, sync the score everywhere" when the activation surface is broad.

Audit and explainability. Regulated industries (finance, health, insurance) often need to explain why a specific user got a specific score. ESP-native black-box scores fail this test. Custom models with documented features and feature importance pass it.

The defensible build pattern: model trained in the warehouse (BigQuery, Snowflake, Databricks), output written nightly to user attributes via reverse-ETL (Census, Hightouch, native ESP integrations), and consumed via standard segment filters. The ESP doesn't need to know it's a predictive model — it just sees a user attribute it can filter on.

Real-time variations of the same pattern use Connected Content (Braze) or Catalog Lookup (Iterable) to call a model endpoint at send time. Adds latency and a runtime dependency, but keeps recommendations fresh. The architecture guide covers the trade-off.

The five operator questions before deploying any predictive feature

Before turning on a predictive model — built or bought — answer these. Most of the time at least one of the answers reveals that the model isn't the bottleneck.

1. What decision does the score change? If users in the top quintile get a different message, journey, or send than users in the bottom quintile, the score is doing work. If everyone gets the same treatment regardless of score, you're paying for a dashboard.

2. What does the score replace? The honest comparison isn't "model vs nothing" — it's "model vs the heuristic the team would use otherwise." A churn model often goes head-to-head with "active in last 30 days = yes/no." If the model can't beat that heuristic in a holdout test, it's expensive complexity.

3. How will it be measured? A predictive model needs a holdout. Some users get the predicted-action treatment; some get the previous heuristic-based treatment. Compare downstream metrics over 60+ days. Anything shorter is noise; anything without a holdout is faith.

4. What happens when it's wrong? Models drift. Data pipelines break. The score that was 0.92 last week is 0.31 this week not because the user changed but because an upstream event stopped firing. Build the monitoring before the deployment, not after the first incident.

5. Who owns it? Predictive models without a clear owner go stale within months. The owner is responsible for retraining cadence, drift monitoring, and the question "is this still earning its place?" Without an owner, the score becomes load-bearing infrastructure that nobody tests, updates, or trusts.

The AI Personalisation skill covers the rollout framework end-to-end, including the build vs buy decision matrix and the holdout design for each of the three primary use cases.

The patterns that disappoint

Predictive scores nobody filters on. A common pattern: turn on Predictive Suite, watch the scores populate, never actually use them in segment logic because the existing program is already segmenting on recency and tier and that's what the team trusts. The model trains, the dashboards trend, and no decision changes. Either commit to using the score in segmentation or don't turn it on.

Predictive features sold as content selection. "The AI picks the best subject line per user." Most ESP implementations are picking from a small candidate set the marketer pre-defined, weighted by historical engagement on similar users. Useful, but closer to a multi-armed bandit than per-user content generation. Set expectations accordingly.

Recommendations that override hand-curation. For small catalogs (200 SKUs or fewer), hand-curated recommendations by lifecycle stage usually outperform model-generated. Programs that switch to AI recommendations and lose curation often see a flat or negative impact. Recommendations are a tool for catalog scale, not a replacement for category strategy.

The pattern that doesn't disappoint: a predictive feature deployed to one program, validated against a holdout for at least 30 days against a non-vanity metric, expanded only once the lift is real. Slow, careful, and accumulates more revenue than the "flip every switch" rollout.

Frequently asked questions

Should I use Braze Predictive Suite or build my own model?: Use Predictive Suite for generic churn and propensity at standard scale. Build your own when the use case is domain-specific (style similarity, intent scoring tied to product features), the activation surface spans multiple systems, or the audit / explainability requirements demand transparency. Most programs end up with a hybrid: ESP-native for the simple cases, custom models in the warehouse for the cases where domain specificity matters.
How long do predictive models take to deploy?: ESP-native: hours to a couple of weeks, depending on event hygiene. Custom warehouse-trained models: 2–6 months including data engineering, model build, validation, and activation. The unsexy truth: most of the timeline is data work, not model work. Programs that say 'we built it in two weeks' usually built it on top of a data layer someone else built before them.
Can predictive models work for small audiences?: Below 10K active users with at least 90 days of clean events, ESP-native predictive features fall back to weak defaults and don't earn their place. Below 50K, custom models tend to overfit. For small audiences, simple heuristic segmentation (recency, frequency, tier) usually outperforms the predictive layer. The right time to deploy is when audience scale and event richness justify the complexity.
What's the difference between propensity and intent scoring?: Propensity scores predict the probability of an action over a defined future window — 'probability of purchase in next 30 days'. Intent scoring is typically near-real-time, surfacing users showing high signal right now (currently browsing, currently engaging). Propensity is for medium-horizon segmentation; intent is for fast-trigger programs. They use overlapping data but answer different questions and feed different programs.
How often should predictive models be retrained?: ESP-native models retrain on a schedule the vendor controls — typically weekly. Custom models depend on the use case: ecommerce recommendations daily or weekly, churn models monthly, LTV models quarterly. The signal is drift — when the model's predictions stop matching observed outcomes, retrain. Build drift monitoring as part of the deployment, not as a future project.

Read to the end

Scroll to the bottom of the guide — we'll tick it on your reading path automatically.

This guide is backed by an Orbit skill

Related guides

Browse all

Strategy9 min

AI personalisation at scale: the architecture that actually works

Every ESP now sells an AI personalisation layer. Most lifecycle programs deploy them and quietly find the lift smaller than the deck claimed. The reason isn't the model — it's the architecture underneath. Here's the data → model → activation stack that decides whether AI personalisation moves revenue or moves nothing.

Strategy10 min

Lifecycle marketing for flat products

The standard lifecycle playbook assumes weekly engagement and neat stage progression. Most real products aren't shaped like that. This is how to design lifecycle for products used once a year, once a quarter, or whenever the user happens to need you — where the textbook quietly makes things worse.

Craft8 min

Generative AI for lifecycle content: where it earns its place and where it embarrasses you

Generative AI inside lifecycle ESPs has moved from novelty to default in 18 months. BrazeAI (formerly Sage AI), Iterable Copy Assist, Klaviyo's subject line generator — they all promise per-message copy at scale. Some uses are genuinely useful. Others are a fast path to brand drift, factual errors, and reputational damage. Here's the line.

Strategy10 min

What is lifecycle marketing? A field guide for operators starting from zero

If you're new to CRM and lifecycle, the field reads like a pile of acronyms and vendor demos. It's actually one simple idea executed across five canonical programs. Here's the frame that makes the rest of the library make sense.

Strategy10 min

Segmentation strategy: beyond RFM

RFM is the floor of audience segmentation, not the ceiling. Every program that stops there ends up describing what users already did without ever predicting what they'll do next. Here's the segmentation stack that actually drives lifecycle decisions — and how to build it in Braze without ending up with 400 segments nobody understands.

Strategy10 min

Retention economics: proving lifecycle ROI to finance

Lifecycle programs get deprioritised when they can't defend their impact in dollars. The four models that keep the budget — LTV, payback, cohort retention, incrementality — and the four-slide pattern that wins a CFO room.

Found this useful? Share it with your team.

SharePost Post

Use this in Claude

Run this methodology inside your Claude sessions.

Orbit turns every guide on this site into an executable Claude skill — 62 lifecycle methodologies, 84MCP tools, native Braze integration. Pay what it's worth.

Get OrbitHow it works

Advanced

Updated 27 April 2026 · 9 min read

Predictive models in lifecycle: churn, propensity, and recommendations without the magic

By Justin Williames

Founder, Orbit · 10+ years in lifecycle marketing

SharePost Post

The three use cases that actually pay for themselves

A predictive score earns its place when it changes a decision. A score that ranks users you would have segmented the same way without it is decoration — interesting to look at, expensive to maintain, and quietly trained on data the team should be using directly.

ESP-native predictive: what Braze, Iterable, and Klaviyo actually ship

When ESP-native is enough

The realistic answer is most of the time, for most programs, for most use cases. ESP-native predictive features are good enough when:

The use case is generic — "score users by likelihood to churn in 30 days" rather than "score users by likelihood to churn for reason X given product feature usage pattern Y."

The audience is large enough — typically 50K+ active users with 6+ months of clean event data. Below that, ESP models fall back to weak defaults regardless of vendor.

The activation surface is the ESP itself — segment filters, Canvas branching, Liquid conditions. Pulling the score out of the ESP for use elsewhere is where ESP-native starts to feel constraining.

When you need to build your own

Custom models earn their cost in three patterns:

The five operator questions before deploying any predictive feature

Before turning on a predictive model — built or bought — answer these. Most of the time at least one of the answers reveals that the model isn't the bottleneck.

The AI Personalisation skill covers the rollout framework end-to-end, including the build vs buy decision matrix and the holdout design for each of the three primary use cases.

The patterns that disappoint

Frequently asked questions

Should I use Braze Predictive Suite or build my own model?: Use Predictive Suite for generic churn and propensity at standard scale. Build your own when the use case is domain-specific (style similarity, intent scoring tied to product features), the activation surface spans multiple systems, or the audit / explainability requirements demand transparency. Most programs end up with a hybrid: ESP-native for the simple cases, custom models in the warehouse for the cases where domain specificity matters.
How long do predictive models take to deploy?: ESP-native: hours to a couple of weeks, depending on event hygiene. Custom warehouse-trained models: 2–6 months including data engineering, model build, validation, and activation. The unsexy truth: most of the timeline is data work, not model work. Programs that say 'we built it in two weeks' usually built it on top of a data layer someone else built before them.
Can predictive models work for small audiences?: Below 10K active users with at least 90 days of clean events, ESP-native predictive features fall back to weak defaults and don't earn their place. Below 50K, custom models tend to overfit. For small audiences, simple heuristic segmentation (recency, frequency, tier) usually outperforms the predictive layer. The right time to deploy is when audience scale and event richness justify the complexity.
What's the difference between propensity and intent scoring?: Propensity scores predict the probability of an action over a defined future window — 'probability of purchase in next 30 days'. Intent scoring is typically near-real-time, surfacing users showing high signal right now (currently browsing, currently engaging). Propensity is for medium-horizon segmentation; intent is for fast-trigger programs. They use overlapping data but answer different questions and feed different programs.
How often should predictive models be retrained?: ESP-native models retrain on a schedule the vendor controls — typically weekly. Custom models depend on the use case: ecommerce recommendations daily or weekly, churn models monthly, LTV models quarterly. The signal is drift — when the model's predictions stop matching observed outcomes, retrain. Build drift monitoring as part of the deployment, not as a future project.

Read to the end

Scroll to the bottom of the guide — we'll tick it on your reading path automatically.

This guide is backed by an Orbit skill

Related guides

Browse all

Strategy9 min

Use this in Claude

Run this methodology inside your Claude sessions.

Orbit turns every guide on this site into an executable Claude skill — 62 lifecycle methodologies, 84MCP tools, native Braze integration. Pay what it's worth.

Get OrbitHow it works