What is Multi-armed bandit?

Question

Accepted Answer

Multi-armed bandits (MAB) are an alternative to fixed-allocation A/B testing where traffic is progressively reweighted toward the currently-winning variant. Thompson sampling is the most common flavour: each variant's conversion rate is modelled as a probability distribution, and each new user gets randomly assigned in proportion to each variant's probability of being the winner. Bandits are better for continuous optimisation where you care about total revenue during the test (ramp up the winner sooner = more revenue), and weaker for decision-making where you care about statistical certainty about which variant is best. Use bandits for high-volume ongoing optimisation (homepage hero rotation, send-time tuning); use traditional A/B for decision-quality launches (new product page, pricing change) where you need confidence in the final call.

Multi-armed bandit

Read next

See also