Online Learning and Bandits

Multi-armed bandits, contextual bandits, and best-arm identification.


frontier tier

Online Learning and Bandits. Multi-armed bandits, contextual bandits, and best-arm identification.

Foundations and canonical references

The standard treatments of online learning and bandits approach the subject from complementary angles. Cesa, Prediction, Learning, and Games (2006) is the anchor reference for the subject and lays out the core definitions, theorems, and worked examples that practitioners return to. Hazan, Introduction to Online Convex Optimization (2016) gives a parallel, more proof-oriented exposition of the same material and is widely used as a graduate text.

Open methodological questions for online learning and bandits include sharpening the bridges between foundational theory and computational practice, extending classical results to broader or more structured settings, and integrating the techniques surveyed above with adjacent mathematical disciplines. The references listed in this page are the entry points that current work builds on.

Prerequisites

Sources

  • textbook · primary · 2006
    Prediction, Learning, and Games
    cesa-bianchi-2006, lugosi-2006
  • textbook · primary · 2016
    Introduction to Online Convex Optimization
    hazan-2016

In context

Where this topic sits in the prerequisite graph. Click any node to jump.

Open in full atlas →


Review this topic

This page was drafted by an agent and is waiting on expert review. Spotted a wrong prerequisite, a missing concept, a misattributed source, or a factual slip? Tell us — your review opens a tracked issue maintainers act on.