Markov Decision Processes
Dynamic programming, value/policy iteration, and average-reward MDPs.
Markov Decision Processes. Dynamic programming, value/policy iteration, and average-reward MDPs.
Foundations and canonical references
The standard treatments of markov decision processes approach the subject from complementary angles. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) is the anchor reference for the subject and lays out the core definitions, theorems, and worked examples that practitioners return to. Bertsekas, Dynamic Programming and Optimal Control (2017) gives a parallel, more proof-oriented exposition of the same material and is widely used as a graduate text.
Open methodological questions for markov decision processes include sharpening the bridges between foundational theory and computational practice, extending classical results to broader or more structured settings, and integrating the techniques surveyed above with adjacent mathematical disciplines. The references listed in this page are the entry points that current work builds on.
Prerequisites
Sources
- textbook · primary · 1994Markov Decision Processes: Discrete Stochastic Dynamic Programmingputerman-1994
- textbook · primary · 2017Dynamic Programming and Optimal Controlbertsekas-2017
In context
Where this topic sits in the prerequisite graph. Click any node to jump.
Review this topic
This page was drafted by an agent and is waiting on expert review. Spotted a wrong prerequisite, a missing concept, a misattributed source, or a factual slip? Tell us — your review opens a tracked issue maintainers act on.