Reinforcement Learning
Learning by trial and error from reward signals.
Reinforcement Learning addresses learning by trial and error from reward signals. It sits within Machine Learning and inherits that area’s core questions about correctness, scale, and tractability. This page surveys the conceptual axes of the topic and points to the references that frame ongoing research and teaching. The intent is to be useful both as an entry point for newcomers and as an index for practitioners cross-checking their mental model against the field’s primary sources.
Work on reinforcement learning can be organised around a few interlocking concerns: the formal objects under study, the algorithms or systems that compute over them, the resource trade-offs (time, memory, communication, statistical efficiency), and the empirical or theoretical guarantees that practitioners rely on. The sources cited below approach the topic from a mix of these angles.
Foundational references
Sutton, Reinforcement Learning: An Introduction (2018) is a standard reference for this material and is used both as a curriculum anchor and as a long-form survey of techniques.
Supporting and complementary work
Szepesvári, Algorithms for Reinforcement Learning (2010) provides supporting material that complements the primary references — readers comparing approaches will find useful framings, alternative notations, or extensions there.
Open methodological questions in reinforcement learning cluster around how to compose the techniques above under realistic constraints — scale, adversarial inputs, partial observability, and shifting workloads. The cited references give the precise statements, proofs, and empirical evaluations that this overview only sketches; downstream topic pages drill into specific subfields.
Prerequisites
Sources
-
- textbook · supporting · 2010Algorithms for Reinforcement Learningszepesvari-2010
In context
Where this topic sits in the prerequisite graph. Click any node to jump.
Reviewed by
Explore
- 01
Human-in-the-Loop Reinforcement Learning
Reinforcement learning algorithms that treat human feedback — preferences, demonstrations, interventions, language — as a first-class learning signal.
- 02
Markov Decision Processes
MDPs, Bellman equations, and policies.
- 03
Sim-to-Real Reinforcement Learning
Training control policies in simulation that transfer reliably to physical hardware — domain randomization, asymmetric actor-critic, and implicit system identification.
- 04
Value-Based Methods
Q-learning, DQN, and value-function approximation.
- 05
Multi-Agent Reinforcement Learning
Reinforcement learning in systems with many simultaneously-learning agents — joint-policy optimization, non-stationarity, and learned communication.
- 06
Policy Gradient Methods
REINFORCE, PPO, TRPO, and actor-critic methods.
- 07
Model-Based RL
Learning dynamics models and planning with them.
- 08
Offline Reinforcement Learning
Learning from logged data without environment interaction.
- 09
Exploration in RL
Curiosity, intrinsic motivation, and exploration bonuses.
- 10
Hierarchical RL
Options, sub-policies, and temporal abstraction.
- 11
Inverse Reinforcement Learning
Recovering reward functions from demonstrations.
- 12
Imitation Learning
Behavioral cloning and learning from demonstrations.
- 13
RL from Human Feedback
RLHF: preference modeling and reward-model-based fine-tuning.
- 14
Distributional RL
Learning the full return distribution rather than its mean.
- 15
Safe Reinforcement Learning
Constrained MDPs and safety guarantees in RL.
Review this topic
This page was drafted by an agent and is waiting on expert review. Spotted a wrong prerequisite, a missing concept, a misattributed source, or a factual slip? Tell us — your review opens a tracked issue maintainers act on.