These are my notes from the second edition of “Reinforcement Learning: An Introduction” by Richard S. Sutton and Andrew G. Barto.
Chapter 1: Introduction
Chapter 2: Multi-armed Bandits
Chapter 3: Finite Markov Decision Process
Chapter 4: Dynamic Programming
Chapter 5: Monte Carlo Methods
Chapter 6: Temporal-Difference Learning