Reinforcement Learning and Stochastic Optimization

Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions is a new book (building off my 2011 book on approximate dynamic programming) that offers a unified framework for all the communities working in the area of decisions under uncertainty (see jungle.princeton.edu).

Below I will summarize my progress as I do final edits on chapters. If you would like to be on the push list, please click here to add your name.

Click here for a copy of the book – Updated Jan 20, 2020

Please enter comments here.

As the book nears completion, I am making it available prior to being submitted to the publisher (tentative date Fall, 2020) for comments.

There is an online companion book, Sequential Decision Analytics and Modeling which uses a teach-by-example style (I used this for an undergraduate course). There are a series of python modules that accompany the book. The associated problem sets will be made available soon.

Right now, I am not working on exercises or references.

Chapter updates:

Chapter 1 – Introduction – Updated Jan 13, 2020.

Chapter 2 – Canonical models and applications – Updated Jan 14, 2020.

Chapter 3 – Online learning – Updated Jan 15, 2020.

Chapter 4 – Introduction to stochastic search – Updated Jan 15, 2020.

Chapter 5 – Derivative-based stochastic optimization – Updated Jan 16, 2020.

Chapter 6 – Stepsize policies – Revised from ADP book – Updated Jan 18, 2020.

Chapter 7 – Derivative-free stochastic optimization – Updated Jan 20, 2020.

Chapter 8 – State-dependent problems – Editing now.

Chapter 9 – Modeling sequential decision problems – Draft – Revised from ADP book.

Chapter 10 – Uncertainty modeling – Needs work – Please comment.

Chapter 11 – Designing policies – Draft

Chapter 12 – Policy function approximations and policy search – Draft

Chapter 13 – Cost function approximations – Draft

Chapter 14 – Discrete Markov decision processes – Draft – From ADP book

Chapter 15 – Dynamic programs with special structure – not started (this may be dropped)

Chapter 16 – Backward approximate dynamic programming – Draft – New!

Chapter 17 – Forward ADP I: The value of a policy – Draft (from ADP book)

Chapter 18 – Forward ADP II: Policy optimization – Draft (from ADP book)

Chapter 19 – Forward ADP III: Convex functions – Draft (from ADP book)

Chapter 20 – Direct lookahead policies – Draft (new!)

Chapter 21 – Risk – not yet started

Chapter 22 – POMDPs, two-agent systems, and multiagent RL – not yet started.