Warren B Powell

Professor Emeritus, Princeton University

Reinforcement Learning and Stochastic Optimization is the first book to unify the diverse communities that study sequential decision problems, which consist of the sequence: *decision, information, decision, information*. The “information” arrives after a decision is made, which means it is uncertain at the time the decision is made. Each decision is evaluated by a contribution (or reward, or cost, or …) to be maximized (or minimized). The challenge is to come up with a method for making decisions, where the method is called a *policy*.

There are established communities that work in fields such as deterministic optimization. The books to the right all use a common notation, and present similar material. The result is a mature community of students coming from hundreds of academic programs who can take these tools to industry.

There is a much larger community for statistics/ machine learning, represented by the books to the left. As with optimization, this family of books present similar material using an established set of methods that students who take a course in statistics (or machine learning) can be expected to master.

This is not true in the area of sequential decision problems, which is clearly a ubiquitous problem, shared across virtually any field (business, engineering, the sciences, economics, health, energy, finance, …). While deterministic optimization is a specialized field focusing on complex problems, sequential decisions arise in every dimension of any human activity.

Unlike optimization and machine learning, there is not a single book that deals with the broad field of sequential decision making. Each of the books to the right deal with particular classes of sequential decision problems. Each represents a different community, featuring roughly eight different notational systems, a variety of modeling frameworks, and an endless collection of algorithms for specialized problems. I call this the “jungle of stochastic optimization.” The books are advanced, and highly biased toward the most complex (and rarely used) methods for making decisions.

*Reinforcement Learning and Stochastic Optimization (RLSO)* is the first book to put the vast range of sequential decision problems into a single modeling framework. We have collected the diverse (and growing) set of solution approaches into an elegant framework consisting of four classes of policies which cover *every* method that has been proposed in the literature, or used in practice. While most of the books on decisions under uncertainty are written at fairly advanced mathematical levels, *RLSO* is written for the same audience that is served by the fields of optimization and machine learning with an emphasis on models and algorithms (just as the other two fields). Instead of focusing on the most complex tools, we provide a balanced presentation of all four classes of policies, which means we cover the policies that are actually used in practice (typically in an ad-hoc way).

I have taught this material in an undergraduate course at Princeton and in a graduate course that attracted students from eight different departments. It should serve as the foundation of a new field I am calling *sequential decision analytics* (click here for a description). For a video introduction, click here.