**Clearing the Jungle of Stochastic Optimization**

We have a very good handle on modeling deterministic optimization problems, but the universe of problems that involve sequencing decisions and information have resisted being expressed in the kind of common canonical framework that has become universal in mathematical programming.

Stochastic optimization problems are so diverse (and so hard) that the field has become fragmented into a Balkanized set of communities with competing algorithmic strategies and modeling styles. It is easy to spend a significant part of a career mastering the subtleties of each perspective on stochastic optimization, without recognizing the common themes.

Our approach can be summarized as "model first, then solve." We argue for a common canonical form that cuts across all these communities. We have found that the different perspectives can often be viewed as different policies, suited to different classes of applications. Click below for:

Key points - A thumbnail sketch of the major ideas

Canonical model - Here we contrast the modeling of deterministic and stochastic problems (over multiple time periods), illustrating the style of optimizing over policies, followed by a summary of the four classes of policies, and an illustration that each may work best depending on the data.

Tutorial articles for different communities:

All communities - A unified framework for optimization under uncertainty, showing how to bring together all the communities in the jungle of stochastic optimization.

Operations Research - Optimal Newsletter article and the original "Clearing the Jungle" tutorial presented at 2014 Informs meeting.

IEEE/controls community - Two part tutorial in the IEEE style. Part I describes how to model stochastic optimization problems, including comparisons to different canonical modeling frameworks. Part II includes a nice battery storage example.

AI/reinforcement learning community - Easy to read article that appeared in AI Magazine, Fall, 2014.

Newslettter articles -These are brief summaries of the key ideas:

SIAM News - Appeared November, 2015

Optima Newsletter, January, 2015 - Includes a commentary by Andrzej Ruszczynski

- State variables - We claim that all properly modeled dynamic systems are Markovian, and provide a teachable, implementable definition that students can use to guide their efforts to model these systems. (If you have a counterexample, email it to me at powell@princeton.edu, but check the definition in the "Clearing the Jungle" article first - see section 3.)
- A dynamic program is a sequential decision
*problem*(it is*not*a method). Bellman's equations are a) conditions for an optimal policy and b) a path to designing good policies. - We introduce the distinction between the
*base model*, which is our model of the problem we are trying to solve, and a*lookahead model*, widely used as one class of policy (especially in stochastic programming). The base model is often referred to as a "simulator" without recognizing that this is the problem we are trying to solve. The base model is comparable to the objective function in a deterministic optimization model. - When solving a deterministic optimization problem, we want to find the best
*decision*(often a real-valued vector). When solving a stochastic optimization problem, we typically want to find the best*function*(known as a policy). Notable exception is asymptotic stochastic search problems (min_x E F(x,W)), but practical stochastic search problems are solved in finite time (which requires searching over policies). - There are two core strategies for finding policies: policy search, and policies based on approximations of a lookahead model. Each of these can be further divided into two strategies, producing four fundamental classes of policies that have been used to solve a wide range of dynamic programs (only one of which uses value functions). Fields such as dynamic programming and stochastic programming are often associated with a particular class of policy.
- A multistage stochastic program (using scenario trees) is a lookahead model for designing a policy (called a lookahead policy) for solving dynamic programs. An optimal solution of a lookahead model is generally
*not*an optimal policy (even if it is hard to solve). - "Robust optimization" (for sequential problems) is a lookahead policy (using a min-max objective) to solve a dynamic program where the objective may be formulated using an expectation (which is what is implied if you simulate the policy and average the results), or the worst case (which is consistent with the robust concept) or any other risk measure.

If you do not have time to read tutorial articles or the tutorial presentation, the two slides below provide a thumbnail sketch of the central idea behind our modeling and algorithmic strategy. It is most important to understand what problem we are solving (our model), and what we are looking for.

The first slide below contrasts the standard canonical form for a multiperiod, deterministic linear program (on the left), and the approach we are advocating to be the canonical form for a sequential, stochastic optimization problem:

The key idea is that when working on sequential stochastic problems, you are searching for the best *function* (known as a policy), rather than the more familiar problem of finding the best real-valued scalar or vector.

This raises the obvious question - how do you search over a space of functions???

We have taken a practical path for solving this question. We have identified four fundamental (meta) classes of policies, called PFAs, CFAs, VFAs and lookaheads. These are summarized on the following slide:

We let "pi" be the variable that captures both the type of function, and any parameters that determine the behavior of the function. A tunable parameter could be a regression parameter, or variables that determine the planning horizon, number of stages and number of scenarios.

In the equations above, we use tildes on variables that describe the lookahead model to avoid confusing them with the base model. This is only an issue when we are using lookahead policies.

Note that these four classes of policies span all the standard modeling and algorithmic paradigms, including dynamic programming (including approximate/adaptive dynamic programming and reinforcement learning), stochastic programming, and optimal control (including model predictive control).

It is important to recognize that even for the same problem, each of the four classes of policies may work best, depending on the data. We demonstrate this in the context of a simple energy storage problem (click here for a paper describing this work). We created five problems, which we solved using each of the four classes of policies, plus a hybrid. In the graph below, we report the performance of each policy as a fraction of the posterior optimal (the best you could do with perfect information). Each policy works best on one of the five problems.

We prefer the approach universally used in deterministic optimization where we formulate the problem first, and *then* we design methods to find a solution (in the form of a policy). But this requires accepting that in sequential problems, we are not looking for *decisions* (as we do in deterministic models), but rather *functions* (or policies). Classical strategies in stochastic optimization (which are described using familiar labels such as dynamic programming, stochastic programming, robust optimization and optimal control) actually represent particular classes of policies. So, to solve a problem using one of these approaches means you are actually choosing the class of policy before you have even started modeling the problem.

The articles below were written as tutorials to summarize the modeling strategy. In an effort to build bridges, each is written with an eye to a different community.

A Unified Framework for Optimization under Uncertainty, November, 2016, (c) Informs. The presentation that accompanied this tutorial at the Informs annual meeting (Nov 13 2016) is now available. (Click here for PowerPoint version) (Click here for pdf - animations are missing).

- Bridges stochastic search, dynamic programming, stochastic programming, stochastic control, optimal stopping, as well as ranking and selection and the expanding multiarmed bandit community.
- Describes how online (cumulative reward) and offline (terminal reward) problems can be modeled in the same way.
- Establishes links between classical problems in statistics and stochastic optimization, helping to bridge these fields.
- Describes two fundamental strategies for designing good (possibly optimal) policies, each of which further divides into two substrategies, producing the four classes of policies that covers all the different approaches we have seen for solving sequential decision problems.
A good companion article is the "Clearing the Jungle" tutorial below.

**For the operations research community:**

Clearing the Jungle of Stochastic Optimization, Informs Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, http://dx.doi.org/10.1287/educ.2014.0128 (c) Informs - See the discussions on state variables, the four classes of policies, and the discussion in section 5 on lookahead policies (esp. for people from stochastic programming).A powerpoint presentation on this topic can be downloaded here.

Bridging the Fields of Stochastic Optimization, Optima Newsletter, with a commentary by Andrjez Ruszczynski. - Optimal newsletter article (January, 2015)

**For the IEEE/controls community:**

The following is a two-part series written for an IEEE audience (the IEEE transactions series enforces a strict 8-page limit on papers):

Tutorial on Stochastic Optimization in Energy I: Models and Policies, IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1459-1467, 2016. Summarizes the modeling framework and four classes of policies, contrasting the notational systems and canonical frameworks of different communities.

Tutorial on Stochastic Optimization in Energy II: An energy storage illustration, IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1468-1475, 2016 Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. A fifth problem shows that in some cases a hybrid policy is needed.

**For the AI/computer science community:**

Energy and Uncertainty: Models and Algorithms for Complex Energy Systems, AI Magazine, Fall issue, appeared Sept, 2014. A shorter, easier-to-read article covering many of the same points and written for a computer science audience, using an energy theme.

The goal of these papers is to initiate a discussion within our communities so that we can develop a common set of principles for modeling these problems and designing policies. Please feel free to email me with your thoughts to powell@princeton.edu.

Warren Powell

www.castlelab.princeton.edu

If you came directly here, click here to return to the CASTLE Labs website.