Modeling is the mathematical representation of a real problem. Good modeling requires a consistent, parsimonious notational style that is easy to read. Ultimately, the real test of a model involves calibration against real-world data, which means capturing the important dimensions of the application.

A few resources that may prove useful:

Recent tutorial articles:

A Unified Framework for Optimization under Uncertainty - Tutorial to be given at Informs annual meeting, Monday, Nov 14, 2016 in Nashville. This framework unifies the fields in the jungle of stochastic optimization. It also helps to read the earlier tutorial article:

Clearing the Jungle of Stochastic Optimization (tutorial given in 2014) - Discusses the core elements of a stochastic model, discusses state variables, reviews the four classes of policies, and discusses lookahead policies in depth.

A lecture on modeling - This is a power point presentation derived from a series of lectures I have been giving on modeling. Includes a series of slides that helps identify a better definition of a state variable. See also Chapter 5 - Modeling.

A lecture on policies - Similar in style to the modeling lecture, this describes and illustrates the four fundamental classes of policies (click here for a more detailed tutorial on policies in our tutorial on the "Jungle of Stochastic Optimization.") The "Unified Framework" tutorial (above) contains a further refinement of these ideas.

ORF 411 - Operations and Information engineering - An undergraduate course we teach in our department at Princeton, with a major theme of modeling.

Major problem classes - An overview of major problem classes along the five fundamental dimensions of sequential decision problems.

Discussions - What is a policy? What is a state variable? How do we bridge communities? ... and other topics that seem relevant.

Additional readings ... tutorial style articles on modeling and stochastic optimization.

Additional advice on modeling - If you are faculty, staff, post-doc or grad student at Princeton University and you need help thinking through a sequential decision problem, from any field, please contact Warren Powell ( to set up a modeling session.


Problem classes

Below I am going to try to list dimensions along which the problem classes of computational stochastic optimization can be divided. If you can think of one that I have missed, let me know.


Static problems - Make a decision (alternatively, choose a deterministic parameter or fix a design), then observe a random outcome.

Two-stage - Make a decision, observe a random outcome, and then make one more decision (also known as the recourse). This is very close to static problems, but I feel that it deserves its problem class. It has received considerable attention from the stochastic programming community where decisions are vectors.

Sequential decision problems - Also known as multistage stochastic optimization, this is where you make a decision, see a random outcome, make another decision, see a random outcome, and so on (over a finite or infinite horizon).



Our lack of an accepted definition of a state variable is one of my pet peeves in the field of sequential decision problems. See my discussion of states for my views on a definition of the state of the system. For now, I will offer some variations of state variables:

A "dynamic program" is often equated with a physical state that determines the available actions. But there are learning problems where the state of the system consists purely of a belief state (see, for example, the multiarmed bandit problem - for more, see my work on optimal learning.

Side note: The computer science community makes a distinction between the state of a system, and state variables which might be the different components of a state. I don't think the operations research community makes this same distinction (and I have no idea about the controls community).



Communities such as computer science (reinforcement learning), control theory and operations research/math programming can be divided roughly in terms of the nature of the decisions each field addresses:

Discrete action spaces - Universal notation is "a" -Discrete action spaces arise in a vast range of applications, often involving the management of a single entity (robots, games, people, animals, ...). Problems are nonconvex and discrete.

Continuous controls - Common notation is "u" - Engineers are often flying planes, controlling machines or chemical processes or moving robots. Controls are typically low-dimensional (10-20 dimensions is a lot), where they exploit continuity but not convexity.

Vector-valued actions - Common notation is "x" - The operations research community solves a wide range of problems using high-dimensional vectors x, often with thousands (and sometimes millions) of dimensions. Convexity is a powerful and widely used property.


Exogenous (random) information

Exogenous information comes in many different flavors, including:


Transition functions

Known as system models, state models, plant models, models, transfer functions, or laws of motion, the transition function describes how the state of the system evolves from one time period to the next. These come in different flavors:


Objective functions

Objective functions can be classified along several dimensions:

Handling uncertainty - For deterministic problems, we just add up costs and contributions. When we introduce uncertainty, we have to decide how to compute our objective. Some choices include:

Additivity - The vast majority of research articles assume that costs/contributions are additive over time, but this is not always the case. In some cases additivity can be restored by designing an appropriate state variable. The use of risk measures can be one source of complicatoin.

Continuity - Objective functions may be Lipschitz continuous, contiinuously differentiable, upper/lower semicontinuous, and CADLAG (google it). If you have none of the above, I hope you are using discrete action spaces where this does not matter.

Convexity - Convexity is a powerful property for vector-valued applications, but there are many problems that cannot assume convexity. The reinforcement learning community works almost exclusively with discrete action spaces, where convexity is not an issue. The engineering controls community widely assumes continuity without convexity (which is how they are always producing those great 3-D Matlab plots!).

Differentiability - We can compute derivatives for some problems (often in the form of stochastic gradients), but there are many problems where derivatives are not available and we have to use derivative-free algorithms.

Computational complexity - Some functions can be evaluated in a fraction of a second, in others a single observation takes a year. This has a big impact on the design and testing of algorithms. A sample of different problem classes include:




I will use this space to carry on a series of discussions about topics in stochastic optimization.

Observations - These are a series of observations about stochastic optimization, some of which run against conventional thinking (at least in some communities).

What is a policy? - It is a rule for making decisons. Here I summarize four basic classes of policies that can be used as a foundadtion for building hybrids.

What is a state? - One of my favorite topics. How did we get this far without agreeing on a definition?

Bridging communities - Sometimes it can be tricky seeing the parallels between communities. Here I discuss:


Some observations

I am going to use this section to make a few observations about dynamic programming.



What is a policy?

A policy is any mapping from state S to a feasible action a (or x, or u). For all the differences of opinion about dynamic programs, I have not found anyone who disagrees with this basic statement.

The dynamic programming community sometimes equates a policy with a lookup table ("when in discrete state s, take the discrete action a"). But a policy is any mapping from state to action, which means it could involve solving a linear program (what is the best assignment of resources to tasks right now).

While there are a wide range of policies, I have found that they can be decomposed into four fundamental classes

  1. Policy function approximations (PFAs) - Analytical mappings from states to actions
  2. Robust cost function approximatinos (CFAs) - These are parametric modifications of cost functions (and/or constraints) designed to achieve robust behavior.
  3. Policies based on value function apprroximations (VFAs) - Generally known as approximate dynamic programming
  4. Lookahead policies - Policies based on optimizing an approximate lookahead model.

A myopic cost function approximation (CFA) might involve solving a linear program to assign resources (such as truck drivers) to tasks (such as loads). We might modify the function to include a bonus for covering late loads (this is the cost function approximation).

Lookahead policies are widely used under different names such as tree search, rolling/receding horizon procedures, model predictive control (popular in engineering), and stochastic programming (which is a rolling horizon procedure which explicitly accounts for uncertainty). Lookahead policies solve approximations of the real problem. This approximation typically involves optimizing over a shorter horizon (instead of optimizing the energy storage device over an entire year, we may feel it is enough to optimize over 24 hours). In addition, it is common to approximate the uncertainty in some way. The simplest approximation is to assume the future is deterministic, optimize this much easier problem, implement the solution, and then observe a random transition (in the real model). The stochastic programming community uses scenario trees, which are sampled versions of the true stochastic process. The same idea is used in the reinforcement learning community under the name Monte Carlo tree search.

Policy function approximations are analytic functions that return an action given a state, without solving an optimization problem.

Policies based on value function approximations look like


Here, we have replaced the value function (actually, the expected value function) with a linear regression. This is the policy most commonly associated with "dynamic programming" and Bellman's equation.

Policies 1, 3 and 4 all use some form of functional approximation. We can approximate functions in three ways: lookup table, parametric and nonparametric. We can also use a mixture known as semi-parametrics.

In addition, it is possible to create a wide range of problems by using mixtures of the four fundamental classes. A popular strategy, for example, is to use tree search (or a rolling horizon procedure) with value function approximations as a terminal reward. Another powerful strategy is to use policy function approximations, possibly as low dimensional patterns, combined with myopic or lookahead policies (possibly as linear programs).

The concept of a policy is particularly powerful whenever we are representing decisions in the future, with information that is not known now. If we are at time t, a decision at time t' > t is a random variable. You can always represent this by replacing the decision with the policy that depends on the state at t' (which is also a random variable).

Additional readings:

Clearing the Jungle of Stochastic Optimization - This will be published in the Informs TutORials series in 2014.

A presentation of the four fundamental classes of policies is given in Chapter 6 of my ADP book.

For an illustration of the four classes of policies in the context of problems in transportation and logistics, see

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.


What is a state?

It seems surprising that the dynamic programming community has not, in the past 60 years, adopted a standard definition of a state variable. This topic was discussed at an NSF workshop called a "Conversation between AI and OR on Stochastic Optimization." At this workshop, we posed the question to the attendees which produced 30 responses which can be viewed here. Needless to say, there was quite a bit of variety in the responses.

Two ideas seemed to stand out from the responses. The first was that a state variable should be a sufficient statistic, which means it contains all the necessary information. The second was that it should be "efficient," "parsimonious," or "minimal."

Two objections were raised about the use of the term "sufficient statistic." First, it can be seen as replacing one piece of jargon ("state") with another, which then needs its own definition. The second is that the term "sufficient statistic" is widely used in statistic where it has its own meaning.

There also seems to be a feeling that a state should only contain necessary information. Most models do this implicitly, but not all. The stochastic programming community routinely uses the entire history, which is certainly a sufficient statistic, but not necessary.

With apologies, my favorite definition is from chapter 5 of my ADP book (you can download this from It reads

A state is the minimally dimensioned function of history that is necessary and sufficient to compute the decision function, the transition function, and the cost/reward function.

This definition is consistent with the concept of a sufficient statistic, but requires that the information be necessary as well (and therefore minimally dimensioned, which is redundant). It also provides clear guidelines for necessary - it is the data needed to compute the decision function (for example, the set of feasible actions might depend on the state), the cost/reward function, and the transition function. We maintain that if information is only needed to compute the transition function, then it should be required to compute the contribution function or decision function at a later point in time.

Examples of information that need to be in the state variable:


In this case, the state variable has to include p_t, p_{t-1} and p_{t-2}, because they are needed to compute the transition function.



Bridging communities

A real opportunity for a community for computational stochastic optimization is recognizing when the contributions of one community can be used to help solve the problems of another. Below I provide bridges between stochastic search and dynamic prograrmming, followed by a step-by-step roadmap from classical dynamic programming to stochastic programming. This is based on a recent newsletter article that appeared in the Informs Computing Society newsletter.


From stochastic search to dynamic programming

Stochastic search is itself an umbrella term that encompasses derivative-based search (stochastic gradient methods, stochastic approximation methods), and derivative-free search (which includes a lot of the work in the simulation-optimization community, and the black-box optimization community). Stochastic search is typically written in the generic form


where x is a deterministic parameter and W is a random variable.

Stochastic search is often viewed as being distinct from dynamic prograrmming because we are choosing a single decision that works well over different outcomes, while sequential problems allow you to adapt decisions over time as new information becomes available. However, a different way to think about sequential decision problems is that you are choosing a fixed policy that determines decisions over time. This would be written



where the state variable evolves according to


where W is a random variable and x is determined by the policy ("decision function") Eqn004. When we search over the policies, we mean that we are searching over classes of policies (this might be a lookahead policy, or one that depends on value function approximations), as well as any tunable parameters for the class of policy being considered. For example, we might represent our policy using an approximate value function, using


Here, we have written the dependence of the policy on the regression parameters theta. We might find the best set of parameters theta by solving


We easily see that this is identical to our stochastic search problem. To solve this, we might draw on the tools of infinitessimal perturbation analysis to estimate derivatives. If we cannot find a derivative, we could turn to various derivative-free techniques based on the field of ranking and selection, simulation-optimization and optimal learning.


From dynamic programming to stochastic programming

This is harder, and I cannot do the equations on a website. What is important to recognize, even without any math, is that stochastic programming solves an approximate model of the future to determine an action that is implemented now. Once this is done, you step forward in time using a real model (this could be a mathematical model, or an observation from a real physical system), after which you have to do it all over again. This is how you recognize a lookahead policy. The approximate model is called the lookahead model.

The stochastic programming community uses two fundamental approaches for solving the lookahead model:

A detailed, step by step derivation which starts with Bellman's optimality equation and ending with the standard methods used in stochastic programming are given in a 6+ page article in the Informs Computing Society newsletter (Fall, 2012). These equations cannot be replicated in a website.

A much longer article gives the same presentation, and then uses applications in transportation and logistics to illustrate the different classes of policies, is

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.



Additional readings:

Alan J. King & Stein W. Wallace (2012). Modeling with stochastic programming. New York: Springer Verlag - A great discussion of modeling in the context of stochastic programming.

Powell, W.B., J. Shapiro and H.P. Simao, “A Representational Paradigm for Dynamic Resource Transformation Problems,” Annals of Operations Research on Modeling (C. Coullard, R. Fourer, and J. H. Owen, eds), Vol. 104, pp. 231-279, 2001. This paper, although a bit dated, laid the foundation for our modeling framework for complex resource allocation problems.

A Unified Framework for Stochastic and Dynamic Programming, Informs Computing Society newsletter, November, 2012 - This six page article provides a brief overview of how to model stochastic, dynamic systems. There is a section which gives a step by step bridge from classical dynamic programming to stochastic programming.

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8. - This is a much longer version of the ICS newsletter article above, illustrated with numerous examples drawn from transportation and logistics.

Yu-Li Chou, Stephen Pollock, H. Edwin, Romeijn, Robert L. Smith, "A Formalism for Dynamic Programming," Working paper, Department of Industrial and Information Engineering, University of Michigan, October, 2001," - A thoughtful perspective of dynamic programming by some of the top researchers in the field. Note the implicit definition of a state variable on page 2.

Jack Kleijnen - Simulation-Optimization via Kriging and Bootstrapping: A Survey, November 2012 - A nice survey article on a stochastic search method known as kriging.

For some recent papers that illustrate the modeling of complex sequential problems in different settings, see

An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application

From Single Commodity to Multiattribute Models for Locomotive Optimization: A Comparison of Integer Programming and Approximate Dynamic Programming

SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology and Policy