Stochastic Optimization

The following series of three papers provides an introduction to how to model stochastic optimization problems. The first is a general article aimed at the operations research community. This is followed by a two-part tutorial series aimed at the IEEE/controls community.

Warren B. Powell. “Clearing the Jungle of Stochastic Optimization,” Informs Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, (c) Informs

This paper describes a general modeling framework for stochastic optimization, discusses how to model state variables, and provides an overview of four fundamental (meta) classes of policies. Section 5 provides an in-depth discussion of lookahead policies, which provides a bridge to (multistage) stochastic programming.

(Click here to download paper)

W. B. Powell and S. Meisel, “Tutorial on Stochastic Optimization in Energy Part I: Models and Policies,” IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1459-1467, 2016.

Reviews the different canonical models used by different communities, and introduces our own canonical modeling framework and introduces the four classes of policies.

(Click here to download paper)

W. B. Powell, Stephan Meisel, “Tutorial on Stochastic Optimization in Energy II: An energy storage illustration”, IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1468-1475, 2016

Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. A fifth problem shows that in some cases a hybrid policy is needed.

(Click here to download paper)


Lauren Hannah, W.B. Powell, D. Dunson, “Semi-Convex Regression for Metamodeling-Based Optimization,” SIAM J. on Optimization, Vol. 24, No. 2, pp. 573-597 (2014).

Stochastic search involves finding a set of controllable parameters that minimizes an unknown objective function using a set of noisy observations. We consider the case when the unknown function is convex and a metamodel is used as a surrogate objective function. Often the data are non-i.i.d. and include a observable state variable, such as applicant information in a loan rate decision problem. State information is difficult to incorporate into convex models. We propose a new semi-convex regression method that is used to produce a convex metamodel in the presence of a state variable. We show consistency for this method. We demonstrate its effectiveness for metamodeling on a set of synthetic inventory management problems and a large, real-life auto loan dataset.

(Click here to download paper)

J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for Concave, Scalar Storage Problems with Vector-Valued Controls,” IEEE Transactions on Automatic Control, Vol. 58, No. 12, pp. 2995-3010. (2013).

This paper proves convergence for an ADP algorithm using approximate value iteration (TD(0)), for problems that feature vector-valued decisions (e.g. allocating energy over a grid), linked by a scalar storage system, such as a water reservoir. The problem arises in settings where resources are distributed from a central storage facility. The algorithm is well suited to continuous problems which requires that the function that captures the value of future inventory be finely discretized, since the algorithm adaptively generates break points for a piecewise linear approximation. The strategy does not require exploration, which is common in reinforcement learning.

(Click here to download paper)

D.H. Lee and W. B. Powell, “An Intelligent Battery Controller Using Bias-Corrected Q-learning”AAAI, Toronto, July, 2012.

This paper uses Q-learning to optimize a battery storage problem in the presence of stochastic prices. We found that for this application (with significant randomness in the cost function), that statistical errors in the Q-factors produces a large bias in the max operator. This paper suggests a correction factor based on an approximation derived from a model with an infinite number of actions. Without the bias correction, the algorithm still shows significant deviation from optimal after millions of iterations. The bias correction term significantly improves the rate of convergence.

(Click here to download paper)

Ilya Ryzhov, Boris Defourny, Warren Powell, “Ranking and Selection Meets Robust Optimization,” Winter Simulation Conference, December, 2012.

The objective of ranking and selection is to efficiently allocate an information budget among a set of design alternatives with unknown values in order to maximize the decision-maker’s chances of discovering the best alternative. The field of robust optimization, however, considers risk-averse decision makers who may accept a suboptimal alternative in order to minimize the risk of a worst-case outcome. We bring these two fields together by defining a Bayesian ranking and selection problem with a robust implementation decision. We propose a new simulation allocation procedure that is risk-neutral with respect to simulation outcomes, but risk-averse with respect to the implementation decision. We discuss the properties of the procedure and present numerical examples illustrating the difference between the risk-averse problem and the more typical risk-neutral problem from the literature.

(Click here to download paper)

Ryzhov, I. O., Awais Tariq, W. B. Powell, “May the Best Man Win: Simulation Optimization for Match-Making in E-Sports,” Proceedings of the Winter Simulation Conference, Phoenix, Arizona, December 11-14.

(Click here to download paper)

L. Hannah, W. B. Powell, D. Blei, “Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable,” Neuro-Information Processing Society, December, 2010.

Consider the newsvendor problem, where we have to choose a quantity and then observe a demand. Now assume that we are first given information about the state of the world – perhaps it is rainy or sunny. This observable state changes our belief about the demand, which changes our optimal solution. We have to design an online, stochastic optimization algorithm that solves the problem without actually knowning the distribution of demand, even when we know the state. If there were only two states of the world, the problem would be easy. Now assume the state of the world is a multidimensional vector. We use Dirichlet process mixtures to prove convergence to optimality of an online learning algorithm which can be used for complex state-of-the-world variables. We demonstrate the method in the context of making commitments for wind energy.

(click here to download paper) (supplement)

L. Hannah and W. B. Powell, “Evolutionary Policy Iteration Under a Sampling Regime for Stochastic Combinatorial Optimization,” IEEE Transactions on Automatic Control, Vol. 55, No. 5, pp. 1254-1257, May, 2010.

This paper provides a convergence proof for a type of evolutionary policy iteration algorithm which depends only on Monte Carlo sampling of the value of a policy. The algorithm was used to prove convergence for an algorithm in the literature which required computing expectations exactly.

(click here to download paper)

P. Frazier, W. B. Powell, H. P. Simao, “Simulation Model Calibration with Correlated Knowledge-Gradients,” Winter Simulation Conference, December, 2009.

A common challenge in the calibration of simulation model is that we have to tune several continuous parameters. This is our first application of the knowledge gradient algorithm with correlated beliefs to the problem of parameter tuning for simulation models. A single run of the model (which uses adaptive learning from approximate dynamic programming) requires more than a day, so the paper also introduces methods to product results without a full run. A Bayesian model is set up to capture the uncertainty in our beliefs about the convergence of the model. The method is illustrated in the tuning of two continuous parameters, which required approximately six runs of the model.

(click here to download paper)

S. Dayanik, W. Powell, and K. Yamazaki (2007). “Index policies for discounted bandit problems with availability constraints,” Journal of Applied Probability (to appear).

One of the most famous problems in information collection is the multiarmed bandit problem, where make a choice (out of a discrete set of choices), observe a reward, and use this observation to update estimates of the future value of rewards. This problem can be solved by choosing the option with the highest index (known as the Gittins index). In this paper, we present an index problem for the case where not all the choices are available each time. As a result, it is sometimes important to make an observation just because the observation is available to be made.

(click here to download paper)

Papadaki, K. and W. B. Powell, “Monotonicity in Multidimensional Markov Decision Processes for Batch Service Problems,” Operations Research Letters, Vol. 35, pp. 267-272 (2007).

Structural properties of stochastic dynamic programs are essential to understanding the nature of the solutions and in
deriving appropriate approximation techniques.We concentrate on a class of multidimensional Markov decision processes and
derive sufficient conditions for the monotonicity of the value functions. We illustrate our result in the case of the multiproduct
batch dispatch (MBD) problem.

(Click here to download paper)

Topaloglu, H. and W.B. Powell, “Dynamic Programming Approximations for Stochastic, Time-Staged Integer Multicommodity Flow Problems,” Informs Journal on Computing, Vol. 18, No. 1, pp. 31-42 (2006). (c) Informs.

This paper applies the technique of separable, piecewise linear approximations to multicommodity flow problems. This technique worked very well for single commodity problems, but it was not at all obvious that it would work well for multicommodity problems, since there are more substitution opportunities. The experimental comparisons against multistage nested Benders (which is very slow) and more classical rolling horizon procedures suggests that it works very well indeed. This paper also provides a more rigorous treatment of what is known as the “multiperiod travel time” problem, and provides a formal development of a procedure for accelerating convergence.

(Click here to download paper)


Powell, W.B., A. Ruszczynski and H. Topaloglu, “Learning Algorithms for Separable Approximations of Stochastic Optimization Problems,” Mathematics of Operations Research, Vol 29, No. 4, pp. 814-836 (2004).(c) Informs

We have been doing a lot of work on the adaptive estimation of concave functions. This paper represents a major plateau. It describes a new algorithm dubbed the Separable Projective Approximation Routine (SPAR) and includes 1) a proof that the algorithm converges when we sample all intervals infinitely often, 2) a proof that the algorithm produces an optimal solution when we only sample the optimal solution of our approximation at each iteration, when applied to separable problems, 3) a bound when the algorithm is applied to nonseparable problems such as two-stage stochastic programs with network resource, and 4) computational comparisons against deterministic approximations and variations of Benders decomposition (which is provably optimal). The experiments show that the SPAR algorithm, even when applied to nonseparable approximations, converges much more quickly than Benders decomposition.

(Click here to download paper)

Spivey, M. and W.B. Powell, “Some Fixed Point Results for the Dynamic Assignment Problem,” Annals of Operations Research, Vol. 124, (G. Mitra, ed.), Kluwer Academic Publishers, pp. 15-33 (2003).

In the course of conducting research on the use of linear value function approximations for the dynamic assignment problem, we ran across a class of problems where the right linear value function would produce the optimal solution. This paper summarizes this result and provides a proof.

(Click here to download paper)

Topaloglu, H. and W.B. Powell, “An Algorithm for Approximating Piecewise Linear Concave Functions from Sample Gradients,” Operations Research Letters, Vol. 31, No. 1, pp. 66-76 (2003).

We often find ourselves needing to approximate the marginal value of an additional resource, where we would like to estimate a piecewise linear concave function. But we have to do this with sample gradient information, which is random. And we still want the function to be concave. This paper proves convergence of an algorithm which maintains concavity by “leveling out” parts of the function that violate concavity.

(Click here to download paper)

Chen, Z.-L. and W.B. Powell, “A Convergent Cutting Plane and Partial Sampling Algorithm for Multistage Linear Programs with Recourse,” Journal of Optimization Theory and Applications, Vol. 103, No. 3, pp. 497-524 (1999). (c) Plenum Publishing.

This paper provides a proof of convergence for a class of nested Bender’s cutting-plane algorithm. The basic model allows only right-hand side randomness, and requires independence between stages. The algorithm is computationally tractable for problems that are much larger than any prior algorithm for which a proof of convergence has been produced. It requires a finite sample space, but it may have thousands of realizations per stage, and the technique can handle dozens (even hundreds) of stages. Numerical performance will depend on the specific problem class.

(Click here to download paper)

R. K.-L. Cheung and W.B. Powell, “SHAPE – A Stochastic, Hybrid Approximation Procedure for Two-Stage Stochastic Programs,” Operations Research, Vol. 48, No. 1, pp. 73-79 (2000) (c) Informs.

This paper offers a provably convergent algorithm for two-stage stochastic programs. The method uses a differentiable (strictly convex) auxiliary function, and then “tilts” it using stochastic gradient information. The algorithm is very easy to implement.

(Click here to download paper)

Godfrey, G. and W.B. Powell, “An Adaptive, Distribution-Free Algorithm for the Newsvendor Problem with Censored Demands, with Application to Inventory and Distribution Problems,” Management Science, Vol. 47, No. 8, pp. 1101-1112, (2001). (c) Informs.

This paper proposes a novel approximation procedure for stochastic resource allocation problems. The technique works with a series of stochastic gradients which are then used to maintain a concave estimate of the value function. When applied to problems with known optimal solutions, the method provides results within a fraction of a percent of optimal. One byproduct is a fast algorithm for the newsvendor problem that does not require knowledge of the underlying demand distribution. The technique is also applied to separable distribution problems.

(Click here to download paper)


Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times,” Transportation Science, Vol. 36, No. 1, pp. 21-39 (2002). (c) Informs

Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, II: Multiperiod Travel Times,” Transportation Science, Vol. 36, No. 1, pp. 40-54 (2002). (c) Informs

This paper adapts the CAVE algorithm to stochastic multistage problems. The paper demonstrates both rapid convergence of the algorithm as well as very high quality solutions. We found that the use of nonlinear approximations was complicated by the presence of multiperiod travel times (a problem that does not arise when we use linear approximations).

(Click here to download paper I)

(Click here to download paper II)


Cheung, R.K.-M. and W.B. Powell, “An Algorithm for Multistage Dynamic Networks with Random Arc Capacities, with an Application to Dynamic Fleet Management,” Operations Research, Vol. 44, No. 6, pp. 951-963. (1996). (c) Informs.

This paper applies the work in the paper immediately below to multistage problems. The motivating problem is dynamic fleet management with stochastic demands. This technique produces an approximate nonlinear function for the value of resources in a particular state. These nonlinear functions are then used in the context of a rolling horizon procedure. Numerical experiments indicate that the method outperforms rolling horizon procedures with deterministic demand forecasts. Note, however, that the nonlinear approximations are not updated iteratively.

Cheung, R.K.-M. and W.B. Powell, “Network Recourse Decomposition Method for Dynamic Networks with Random Arc Capacities,” Networks, Vol. 24, No. 7, pp. 369-384 (1994).

We consider a two-stage stochastic transshipment network. Our goal is to develop nonlinear approximations for the value of flow into a node at the beginning of the second stage. We decompose the second stage (which is assumed to be a dynamic transshipment network) into a series of trees, which allows us to use the tree recourse algorithm (described in the paper below) to find the exact value function. Because the second stage network is a transshipment network, we use a Lagrangian approach to decompose the network into trees. An iterative algorithm then updates the multipliers.

(click here to download paper)

Cheung, R.K.-M. and W.B. Powell, “Stochastic Programs over Trees with Random Arc Capacities,” Networks, Vol. 24, pp. 161-175 (1994).

This paper considers the problem of a tree with random arc capacities. If S units of flow enter the root node, where they are to be rounded through n stages before they depart through the leavese of the tree, then we are interested in finding V(S), which is the expected value of S units of flow. We propose an efficient, exact algorithm for finding V(S), which works for both single stage (all arc capacities are known at the same time) or n stages (the arc capacities are learned only after a stage is progressed).


Powell, W.B. and L. Frantzeskakis, “Restricted Recourse Strategies for Dynamic Networks with Random Arc Capacities,” Transportation Science , Vol. 64, pp. 261-287, 1996. (c) Informs.

General stochastic networks are quite complex. Not surprisingly, it is easier to analyze problems with special structure. In this paper, we explore specialized recourse strategies that arise in stochastic networks. We cover simple recourse (where the action is implemented regardless of the outcome, meaning that the state of the system in the future is deterministic), nodel recourse (where we assume that a decision is made at a node regarding which outbound arc should be chosen), and tree recourse (where we make the decision of how to route ourselves over a tree). Variations are also considered.

Frantzeskakis, L. and W.B. Powell, “Restricted Recourse Strategies for Bounding the Expected Network Recourse Function” Annals of Operations Research on Stochastic Programming, (J. Higle and S. Sen, eds), Vol. 64, pp. 261-287, 1996.

We investigate bounds for stochastic networks drawing on the principles of restricted recourse strategies, and concepts drawn from the work of Stein Wallace. Not for the faint of heart. As with a lot of the work on bounds for stochastic programs, this is a difficult intellectual exercise with a lot of input for very little output.

(click here to download paper)

See also:

Frantzeskakis, L. and W.B. Powell, “Bounding Procedures for Dynamic Networks with Random Link Capacities, with Application to Stochastic Programming,” Networks , Vol. 23, pp. 575-595 (1993).

Here we actually produce a computationally tractable bound for multistage dynamic networks with random arc capacities. We use our ability to successively solve to optimality network recourse functions with linear value functions. If we choose the right constant, we can produce a bound. But, don’t get your hopes up; the bounds are not very tight.

(click here to download paper)