ORF 544

Graduate course on

Stochastic Optimization

Fall, 2016
Meeting: Fridays, 1:30-4:30
Sherrerd 101

Professor Warren Powell

Department of Operations Research and Financial Engineering
Princeton University

If you are interested in taking/sitting in on this course, please click here to add your name to the list. The course is open to graduate students at Princeton (all departments) and Rutgers (you can apply for transfer credit).

The theme of the course is "Computational Stochastic Optimization," which is to say that we will be Jungle teaching stochastic optimization with a focus on proper modeling, and the design and analysis of practical algorithms. The course will present a unified framework for stochastic optimization that cuts across the many communities that contribute to the general problem of the design and control of systems under uncertainty. This modeling framework has been tested in problem domains spanning transportation and logistics, energy, health, finance, and even the laboratory sciences.

In contrast with classical courses that provide significant depth in the analysis of methods that only apply to a narrow problem class, we will provide a general modeling framework that covers a wide range of stochastic optimization problems that are typically covered by over a dozen different books, each identified with different fields. These communitiese have been known in the past under names such as dynamic programming, stochastic programming, stochastic control, stochastic search, simulation optimization, ranking and selection, and multiarmed bandit problems. This is why I like to call this the "jungle of stochastic optimization" (click here for more information on the "jungle").

Front coverAudience: The course is appropriate for students in operations research, economics, computer science, applied math, and any field of engineering (e.g. for students interested in engineering controls). Students must have some background in probability and statistics, and the ability to code in some environment (Matlab is fine). The emphasis will be on models and algorithms, and I will try to draw applications from the students to supplement my own library of problem settings.

Readings: The course will be taught from a new book, Optimization under Uncertainty: A Unified Framework, that is being written. The book will introduce students to different notational systems, and will cover problems, modeling frameworks and algorithmic strategies from all of the books shown above.


Take home final exam:

This is due Friday, Jan 20 at 4pm. Turn in to Tara Zigler.

Click here for take home final (updated Jan 18) - The final is designed to take you on a tour through the book

Click here for the current version of the entire book (***updated Jan 18 - Chapter 20 on lookahead models is new).

Last class Monday, Jan 16, 1:30pm, Sherrerd 125.

Click here for pdf of all the powerpoint slides I used in the course (60 meg, almost 1200 slides).


Readings: Individual chapters of the textbook, Optimization under Uncertainty, will be made available as they are ready:

OUU: Table of contents (revised Sept 29)

Part I: Introduction

OUU: Chapter 1 - Overview of the unified framework

OUU: Chapter 2 - Canonical models (revised Sept 29)

OUU: Chapter 3 - Approximation strategies (revised Sept 24)

Part II: Learning problems

OUU: Chapter 4 - Introduction to stochastic optimization (revised Oct 7)

OUU: Chapter 5 - Derivative-based stochastic search (revised Oct 7)

OUU: Chapter 6 - Adaptive estimation and stepsize policies (revised Oct 7)

OUU: Chapter 7 - Derivative-free stochastic search

Part III: General state-dependent functions

OUU: Chapter 8 - State-dependent problems (revised Oct 18)

OUU: Chapter 9 - Modeling

OUU: Chapter 10 - Modeling Uncertainty

OUU: Chapter 11 - Policies

OUU: Chapter 12 - Policy function approximations and policy search

Raymond Perkins lecture on Cost Function Approximations

OUU: Chapter 14 - Discrete Markov decision processes

OUU: Chapter 17 - Forward approximate dynamic programming I: The value of a policy

OUU: Chapter 18 - Forward approximate dynamic programming II: Policy optimization


Course themes:

We will illustrate ideas through a combination of standard models and real applications. Some of the standard models will include:

Real applications will be drawn based on the interests of the class.



Course outline:

Week 1 - Overview and canonical problems

Week 2 - Approximation strategies

Week 3 - The quintessential stochastic optimization problem and derivative-based stochastic optimization

Week 4 - Derivative-free stochastic optimization

Week 5 - Physical state applications and modeling

Week 6 - Modeling uncertainty and designing policies

Week 7 - Policy search

Week 8 - Discrete Markov decision processes

Week 9 - Structured dynamic programming and backward ADP

Week 10 - Forward approximate dynamic programming

Week 11 - Dec 8-Dec 13- Direct lookahead policies



Week 1 - Sept 23

Overview of computational stochastic optimization



Readings (* indicates required readings)

*Optimization under Uncertainty: A Unified Framework, Chapter 1

*Optimization under Uncertainty: A Unified Framework, Chapter 2


Warren B. Powell. “Clearing the Jungle of Stochastic Optimization,” Informs Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, http://dx.doi.org/10.1287/educ.2014.0128 (c) Informs - [Read up through section 5.]

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy I: Modeling and Policies", IEEE Trans. on Power Systems (to appear) Summarizes the modeling framework and four classes of policies, contrasting the notational systems and canonical frameworks of different communities. This paper and the next are written from the perspective of the IEEE community.

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy II: An energy storage illustration," IEEE Trans. on Power Systems (to appear) Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. A fifth problem shows that in some cases a hybrid policy is needed. [Focus on the energy modeling example - illustrates how to model a problem, and shows that any of the four classes of policies may work best, depending on the data].


Canonical problems

In this lecture I will cover a number of classical, canonical problems that are familiar to different communities working in stochastic optimization. Examples include:

We will close by presenting a simple canonical framework for all of these problems.


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter 2


Week 2 - Sept 27-29 - Approximation strategies

Stochastic optimization often (although not always) depends on using statistical methods to approximate functions. The presentation will cover some of the most popular methods used in statistics and machine learning, with emphasis on adaptive/recursive learning methods.


Readings (* indicates required readings)

*Optimization under Uncertainty: A Unified Framework, Chapter 3


Week 3

Oct 4 - The quintessential stochastic optimization problem

The "quintessential" stochastic optimization problem is the starting point in stochastic optimization. Here we introduce different flavors of the problem and discuss variations of objective functions.


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter 4


Oct 6 - Derivative-based stochastic optimization

This is arguably the most fundamental of all stochastic optimization problems, which is typically written

\min_x \E F(x,W)

where W is a random variable. This week, we consider the case where we can compute derivatives of the function F(x,W) given W. This is both an important problem in its own right, as well as the foundation for performing policy search for sequential problems.


Derivative-based stochastic optimization is both an important problem in its own right (static stochastic optimization), as well as the foundational problem for performing policy search for sequential problems. Furthermore, finite-horizon versions can themselves be formulated as dynamic programs.


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter 5 and 6.

W. B. Powell, Approximate Dynamic Programming - Chapter 8 - Policy search - Sections 8.2 and 8.3 and the proofs in section 8.9.

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2nd edition (2014). - Section 5.9.

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, Machine Learning, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

Mark Broadie, Deniz Cicek, Assaf Zeevi - General Bounds and Finite-Time Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm - This paper relaxes a key condition for stepsizes, namely the requirement that the sum of the squares of stepsizes has to be bounded.


Week 4 -Oct 11-13 - Derivative-free stochastic optimization

Although we pointed out that derivative-based stochastic optimization can be formulated as a dynamic program, it is in the setting of derivative-free stochastic optimization where we are going to first have to depend on the concepts of sequential learning systems, with a rich set of belief models that we explore using a full pallete of policies. We are going to use this setting of pure learning problems to introduce these concepts, which we return to later in the setting of sequential problems with physical sates.



Derivative-free stochastic optimization is the companion of derivative-based stochastic optimization, and represents an important class of problems, both in their own right, and as a strategy for performing policy search for sequential problems (dynamic programs).

Offline versions of this problem are known in the literature as "ranking and selection" problems. Online versions (where costs/rewards are accumulated as the search progresses), are widely known as "multiarmed bandit problems." Both can be formulated using the classical language of dynamic programming with appropriate objective functions.

Try to pay attention to whether a method is frequentist or Bayesian, as well as online or offline. Papers within a particular community do not always make these assumptions explicit. For example, "bandit" papers are almost always (99.9 percent) for online problems.


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter 7


Readings for offline learning:

For a general introduction (written at a undergraduate level), see:

Powell and Ryzhov, Optimal Learning - Chapter 4 - The ranking and selection problem [Entire chapter]

Powell and Ryzhov, Optimal Learning - Chapter 5 - The knowledge gradient policy [Exclude section 5.5]

A summary of a number of heuristics and a their experimental evaluation, see:

Wang, Powell - Finite-time analysis for the knowledge-gradient policy and a new testing environment for optimal learning - Focus on section 6 and onward, describing the MOLTE testing environment.

For a nice theoretical analysis of the knowledge gradient policy, see:

P. Frazier, W. Powell, S. Dayanik - A Knowledge Gradient Policy for Sequential Information Collection - This paper provides a rigorous model and the asymptotic analysis of the knowledge gradient policy. This is a good paper for learning how to properly model a Bayesian-based optimal learning problem. Read this paper thoroughly.

Frazier, P. I., and W. B. Powell, “Paradoxes in Learning and the Marginal Value of Information,” Decision Analysis, Vol. 7, No. 4, pp. 378-403, 2010. - Establishes the nonconcavity of the value of information. This happens a lot, so pay attention to this. Think about the limitations of the proposed KG(*) policy.

Recent paper on rate of convergence of expected value of information (EI) policies

Ryzhov - On the convergence rates of expected improvement methods March 2016 (to appear in Operations Research) - Highlights:

Readings for online learning:

Auer, Cesa-Bianchi, Fischer - Finite-time analysis of the Multiarmed Bandit Problem - Very nice paper building on the seminal paper by Lai and Robbins, who had shown that there are policies that asymptotically achieve the minimal regret. This paper shows that there are policies that can uniformly achieve the minimal regret. The paper only considers problems where the outcomes are 0 or 1.

*Bubeck and Cesa-Bianchi - Regret Analysis of Stochastic and Non-Stochastic Multi-Armed Bandit Problems - Read Sections 1, 2 and skim 7.

*Powell and Ryzhov, Optimal Learning - Chapter 6 - Introduction to bandit problems and an overview of the major solution approaches (Gittins indices, knowledge gradient for online learning, UCB policies)


For a sample of the literature for simulation-optimization (frequentist):

Chun-Hung Chen, Donghai He, Michael Fu, Loo Hay Lee, Efficient Simulation Budget Allocation for Selecting an Optimal Subset - This is a sample of the literature for optimizing across a discrete set of designs using Monte Carlo simulation. This is a frequentist methodology, designed for efficiently using finite computing resources to manage expensive simulations of a relatively small set of designs.

Hong, J., & Nelson, B. L. (2006). Discrete Optimization via Simulation Using COMPASS. Operations Research, 54(1), 115–129. doi:10.1287/opre.1050.0237 - Illustrates a frequentist-based method for general stochastic search, popular in the simulation-optimization community.

Hong, L., & Nelson, B. L. (2007). A framework for locally convergent random-search algorithms for discrete optimization via simulation. ACM Transactions on Modeling and Computer Simulation, 17(4), 1–22. doi:10.1145/1276927.1276932 - [Skim the convergence proof.]

More advanced belief models:

D. Negoescu, P. Frazier and W. B. Powell, “The Knowledge Gradient Algorithm for Sequencing Experiments in Drug Discovery,” Informs Journal on Computing, Vol. 23, No. 3, pp. 346-363, 2011. (Online supplement) - The knowledge gradient with a linear belief model. This is the first KG paper using a parametric belief model.

M. Mes, W.B. Powell, P. Frazier, Hierarchical Knowledge Gradient for Sequential Sampling - This illustrates a nonparametric belief model constructed by using hierarchical estimates of the unknown function. Note that the knowledge gradient has to anticipate both the change in the weights, as well as the change in the estimates.

Si Chen, K-R G. Reyes, M. Gupta, M. C. McAlpine, W. B. Powell, “Optimal Learning in Experimental Design Using the Knowledge Gradient Policy with Application to Characterizing Nanoemulsion Stability,” SIAM J. Uncertainty Quantification, Vol. 3, pp. 320-345, 2015. - Nonlinear belief model with discrete prior. [Sections 1-2, but primarily section 2.4, which is where the idea of discrete priors is presented.]

Y. Wang, W.B. Powell, Knowledge Gradient for Sequential Decisions with Stochastic Binary Feedbacks - This paper illustrates using the Laplace approximation to handle a nonlinear belief model. This represents an alternative approximation strategy to the use of discrete priors. Focus on the Laplace approximation in sections 3-4.

Leonora Bianchi, Marco Dorigo, Luca Maria Gambardella, Walter J. Gutjahr, A survey on metaheuristics for stochastic combinatorial optimization, Natural Computing, Vol 8, No. 2, pp. 239-287, 2009.



Week 5 - Oct 18-20 - Physical state applications and modeling

"Dynamic programming" is most often associated with problems which involve a physical state (more precisely, where the cost/contribution function is itself a function of the state, unlikely our earlier learning problems). Given the complexity of these problems, this is where we are really going to start to learn how to model real-world problems.


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 8-9


Week 6 - Oct 25-27 Modeling uncertainty and designing policies

Any sequential decision problem works from the triplet of starting with a state, making a decision, observing new information, and then transitioning to a new state. We start by discussing how to model uncertainty, and then transition to the challenge of designing policies.

Oct 18 - Modeling uncertainty

Stochastic optimization has traditionally focused on making the best decisions under uncertainty, where the focus is on the decisions, not the uncertainty. In practice, you will spend more time on modeling uncertainty.



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 10


Oct 20 - Policies

In deterministic optimization, we look for real-valued decisions (typically vectors). In stochastic optimization, we look for functions called policies, that map states to (feasible) actions. It is here that we truly pull all the different fields of stochastic optimization together.


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 11


Nov 1-3 - Fall break


Week 7 - Nov 8-10 - Policy search

Policy search is used when we can specify the form of a parameterized policy, where we then have to search for the best value of the parameters. These come in two forms: analytical functions that directly map a state to an action, and parameterized cost functions where we minimize/maximize a parameterized cost function (where we may modify the objective function or the constraints).


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 12-13


Deisenroth, M. P., Neumann, G., & Peters, J. (2011). A Survey on Policy Search for Robotics. Foundations and Trends in Machine Learning, 2(1-2), 1–142. - A nice introduction to policy search, but primarily in the context of deterministic control problems.

Michael J. Hadjiyiannis, Paul J. Goulart, and Daniel Kuhn, An Efficient Method to Estimate the Suboptimality of Affine Controllers, IEEE Trans. on Automatic Control, Vol. 56, No. 12, 2011 - A peek into the world of "affine controllers" (that is, policies that are linear in the state), in the setting of discrete time optimal control.

Week 8 - Nov 15-17 - Discrete Markov decision processes

No class Nov 15

This is the classic material due to Bellman, which sets up the strategy for designing policies based on calculating the value of being in a state. This material is based on some strong assumptions: relatively small sets of discrete states and actions, with well-defined and computable expectation over the exogenous information, making it possible to calculate one-step transition matrices, the cornerstone of discrete Markov decision processes. This material also lays the foundation for approximation strategies that can be used when these assumptions are not satisfied.





Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 14

Week 9

Nov 22 - Structured dynamic programs

Here we describe different classes of dynamic programs which exhibit special structure that we can exploit to find optimal policies.


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 15

Alberto Bemporad, Manfred Morari, Vivek Dua, Efstratios N. Pistikopoulos, The explicit linear quadratic regulator for constrained systems - A nice tutorial article on optimal control, focusing on constrained systems.

D. P. Bertsekas, "Value and Policy Iteration in Optimal Control and Adaptive Dynamic Programming," to appear in IEEE Trans. on Neural Networks (2015). - An important overview of value and policy iteration for deterministic optimal control. Helps provide an understanding of the role of approximation error.


Books I have available in the lab include:

Bertsekas, D. P. (2012). Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming (4th ed.). Belmont, MA: Athena Scientific. - The fourth edition of Bertsekas' popular book on optimal control. This is the most readable treatment I have seen that deals with stochastic control, although the style will be familiar to anyone with a background in classical Markov decision processes.

Lewis, F. L., Vrabie, D. L., & Syrmos, V. L. (2012). Optimal Control (3rd ed.). Hoboken, NJ: John Wiley & Sons. - Another very popular book, Frank Lewis comes from a classical engineering controls background, which means that he focuses on deterministic models of engineering problems. He only touches on "stochastic control" late in the book, and only in a fairly casual way.

Jiongmin Yong and Xun Yu Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations, Springer, 1999 - This is a classical, heavy-math treatment of stochastic control. The treatment here focuses on theory rather than modeling and computation.

Some older classics:

Robert F. Stengel, Optimal Control and Estimation - An older, but classic, treatment of optimal control from an engineering perspective (Prof. Stengel is a professor in MAE at Princeton).

Donald E Kirk, Optimal Control Theory: An Introduction.


Nov 29 - Backward approximate dynamic programming

Classical backward dynamic programming is limited to a special class of fairly small problems. But before we give up, it is possible to introduce approximations within a backward DP procedure.



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 16

Because of Thanksgiving, the weeks will transition to Thursday-Tuesday pairs.

Week 10 - Dec 1-Dec 6 Forward approximate dynamic programming



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapters 17-19

Week 11 - Dec 8-Dec 13- Direct lookahead policies

We finally move to designing policies using direct lookahead models, also known as model predictive control.



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter 20



Readings from earlier editions:


Agrawal, Goyal - Analysis of Thompson Sampling for the Multiarmed bandit problem - Thompson sampling is an example of a sampled lookahead model. Very easy to use, with nice analytical properties. Be able to explain how Thompson sampling works, and the bound (but not the proof of the bound).

Kleywegt, Shapiro, Homem-de-Melo The Sample Average Approximation Method for Stochastic Discrete Optimization

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2nd edition (2014). - [Read 5.3, 5.4, 5.5, 5.10]

Nemirovski, Juditsky, Lan and Shapiro - Robust Stochastic Approximation Approach to Stochastic Programming - A comparison of stochastic approximation methods to the sample average approximation method.

Bayraksan, G., & Love, D. K. (2015). INFORMS Tutorials in Operations Research, "Data-Driven Stochastic Programming Using Phi- Divergences." Informs TutORials in Operations Research 2014, (November), 1–19. - This is a nice tutorial on the idea of phi-divergences, which is a broad class of metrics for measuring the difference between two probability distributions.

Add Romisch paper...




Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter xx

Week 8 - Nov 9-11 - Optimal control

"Stochastic control" is a field that evolved largely from the optimal control of engineering applications. It closely parallels Markov decision processes (in fact, stochastic control preceded Markov decision processes). While the mathematical formulation of stochastic control problems is general, most applications consider stochastics in the form of additive noise in the transition function.

The optimal control literature often uses continuous time formulations, which is reflected in the notational style. It also tends to assume that the underlying problem is continuous and differentiable, but nonconvex.



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter xx

Week 9 - Nov 16-18 - Approximate Dynamic Programming

Approximate dynamic programming evolved initially from the computational problems that arose when trying to use the classical methods for solving Markov decision processes (primarily), although there has been significant activity in the optimal control community (motivated by engineering applications). In this very short presentation, we will focus on the methods that evolved to approximate the classical algorithms that evolved from Markov decision processes, namely value iteration and policy iteration.



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter xx

*Geramifard et al, A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning - A tutorial on the use of different value function approximations, written in the language of reinforcement learning. [Read 1-3, skim 4]

*J.N. Tsitsiklis, Asynchronous Stochastic Optimization and Q-Learning - This was the first paper to bridge reinforcement learning/approximate dynamic programming with the mathematics of stochastic approximation methods. The "classic theory" this paper refers to the material in section 8.9 of Powell - Approximate Dynamic Programming.

*J.N. Tsitsiklis, Ben van Roy - Analysis of Temporal-Difference Learning with Functional Evaluation - Temporal-difference learning refers to an algorithmic strategy where we learn the value of a fixed policy (no optimization taking place).

*I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This paper builds on the George and Powell paper, but uses the setting of a dynamic program with a single state and action to derive an optimal stepsize for approximate value iteration algorithms (including Q-learning). [Up through section III.C]

Van Roy, B., & de Farias, D. P. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3), 589–608.

*D.R. Jiang et al - A Comparison of Approximate Dynamic Programming Techniques on Benchmark Energy Storage Problems: Does Anything Work? - An extensive comparison of a variety of strategies based on approximating value functions. In short, classical machine learning methods estimated using approximate value or policy iteration do not work well when compared against optimal benchmarks. The only methods that work well are those that exploit convexity or monotonicity, but these are basically lookup table representations exploiting structure (vanilla lookup table does not work well either). [Just skim.]

Books and monographs (reference material)

*Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific. - This book is the first thorough treatment of approximate dynamic programming using the mathematics of stochastic approximation procedures. Click here for Chapter 5 which illustrates some useful proofs for lookup table representations.

R. Sutton and A. Barto, Reinforcement Learning, MIT Press, 2012 (second edition). - This is the bible of reinforcement learning - the first book to introduce the field. It illustrates the style and vocabulary of the field of reinforcement learning.

Szepesvari - Algorithms for reinforcement learning - A more modern but advanced monograph on reinforcement learning written by one of the top theoreticians in reinforcement learning.

D.P. Bertsekas, Dynamic programming and optimal control, Chapter 6, Approximate Dynamic Programming - This is a pre-publication version of a very thorough treatment of a class of ADP algorithms that later appeared in his update of his optimal control book.

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

*Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy - This has some very important material on the use of approximating architectures. Be sure to see the material on projected Bellman minimization.

*Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy -

J. Si, A. Barto, W.B. Powell and D. Wunsch (eds.), Learning and Approximate Dynamic Programming: Scaling up to the Real World,  John Wiley and Sons, New York, 2004.  - An edited volume from the 2002 NSF workshop, featuring contributions from computer science (reinforcement learning) and engineering controls.



Week 10 - Nov 23-25 - Stochastic programming and Benders decomposition for convex optimization

This week, we cover what could be called approximate dynamic programming for convex problems, and in particular for sequential stochastic linear programs, which is a special type of convexity (e.g. one that can be approximated using a series of cuts).



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter xx

*Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2nd edition (2014). - Sections 3, 5.10

*Shapiro - Analysis of the stochastic dual dynamic programming method - Nice illustration of SDDP for the Brazilian hydro system using integerstage independence. Don't spend too much time on the hydro application - focus on the structure of the model and the algorithm. Think about the question: is this a base model, or a lookahead model?

*Asamov, Powell - Regularized Decomposition of High-Dimensional Multistage Stochastic Programs with Markov Uncertainty. - This paper bridges the notation of stochastic linear programming with the notation of dynamic programming. It then introduces a very compact way to do regularization for multistage stochastic linear programs.

*Philpott Guan - On the convergence of stochastic dual dynamic programming and related methods - Elegant paper establishing the convergence of cutting-plane based methods.

Girardeau, Leclere, Philpott - On the Convergence of Decomposition Methods for Multistage Stochastic Convex Programs - Much more theoretical analysis of temporal decomposition methods for dynamic programs. A very good reference of a mathematically well-prepared student looking for more depth.

Higle Sen - Stochastic decomposition: An algorithm for two-stage stochastic programs with recourse - The first use of Monte Carlo-based samples of Benders cuts (extends the earlier work on "L-shaped decomposition" which uses Benders cuts to approximate the second stage of a stochastic linear program).


Week 11 - Nov 30-Dec 2 - Lookahead models and sampled approximations - Multistage problems



Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter xx

*D.P. Bertsekas, Dynamic Programming and Suboptimal Control: A survey from ADP to MPC.

Monte Carlo tree search for scalar action spaces:

*Browne et al - Survey of Monte Carlo tree search methods - Classical treatment of MCTS, illustrating the tendency of the CS community to work on deterministic problems.

*Auger, David, Adrien Couëtoux, and Olivier Teytaud. "Continuous upper confidence trees with polynomial exploration–consistency." Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2013. 194-209. - Addresses Monte Carlo tree search in a stochastic setting.

Model predictive control:

Matthew Ellis, Helen Duranda, Panagiotis D. Christofidesa, A tutorial review of economic model predictive control methods, J. Process Control, 2014. - An introduction to model predictive control using the language of control theory with engineering applications.

Scenario trees for multistage linear/integer programming:

*Mulvey, Vanderbei and Zenios - Robust Optimization of Large-Scale Systems - One of the first uses of stochastic optimization with scenario trees for multistage linear programs.

*Heitsch and Romisch, Scenario tree reduction for multistage stochastic programs, Computational Management Science, 2009 - This is part of a growing body of research focusing on how to generate effective samples.

Week 12 - Dec 7-9 - Risk-based and robust stochastic optimization


Readings (* indicates required readings)

*Course textbook: Warren B. Powell, Optimization under Uncertainty: A Unified Framework, Chapter xx

Ruszczyński, A. (2014). Advances in Risk-Averse Optimization Advances in Risk-Averse Optimization. In Informs Tutorials in Operations Research (pp. 168–190). Baltimore

Ben-Tal, A., & Nemirovski, A. (2002). Robust optimization - methodology and applications. Mathematical Programming, 92(3), pp. 453–480.

Ruszczynski, A. (2010). Risk-averse dynamic programming for Markov decision processes. Mathematical Programming, 125(2), 235–261. doi:10.1007/s10107-010-0393-3

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 6. [Update to latest edition]


Click here to access prior versions of this course (taught as grad seminars).