ORF 544

(formerly ORF 569)

Graduate seminar on

Computational Stochastic Optimization

Organized by Warren Powell

 

Below are the readings and lecture notes for a series of graduate seminars on the broad topic of computational stochastic optimization.

Spring, 2016 (reading course)

Fall, 2013

Spring, 2013 - This year it is being taught as a reading course.

Spring, 2012 - This version of the course featured weekly speakers. Each lecture was scribed, and there are supporting readings.

Collection of additional readings

 

Spring, 2016 (Informal reading course)

The course will be taught as a reading course, in preparation for its introduction as a full course, ORF 544, for the fall of 2016.

The theme of the course is "Computational Stochastic Optimization," which is to say that we will be teaching stochastic optimization with a focus on proper modeling, and the design and analys of practical algorithms. The course will cut across the many communities that contribute to the broad field of stochastic optimization (see the "jungle" webpage).

A major goal of the course is to provide students with a strong foundation in terms of the following:

We will illustrate ideas through a combination of standard models and real applications. Some of the standard models will include:

Real applications will be drawn based on the interests of the class.

Each lecture is accompanied by a series of readings. The idea is to provide an initial bibliography that will serve as an introduction to each of the topics. While these readings have been carefully chosen, there is much more here than can be absorbed in a single course.

 

Week 1 - Feb 11 - Overview of computational stochastic optimization

Topics:

 

Readings:

*Warren B. Powell. “Clearing the Jungle of Stochastic Optimization,” Informs Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, http://dx.doi.org/10.1287/educ.2014.0128 (c) Informs - [Read up through section 5.]

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy I: Modeling and Policies", IEEE Trans. on Power Systems (to appear) Summarizes the modeling framework and four classes of policies, contrasting the notational systems and canonical frameworks of different communities.

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy II: An energy storage illustration," IEEE Trans. on Power Systems (to appear) Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. A fifth problem shows that in some cases a hybrid policy is needed. [Focus on the energy modeling example - illustrates how to model a problem, and shows that any of the four classes of policies may work best, depending on the data].

The following three chapters are the major modeling chapters from the third edition of my ADP book:

*Chapter 5 - Modeling - This provides the full framework for modeling sequential decision problems. Major change from the second edition is the discussion of objective functions toward the end. [Very important - this is the primary resource for how to model sequential problems. Be sure to pay special attention to section 5.11.]

*Chapter 6 - Modeling uncertainty - This is entirely new material focusing purely on modeling uncertainty (a topic often referred to as "uncertainty quantification"). [Read section 6.1]

*Chapter 7 - Policies - This chapter describes two fundamental strategies for designing policies: direct policy search (optimizing over classes of functions), and policies based on lookaheadmodels. Each of these are then divided into two strategies, creating four classes of policies. These are illustrated using the energy storage example presented in chapter 5 (section 5.8).

 

Week 2 - Feb 18 - Derivative-based stochastic optimization

Topics:

Derivative-based stochastic optimization is both an important problem in its own right (static stochastic optimization), as well as the foundational problem for performing policy search for sequential problems. Furthermore, finite-horizon versions can themselves be formulated as dynamic programs.

Readings:

W. B. Powell, Approximate Dynamic Programming - Chapter 8 - Policy search - Sections 8.3 and 8.9 (skim 8.1-8.2)

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2nd edition (2014). - Section 5.9.

Mark Broadie, Deniz Cicek, Assaf Zeevi - General Bounds and Finite-Time Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm - This paper relaxes a key condition for stepsizes, namely the requirement that the sum of the squares of stepsizes has to be bounded.

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, Machine Learning, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This paper builds on the George and Powell paper, but uses the setting of a dynamic program with a single state and action to derive an optimal stepsize for approximate value iteration algorithms (including Q-learning).

 

Week 3 - Feb 25 - Derivative-free stochastic optimization I

Topics:

Derivative-free stochastic optimization is the companion of derivative-based stochastic optimization, and represents an important class of problems, both in their own right, and as a strategy for performing policy search for sequential problems (dynamic programs).

Offline versions of this problem are known in the literature as "ranking and selection" problems. Online versions (where costs/rewards are accumulated as the search progresses), are widely known as "multiarmed bandit problems." Both can be formulated using the classical language of dynamic programming with appropriate objective functions.

Readings for offline learning:

For a general introduction, see:

Powell and Ryzhov, Optimal Learning - Chapter 4 - The ranking and selection problem

Powell and Ryzhov, Optimal Learning - Chapter 5 - The knowledge gradient policy

For a sample of the OCBA literature for simulation-optimization:

Chun-Hung Chen, Donghai He, Michael Fu, Loo Hay Lee, Efficient Simulation Budget Allocation for Selecting an Optimal Subset - This is a sample of the literature for optimizing across a discrete set of designs using Monte Carlo simulation. This is a frequentist methodology, designed for efficiently using finite computing resources to manage expensive simulations of a relatively small set of designs.

A summary of a number of heuristics and a their experimental evaluation, see:

Wang, Powell - Finite-time analysis for the knowledge-gradient policy and a
new testing environment for optimal learning
- Focus on section 6 and onward, describing the MOLTE testing environment.

For a nice theoretical analysis of the knowledge gradient policy, see:

P. Frazier, W. Powell, S. Dayanik - A Knowledge Gradient Policy for Sequential Information Collection - This paper provides a rigorous model and the asymptotic analysis of the knowledge gradient policy.

Readings for online learning:

Powell and Ryzhov, Optimal Learning - Chapter 6 - Bandit problems (Gittins indices, knowledge gradient for online learning, UCB policies)

Bubeck and Cesa-Bianchi - Regret Analysis of Stochastic and Non-Stochastic Multi-Armed Bandit Problems - Read Sections 1, 2 and skim 7.

 

Week 4 - March 3 - Derivative-free stochastic optimization II

Topics:

D. Negoescu, P. Frazier and W. B. Powell, “The Knowledge Gradient Algorithm for Sequencing Experiments in Drug Discovery,” Informs Journal on Computing, Vol. 23, No. 3, pp. 346-363, 2011. (Online supplement) - Linear belief model

Si Chen, K-R G. Reyes, M. Gupta, M. C. McAlpine, W. B. Powell, “Optimal Learning in Experimental Design Using the Knowledge Gradient Policy with Application to Characterizing Nanoemulsion Stability,” SIAM J. Uncertainty Quantification, Vol. 3, pp. 320-345, 2015. - Nonlinear belief model with discrete prior.

Y. Wang, W.B. Powell, Knowledge Gradient for Sequential Decisions with Stochastic Binary Feedbacks - This paper illustrates using the Laplace approximation to handle a nonlinear belief model. This represents an alternative approximation strategy to the use of discrete priors.

Kobby's work on exploration/exploitation with parametric belief models

Xinyu's work on balancing optimization and parameter estimation

 

Week 5 - March 10 - Lookahead models and sampled approximations - Static/two-stage

Topics:

Readings:

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2nd edition (2014). - 5.3, 5.4, 5.5, 5.10

Kleywegt, Shapiro, Homem-de-Melo The Sample Average Approximation Method for Stochastic Discrete Optimization

Nemirovski, Juditsky, Lan and Shapiro - Robust Stochastic Approximation Approach to Stochastic Programming - A comparison of stochastic approximation methods to the sample average approximation method.

 

Week 6 - March 17 - Dyamic programming and Markov decision processes

Topics:

 

Powell, Approximate dynamic programming, Chapter 3 - 3.1-3.8, 3.9 (skim), 3.10.

 

Week 7 - March 24 - Policy search for dynamic programming

Topics:

 

Deisenroth, M. P., Neumann, G., & Peters, J. (2011). A Survey on Policy Search for Robotics. Foundations and Trends in Machine Learning, 2(1-2), 1–142. - A nice introduction to policy search, but primarily in the context of deterministic control problems.

Week 8 - March 31 - Approximate Dynamic Programming

Topics:

Readings:

J.N. Tsitsiklis, Asynchronous Stochastic Optimization and Q-Learning - This was the first paper to bridge reinforcement learning/approximate dynamic programming with the mathematics of stochastic approximation methods

J.N. Tsitsiklis, Ben van Roy - Analysis of Temporal-Difference Learning with Functional Evaluation - Temporal-difference learning refers to an algorithmic strategy where we learn the value of a fixed policy (no optimization taking place).

Books and monographs:

Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific. - This book is the first thorough treatment of approximate dynamic programming using the mathematics of stochastic approximation procedures.

R. Sutton and A. Barto, Reinforcement Learning, MIT Press, 2012 (second edition). - This is the bible of reinforcement learning - the first book to introduce the field. It illustrates the style and vocabulary of the field of reinforcement learning.

Szepesvari - Algorithms for reinforcement learning - A more modern but advanced monograph on reinforcement learning written by one of the top theoreticians in reinforcement learning.

D.P. Bertsekas, Dynamic programming and optimal control, Chapter 6, Approximate Dynamic Programming - This is a pre-publication version of a very thorough treatment of a class of ADP algorithms that later appeared in his update of his optimal control book.

 

Week 9 - April 7 - Stochastic programming and Benders decomposition for convex optimization

Topics:

Higle Sen - Stochastic decomposition: An algorithm for two-stage stochastic programs with recourse - The first use of Monte Carlo-based samples of Benders cuts

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2nd edition (2014). - Section 3, 5.10

Shapiro - Analysis of the stochastic dual dynamic programming method - Nice illustration of SDDP for the Brazilian hydro system using integerstage independence

Asamov, Powell - Regularized Decomposition of High-Dimensional Multistage Stochastic Programs with Markov Uncertainty.

Philpott Guan - On the convergence of stochastic dual dynamic programming and related methods - Nice paper in OR letters overcoming the weaknesses of prior proofs.

Girardeau, Leclere, Philpott - On the Convergence of Decomposition Methods for Multistage Stochastic Convex Programs - Much more theoretical analysis of temporal decomposition methods for dynamic programs.

 

Week 10 - April 14 - Lookahead models and sampled approximations II - Multistage problems

Topics:

D.P. Bertsekas, Dynamic Programming and Suboptimal Control: A survey from ADP to MPC.

Browne et al - Survey of Monte Carlo tree search methods - Classical treatment of MCTS (for deterministic applications)

Auger, David, Adrien Couëtoux, and Olivier Teytaud. "Continuous upper confidence trees with polynomial exploration–consistency." Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2013. 194-209. - Addresses Monte Carlo tree search in a stochastic setting.

Mulvey, Vanderbei and Zenios - Robust Optimization of Large-Scale Systems - One of the first uses of stochastic optimization with scenario trees for multistage linear programs.

Heitsch and Romisch, Scenario tree reduction for multistage stochastic programs, Computational Management Science, 2009 - This is part of a growing body of research focusing on how to generate effective samples.

Week 11 - April 21 - Risk-based and robust stochastic optimization

 

 

Week 12 - April 28 - Buffer week

 

Fall, 2013

Lectures will be scribed and posted as the course procees. Additional reading materials are also posted with each lecture. I have posted the scribed lectures from 2012. These will be replaced with the lecture given this semester as it becomes available.

Assignments of people to topics are tentative. New topics can be added, and the topics below can be tilted to fit the interests of the speaker.

 

Week 1 - Sept 18 - Overview of computational stochastic optimization

Prof. Powell will introduce the fields of stochastic search and dynamic programming to bring out the relationship between these fields. He will also present a modeling framework for stochastic dynamic programs, discuss state variables, and time permitting, discuss some current research challenges.

Click here for Prof. Powell's opening lecture.

References:

Powell, Approximate dynamic programming, Chapter 5 - Modeling

Powell, Approximate dynamic programming, Chapter 6 - Policies

 

Week 2 - Sept 25 - Stochastic search - Genna Gliner

Click here for lecture from 2013

Readings:

Powell, W. B., Chapter 6 of Approximate Dynamic Programming, section 6.8 (2011). This section describes some classic convergence proofs for stochastic approximation theory that are central to ADP and RL.

Nati Srebro and Ambuj Tewari, Stochastic Optimization for Machine Learning, (2010), A presentation describing how the parameters of a machine learning model can be estimated using the tools of stochastic search.

Mark Broadie, Deniz Cicek, Assaf Zeevi - General Bounds and Finite-Time Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm - This paper relaxes a key condition for stepsizes, namely the requirement that the sum of the squares of stepsizes has to be bounded.

 

Week 3 - October 2 - Derivative free stochastic search - Yan Li

Click here for lecture from 2013.

References:

Spall, 2003 (available from Prof. Powell).

Hu, Wang, Zhou, Fu, Marcus - Survey of model-based methods for global optimization - A good survey article for methods for optimizing non-convex functions.

Chang, Fu, Hu, and Marcus, Simulation-based optimization of Markov decision processes - See section 4.1 on model reference adaptive search.

 

No class on Wednesday, October 9 due to conflict with Informs annual meeting.

 

Week 4 - October 16 - Optimal learning in stochastic search - Yingfei Wang

Offline and online learning, active learning, expected value of information methods, lookup table and parametric belief models.

Click here for lecture from 2013.

References:

Powell and Ryzhov, Optimal Learning, Chapter 4 - Ranking and selection (offline learning)

Powell and Ryzhov, Optimal Learning, Chapter 5 - Knowledge gradient

Powell and Ryzhov, Optimal Learning, Chapter 6 - Bandit problems (online learning)

Powell and Ryzhov, Optimal Learning, Chapter 8 - Linear belief models

 

Week 5 - October 23 - Stepsizes and optimal learning rates - Meriton Ibraimi

Click here for lecture from 2013.

References:

Powell, Approximate dynamic programming, Chapter 11 - Stepsizes

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, Machine Learning, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This is a new paper, which generalizes the optimal stepsize formula in the George et al paper above specifically for the setting of dynamic programs.

 

 

Part II - Multistage stochastic optimization

We make the transition to multistage problems where decisions have to be made over time. We do this by covering lookahead polies (tree search, model predictive control, stochastic programming), and then move to dynamic programming (policies based on value function approximations), starting with classical exact results from the MDP literature. We have a lecture on stochastic optimization in the presence of risk measures (rather than classical minimization of expectations). We close with two lectures that cover very different perspectives of online learning, known widely as "bandit" problems.

 

Week 6 - Nov 6 - Markov Decision processes - Javad Khazaei

This lecture will cover classical material on the convergence of Markov decision processes for discrete states and actions

Click here for lecture from 2013

References:

Powell, Approximate dynamic programming, Chapter 3

 

 

Week 7 - Nov 13 - Stochastic programming and Benders decomposition - Tsvetan Asamov

Click here for lecture from 2013.

References:

Higle and Sen, Stochastic decomposition, 1991 (two stage).

Sen and Zhou - Multistage Stochastic Decomposition (working paper, 2012)

 

Other references on scenario trees (not sure there will be time to cover this material):

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 3.

Heitsch and Romisch, Scenario tree reduction for multistage stochastic programs, Computational Management Science, 2009

 

 

Week 8 - Nov 20 - ADP with value function approximations - Daniel Jiang

Click here for lecture from 2013.

References:

Strategies for approximating VFA's, how to perform updates, approximating while searching for an optimal policy.

References: This is covered in the following three chapters. Chapters 9 and 10 are the most important.

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy

Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy

 

Week 9 - Dec 4 - Stochastic optimization with risk measures - Tsvetan Asamov

Meet in CASTLE Lab

Click here for lecture from 2012.

References:

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 6.

 

Week 10 - Dec 13 (Friday!) - Multiarmed bandit problems - Si Chen

Gittins indices (theory, algorithms, approximations), knowledge gradient for bandit problems

Click here for lecture from 2012.

References:

Powell and Ryzhov, Optimal Learning, Chapter 6

Week 11 - January 8? - Optimal learning in the sciences - Kris Reyes

This lecture will focus on belief extraction and optimal learning in the context of applications in the sciences (although the concepts can be applied to many other applications).

 

 

Spring, 2013 - Reading course

Week 1 - Stochastic search

Click here for lecture from 2012

Readings:

Powell, W. B., Chapter 6 of Approximate Dynamic Programming, section 6.8 (2011). This section describes some classic convergence proofs for stochastic approximation theory that are central to ADP and RL.

Nati Srebro and Ambuj Tewari, Stochastic Optimization for Machine Learning, (2010), A presentation describing how the parameters of a machine learning model can be estimated using the tools of stochastic search.

Week 2 - Sample average approximation

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 5.

Kleywegt, Homem-de-Mello, Shapiro, Sample Average Approximation SIAM J. Optimization, 2001. - This is the paper that introduced the sample average approximation method.

 

Week 3 - Derivative-free stochastic optimization

Click here for lecture.

References:

Spall, 2003 (available from Prof. Powell).

Hu, Wang, Zhou, Fu, Marcus - Survey of model-based methods for global optimization - A good survey article for methods for optimizing non-convex functions.

Chang, Fu, Hu, and Marcus, Simulation-based optimization of Markov decision processes - See section 4.1 on model reference adaptive search.

Week 4 - Optimal learning methods in stochastic search

Offline and online learning, active learning, expected value of information methods, lookup table and parametric belief models.

Click here for lecture.

References:

Powell and Ryzhov, Optimal Learning, Chapter 4 - Ranking and selection (offline learning)

Powell and Ryzhov, Optimal Learning, Chapter 5 - Knowledge gradient

Powell and Ryzhov, Optimal Learning, Chapter 6 - Bandit problems (online learning)

Powell and Ryzhov, Optimal Learning, Chapter 8 - Linear belief models

 

Week 5 - Stepsizes and optimal learning rates

Click here for lecture.

References:

Powell, Approximate dynamic programming, Chapter 11 - Stepsizes

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, Machine Learning, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This is a new paper, which generalizes the optimal stepsize formula in the George et al paper above specifically for the setting of dynamic programs.

 

 

Week 6 - Markov Decision processes

This lecture will cover classical material on the convergence of Markov decision processes for discrete states and actions

References:

Powell, Approximate dynamic programming, Chapter 3

 

 

Week 7 - ADP with value function approximations

Click here for lecture.

References:

Strategies for approximating VFA's, how to perform updates, approximating while searching for an optimal policy.

References: This is covered in the following three chapters. Chapters 9 and 10 are the most important.

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy

Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy

 

 

----------------------------------------------------------

Spring, 2012 - This version of the course featured weekly speakers. Each lecture was scribed, and there are supporting readings.

The course is designed to be a comprehensive treatment of major topics in stochastic optimization, presented in the form of a series of roughly 2-hour presentations by graduate students and post-doctoral associates.

The course is roughly divided between stochastic search (often called two-stage stochastic optimization) and sequential decision problems (multistage). The topics cover most of the major threads in stochastic optimization that are being pursued in the research community. The discussion will span basic modeling, the design of a wide range of practical algorithms, and an introduction to key concepts of convergence with an emphasis on Monte Carlo-based algorithms.

At the end of the list of lectures is a series of additional readings that will grow as new topics are added.

 

 

Part I - Stochastic search (two-stage)

The first half of the course focuses on stochastic optimization where we have to choose a deterministic design variable (this could be scalar, categorical or vector-valued). We consider derivative-based and derivative-free search procedures, optimal learning rates and methods from the optimal learning literature. We make the tie-in to multistage problems in the form of choosing deterministic policies.

 

Week 1 - February 9 - Overview of sequential stochastic optimization problems - Warren Powell

How to model, major classes of policies, the link between two-stage and multistage stochastic optimization

References:

Powell, Approximate dynamic programming, Chapter 5 - Modeling

Powell, Approximate dynamic programming, Chapter 6 - Policies

 

Week 2 - February 16 - Classical stochastic convergence proof for stochastic gradient methods - Warren Powell (scribe: Ricardo Collado)

Click here for lecture.

References:

Powell, W. B., Chapter 6 of Approximate Dynamic Programming, section 6.8. This section describes some classic convergence proofs for stochastic approximation theory that are central to ADP and RL.

J. Spall, Introduction to Stochastic Search and Optimization

Week 3 - February 23 - Derivative-free stochastic optimization - Dan Jiang (scribe: Max Goer)

Click here for lecture.

References:

Spall, 2003 (available from Prof. Powell).

Hu, Wang, Zhou, Fu, Marcus - Survey of model-based methods for global optimization - A good survey article for methods for optimizing non-convex functions.

Chang, Fu, Hu, and Marcus, Simulation-based optimization of Markov decision processes - See section 4.1 on model reference adaptive search.

 

 

Week 4 - March 1 - Optimal learning methods in stochastic search - Max Goer (scribe: Dan Jiang)

Offline and online learning, active learning, expected value of information methods, lookup table and parametric belief models.

Click here for lecture.

References:

Powell and Ryzhov, Optimal Learning, Chapter 4 - Ranking and selection

Powell and Ryzhov, Optimal Learning, Chapter 5 - Knowledge gradient

 

Week 5 - March 8 - Stepsizes and optimal learning rates - DH Lee (scribe: Vincent Pham)

Click here for lecture.

References:

Powell, Approximate dynamic programming, Chapter 11 - Stepsizes

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, Machine Learning, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This is a new paper, which generalizes the optimal stepsize formula in the George et al paper above specifically for the setting of dynamic programs.

 

Week 6 -March 15 - Perturbation analysis in optimization - Boris Defourny (scribe: Ethan Fang)

Click here for lecture.

References:

Primary reference is the tutorial article:

J.F. Bonnans, A. Shapiro, Optimization problems with perturbations: A guided tour, SIAM Review, Vol. 40(2), pp. 228-264, 1998.

For a more in-depth treatment, there is a book:

J.F. Bonnans, A. Shapiro, Perturbation Analysis of Optimization Problems, Springer, 2000 (primarily chapter 4)

 

 

Part II - Multistage stochastic optimization

We make the transition to multistage problems where decisions have to be made over time. We do this by covering lookahead polies (tree search, model predictive control, stochastic programming), and then move to dynamic programming (policies based on value function approximations), starting with classical exact results from the MDP literature. We have a lecture on stochastic optimization in the presence of risk measures (rather than classical minimization of expectations). We close with two lectures that cover very different perspectives of online learning, known widely as "bandit" problems.

 

Week 7 - March 29 - Stochastic programming and Benders decomposition - Ethan Fang (scribe: Daniel Salas)

Click here for lecture.

References:

Higle and Sen, Stochastic decomposition, 1991 (two stage).

Sen and Zhou - Multistage Stochastic Decomposition (working paper, 2012)

 

Other references on scenario trees (not sure there will be time to cover this material):

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 3.

Heitsch and Romisch, Scenario tree reduction for multistage stochastic programs, Computational Management Science, 2009

 

Week 8 - April 5 - Markov Decision processes - Vincent Pham (scribe: Daniel Condro)

This lecture will cover classical material on the convergence of Markov decision processes for discrete states and actions

Click here for lecture

References:

Powell, Approximate dynamic programming, Chapter 3

 

Week 9 - April 12 - ADP with value function approximations - Daniel Salas (scribe: DH Lee)

Click here for lecture.

References:

Strategies for approximating VFA's, how to perform updates, approximating while searching for an optimal policy.

References: This is covered in the following three chapters. Chapters 9 and 10 are the most important.

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy

Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy

 

Week 10 - April 19 - Stochastic optimization with risk measures - Ricardo Collado

Click here for lecture.

References:

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 6.

 

Week 11 - April 26 - Multiarmed bandit problems - Daniel Condro

Gittins indices (theory, algorithms, approximations), knowledge gradient for bandit problems

Click here for lecture.

References:

Powell and Ryzhov, Optimal Learning, Chapter 6

Week 12 - May 3 - Regret bounds for bandit problems - Sebastien Bubeck

Seb's tutorial page: http://www.princeton.edu/~sbubeck/tutorial.html

 

 

 

 

A collection of readings

Below is a collection of readings relevant to the field of computational stochastic optimization.

 

Books/monographs/book chapters/surveys - These represent general purpose references (limited to what is available for download)

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. - This is classical stochastic programming from the perspective of operations research.

R. Sutton and A. Barto, Reinforcement Learning, MIT Press, 1998. - This is the bible of reinforcement learning - the first book to introduce the field. It is aging, but remains a good source of the vocabulary and many of the fundamental concepts, available on the internet.

Szepesvari - Algorithms for reinforcement learning - A more modern but advanced monograph on reinforcement learning written by one of the top theoreticians in reinforcement learning.

Chang, Fu, Hu, and Marcus, Simulation-based optimization of Markov decision processes, - A research-oriented monograph covering different Monte Carlo-based algorithms for stochastic optimization (single stage and multistage), written in a voculary that is a hybrid of control theory and classical MDP.

D.P. Bertsekas, Dynamic programming and optimal control, Chapter 6, Approximate Dynamic Programming - This is available directly from Bertsekas' website, and he updates it from time to time. This is an in-depth treatment of ADP with an emphasis on theory over modeling and computation. The book uses control theory notation, but otherwise it is heavily influenced by the reinforcement learning community.

D.P. Bertsekas - A survey of approximate policy iteration

 

Selected chapters from: Powell, Approximate Dynamic Programming: Solving the curses of dimensionality:

Powell, Approximate dynamic programming, Chapter 3 - A quick overview of classical discrete Markov decision processes

Powell, Approximate dynamic programming, Chapter 4 - Introduction to ADP - This is basically an introduction to reinforcement learning.

Powell, Approximate dynamic programming, Chapter 5 - Modeling - The right way to model a dynamic program.

Powell, Approximate dynamic programming, Chapter 6 - Policies - A presentation of the four fundamental classes of policies.

Powell, Approximate dynamic programming, Chapter 7 - Policy search

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy

Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy

 

Sample average approximation

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 5.

Kleywegt, Homem-de-Mello, Shapiro, Sample Average Approximation SIAM J. Optimization, 2001. - This is the paper that introduced the sample average approximation method.

 

Stochastic search

Powell, W. B., Chapter 6 of Approximate Dynamic Programming, section 6.8 (2011). This section describes some classic convergence proofs for stochastic approximation theory that are central to ADP and RL.

Nati Srebro and Ambuj Tewari, Stochastic Optimization for Machine Learning, (2010), A presentation describing how the parameters of a machine learning model can be estimated using the tools of stochastic search.

 

Lookahead policies

Kearns, Mansour, Ng , "A sparse sampling algorithm for near-optimal planning in large Markov decision processes," IJCAI 1999 - This paper develops bounds for a method of tree search which trims the tree using Monte Carlo sampling.

Reinforcement learning

R. Sutton and A. Barto, Reinforcement Learning, MIT Press, 1998. - This is the bible of reinforcement learning - the first book to introduce the field. It is aging, but remains a good source of the vocabulary and many of the fundamental concepts, available on the internet.

Learning the value of a fixed policy

Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3, 9-44. Retrieved from http://www.springerlink.com/index/n386vu21527p0r7m.pdf. \cite{Sutton1988a}

Rich Sutton was one of the founders of the field of reinforcement learning which started around 1980. This paper introduced the seminal idea of temporal difference learning, which is really a form of approximate value iteration for a fixed policy.

 

Tsitsiklis, J. N., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690. \cite{Tsitsiklis1997b}

Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pp, 30-37. \cite{Baird1995}

Bradtke, S. J., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning (Vol. 22, p. 33–57). Springer. Retrieved from http://www.springerlink.com/index/V2Q39464028662PR.pdf. \cite{Bradtke1996}

Tadic, V. (2001). On the Convergence of Temporal-Difference Learning with Linear Function Approximation, 42, 241-267. \cite{tadic2001}

Extends Tsitsiklis and van Roy (1997) to general state spaces.

 

Precup, D., Sutton, R. S., & Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In 19th International Conference on Machine Learning (p. 417–424). \cite{precup2001}

“First algorithm for off-policy TD learning with linear function approximation.”

A. Nedic and D. P. Bertsekas, "Least Squares Policy Evaluation Algorithms with Linear Function Approximation", Discrete Event Dynamic Systems: Theory and Applications, 13:79-110, 2003.

Sutton, R. S., Ari, S., Cs., M., & R, H. (2009). A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. In Advances in Neural Information Processing Systems, 21.

Maei, H., Szepesvari, C., Bhatnagar, S., Silver, D., Precup, D., Sutton, R. S., et al. (2009). Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. In Advances in Neural Information Processing Systems 22 (pp. 1-9). Vancouver, BC: MIT Press. \cite{Maei2009}

Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, C., et al. (2009). Fast gradient-descent methods for temporal-difference learning with linear function approximation. Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09, 1-8. New York, New York, USA: ACM Press. \cite{Sutton2009}

 

Optimal learning rates for estimating functions

Ryzhov, P. I. Frazier and W. B. Powell, “Stepsize Selection for Approximate Value Iteration and a New Optimal Stepsize Rule”

Even-dar, E., & Mansour, Y. (2003). Learning rates for Q-learning. Journal of Machine Learning Research, Vol, 5, pp. 1-25.


Learning while optimizing

Bradtke, S. J., Ydestie, B. E., & Barto, A. G. (1994). Adaptive linear quadratic control using policy iteration. in Proc. Amer. Control Conf., Baltimore, MD, Jun, 3475-3476. \cite{Bradtke1994a}

Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16, 185-202. \cite{Tsitsiklis1994}
This is the seminal paper that merged reinforcement learning (approximate dynamic programming) with the convergence theory of stochastic approximation.

Gordon, G. (1995). Stable function approximation in dynamic programming. Citeseer. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.4595&rep=rep1&type=pdf. \cite{Gordon1995a}

Tsitsiklis, J. N., & Roy, B. (1996). Feature-based methods for large scale dynamic programming. Machine Learning, 22(1-3), 59-94. doi: 10.1007/BF00114724. \cite{Tsitsiklis1996c}

Landelius, T., & Knutsson, H. (1997). Greedy adaptive critics for LQR problems: Convergence proofs. Neural Computation, 1-20. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.3939. \cite{Landelius1996}

Singh, S., Jaakkola, T., Szepesvari, C., & Littman, M. (2000). Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3), 287–-308. Springer. \cite{SiJaLiSz00}

Gordon, G. (2001). Reinforcement learning with function approximation converges to a region. In In Advances in Neural Information Processing Systems. \cite{Gordon2001a}

Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161-178. \cite{OrSe02}

Antos, A., Szepesvari, C., & Munos, R. (2007). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89-129.\cite{Antos2007a}

Antos, A., Szepesvari, C., & Munos, R. (2007). Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory. 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 330-337. \cite{Antos2007}

Munos, R., & Szepesvari, C. (2008). Finite-Time Bounds for Fitted Value Iteration. Journal of Machine Learning Research, 1, 815-857. \cite{munos2008}

Antos, A., Munos, R., & Szepesvari, C. (2008). Fitted Q-iteration in continuous action-space MDPs. In Advances in neural information processing systems (Vol. 20, p. 9–16). Citeseer. \cite{Antos2008}

Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. Proceedings of the 25th international conference on Machine learning - ICML '08, 664-671. New York, New York, USA: ACM Press. \cite{Melo2008}

Wang, F-Y, Jin, N. Liu, D. Wei, Q, Adaptive Dynamic programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems with Epsilon-Error Bound, IEEE Transactions on Neural Networks, Vol 22, No. 1, pp. 24-36, 2011.

 

Stochastic optimization with different objective functions:

Robust optimization

Bandi, C., & Bertsimas, D. J. (2012). Tractable stochastic analysis in high dimensions via robust optimization. Mathematical Programming. doi:10.1007/s10107-012-0567-2

Bertsimas, D. J., Nohadani, O., & Teo, K. M. (2010). Robust Optimization for Unconstrained Simulation-Based Problems. Operations Research, 58(1), 161–178. doi:10.1287/opre.1090.0715

 

Reliability constrained optimization

Bordley, R. F., & Pollock, S. M. (2009). A Decision-Analytic Approach to Reliability-Based Design Optimization. Operations Research, 57(5), 1262–1270. doi:10.1287/opre.1080.0661

 

Optimization of convex risk measures

Ruszczynski, A. (2010). Risk-averse dynamic programming for Markov decision processes. Mathematical Programming, 125(2), 235–261. doi:10.1007/s10107-010-0393-3

Ruszczynski, A., & Shapiro, A. (2006). Optimization of Convex Risk Functions. Mathematics of Operat, 31(3), 433–452. doi:10.1287/moor.1050.0186

Shapiro, Dentcheva, Ruszczynski, Lecture notes on stochastic programming, 2009. Chapter 6.

 

Quantile optimization

Kim, J. H., Powell, W. B., & Collado, R. A. (n.d.). Quantile optimization for heavy-tailed distributions using asymmetric signum functions.