**ORF 544**

**(formerly ORF 569)**

Graduate seminar on

Organized by Warren Powell

Below are the readings and lecture notes for a series of graduate seminars on the broad topic of computational stochastic optimization.

Spring, 2013 - This year it is being taught as a reading course.

Collection of additional readings

Spring, 2016 (Informal reading course)

The course will be taught as a reading course, in preparation for its introduction as a full course, ORF 544, for the fall of 2016.

The theme of the course is "Computational Stochastic Optimization," which is to say that we will be teaching stochastic optimization with a focus on proper modeling, and the design and analys of practical algorithms. The course will cut across the many communities that contribute to the broad field of stochastic optimization (see the "jungle" webpage).

A major goal of the course is to provide students with a strong foundation in terms of the following:

- The major problem classes in stochastic optimization:
- Types of state variables
- Types of decisions
- Types of uncertainty that arises in real applications, and how to model them
- What we know about transition functions
- Types of objective functions

- Formulating the optimization problem
- From deterministic decisions to adapted policies

- Proper notation (esp. with respect to modeling information)
- A unified framework
- Staging: from static to sequential to contextual
- Timing: from finite horizon to infinite horizon
- From offline to online

- The major solution strategies:
- Stochastic (policy) search
- Analytical policy function approximations
- Parametric cost function approximations

- Lookahead approximations
- Solving an approximate lookahead model
- Using value function approximations

- Stochastic (policy) search
- The use of Monte Carlo simulation methods
- Solution via the use of sampled models
- Search via sample path methods (e.g. stochastic gradients)

- Uncertainty operators
- Types (expectations, risk measures, robust operators)
- Effect on modeling and computation.

We will illustrate ideas through a combination of standard models and real applications. Some of the standard models will include:

- Newsvendor problem
- Inventory/storage problems
- Optimal stopping
- Stochastic shortest paths
- Stochastic linear programs

Real applications will be drawn based on the interests of the class.

Each lecture is accompanied by a series of readings. The idea is to provide an initial bibliography that will serve as an introduction to each of the topics. While these readings have been carefully chosen, there is much more here than can be absorbed in a single course.

**Week 1 - Feb 11 - Overview of computational stochastic optimization**

Topics:

- Major problem classes
- Static (decision-information)
- Static-contextual (information-decision-information)
- Two-stage (decision-information-decision)
- Fully sequential
- Finite horizon
- Infinite horizon

- Notational systems
- Offline and online optimization
- Types of uncertainty
- Classes of uncertainty
- Types of distributions

- Handling uncertainty - Expectations, risk-measures and robust optimization
- Objectives
- Expected cost/reward
- Discounted
- Average reward

- Risk measures
- Coherent risk measures
- Time consistency

- Robust optimization
- Opportunity cost
- Regret
- Probability correct selection
- ....

- Expected cost/reward
- The myths of stochastic optimization

Readings:

*Warren B. Powell. “Clearing the Jungle of Stochastic Optimization,” Informs Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, http://dx.doi.org/10.1287/educ.2014.0128 (c) Informs - [Read up through section 5.]

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy I: Modeling and Policies", IEEE Trans. on Power Systems (to appear) Summarizes the modeling framework and four classes of policies, contrasting the notational systems and canonical frameworks of different communities.

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy II: An energy storage illustration," IEEE Trans. on Power Systems (to appear) Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. A fifth problem shows that in some cases a hybrid policy is needed. [Focus on the energy modeling example - illustrates how to model a problem, and shows that any of the four classes of policies may work best, depending on the data].

The following three chapters are the major modeling chapters from the third edition of my ADP book:

*Chapter 5 - Modeling - This provides the full framework for modeling sequential decision problems. Major change from the second edition is the discussion of objective functions toward the end. [Very important - this is the primary resource for how to model sequential problems. Be sure to pay special attention to section 5.11.]

*Chapter 6 - Modeling uncertainty - This is entirely new material focusing purely on modeling uncertainty (a topic often referred to as "uncertainty quantification"). [Read section 6.1]

*Chapter 7 - Policies - This chapter describes two fundamental strategies for designing policies: direct policy search (optimizing over classes of functions), and policies based on lookaheadmodels. Each of these are then divided into two strategies, creating four classes of policies. These are illustrated using the energy storage example presented in chapter 5 (section 5.8).

**Week 2 - Feb 18 - Derivative-based stochastic optimization**

Topics:

- Problem classes
- Newsvendor
- General nonlinear (convex) optimization
- Linear programs

- Stochastic approximation methods
- Asymptotic convergence analysis
- Finite-time formulation and analysis

Derivative-based stochastic optimization is both an important problem in its own right (static stochastic optimization), as well as the foundational problem for performing policy search for sequential problems. Furthermore, finite-horizon versions can themselves be formulated as dynamic programs.

Readings:

W. B. Powell, Approximate Dynamic Programming - Chapter 8 - Policy search - Sections 8.3 and 8.9 (skim 8.1-8.2)

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming*, 2nd edition (2014). - Section 5.9.

Mark Broadie, Deniz Cicek, Assaf Zeevi - General Bounds and Finite-Time Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm - This paper relaxes a key condition for stepsizes, namely the requirement that the sum of the squares of stepsizes has to be bounded.

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, *Machine Learning*, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This paper builds on the George and Powell paper, but uses the setting of a dynamic program with a single state and action to derive an optimal stepsize for approximate value iteration algorithms (including Q-learning).

**Week 3 - Feb 25 - Derivative-free stochastic optimization** I

Topics:

- Optimization models
- Offline (ranking and selection)
- Online ("multiarmed bandit problems")

- Problem classes
- Discrete alternatives
- Continuous actions
- Contextual problems
- Stationary/nonstationary

- Belief models
- Lookup table
- Bayesian vs. frequentist

- Formulation as a dynamic program
- For offline objective
- For online objective

- Policies for offline learning
- PFAs - Boltzmann (soft max), epsilon-greedy
- CFAs - Interval estimation, UCB
- VFAs - Value function approximation of knowledge state
- Lookaheads:
- Thompson sampling
- Value of information policies (EI, knowledge gradient, kriging)

- Finite time vs. asymptotic analysis

Derivative-free stochastic optimization is the companion of derivative-based stochastic optimization, and represents an important class of problems, both in their own right, and as a strategy for performing policy search for sequential problems (dynamic programs).

Offline versions of this problem are known in the literature as "ranking and selection" problems. Online versions (where costs/rewards are accumulated as the search progresses), are widely known as "multiarmed bandit problems." Both can be formulated using the classical language of dynamic programming with appropriate objective functions.

Readings for offline learning:

For a general introduction, see:

Powell and Ryzhov, Optimal Learning - Chapter 4 - The ranking and selection problem

Powell and Ryzhov, Optimal Learning - Chapter 5 - The knowledge gradient policy

For a sample of the OCBA literature for simulation-optimization:

Chun-Hung Chen, Donghai He, Michael Fu, Loo Hay Lee, Efficient Simulation Budget Allocation for Selecting an Optimal Subset - This is a sample of the literature for optimizing across a discrete set of designs using Monte Carlo simulation. This is a frequentist methodology, designed for efficiently using finite computing resources to manage expensive simulations of a relatively small set of designs.

A summary of a number of heuristics and a their experimental evaluation, see:

Wang, Powell - Finite-time analysis for the knowledge-gradient policy and a

new testing environment for optimal learning - Focus on section 6 and onward, describing the MOLTE testing environment.

For a nice theoretical analysis of the knowledge gradient policy, see:

P. Frazier, W. Powell, S. Dayanik - A Knowledge Gradient Policy for Sequential Information Collection - This paper provides a rigorous model and the asymptotic analysis of the knowledge gradient policy.

Readings for online learning:

Powell and Ryzhov, Optimal Learning - Chapter 6 - Bandit problems (Gittins indices, knowledge gradient for online learning, UCB policies)

Bubeck and Cesa-Bianchi - Regret Analysis of Stochastic and Non-Stochastic Multi-Armed Bandit Problems - Read Sections 1, 2 and skim 7.

**Week 4 - March 3 - Derivative-free stochastic optimization II**

Topics:

- More advanced belief models
- Linear models

- Nonlinear models
- Semi-parametric and nonparametric models
- Approximating the knowledge gradient
- Sampled belief model
- Resampling
- Laplace approximation

- Exploration vs. exploitation using parametric belief models
- Balancing optimization and parameter estimation
- Optimal computing budget allocation (OCBA) for simulation optimization

D. Negoescu, P. Frazier and W. B. Powell, “The Knowledge Gradient Algorithm for Sequencing Experiments in Drug Discovery,” *Informs Journal on Computing*, Vol. 23, No. 3, pp. 346-363, 2011. (Online supplement) - Linear belief model

Si Chen, K-R G. Reyes, M. Gupta, M. C. McAlpine, W. B. Powell, “Optimal Learning in Experimental Design Using the Knowledge Gradient Policy with Application to Characterizing Nanoemulsion Stability,” SIAM J. Uncertainty Quantification, Vol. 3, pp. 320-345, 2015. - Nonlinear belief model with discrete prior.

Y. Wang, W.B. Powell, Knowledge Gradient for Sequential Decisions with Stochastic Binary Feedbacks - This paper illustrates using the Laplace approximation to handle a nonlinear belief model. This represents an alternative approximation strategy to the use of discrete priors.

Kobby's work on exploration/exploitation with parametric belief models

Xinyu's work on balancing optimization and parameter estimation

**Week 5 - March 10 - Lookahead models and sampled approximations - Static/two-stage**

Topics:

- Sampled approximations
- Asymptotic analysis

- Lookahead models
- Thompson sampling
- Sample average approximation

Readings:

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming*, 2nd edition (2014). - 5.3, 5.4, 5.5, 5.10

Nemirovski, Juditsky, Lan and Shapiro - Robust Stochastic Approximation Approach to Stochastic Programming - A comparison of stochastic approximation methods to the sample average approximation method.

**Week 6 - March 17 - Dyamic programming and Markov decision processes**

Topics:

- Overview of sequential problems (with discrete actions)
- Inventory/storage problems
- Stopping problems
- Stochastic shortest paths
- Online vs. offline dynamic programs

- Modeling sequential decision problems
- State variables
- Initial state (context)
- Pre- and post-decision states

- Decisions/actions/controls
- Exogenous information
- Transition function
- Objective function

- State variables
- Major problem classes
- Horizon
- Finite/infinite horizon

- Objectives
- Expected cost/reward
- Undiscounted/discounted
- Average reward

- Risk measures
- Min max

- Expected cost/reward
- Online and offline formulations
- Model-free vs. model-based dynamic programming

- Horizon
- Bellman's optimality equations
- Solution strategies (for discrete MDPs)
- Backward MDP for finite horizon problems
- Value/policy iteration for infinite horizon

- Lookup tables and the curse of dimensionality

Powell, Approximate dynamic programming, Chapter 3 - 3.1-3.8, 3.9 (skim), 3.10.

**Week 7 - March 24 - Policy search for dynamic programming**

Topics:

- Learning parametric policy function approximations
- Relationship to classical stochastic search
- Derivative-free or derivative-based

- Learning parametric cost function approximations
- Cost function correction terms
- Constraint modification
- Computing derivatives

- Learning neural networks

Deisenroth, M. P., Neumann, G., & Peters, J. (2011). A Survey on Policy Search for Robotics. *Foundations and Trends in Machine Learning*, *2*(1-2), 1–142. - A nice introduction to policy search, but primarily in the context of deterministic control problems.

**Week 8 - March 31 - Approximate Dynamic Programming**

Topics:

- Value function approximations
- Approximate value iteration using forward DP
- Approximations using backward DP

- Approximation strategies
- Lookup table/nonparametric
- Parametric
- Low-rank approximations

- Approximate policy iteration
- Actor-critic methods
- Active learning strategies in ADP
- Online vs. offline learning

Readings:

J.N. Tsitsiklis, Asynchronous Stochastic Optimization and Q-Learning - This was the first paper to bridge reinforcement learning/approximate dynamic programming with the mathematics of stochastic approximation methods

J.N. Tsitsiklis, Ben van Roy - Analysis of Temporal-Difference Learning with Functional Evaluation - Temporal-difference learning refers to an algorithmic strategy where we learn the value of a fixed policy (no optimization taking place).

Books and monographs:

Bertsekas, D. P., & Tsitsiklis, J. N. (1996). *Neuro-Dynamic Programming*. Belmont, MA: Athena Scientific. - This book is the first thorough treatment of approximate dynamic programming using the mathematics of stochastic approximation procedures.

R. Sutton and A. Barto, Reinforcement Learning, MIT Press, 2012 (second edition). - This is the bible of reinforcement learning - the first book to introduce the field. It illustrates the style and vocabulary of the field of reinforcement learning.

Szepesvari - Algorithms for reinforcement learning - A more modern but advanced monograph on reinforcement learning written by one of the top theoreticians in reinforcement learning.

D.P. Bertsekas, *Dynamic programming and optimal control*, Chapter 6, Approximate Dynamic Programming - This is a pre-publication version of a very thorough treatment of a class of ADP algorithms that later appeared in his update of his optimal control book.

**Week 9 - April 7 - Stochastic programming and Benders decomposition for convex optimization**

Topics:

- Two-stage stochastic programs
- Basic model
- Stochastic decomposition
- Regularization methods for two-stage problems

- Benders decomposition for multistage decomposition
- Modeling multistage linear programs
- The exogenous information process
- Interstage independence
- Markov models

- The stochastic dual decomposition method (SDDP)

Higle Sen - Stochastic decomposition: An algorithm for two-stage stochastic programs with recourse - The first use of Monte Carlo-based samples of Benders cuts

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming*, 2nd edition (2014). - Section 3, 5.10

Shapiro - Analysis of the stochastic dual dynamic programming method - Nice illustration of SDDP for the Brazilian hydro system using integerstage independence

Philpott Guan - On the convergence of stochastic dual dynamic programming and related methods - Nice paper in OR letters overcoming the weaknesses of prior proofs.

Girardeau, Leclere, Philpott - On the Convergence of Decomposition Methods for Multistage Stochastic Convex Programs - Much more theoretical analysis of temporal decomposition methods for dynamic programs.

**Week 10 - April 14 - Lookahead models and sampled approximations II - Multistage problems**

Topics:

- Modeling lookahead policies
- Approximation strategies for lookahead models
- Base models vs. lookahead models
- Model predictive control (deterministic lookahead)
- Stochastic DP lookahead using latent variables
- Sampled lookahead models
- Monte Carlo tree search

D.P. Bertsekas, Dynamic Programming and Suboptimal Control: A survey from ADP to MPC.

Browne et al - Survey of Monte Carlo tree search methods - Classical treatment of MCTS (for deterministic applications)

Auger, David, Adrien Couëtoux, and Olivier Teytaud. "Continuous upper confidence trees with polynomial exploration–consistency." *Machine Learning and Knowledge Discovery in Databases*. Springer Berlin Heidelberg, 2013. 194-209. - Addresses Monte Carlo tree search in a stochastic setting.

Mulvey, Vanderbei and Zenios - Robust Optimization of Large-Scale Systems - One of the first uses of stochastic optimization with scenario trees for multistage linear programs.

Heitsch and Romisch, Scenario tree reduction for multistage stochastic programs, Computational Management Science, 2009 - This is part of a growing body of research focusing on how to generate effective samples.

**Week 11 - April 21 - Risk-based and robust stochastic optimization**

**Week 12 - April 28 - Buffer week **

Lectures will be scribed and posted as the course procees. Additional reading materials are also posted with each lecture. I have posted the scribed lectures from 2012. These will be replaced with the lecture given this semester as it becomes available.

Assignments of people to topics are tentative. New topics can be added, and the topics below can be tilted to fit the interests of the speaker.

**Prof. Powell will introduce the fields of stochastic search and dynamic programming to bring out the relationship between these fields. He will also present a modeling framework for stochastic dynamic programs, discuss state variables, and time permitting, discuss some current research challenges.**

Click here for Prof. Powell's opening lecture.

References:

Powell, *Approximate dynamic programming*, Chapter 5 - Modeling

Powell, *Approximate dynamic programming*, Chapter 6 - Policies

Click here for lecture from 2013

Readings:

Powell, W. B., Chapter
6 of *Approximate Dynamic Programming*, section 6.8 (2011). This section
describes some classic convergence proofs for stochastic approximation theory
that are central to ADP and RL.

Nati Srebro and Ambuj Tewari, Stochastic Optimization for Machine Learning, (2010), A presentation describing how the parameters of a machine learning model can be estimated using the tools of stochastic search.

Mark Broadie, Deniz Cicek, Assaf Zeevi - General Bounds and Finite-Time Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm - This paper relaxes a key condition for stepsizes, namely the requirement that the sum of the squares of stepsizes has to be bounded.

Click here for lecture from 2013.

References:

Spall, 2003 (available from Prof. Powell).

Hu, Wang, Zhou, Fu, Marcus - Survey of model-based methods for global optimization - A good survey article for methods for optimizing non-convex functions.

Chang, Fu, Hu, and Marcus, *Simulation-based optimization of Markov decision processes* - See section 4.1 on model reference adaptive search.

**No class on Wednesday, October 9 due to conflict with Informs annual meeting.**

Offline and online learning, active learning, expected value of information methods, lookup table and parametric belief models.

Click here for lecture from 2013.

References:

Powell and Ryzhov, *Optimal Learning*, Chapter 4 - Ranking and selection (offline learning)

Powell and Ryzhov, *Optimal Learning*, Chapter 5 - Knowledge gradient

Powell and Ryzhov, *Optimal Learning*, Chapter 6 - Bandit problems (online learning)

Powell and Ryzhov, *Optimal Learning, *Chapter 8 - Linear belief models

**Week 5 - October 23 - Stepsizes and optimal learning rates** - Meriton Ibraimi

Click here for lecture from 2013.

References:

Powell, Approximate dynamic programming, Chapter 11 - Stepsizes

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, *Machine Learning*, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This is a new paper, which generalizes the optimal stepsize formula in the George et al paper above specifically for the setting of dynamic programs.

We make the transition to multistage problems where decisions have to be made over time. We do this by covering lookahead polies (tree search, model predictive control, stochastic programming), and then move to dynamic programming (policies based on value function approximations), starting with classical exact results from the MDP literature. We have a lecture on stochastic optimization in the presence of risk measures (rather than classical minimization of expectations). We close with two lectures that cover very different perspectives of online learning, known widely as "bandit" problems.

**Week 6 - Nov 6 - Markov Decision processes** - Javad Khazaei

This lecture will cover classical material on the convergence of Markov decision processes for discrete states and actions

Click here for lecture from 2013

References:

Powell, Approximate dynamic programming, Chapter 3

**Week 7 - Nov 13 - Stochastic programming and Benders decomposition** - Tsvetan Asamov

Click here for lecture from 2013.

References:

Higle and Sen, Stochastic decomposition, 1991 (two stage).

Sen and Zhou - Multistage Stochastic Decomposition (working paper, 2012)

Other references on scenario trees (not sure there will be time to cover this material):

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* Chapter 3.

**Week 8 - Nov 20 - ADP with value function approximations** - Daniel Jiang

Click here for lecture from 2013.

References:

Strategies for approximating VFA's, how to perform updates, approximating while searching for an optimal policy.

References: This is covered in the following three chapters. Chapters 9 and 10 are the most important.

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy

Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy

**Week 9 - Dec 4 - Stochastic optimization with risk measures** - Tsvetan Asamov

**Meet in CASTLE Lab**

Click here for lecture from 2012.

References:

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* Chapter 6.

**Week 10 - Dec 13 (Friday!) - Multiarmed bandit problems** - Si Chen

Gittins indices (theory, algorithms, approximations), knowledge gradient for bandit problems

Click here for lecture from 2012.

References:

Powell and Ryzhov, Optimal Learning, Chapter 6

**Week 11 - January 8? - Optimal learning in the sciences - Kris Reyes**

This lecture will focus on belief extraction and optimal learning in the context of applications in the sciences (although the concepts can be applied to many other applications).

Click here for lecture from 2012

Readings:

Powell, W. B., Chapter
6 of *Approximate Dynamic Programming*, section 6.8 (2011). This section
describes some classic convergence proofs for stochastic approximation theory
that are central to ADP and RL.

Nati Srebro and Ambuj Tewari, Stochastic Optimization for Machine Learning, (2010), A presentation describing how the parameters of a machine learning model can be estimated using the tools of stochastic search.

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* Chapter 5.

Kleywegt, Homem-de-Mello, Shapiro, Sample Average Approximation SIAM J. Optimization, 2001. - This is the paper that introduced the sample average approximation method.

**Week 3 - Derivative-free stochastic optimization**

References:

Spall, 2003 (available from Prof. Powell).

Hu, Wang, Zhou, Fu, Marcus - Survey of model-based methods for global optimization - A good survey article for methods for optimizing non-convex functions.

Chang, Fu, Hu, and Marcus, *Simulation-based optimization of Markov decision processes* - See section 4.1 on model reference adaptive search.

**Week 4 - Optimal learning methods in stochastic search**

Offline and online learning, active learning, expected value of information methods, lookup table and parametric belief models.

References:

Powell and Ryzhov, *Optimal Learning*, Chapter 4 - Ranking and selection (offline learning)

Powell and Ryzhov, *Optimal Learning*, Chapter 5 - Knowledge gradient

Powell and Ryzhov, *Optimal Learning*, Chapter 6 - Bandit problems (online learning)

Powell and Ryzhov, *Optimal Learning, *Chapter 8 - Linear belief models

**Week 5 - Stepsizes and optimal learning rates**

References:

Powell, Approximate dynamic programming, Chapter 11 - Stepsizes

A. George, W. B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, *Machine Learning*, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This is a new paper, which generalizes the optimal stepsize formula in the George et al paper above specifically for the setting of dynamic programs.

**Week 6 - Markov Decision processes**

This lecture will cover classical material on the convergence of Markov decision processes for discrete states and actions

References:

Powell, Approximate dynamic programming, Chapter 3

**Week 7 - ADP with value function approximations**

References:

Strategies for approximating VFA's, how to perform updates, approximating while searching for an optimal policy.

References: This is covered in the following three chapters. Chapters 9 and 10 are the most important.

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy

Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy

----------------------------------------------------------

The course is designed to be a comprehensive treatment of major topics in stochastic optimization, presented in the form of a series of roughly 2-hour presentations by graduate students and post-doctoral associates.

The course is roughly divided between stochastic search (often called two-stage stochastic optimization) and sequential decision problems (multistage). The topics cover most of the major threads in stochastic optimization that are being pursued in the research community. The discussion will span basic modeling, the design of a wide range of practical algorithms, and an introduction to key concepts of convergence with an emphasis on Monte Carlo-based algorithms.

At the end of the list of lectures is a series of additional readings that will grow as new topics are added.

Part I - Stochastic search (two-stage)

The first half of the course focuses on stochastic optimization where we have to choose a deterministic design variable (this could be scalar, categorical or vector-valued). We consider derivative-based and derivative-free search procedures, optimal learning rates and methods from the optimal learning literature. We make the tie-in to multistage problems in the form of choosing deterministic policies.

**Week 1 - February 9 - Overview of sequential stochastic optimization problems** - Warren Powell

How to model, major classes of policies, the link between two-stage and multistage stochastic optimization

References:

Powell, *Approximate dynamic programming*, Chapter 5 - Modeling

Powell, *Approximate dynamic programming*, Chapter 6 - Policies

**Week 2 - February 16 - Classical stochastic convergence proof for stochastic gradient methods** - Warren Powell (scribe: Ricardo Collado)

References:

Powell, W. B., Chapter
6 of *Approximate Dynamic Programming*, section 6.8. This section
describes some classic convergence proofs for stochastic approximation theory
that are central to ADP and RL.

J. Spall, Introduction to Stochastic Search and Optimization

**Week 3 - February 23 - Derivative-free stochastic optimization** - Dan Jiang (scribe: Max Goer)

References:

Spall, 2003 (available from Prof. Powell).

Hu, Wang, Zhou, Fu, Marcus - Survey of model-based methods for global optimization - A good survey article for methods for optimizing non-convex functions.

Chang, Fu, Hu, and Marcus, *Simulation-based optimization of Markov decision processes* - See section 4.1 on model reference adaptive search.

**Week 4 - March 1 - Optimal learning methods in stochastic search** - Max Goer (scribe: Dan Jiang)

Offline and online learning, active learning, expected value of information methods, lookup table and parametric belief models.

References:

Powell and Ryzhov, *Optimal Learning*, Chapter 4 - Ranking and selection

Powell and Ryzhov, *Optimal Learning*, Chapter 5 - Knowledge gradient

**Week 5 - March 8 - Stepsizes and optimal learning rates** - DH Lee (scribe: Vincent Pham)

References:

Powell, Approximate dynamic programming, Chapter 11 - Stepsizes

*Machine Learning*, 2006. - An overview of popular deterministic and stochastic stepsize formulas, and a new "optimal" stepsize formula based on signal processing concepts.

I. O. Ryzhov, P. I. Frazier, W. B. Powell, Optimal Stepsizes for Approximate Dynamic Programming - This is a new paper, which generalizes the optimal stepsize formula in the George et al paper above specifically for the setting of dynamic programs.

**Week 6 -March 15 - Perturbation analysis in optimization** - Boris Defourny (scribe: Ethan Fang)

References:

Primary reference is the tutorial article:

For a more in-depth treatment, there is a book:

J.F. Bonnans, A. Shapiro, Perturbation Analysis of Optimization Problems, Springer, 2000 (primarily chapter 4)

We make the transition to multistage problems where decisions have to be made over time. We do this by covering lookahead polies (tree search, model predictive control, stochastic programming), and then move to dynamic programming (policies based on value function approximations), starting with classical exact results from the MDP literature. We have a lecture on stochastic optimization in the presence of risk measures (rather than classical minimization of expectations). We close with two lectures that cover very different perspectives of online learning, known widely as "bandit" problems.

**Week 7 - March 29 - Stochastic programming and Benders decomposition** - Ethan Fang (scribe: Daniel Salas)

References:

Higle and Sen, Stochastic decomposition, 1991 (two stage).

Sen and Zhou - Multistage Stochastic Decomposition (working paper, 2012)

Other references on scenario trees (not sure there will be time to cover this material):

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* Chapter 3.

**Week 8 - April 5 - Markov Decision processes** - Vincent Pham (scribe: Daniel Condro)

This lecture will cover classical material on the convergence of Markov decision processes for discrete states and actions

References:

Powell, Approximate dynamic programming, Chapter 3

**Week 9 - April 12 - ADP with value function approximations** - Daniel Salas (scribe: DH Lee)

References:

Strategies for approximating VFA's, how to perform updates, approximating while searching for an optimal policy.

References: This is covered in the following three chapters. Chapters 9 and 10 are the most important.

Powell, Approximate dynamic programming, Chapter 8 - Approximating value functions (basically an overview of popular statistical methods used in ADP)

Powell, Approximate dynamic programming, Chapter 9 - Approximating the value function for a fixed policy

Powell, Approximate dynamic programming, Chapter 10 - Approximating value functions while searching for an optimal policy

**Week 10 - April 19 - Stochastic optimization with risk measures** - Ricardo Collado

References:

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* Chapter 6.

**Week 11 - April 26 - Multiarmed bandit problems** - Daniel Condro

Gittins indices (theory, algorithms, approximations), knowledge gradient for bandit problems

References:

Powell and Ryzhov, Optimal Learning, Chapter 6

**Week 12 - May 3 - Regret bounds for bandit problems **- Sebastien Bubeck

Seb's tutorial page: http://www.princeton.edu/~sbubeck/tutorial.html

**Below is a collection of readings relevant to the field of computational stochastic optimization.**

**Books/monographs/book chapters/surveys - These represent general purpose references** (limited to what is available for download)

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* - This is classical stochastic programming from the perspective of operations research.

R. Sutton and A. Barto, Reinforcement Learning, MIT Press, 1998. - This is the bible of reinforcement learning - the first book to introduce the field. It is aging, but remains a good source of the vocabulary and many of the fundamental concepts, available on the internet.

Szepesvari - Algorithms for reinforcement learning - A more modern but advanced monograph on reinforcement learning written by one of the top theoreticians in reinforcement learning.

Chang, Fu, Hu, and Marcus, *Simulation-based optimization of Markov decision processes*, - A research-oriented monograph covering different Monte Carlo-based algorithms for stochastic optimization (single stage and multistage), written in a voculary that is a hybrid of control theory and classical MDP.

D.P. Bertsekas, *Dynamic programming and optimal control*, Chapter 6, Approximate Dynamic Programming - This is available directly from Bertsekas' website, and he updates it from time to time. This is an in-depth treatment of ADP with an emphasis on theory over modeling and computation. The book uses control theory notation, but otherwise it is heavily influenced by the reinforcement learning community.

D.P. Bertsekas - A survey of approximate policy iteration

**Selected chapters from: Powell, Approximate Dynamic Programming: Solving the curses of dimensionality**:

Powell, Approximate dynamic programming, Chapter 3 - A quick overview of classical discrete Markov decision processes

Powell,

Approximate dynamic programming, Chapter 4 - Introduction to ADP - This is basically an introduction to reinforcement learning.Powell,

Approximate dynamic programming, Chapter 5 - Modeling - The right way to model a dynamic program.Powell,

Approximate dynamic programming, Chapter 6 - Policies - A presentation of the four fundamental classes of policies.Powell, Approximate dynamic programming, Chapter 7 - Policy search

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* Chapter 5.

Kleywegt, Homem-de-Mello, Shapiro, Sample Average Approximation SIAM J. Optimization, 2001. - This is the paper that introduced the sample average approximation method.

Powell, W. B., Chapter
6 of *Approximate Dynamic Programming*, section 6.8 (2011). This section
describes some classic convergence proofs for stochastic approximation theory
that are central to ADP and RL.

Nati Srebro and Ambuj Tewari, Stochastic Optimization for Machine Learning, (2010), A presentation describing how the parameters of a machine learning model can be estimated using the tools of stochastic search.

Kearns, Mansour, Ng , "A sparse sampling algorithm for near-optimal planning in large Markov decision processes," IJCAI 1999 - This paper develops bounds for a method of tree search which trims the tree using Monte Carlo sampling.

R. Sutton and A. Barto, Reinforcement Learning, MIT Press, 1998. - This is the bible of reinforcement learning - the first book to introduce the field. It is aging, but remains a good source of the vocabulary and many of the fundamental concepts, available on the internet.

Rich Sutton was one of the founders of the field of reinforcement learning which started around 1980. This paper introduced the seminal idea of temporal difference learning, which is really a form of approximate value iteration for a fixed policy.

Tsitsiklis, J. N., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690. \cite{Tsitsiklis1997b}

Tadic, V. (2001). On the Convergence of Temporal-Difference Learning with Linear Function Approximation, 42, 241-267. \cite{tadic2001}

Extends Tsitsiklis and van Roy (1997) to general state spaces.

“First algorithm for off-policy TD learning with linear function approximation.”

Bradtke,
S. J., Ydestie, B. E., & Barto, A. G. (1994). Adaptive linear quadratic
control using policy iteration. in Proc. Amer. Control Conf., Baltimore, MD,
Jun, 3475-3476. \cite{Bradtke1994a}

Tsitsiklis,
J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine
Learning, 16, 185-202. \cite{Tsitsiklis1994}

This is the seminal paper that merged reinforcement learning (approximate dynamic programming) with the convergence theory of stochastic approximation.

Tsitsiklis,
J. N., & Roy, B. (1996). Feature-based methods for large scale dynamic programming.
Machine Learning, 22(1-3), 59-94. doi: 10.1007/BF00114724. \cite{Tsitsiklis1996c}

Gordon,
G. (2001). Reinforcement learning with function approximation converges to a
region. In In Advances in Neural Information Processing Systems. \cite{Gordon2001a}

Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161-178. \cite{OrSe02}

Munos, R., & Szepesvari, C. (2008). Finite-Time Bounds for Fitted Value Iteration. Journal of Machine Learning Research, 1, 815-857. \cite{munos2008}

**Stochastic optimization with different objective functions:**

**Robust optimization**

**Reliability constrained optimization**

**Optimization of convex risk measures**

Shapiro, Dentcheva, Ruszczynski, *Lecture notes on stochastic programming, *2009*.* Chapter 6.

**Quantile optimization**

Kim, J. H., Powell, W. B., & Collado, R. A. (n.d.). Quantile optimization for heavy-tailed distributions using asymmetric signum functions.