**Our research focuses on three dimensions: modeling (the translation of physical
problems into mathematics), algorithms (covering both theoretical and experimental
research), and implementation. For a list of recent papers that are primarily
theoretical, click here. Some
papers are listed under multiple headings. Where available, we have provided
downloadable versions of papers. **

**For a list of recent papers, go to**

*General introductions:*

**Books****Surveys/reviews/book chapters****Clearing the jungle of stochastic optimization (a series of tutorials)**

*General purpose methods:*

**Optimal learning****Approximate dynamic programming****Machine learning****Stochastic optimization****Deterministic optimization****Queueing theory**

*Models and algorithms for general resource allocation problems:*

*Applications:*

**Energy systems analysis****Dynamic fleet management (trucking, rail, air)****Military airlift****Health and medical applications****Vehicle routing and scheduling problems****Service network design****Machine scheduling****Physical distribution****Traffic and public transportation****Freight demand estimation**

*Of general interest:*

**What's new?(back)**

Below we provide a list of working papers (under review at various journals), papers that have been accepted but have not yet appeared, and papers that have recently appeared.

**Recent working papers (in various stages of review): Updated November, 2016.**

Lina al-Kanj, W.B. Powell, B. Bouzaiene-Ayari, “The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response,”

B. Cheng, W.B. Powell, “Co-optimizing Battery Storage for Frequency Regulation and Energy Shifting using Multiscale Dynamic Programming,”

Tsvetan Asamov, W. B. Powell, “Regularized Decomposition of High-Dimensional Multistage Stochastic Programs with Markov Uncertainty,”

Xinyu He, Warren B. Powell, “Optimal Learning for Stochastic Optimization with Nonlinear Parametric Belief Models,”

Ilya Ryzhov, Martijn Mes, W.B. Powell, Gerald van den Berg, “Bayesian Exploration for Approximate Dynamic Programming,”

Ricardo Collado, Jae Ho Kim, W.B. Powell, “Quantile Optimization in Electricity Trading in the Presence of Storage with Heavy-Tailed Prices,”

Tsvetan Asamov, W.B. Powell, “SDDP vs. ADP: The Effect of Dimensionality in Multistage Stochastic Optimization for Grid Level Energy Storage,”

Wang, Y. W. B. Powell, R. Schapire, “Finite-time analysis for the knowledge-gradient policy,”

Daniel Jiang, W. B. Powell, “Optimal Policies for Risk-Adverse Electric Vehicle Charging with Spot Purchases,”

Yingfei Wang, Warren B. Powell, “The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks with an Application in Personalized Health Care”

Si Chen, Kristofer-Roy G. Reyes, Xinyu He, Maneesh K. Gupta, Jesse P. Goodman, Michael C. McAlpine, Warren B. Powell, “Optimal Learning for Release Profile Matching with a Noncentral Chi-Squared Noise Distribution”

Vazquez-Anderson, Jorge; Baldridge, Kevin; Reyes, Kristofer; Powell, Warren; Contreras, Lydia N, “Understanding the role of structural accessibility in RNA targeting via biophysical modeling of in vivo anti-sense hybridization,”

Ilya Ryzhov, Martijn Mes, W.B. Powell, Gerald van den Berg, “Bayesian Exploration for Approximate Dynamic Programming,”

Yan Li, Han Liu, W.B. Powell, “The Knowledge Gradient Policy using a Sparse Additive Belief Model,”

J. Khazaei, W. B. Powell, “SMART-Invest -A stochastic, dynamic planning for optimizing investments in wind, solar, and storage in the presence of fossil fuels: The Case of the PJM Electricity Market,”

Javad Khazaei, Michael Coulon, W.B. Powell, “ADAPT: A Price Stabilizing Compliance Policy for Renewable Energy Certificates: The Case of SREC Markets,” Click here for copy.

W. R. Scott, W. B. Powell, S. Moazeni, “Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison Against Optimal Benchmarks Using Energy Storage,”

Daniel Jiang, W. B. Powell, “An Approximate Dynamic Programming Algorithm for Monotone Value Functions,”

R. Collado, J. Kim, W. B. Powell, , “Quantile Optimization using Asymmetric Signum Functions”

Donghun Lee, W. B. Powell, “Bias-Corrected Q-Learning to Control Max Operator Bias”

**Papers which are to appear:**

S. Moazeni, W.B. Powell, B. Defourny, B. Bouzaiene-Ayari, “Parallel Non-Stationary Direct Policy Search," Informs J. on Computing.

C. Archer, Hugo P. Simao, W. Kempton, W.B. Powell, M. Dvorak, “The challenge of integrating offshore wind power in the US electric grid. Part I: Wind Forecast Error,” Renewable Energy (to appear)

Hugo P. Simao, W.B. Powell, C. Archer, W. Kempton, “The challenge of integrating offshore wind power in the US electric grid. Part II: Simulation of electricity market operations,” Renewable Energy (to appear)

Warren B Powell, A Unified Framework for Optimization under Uncertainty, Informs TutORials, November, 2016. (invited, lightly reviewed).

This tutorial article shows how a wide range of problems, spanning classical stochastic search, dynamic programming, stochastic programming, optimal stopping, stochastic control, along with learning problems such as the multiarmed bandit problem, can all be cast in a common framework.

Daniel Salas, W. B. Powell, “Benchmarking a Scalable Approximation Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems” Informs J. on Computing (to appear)

The article describes a scalable algorithm based on approximate dynamic programming that optimizes battery storage across a grid. The paper includes careful benchmarking that shows that the method provides very near optimal solutions on realistic, time-dependent problems.

Arta Jamshidi and W. B. Powell, “A Recursive Local Polynomial Approximation Method using Dirichlet Clouds and Radial Basis Functions” SIAM J. on Scientific Computation (to appear)

The article describes a nonparametric method for local approximation of smooth functions that can be easily updated in recursive applications.

Lina al-Kanj, Belgacem Bouzaiene-Ayari, W. B. Powell, “A Probability Model for Grid Failures Using Incomplete Outage Information,” IEEE Transactions on the Smart Grid (to appear)

Utilities have to restore the grid from storm outages, without knowing precisely where the outages have occurred. Their primary source of information is "lights out" calls from customers who have lost their power. But an outage can trigger a recloser that cuts power to a large number of houses that are no-where close to the actual outage. We develop a probability model that uses Bayes theorem to calculate the probability of outages based on the history of lights out calls, as well as a failure to call, which is also a form of information. We present a rigorous probability model, describe the application of Bayes theorem, and then describe a strategy for overcoming the familiar curse-of-dimensionality that typically arises when using Bayes theorem.

Somayeh Moazeni, I. Arciniegas Rueda, M. Coulon, B. Song, W.B. Powell, “A Non-Parametric Structural Hybrid Modeling Approach for Electricity Prices,” J. of Quantitative Finance (to appear)

Utilities face the problem of signing electricity contracts for up to 10 years into the future. Forward contracts do not exist for electricity contracts this far into the future, but fuel contracts are available. We use a nonparametric structural model of the supply stack and a load forecast, combined with stochastic forecasts of fuel prices, to create a stochastic model of 10-year electricity prices.

Harvey Cheng, W. B. Powell, Arta Jamshidi, “Optimal Learning with a Local Parametric Belief Model,” J. Global Optimization

Introduces the knowledge gradient using a hierarchical, local parametric belief model. The paper includes a convergence proof and demonstrates the effectiveness on different problems, including the surprisingly difficult newsvendor problem.

Boris Defourny, Ilya O. Ryzhov, W. B. Powell, “Optimal Information Blending with Measurements in the L2 Sphere,” Mathematics of Operations Research (to appear)

This paper introduces risk in the context of optimal learning using a knowledge gradient-type policy.

Moazeni, Somayeh, W.B. Powell, A. H. Hajimiragha, “Mean-Conditional Value-at-Risk Optimal Energy Storage Operation in the Presence of Transaction Costs,” IEEE Transactions on Power Systems, Accepted July, 2014 (to appear).

This paper addresses the formulation and solution of an optimal energy storage management problem under risk consideration and transaction costs of trading energy with the power grid. The price evolves as a stochastic process, capable of correctly explaining the seasonality effects as well as the tail fatness and spikiness in its distribution. Transaction costs capture the price impact of the storage operation on the electricity spot price. A risk analysis of an optimal risk neutral deterministic policy as well as the simple myopic policy indicates that the realized operational cost may notably differ from the expected cost by a considerable probability. This difference suggests that we need to consider risk. Using the downside risk measure of Conditional Value-at-Risk, an optimal risk averse conversion and transmission strategy, among the grid, the renewable power generation source, and an energy storage is proposed to fully satisfy the electricity demand and minimize the expected operational cost as well as the risk. Our numerical study using data from NYISO demonstrates the impacts of risk consideration and the transaction cost parameters.

Belgacem Bouzaiene-Ayari, C. Cheng, S. Das, R. Fiorillo, W.B. Powell, “From Single Commodity to Multiattribute Models for Locomotive Optimization: A Comparison of Integer Programming and Approximate Dynamic Programming,” Transportation Science, Appeared online, August 4, 2014. http://dx.doi.org/10.1287/trsc.2014.0536

The paper compares three optimization models for the locomotive planning problem for Norfolk Southern railroad: an aggregate model that is used in production to provide high level guidance on locomotive flows, a more detailed multicommodity flow formulation (which could be solved only over a three day horizon), and a very detailed model based on the modeling and algorithmic framework of approximate dynamic programming. The ADP-based formulation, called PLASMA, has been used by NS for a number of years for strategic fleet planning, and for daily operational planning. NS (as of this writing) only uses the deterministic version of the code, but this paper describes experiments using stochastic travel times (a major issue at North American freight railroads) and demonstrates that ADP produces much more robust solutions. Click here for more details.

W. B. Powell, "Perspectives of Approximate Dynamic Programming," Annals of Operations Research, Annals of Operations Research (2012), DOI 10.1007/s10479-012-1077-6 (c) Springer.

This is an overview of ADP, covering some of the historical evolution, how to model a sequential decision process, and a summary of the four fundamental classes of policies that are used in most of the stochastic optimization literature, spanning reinforcement learning, dynamic programming, stochastic programming, and variations of lookahead policies.

**Papers which have recently appeared:**

W. B. Powell and S. Meisel, "Tutorial on Stochastic Optimization in Energy Part I: Models and Policies," IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1459-1467, 2016.

Reviews the different canonical models used by different communities, and introduces our own canonical modeling framework and introduces the four classes of policies.

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy II: An energy storage illustration", IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1468-1475, 2016

Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. A fifth problem shows that in some cases a hybrid policy is needed.

Si Chen, K-R G. Reyes, M. Gupta, M. C. McAlpine, W. B. Powell, "Optimal Learning in Experimental Design Using the Knowledge Gradient Policy with Application to Characterizing Nanoemulsion Stability," SIAM J. Uncertainty Quantification, Vol. 3, pp. 320-345, 2015. DOI 10.1137/140971129.

Calculating the E max of the knowledge gradient is computationally difficult when the underlying belief model is nonlinear in the parameters. This paper proposes and tests the idea of using a knowledge gradient policy based on a sampled belief model. The policy is demonstrated in the context of an experimental setting designing the parameters for a water-oil-water mixture for drug delivery.

Yingfei Wang, K. G. Reyes, K. A. Brown, C. A. Mirkin, W. B. Powell, “Nested Batch Mode Learning and Stochastic Optimization with an Application to Sequential Multi-Stage Testing in Materials Science,” SIAM J. Scientific Computing, Vol. 37, No. 3, pp. B361-B381, DOI: 10.1137/140971117, 2015.

This paper addresses a complex optimal learning problem arising in the laboratory sciences, which involves guiding the process of making nested, batch decisions. The first decision involves creating a design of a nano particle. Then it is possible to test the performance (the reflectivity of the surface of a material) by testing different densities, which can be done in a batch experiment.

Michael Coulon, Javad Khazaei, W. B. Powell, “SMART-SREC: A stochastic model of the New Jersey solar renewable energy certificate market,” Journal of Environmental Economics and Management, Vol. 73, pp. 13-31, 2015.

New Jersey uses targets to encourage the use of solar energy by setting up a market for solar renewable energy certificates. This paper develops a dynamic programming model of these markets and calibrates the model against historical data for New Jersey.

D. Jiang, W. B. Powell, “Optimal Hour-ahead Bidding in the Real-Time Electricity Market with Battery Storage Using Approximate Dynamic Programming” Informs J. on Computing, Vol. 27, No. 3, pp. 525-543 (2015).

The hour-ahead bidding problem requires setting a low (buy) and high (sell) bid that determines how a battery is managed one hour in the future, which means that the post-decision state has to include both the bids we just determined, along with other variables such as energy in storage, prices and exogenous energy sources. The dynamic program features a value function that is monotone in all the state variables, which is exploited in the design of an optimal algorithm.

Yingfei Wang, K. G. Reyes, K. A. Brown, C. A. Mirkin, W. B. Powell, “Nested Batch Mode Learning and Stochastic Optimization with an Application to Sequential Multi-Stage Testing in Materials Science,” SIAM J. Scientific Computing, Vol. 37, No. 3, pp. B361-B381, DOI: 10.1137/140971117, 2015.

The paper proposes a method for computing the knowledge gradient when the belief model is nonlinear in the parameters, using a discretized sample of the unknown parameters. The method is demonstrated in the context of a complex experimental problem involving the design of nanoemulsions..

Yingfei Wang, K. G. Reyes, K. A. Brown, C. A. Mirkin, W. B. Powell, “Nested Batch Mode Learning and Stochastic Optimization with an Application to Sequential Multi-Stage Testing in Materials Science,” SIAM J. Scientific Computing, Vol. 37, No. 3, pp. B361-B381, DOI: 10.1137/140971117, 2015.

The paper develops a knowledge gradient policy for guiding an initial design decision (e.g. the size and shape of nanoparticles) followed by batch learning of a secondary tunable parameter (e.g. the density of particles) to maximize a metric (reflexivity of a surface).

Daniel Jiang, Thuy Pham, Warren B. Powell, Daniel Salas, Warren Scott, “A Comparison of Approximate Dynamic Programming Techniques on Benchmark Energy Storage Problems: Does Anything Work?,” IEEE Symposium Series on Computational Intelligence, Workshop on Approximate Dynamic Programming and Reinforcement Learning, Orlando, FL, December, 2014.

This paper uses two variations on energy storage problems to investigate a variety of algorithmic strategies from the ADP/RL literature. All of these methods are tested on benchmark problems that are solved optimally, so that we get an accurate estimate of the quality of the policies being produced. Somewhat surprisingly, generic machine learning algorithms for approximating value functions did not work particularly well. What did work well is best described as "lookup table with structure." The structure we exploit is convexity and monotonicity. Test datasets are available at http://www.castlelab.princeton.edu/datasets.htm.

Ryzhov, I., P. I. Frazier and W. B. Powell, “Stepsize Selection for Approximate Value Iteration and a New Optimal Stepsize Rule,” IEEE Transactions on Automatic Control, Vol. 60, No. 3, pp. 743-758, 2015.

This paper studies stepsizes in the context of bootstrapping-based algorithms (value iteration, Q-learning), which introduces the problem of designing a stepsize that balances the need to add costs/rewards over time (which requires larger stepsizes), while smoothing noise (which requires smaller stepsizes). We provide tight upper and lower bounds that proves that 1/n converges to the optimal solution, but converges so slowly it should never be used. We then use a single state, single-action problem to derive an optimal stepsize for both steady state and finite horizon problems.

W.B. Powell, B. Bouzaiene-Ayari, Clark Cheng, Sourav Das, Ricardo Fiorillo, Coleman Lawrence, “Locomotive Planning at Norfolk Southern: An Optimizing Simulator Using Approximate Dynamic Programming,” Interfaces, Vol. 44, No. 6, pp. 567-578., 2014.

This paper describes the implementation of PLASMA at the Norfolk Southern Railroad.Click here for more details.

Warren B. Powell. “Clearing the Jungle of Stochastic Optimization.” INFORMS Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, http://dx.doi.org/10.1287/educ.2014.0128

This invited tutorial unifies different communities working on sequential decision problems. First, it provides a simple, five-part canonical form for modeling stochastic dynamic programs (drawing off established notation from the controls community), with a thorough discussion of state variables. It then summarizes four fundamental classes of policies called policy function approximations (PFAs), policies based on cost function approximations (CFAs), policies based on value function approximations (VFAs), and lookahead policies. There is a detailed discussion of stochastic lookahead policies (familiar to stochastic programming). It closes with a summary of results using approximate value functions in an energy storage problem.

Lauren Hannah, W.B. Powell, D. Dunson, “Semi-Convex Regression for Metamodeling-Based Optimization,” SIAM J. on Optimization, Vol. 24, No. 2, pp. 573-597, (2014).

Stochastic search involves finding a set of controllable parameters that minimizes an unknown objective function using a set of noisy observations. We consider the case when the unknown function is convex and a metamodel is used as a surrogate objective function. Often the data are non-i.i.d. and include a observable state variable, such as applicant information in a loan rate decision problem. State information is difficult to incorporate into convex models. We propose a new semi-convex regression method that is used to produce a convex metamodel in the presence of a state variable. We show consistency for this method. We demonstrate its effectiveness for metamodeling on a set of synthetic inventory management problems and a large, real-life auto loan dataset.

E. Barut and W. B. Powell, “Optimal Learning for Sequential Sampling with Non-Parametric Beliefs,” Journal of Global Optimization, Vol. 58, pp. 517-543 (2014). DOI 10.1007/s10898-013-0050-5.

This paper derives the knowledge gradient when the belief model is captured using kernel regression, a class of nonparametric statistical models.

Shi, N., Song, H., & Powell, W. B. The dynamic fleet management problem with uncertain demand and customer chosen service level.International Journal of Production Economics,148, pp. 110–121 (2014). doi:10.1016/j.ijpe.2013.09.010

**Surveys/reviews/book
chapters (back)**

**This section contains a series
of tutorial articles and book chapters. Often, these are an easy place to start.
Journal articles can be quite technical, and they also represent our first attempt
to write up a piece of research. These book chapters synthesize a lot of our
research, and they are intended to be more tutorial in style.**

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.

Using the contextual domain of transportation and logistics, this paper describes the fundamentals of how to model sequential decision processes (dynamic programs), and outlines four classes of policies. A section describes the linkage between stochastic search and dynamic programming, and then provides a step by step linkage from classical statement of Bellman's equation to stochastic programming. The remainder of the paper uses a variety of applications from transportation and logistics to illustrate the four classes of policies.

W. B. Powell, I. O. Ryzhov, “Optimal Learning and Approximate Dynamic Programming,”Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, (F. Lewis and D. Liu, eds.), John Wiley/CRC Press, New York, pp. 410-431, 2013.

This chapter covers methods for solving the exploration vs. exploitation problem from two perspectives: policy search (a dynamic program with a pure belief state), and dynamic programming with a physical state. After providing an overview of some of the most popular heuristics, we describe how the knowledge gradient can be used in both settings. The knowledge gradient is illustrated for both offline and online learning, both for policy search and in dynamic programming with a physical state.

Powell, W. B., “The knowledge gradient for optimal learning,” Encyclopedia of Operations Research and Management Science, John Wiley and Sons. (to appear)

This is a short introduction to the concept of the knowledge gradient for optimal learning, including both online and offline applications.

Powell, W. B., “Approximate Dynamic Programming I: Modeling,” Encyclopedia of Operations Research and Management Science, John Wiley and Sons. (to appear)

With apologies, this chapter has nothing to do with approximate dynamic programming. I have found that before you solve a problem with ADP (or any other strategy), you should model your problem properly. This article outlines the classical framework used in control theory (less familiar to people in operations research), that is very friendly to simulation-based models. Of particular interest (at least to me) is section 4, which describes nine types of policies - one of these uses approximate dynamic programming.

Powell, W. B., “Approximate Dynamic Programming II: Algorithms,”Encyclopedia of Operations Research and Management Science, John Wiley and Sons.

This is a brief introduction to approximate dynamic programming. Think of it as just a taste.

Powell, W. B. “What you should know about approximate dynamic programming,” Naval Research Logistics, Vol. 56, No. 3, pp. 239-249, 2009.

This article is a brief overview and introduction to approximate dynamic programming, with a bias toward operations research. It highlights the major dimensions of an ADP algorithm, some strategies for approximating value functions, and brief discussions of good (and bad) modeling and algorithmic strategies.

Powell, W.B., “Approximate Dynamic Programming: Lessons from the field,” Proceedings of the Winter Simulation Conference, December, 2008.

This is the third in a series of tutorials given at the Winter Simulation Conference. This one has additional practical insights for people who need to implement ADP and get it working on practical applications.

Powell, W. B. and P. Frazier, "Optimal Learning," TutORials in Operations Research, Chapter 10, pp. 213-246, Informs (2008).

This was an invited tutorial on the topic of optimal learning, and represents a fairly easy introduction to the general field of information collection. Although the page constraints limited the scope, it covers the central dimensions of information collection, along with an overview of a number of the most popular heuristic policies. Of course, we include an introduction to the knowledge gradient concept.

Powell, W. B., “Approximate Dynamic Programming – A Melting Pot of Methods,” Informs Computing Society Newsletter, Fall, 2008 (Harvey Greenberg, ed.)

This article appeared in the Informs Computing Society Newsletter. It provides an easy, high-level overview of ADP, emphasizing the perspective that ADP is much more than an algorithm - it is really an umbrella for a wide range of solution procedures which retain, at their core, the need to approximate the value of being in a state.

Powell, W.B., “Real-time dispatching for truckload motor carriers,” in Logistics Engineering Handbook (G. Don Taylor, ed.), CRC Press, 2007, pp. 15-1 – 15-30. (c) CRC Press.

This paper describes our experience implementing a real-time optimization model at Burlington Motor Carriers. Intended for a broad audience, it describes the basics of the model and some of tbe behind-the-scenes events that were encountered during implementation.

Powell, W.B., A. George, B. Bouzaiene-Ayari and H. Simao, “Approximate Dynamic Programming for High Dimensional Resource Allocation Problems,” Proceedings of the IJCNN, Montreal, August 2005.

This is an easy introduction to the use of approximate dynamic programming for resource allocation problems. It shows how math programming and machine learning can be combined to solve dynamic programs with many thousands of dimensions, using techniques that are easily implemented on a laptop.

Powell, W.B., “Approximate Dynamic Programming for High-Dimensional Problems ,” Proceedings of the Winter Simulation Conference, December, 2007.

This short tutorial article, prepared for the Winter Simulation Conference, presents notation and fundamental algorithmic ideas to solve the types of high-dimensional applications which arise in a broad range of resource allocation problems.

Powell, W.B., “The Optimizing-Simulator: Merging Simulation and Optimization using Approximate Dynamic Programming,” Proceedings of the Winter Simulation Conference, December, 2005.

Approximate dynamic programming involves iteratively simulating a system. As a result, it often has the appearance of an "optimizing simulator." This short article, presented at the Winter Simulation Conference, is an easy introduction to this simple idea.

Powell, W.B. and B. van Roy, “Approximate Dynamic Programming for High Dimensional Resource Allocation Problems," (in Learning and Approximate Dynamic Programming: Scaling up to the Real World, J. Si, A. Barto, W.B. Powell and D. Wunsch, eds.), John-Wiley and Sons, New York, 2004.

We have done a lot of work that involves solving multistage linear programs for resource allocation using dynamic programming. This relatively short chapter is a quick introduction to this work, and brings together the themes of approximate dynamic programming and linear programming.

Powell, W.B., B. Bouzaiene-Ayari and H.P. Simao, “Dynamic Models for Freight Transportation,” Handbooks in Operations Research and Management Science: Transportation (G. Laporte and C. Barnhart, eds.), Elsevier, Amsterdam, 2007.

This chapter focuses on modeling the organization and flow of decisions and information in transportation. The chapter describes how to model sequential information and decision processes, and discusses four classes of information and the algorithmic strategies that are associated with each class. Approximate dynamic programming methods are introduced and illustrated for several classes of problems that arise in transportation and logistics.

Powell, W. B. and H. Topaloglu, “Stochastic Programming in Transportation and Logistics,” Handbooks in Operations Research and Management Science: Stochastic Programming (A. Shapiro and A. Ruszczynski, eds.), Elsevier, Amsterdam, pp. 555-635, 2003..

We introduce adaptive learning algorithms in the context (and notation) of stochastic programming, using fleet management (and particularly freight car distribution) as the illustrative application. The chapter is a good review of different approximation strategies, and uses the freight car distribution problem to illustrate how some problems exhibit separability and how to solve problems that are nonseparable.

Powell, W.B., “Dynamic Models of Transportation Operations,” Handbooks in Operations Research and Management Science: Supply Chain Management, (S. Graves and T. A. G. de Tok, eds.), Elsevier, Amsterdam, pp. 677-756, 2003.

This chapter provides an overview of problems in freight transportation, and a modeling and algorithmic strategy for solving these problems. For students looking for a single reference that covers the DRTP modeling framework and our approximate dynamic programming strategies, this is a good overview.

Powell, W.B. and H. Topaloglu, “Fleet Management,” in Applications of Stochastic Programming, (S. Wallace and W. Ziemba, eds.), Math Programming Society - SIAM Series in Optimization, Philadelphia,pp. 185-216, 2005.

Stochastic programming has been steadily making its way into a variety of application areas. This chapter demonstrates the use of a class of stochastic programming methods for fleet management applications, using the freight car distribution problem as a problem context.

Powell, W. B., P. Jaillet and A. Odoni, "Stochastic and Dynamic Networks and Routing," Handbook in Operations Research and Management Science, Vol. 4, Networks, (M.O. Ball, T.L. Magnanti, C.L. Monma and G.L. Nemhauser, eds.), pp. 141-295, 1995.

The hard part about writing a chapter for a handbook on dynamic models is that the field is very new and evolving quickly. However, this chapter contains a fairly extensive literature review and should serve as a useful reference for research prior to 1993-94.

- (Click here to view paper)

"Optimization Models and Algorithms: An Emerging Technology for the Motor Carrier Industry," Special Issue of IEEE on Intelligent Vehicle /Highway Systems, pp. 68 - 80, 1990. (Winner, Best Paper on Land Transportation from IEEE Section on Vehicular Technologies, 1992.)

This is a survey of a variety of optimization strategies for routing and scheduling of vehicles. The best part of this paper was that I received a marriage proposal from an appreciative reader.

"Toward a Unified Framework for Real-Time Logistics Control," Military Journal of Operations Research, Vol. I, No. 4, Winter, 1996, pp. 69-79.

**Books (back)**

W.B. Powell, I.O. Ryzhov,Optimal Learning, John Wiley and Sons, 2012

Optimal Learning addresses the challenge of collecting information, when observations are time consuming and/or expensive. This book grew out of an undergraduate course, and the first twelve chapters are accessible to an advanced undergraduate audience. For more on optimal learning (including table of contents, downloadable software, and a number of downloadable research papers), go to http://optimallearning.princeton.edu.

W.B. Powell,Approximate Dynamic Programming, John Wiley and Sons, 2nd edition, 2011.

This is the first book to bridge the growing field of approximate dynamic programming with operations research. Dynamic programming has often been dismissed because it suffers from "the curse of dimensionality." In fact, there are three curses of dimensionality when you deal with the high-dimensional problems that typically arise in operations research (the state space, the outcome space and the action space). This book shows how we can estimate value function approximations around the post-decision state variable to produce techniques that allow us to solve dynamic programs which exhibit states with millions of dimensions (approximately).

The book is aimed at an advanced undergraduate/masters level audience with a good course in probability and statistics, and linear programming (for some applications). For the advanced Ph.D., there is an introduction to fundamental proof techniques in "why does it work" sections. The book emphasizes solving real-world problems, and as a result there is considerable emphasis on proper modeling. The book includes dozens of algorithms written at a level that can be directly translated to code. The material in this book is motivated by numerous industrial applications undertaken at CASTLE Lab, as well as a number of undergraduate senior theses.

(for more information, click here for the ADP website)

J. Si, A. Barto, W.B. Powell and D. Wunsch (eds.), Learning and Approximate Dynamic Programming: Scaling up to the Real World, John-Wiley and Sons, New York, 2004.

This edited volume is based on the NSF workshop held in Playacar, Mexico in 2002. It represents the diversity of communities that are using, and contributing to, the general area of approximate dynamic programming.

* Optimal
learning *(

Optimal learning represents the problem of making observations (or measurements) in an efficient way to achieve some objective. This is our newest area of research, with a number of papers on the way.

The paper develops a knowledge gradient policy for guiding an initial design decision (e.g. the size and shape of nanoparticles) followed by batch learning of a secondary tunable parameter (e.g. the density of particles) to maximize a metric (reflexivity of a surface).

E. Barut and W. B. Powell, “Optimal Learning for Sequential Sampling with Non-Parametric Beliefs,” Journal of Global Optimization, Vol. 58, pp. 517-543 (2014). DOI 10.1007/s10898-013-0050-5.

This paper derives the knowledge gradient when the belief model is captured using kernel regression, a class of nonparametric statistical models.

Dayanik, Savas, Warren B. Powell, and Kazutoshi Yamazaki. "Asymptotically Optimal Bayesian sequential change detection and identification rules."Annals of Operations Research(M. Katehakis, ed.) 208.1 (2013): 337-370.

We study the joint problem of sequential change detection and multiple hypothesis testing. Suppose that the common distribution of a sequence of i.i.d. random variables changes suddenly at some unobservable time to one of nitely many distinct alternatives, and one needs to both detect and identify the change at the earliest possible time. We propose computationally efficient sequential decision rules that are asymptotically either Bayes-optimal or optimal in a Bayesian fixed-error formulation, as the unit detection delay cost or the misdiagnosis and false alarm probabilities go to zero, respectively. Numerical examples are provided to verify the asymptotic optimality and the speed of convergence.

Ryzhov, I., W. B. Powell, “Information Collection for Linear Programs with Uncertain Objective Coefficients,” SIAM J. Optimization, Vol. 22(4), pp. 1344–1368 http://epubs.siam.org/doi/abs/10.1137/12086279X. (2012).Finding the optimal solution of a linear program assumes that you have accurate information on costs (among other things). In some settings, these costs may be approximate, and getting more information can be expensive. For example, a problem in logistics might require including costs that reflect the quality of service provided by a vendor, but it may be necessary to use the vendor for a period of time, or collect historical information from other manufacturers, to refine these costs.

This paper develops the knowledge gradient for maximizing the expected value of information when solving linear programs. The paper establishes asymptotic optimality for off-line versions of the problem and proposes a computationally tractable algorithm.

I. Ryzhov, W. B. Powell, P. I. Frazier, “The knowledge gradient algorithm for a general class of online learning problems,”Operations Research, Vol. 60, No. 1, pp. 180-195 (2012).

The knowledge-gradient policy was originally derived for off-line learning problems such as ranking and selection. Considerable attention has been given to the on-line version of this problem, known popularly as the multiarmed bandit problem, for which Gittins indices are known to be optimal for discounted, infinite-horizon versions of the problem. In this paper, we derive a knowledge gradient policy for on-line problems, and show that it very closely matches the performance of Gittins indices for discounted infinite horizon problems. It actually slightly outperforms the best available approximation of Gittins indices (by Gans and Chick) on problems for which Gittins indices should be optimal. The KG policy is also effective on finite horizon problems. The only policy which is competitive with KG seems to be interval estimation, but this requires careful tuning of a parameter. The KG policy also works on problems where the beliefs about different alternatives are correlated. The KG policy with independent beliefs is extremely easy to compute (we provide closed-form expressions for the case with normal rewards), and requires a simple numerical algorithm for the case with correlated beliefs.

Scott, Warren, P. I. Frazier, and W. B. Powell. "The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters Using Gaussian Process Regression."SIAM Journal on Optimization21, No. 3 (2011): 996-1026.

We have previously developed the knowledge gradient with correlated beliefs for discrete alternatives. This paper extends this idea to problems with continuous alternatives. We do this by developing a continuous approximate of the knowledge gradient. This produces a nonconcave surface that we have to maximize. Local minima are located close to points that have been previously measured, so we use these points to guess at the locations of local maxima and then use a simple gradient search algorithm starting from each of these points. A proof of convergence is provided. We use the distances between local minima to perform scaling of the steepest descent algorithm. We compare the method against Huang's adaptation of sequential kriging to problems with noisy measurements.

I. Ryzhov, W.B. Powell, "Information collection on a graph,"Operations Research, Vol 59, No. 1, pp. 188-201, 2011. (c) Informs

We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to rene Bayesian estimates of individual arc costs in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute, and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.

Ryzhov, I. and W. B. Powell, “Bayesian Active Learning with Basis Functions,” IEEE Workshop on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011.A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. We propose a Bayesian strategy for resolving the exploration/exploitation dilemma in this setting. Our approach is based on the knowledge gradient concept from the optimal learning literature, which has been recently adapted for approximate dynamic programming with lookup-table approximations. The new method performs well in numerical experiments conducted on an energy storage problem.

P. Frazier and W. B. Powell, “Consistency of Sequential Bayesian Sampling Policies” SIAM J. Control and Optimization, Vol. 49, No. 2, 712-731 (2011). DOI: 10.1137/090775026We consider Bayesian information collection, in which a measurement policy collects information to support a future decision. This framework includes ranking and selection, continuous global optimization, and many other problems in sequential experimental design. We give a sufficient condition under which measurement policies sample each measurement type infinitely often, ensuring consistency, i.e., that a globally optimal future decision is found in the limit. This condition is useful for verifying consistency of adaptive sequential sampling policies that do not do forced random exploration, making consistency difficult to verify by other means. We demonstrate the use of this sufficient condition by showing consistency of two previously proposed ranking and selection policies: OCBA for linear loss, and the knowledge-gradient policy with independent normal priors. Consistency of the knowledge-gradient policy was shown previously, while the consistency result for

OCBA is new.

D. Negoescu, P. Frazier and W. B. Powell, “The Knowledge Gradient Algorithm for Sequencing Experiments in Drug Discovery”, Informs Journal on Computing, Vol. 23, No. 3, pp. 346-363, 2011. (c) Informs

This paper describes a method for applying the knowledge gradient to a problem with a very large number of alternatives. Instead of creating a belief about each alternative (known as a "lookup table belief model"), we represent our belief about an alternative using linear regression (known as a "parametric belief model"). The method is motivated by the need to find the best molecular compound to solve a particular problem (e.g. killing cancer cells). There is a base compound with a series of sites (indexed by j) and a series of small sequences of atoms ("substituents") indexed by i. Let X_{ij} = 1 if we put substituent i at site j, and let theta_{ij} be the impact of this combination on the performance of the compound. The goal is to choose compounds to test that allow us to estimate the parameters theta as quickly as possible.

(click here to download main paper) (Click here for online supplement)

P. Frazier, W. B. Powell, H. P. Simao, "Simulation Model Calibration with Correlated Knowledge-Gradients," Winter Simulation Conference, December, 2009.

A common challenge in the calibration of simulation model is that we have to tune several continuous parameters. This is our first application of the knowledge gradient algorithm with correlated beliefs to the problem of parameter tuning for simulation models. A single run of the model (which uses adaptive learning from approximate dynamic programming) requires more than a day, so the paper also introduces methods to product results without a full run. A Bayesian model is set up to capture the uncertainty in our beliefs about the convergence of the model. The method is illustrated in the tuning of two continuous parameters, which required approximately six runs of the model.

Frazier, P. I., and W. B. Powell, “Paradoxes in Learning: The Marginal Value of Information and the Problem of Too Many Choices,” Decision Analysis, Vol. 7, No. 4, pp. 378-403, 2010.

The value of information can be a concave function in the number of measurements, but for many problems it is not, and instead follows an S-curve. A little bit of information may teach you nothing, and you may have to make an investment in information beyond a certain threshold to actually have an impact. We investigate the economic implications of the S-curve effect, showing that it is possible to have too many choices. We then revisit the knowledge gradient algorithm, which allocates measurements based on the marginal value of information. The knowledge gradient can produce poor learning results in the presence of an S-curve. We propose the KG(*) algorithm, which maximizes the average value of information, and show that it produces good results when there is a significant S-curve effect.

Mes, M., P. I. Frazier and W. B. Powell, “Hierarchical Knowledge Gradient for Sequential Sampling,” J. Machine Learning Research, Vol.12, pp. 2931-2974, 2011.

We develop the knowledge gradient for optimizing a function when our belief is represented by constants computed at different levels of aggregation. Our estimate of the function at any point is given by a weighted sum of estimates at different levels of aggregation.

Frazier, P. and W. B. Powell, “The knowledge gradient stopping rule for ranking and selection,” Proceedings of the Winter Simulation Conference, December 2008.

We consider the ranking and selection of normal means in a fully sequential Bayesian context. By considering the sampling and stopping problems jointly rather than separately, we derive a new composite stopping/sampling rule. The sampling component of the derived composite rule is the same as the previously introduced LL1 sampling rule, but the stopping rule is new. This new stopping rule significantly improves the performance of LL1 as compared to its performance under the best other generally known adaptive stopping rule, EOC Bonf, outperforming it in every case tested.

Ryzhov, I. and W. B. Powell, “The Knowledge Gradient Algorithm For Online Subset Selection,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning (part of IEEE Symposium on Computational Intelligence), March, 2009.

We derive a one-period look-ahead policy for online subset selection problems, where learning about one subset also gives us information about other subsets. We show that the resulting decision rule is easily computable, and present experimental evidence that the policy is competitive against other online learning policies.

S Dayanik, W. B. Powell and K. Yamazaki "An Asymptotically Optimal Strategy in Sequential Change Detection and Identification Applied to Problems in Biosurveillance" Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics, (J. Li, D. Aleman, R. Sikora, eds.), 2008.

In some application, it is useful to have a stopping rule for an information collection problem. This paper investigates a stopping rule based on the knowledge gradient concept.

Powell, W. B. and P. Frazier, "Optimal Learning," TutORials in Operations Research, Chapter 10, pp. 213-246 (2008) (c) Informs.

This was an invited tutorial on the topic of optimal learning, and represents a fairly easy introduction to the general field of information collection. Although the page constraints limited the scope, it covers the central dimensions of information collection, along with an overview of a number of the most popular heuristic policies. Of course, we include an introduction to the knowledge gradient concept.

P. Frazier, W. B. Powell, S. Dayanik, “The Knowledge-Gradient Policy for Correlated Normal Beliefs,” Informs Journal on Computing, Vol. 21, No. 4, pp. 585-598 (2009) (c) Informs

The knowledge gradient policy is a method for determining which of a discrete set of measurements we should make to determine which of a discrete set of choices we should make. Most of the applications that we have considered introduce the dimension of correlated beliefs. If we evaluate the level of contamination in one location and it measures high, we are likely to raise our belief about the level of toxin in nearby locations. If we test a machine for airport security that can sense explosives and it works poorly, we might lower our evaluation of other devices that might use similar technologies (e.g. a particular material or sensor within the device). This article shows how to compute the knowledge gradient for problems with correlated beliefs. The paper shows that just as with problems with independent beliefs, the knowledge gradient is both myopically and asymptotically optimal. Experimental work shows that it can produce a much higher rate of convergence than the knowledge gradient with independent beliefs, in addition to outperforming other more classical information collection mechanisms.

(click here to download paper) (Click here for online supplement)

Frazier, P., W. B. Powell and S. Dayanik, “A Knowledge Gradient Policy for Sequential Information Collection,” SIAM J. on Control and Optimization, Vol. 47, No. 5, pp. 2410-2439 (2008).

The knowledge gradient policy is introduced here as a method for solving the ranking and selection problem, which is an off-line version of the multiarmed bandit problem. Imagine that you have M choices (M is not too large) where you have a normally distributed belief about the value of each choice. You have a budget of N measurements to evaluate each choice to refine your distribution of belief. After your N measurements, you have to choose what appears to be the best based on your current belief. The knowledge gradient policy guides this search by always choosing to measure the choice which would produce the highest value if you only have one more measurement (the knowledge gradient can be viewed as a method of steepest ascent). The paper shows that this policy is myopically optimal (by construction), but is also asymptotically optimal, making it the only stationary policy that is both myopically and asymptotically optimal. The paper provides bounds for finite measurement budgets, and provides experimental work that shows that it works as well as, and often better, than other standard learning policies.

S. Dayanik, W. Powell, and K. Yamazaki, “Index policies for discounted bandit problems with availability constraints,” Advances in Applied Probability, Vol. 40, No. 2, pp. 377-400 (2008).

One of the most famous problems in information collection is the multiarmed bandit problem, where make a choice (out of a discrete set of choices), observe a reward, and use this observation to update estimates of the future value of rewards. This problem can be solved by choosing the option with the highest index (known as the Gittins index). In this paper, we present an index problem for the case where not all the choices are available each time. As a result, it is sometimes important to make an observation just because the observation is available to be made.

**Approximate
dynamic programming (back)**

Much of our work falls in the intersection of stochastic programming and dynamic programming. The dynamic programming literature primarily deals with problems with low dimensional state and action spaces, which allow the use of discrete dynamic programming techniques. The stochastic programming literature, on the other hands, deals with the same sorts of higher dimensional vectors that are found in deterministic math programming. However, the stochastic programming community generally does not exploit state variables, and does not use the concepts and vocabulary of dynamic programming.

Our contributions to the area of approximate dynamic programming can be grouped into three broad categories: general contributions, transportation and logistics, which we have broadened into general resource allocation, discrete routing and scheduling problems, and batch service problems. A series of short introductory articles are also available.

Warren B. Powell. “Clearing the Jungle of Stochastic Optimization.” INFORMS Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, http://dx.doi.org/10.1287/educ.2014.0128

This invited tutorial unifies different communities working on sequential decision problems. First, it provides a simple, five-part canonical form for modeling stochastic dynamic programs (drawing off established notation from the controls community), with a thorough discussion of state variables. It then summarizes four fundamental classes of policies called policy function approximations (PFAs), policies based on cost function approximations (CFAs), policies based on value function approximations (VFAs), and lookahead policies. There is a detailed discussion of stochastic lookahead policies (familiar to stochastic programming). It closes with a summary of results using approximate value functions in an energy storage problem.

Daniel Jiang, Thuy Pham, Warren B. Powell, Daniel Salas, Warren Scott, “A Comparison of Approximate Dynamic Programming Techniques on Benchmark Energy Storage Problems: Does Anything Work?,” IEEE Symposium Series on Computational Intelligence, Workshop on Approximate Dynamic Programming and Reinforcement Learning, Orlando, FL, December, 2014.

This paper uses two variations on energy storage problems to investigate a variety of algorithmic strategies from the ADP/RL literature. All of these methods are tested on benchmark problems that are solved optimally, so that we get an accurate estimate of the quality of the policies being produced. Somewhat surprisingly, generic machine learning algorithms for approximating value functions did not work particularly well. What did work well is best described as "lookup table with structure." The structure we exploit is convexity and monotonicity. Test datasets are available at http://www.castlelab.princeton.edu/datasets.htm.

J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for Concave, Scalar Storage Problems with Vector-Valued Controls,” IEEE Transactions on Automatic Control, Vol. 58, No. 12, pp. 2995-3010. http://dx.doi.org/10.1109/TAC.2013.2272973 (2013).

This paper proves convergence for an ADP algorithm using approximate value iteration (TD(0)),for problems that feature vector-valued decisions (e.g. allocating energy over a grid), linked by a scalar storage system, such as a water reservoir. The problem arises in settings where resources are distributed from a central storage facility. The algorithm is well suited to continuous problems which requires that the function that captures the value of future inventory be finely discretized, since the algorithm adaptively generates break points for a piecewise linear approximation. The strategy does not require exploration, which is common in reinforcement learning.

W. B. Powell, J. Ma, “A Review of Stochastic Algorithms with Continuous Value Function Approximation and Some New Approximate Policy Iteration Algorithms for Multi-Dimensional Continuous Applications,” Journal of Control Theory and Applications, Vol. 9, No. 3, pp. 336-352, 2011.

We review the literature on approximate dynamic programming, with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and actions, and complex information processes. We build on the literature that has addressed the well-known problem of multidimensional (and possibly continuous) states, and the extensive literature on model-free dynamic programming which also assumes that the expectation in Bellman's equation cannot be computed. However, we point out complications that arise when the actions/controls are vector-valued and possibly continuous. We then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees (with technical assumptions) for both parametric and nonparametric architectures for the value function.

Ryzhov, I. and W. B. Powell, “Bayesian Active Learning with Basis Functions,” IEEE Workshop on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011.A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. We propose a Bayesian strategy for resolving the exploration/exploitation dilemma in this setting. Our approach is based on the knowledge gradient concept from the optimal learning literature, which has been recently adapted for approximate dynamic programming with lookup-table approximations. The new method performs well in numerical experiments conducted on an energy storage problem.

W.B. Powell,Approximate Dynamic Programming, John Wiley and Sons, 2007.

This is the first book to bridge the growing field of approximate dynamic programming with operations research. Dynamic programming has often been dismissed because it suffers from "the curse of dimensionality." In fact, there are three curses of dimensionality when you deal with the high-dimensional problems that typically arise in operations research (the state space, the outcome space and the action space). This book shows how we can estimate value function approximations around the post-decision state variable to produce techniques that allow us to solve dynamic programs which exhibit states with millions of dimensions (approximately).

The book is aimed at an advanced undergraduate/masters level audience with a good course in probability and statistics, and linear programming (for some applications). For the advanced Ph.D., there is an introduction to fundamental proof techniques in "why does it work" sections. The book emphasizes solving real-world problems, and as a result there is considerable emphasis on proper modeling. The book includes dozens of algorithms written at a level that can be directly translated to code. The material in this book is motivated by numerous industrial applications undertaken at CASTLE Lab, as well as a number of undergraduate senior theses.

Powell, W. B., "Approximate Dynamic Programming I: Modeling," Encyclopedia of Operations Research and Management Science, John Wiley and Sons, (to appear).

Powell, W. B., "Approximate Dynamic Programming II: Algorithms," Encyclopedia of Operations Research and Management Science, John Wiley and Sons, (to appear).

These two short chapters provide yet another brief introduction to the modeling and algorithmic framework of ADP. The first chapter actually has nothing to do with ADP (it grew out of the second chapter). Instead, it describes the five fundamental components of any stochastic, dynamic system. There is also a section that discusses "policies", which is often used by specific subcommunities in a narrow way. I describe nine specific examples of policies. I think this helps put ADP in the broader context of stochastic optimization.

The second chapter provides a brief introduction to algorithms for approximate dynamic programming. In the tight constraints of these chapters for Wiley's Encyclopedia, it is not possible to do a topic like this justice in 20 pages, but if you need a quick peek into ADP, this is one sample.

Ryzhov, I. O., W. B. Powell, "Approximate Dynamic Programming with Correlated Bayesian Beliefs," Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Sept. 29-Oct. 1, 2010.The exploration-exploitation problem in dynamic programming is well-known, and yet most algorithms resort to heuristic exploration policies such as epsilon-greedy. We use a Bayesian model of the value of being in each state with correlated beliefs, which reflects the common fact that visiting one state teaches us something about visiting other states. We use the knowledge gradient algorithm with correlated beliefs to capture the value of the information gained by visiting a state. Using both a simple newsvendor problem and a more complex problem of making wind commitments in the presence of stochastic prices, we show that this method produces significantly better results than epsilon-greedy for both Bayesian and non-Bayesian beliefs. These are shown for both offline and online implementations.

Powell, W.B., “Merging AI and OR to Solve High-Dimensional Resource Allocation Problems using Approximate Dynamic Programming” Informs Journal on Computing, Vol. 22, No. 1, pp. 2-17 (2010).(c) Informs

This paper addresses four problem classes, defined by two attributes: the number of entities being managed (single or many), and the complexity of the attributes of an entity (simple or complex). All the problems are stochastic, dynamic optimization problems. Single, simple-entity problems can be solved using classical methods from discrete state, discrete action dynamic programs. The AI community often works on problems with a single, complexity entity (e.g. a backgammon board). The OR community tends to work on problems with many simple entities. This paper briefly describes how advances in approximate dynamic programming performed within each of these communities can be brought together to solve problems with multiple, complex entities.

Nascimento, J. and W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem,” Mathematics of Operations Research, Vol. 34, No. 1, pp. 210-237 (2009).

I have worked for a number of years using piecewise linear function approximations for a broad range of complex resource allocation problems. A few years ago we proved convergence of this algorithmic strategy for two-stage problems (click here for a copy). In this latest paper, we have our first convergence proof for a multistage problem. This paper is more than a convergence proof for this particular problem class - it lays out a proof technique, which combines our work on concave approximations with theory laid out by Bertsekas and Tsitsiklis (in their Neuro-Dynamic Programming book).

Ma, J. and W. B. Powell, “A convergent recursive least squares policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning (part of IEEE Symposium on Computational Intelligence), March, 2009.

This conference proceedings paper provides a sketch of a proof of convergence for an ADP algorithm designed for problems with continuous and vector-valued states and actions. In addition, it also assumes that the expected in Bellman's equation cannot be computed. The proof assumes that the value function can be expressed as a finite combination of known basis functions. The proof is for a form of approximate policy iteration.

George, A., W.B. Powell and S. Kulkarni, “Value Function Approximation Using Hierarchical Aggregation for Multiattribute Resource Management,” Journal of Machine Learning Research, Vol. 9, pp. 2079-2111 (2008).

There are a number of problems in approximate dynamic programming where we have to use coarse approximations in the early iterations, but we would like to transition to finer approximations as we collect more information. This paper studies the statistics of aggregation, and proposes a weighting scheme that weights approximations at different levels of aggregation based on the inverse of the variance of the estimate and an estimate of the bias. This weighting scheme is known to be optimal if we are weighting independent statistics, but this is not the case here. What is surprising is that the weighting scheme works so well. We demonstrate this, and provide some important theoretical evidence why it works.

George, A. and W.B. Powell, “Adaptive Stepsizes for Recursive Estimation with Applications in Approximate Dynamic Programming,” Machine Learning, Vol. 65, No. 1, pp. 167-198, (2006). (c) Springer

One of the first challenges anyone will face when using approximate dynamic programming is the choice of stepsizes. Use the wrong stepsize formula, and a perfectly good algorithm will appear not to work. Deterministic stepsize formulas can be frustrating since they have parameters that have to be tuned (difficult if you are estimating thousands of values at the same time). This paper reviews a number of popular stepsize formulas, provides a classic result for optimal stepsizes with stationary data, and derives a new optimal stepsize formula for nonstationary data. This result assumes we know the noise and bias (knowing the bias is equivalent to knowing the answer). A formula is provided when these quantities are unknown. Our result is compared to other deterministic formulas as well as stochastic stepsize rules which are proven to be convergent. The numerical work suggests that the new optimal stepsize formula (OSA) is very robust. It often is the best, and never works poorly.

Powell, W.B., A. George, B. Bouzaiene-Ayari and H. Simao, “Approximate Dynamic Programming for High Dimensional Resource Allocation Problems,” Proceedings of the IJCNN, Montreal, August 2005.

This is an easy introduction to the use of approximate dynamic programming for resource allocation problems. It shows how math programming and machine learning can be combined to solve dynamic programs with many thousands of dimensions, using techniques that are easily implemented on a laptop.

Powell, W.B., “The Optimizing-Simulator: Merging Simulation and Optimization using Approximate Dynamic Programming,” Proceedings of the Winter Simulation Conference, December, 2005.

Approximate dynamic programming involves iteratively simulating a system. As a result, it often has the appearance of an "optimizing simulator." This short article, presented at the Winter Simulation Conference, is an easy introduction to this simple idea.

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.

Using the contextual domain of transportation and logistics, this paper describes the fundamentals of how to model sequential decision processes (dynamic programs), and outlines four classes of policies. A section describes the linkage between stochastic search and dynamic programming, and then provides a step by step linkage from classical statement of Bellman's equation to stochastic programming. The remainder of the paper uses a variety of applications from transportation and logistics to illustrate the four classes of policies.

Powell, W. B., "Approximate Dynamic Programming II: Algorithms," Encyclopedia of Operations Research and Management Science, John Wiley and Sons, (to appear).

These two short chapters provide yet another brief introduction to the modeling and algorithmic framework of ADP. The first chapter actually has nothing to do with ADP (it grew out of the second chapter). Instead, it describes the five fundamental components of any stochastic, dynamic system. There is also a section that discusses "policies", which is often used by specific subcommunities in a narrow way. I describe nine specific examples of policies. I think this helps put ADP in the broader context of stochastic optimization.

The second chapter provides a brief introduction to algorithms for approximate dynamic programming. In the tight constraints of these chapters for Wiley's Encyclopedia, it is not possible to do a topic like this justice in 20 pages, but if you need a quick peek into ADP, this is one sample.

Powell, W. B. “What you should know about approximate dynamic programming,” Naval Research Logistics, Vol. 56, No. 3, pp. 239-249, 2009.

This article is a brief overview and introduction to approximate dynamic programming, with a bias toward operations research. It highlights the major dimensions of an ADP algorithm, some strategies for approximating value functions, and brief discussions of good (and bad) modeling and algorithmic strategies.

Powell, W.B., “Merging AI and OR to Solve High-Dimensional Resource Allocation Problems using Approximate Dynamic Programming” Informs Journal on Computing, Vol. 22, No. 1, pp. 2-17 (2010).(c) Informs

This paper addresses four problem classes, defined by two attributes: the number of entities being managed (single or many), and the complexity of the attributes of an entity (simple or complex). All the problems are stochastic, dynamic optimization problems. Single, simple-entity problems can be solved using classical methods from discrete state, discrete action dynamic programs. The AI community often works on problems with a single, complexity entity (e.g. a backgammon board). The OR community tends to work on problems with many simple entities. This paper briefly describes how advances in approximate dynamic programming performed within each of these communities can be brought together to solve problems with multiple, complex entities.

Powell, W. B., “Approximate Dynamic Programming: Lessons from the field,” Invited tutorial, Proceedings of the 40th Conference on Winter Simulation, pp. 205-214, 2008.

This is the third in a series of tutorials given at the Winter Simulation Conference. This one has additional practical insights for people who need to implement ADP and get it working on practical applications.

Powell, W. B., “Approximate Dynamic Programming – A Melting Pot of Methods,” Informs Computing Society Newsletter, Fall, 2008 (Harvey Greenberg, ed.)

This article appeared in the Informs Computing Society Newsletter. It provides an easy, high-level overview of ADP, emphasizing the perspective that ADP is much more than an algorithm - it is really an umbrella for a wide range of solution procedures which retain, at their core, the need to approximate the value of being in a state.

Approximate dynamic programming in transportation and logistics:

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework,” European J. on Transportation and Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.

Using the contextual domain of transportation and logistics, this paper describes the fundamentals of how to model sequential decision processes (dynamic programs), and outlines four classes of policies. A section describes the linkage between stochastic search and dynamic programming, and then provides a step by step linkage from classical statement of Bellman's equation to stochastic programming. The remainder of the paper uses a variety of applications from transportation and logistics to illustrate the four classes of policies.

Simao, H. P., J. Day, A. George, T. Gifford, J. Nienow, W. B. Powell, “An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application,” Transportation Science, Vol. 43, No. 2, pp. 178-197 (2009).(c) Informs.

This is a major application paper, which summarizes several years of development to produce a model based on approximate dynamic programming which closely matches historical performance. The model represents drivers with 15 attributes, capturing domicile, equipment type, days from home, and all the rules (including the 70 hour in eight days rule) governing drivers. The model gets drivers home, on weekends, on a regular basis (again, closely matching historical performance). The value functions produced by the ADP algorithm are shown to accurately estimate the marginal value of drivers by domicile.

(click here to download paper) See also the companion paper below:

Simao, H. P. A. George, Warren B. Powell, T. Gifford, J. Nienow, J. Day, "Approximate Dynamic Programming Captures Fleet Operations for Schneider National," Interfaces, Vol. 40, No. 5, pp. 342-352, 2010. (c) Informs

This paper is a lite version of the paper above, submitted for the Wagner competition. This paper does with pictures what the paper above does with equations.

Powell, W. B., Belgacem Bouzaiene-Ayari, Jean Berger, Abdeslem Boukhtouta, Abraham P. George, “The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations”, ACM Transactions on Automatic Control, Vol. 22, No. 1, pp. 39-57 (2011), DOI: 10.1145/2043635.2043636.

This paper shows that approximate dynamic programming can produce robust strategies in military airlift operations. Simulations are run using randomness in demands and aircraft availability. When demands are uncertain, we vary the degree to which the demands become known in advance. The results show that if we allocate aircraft using approximate dynamic programming, the effect of uncertainty is significantly reduced. These results call into question simulations that examine the effect of advance information which do not use robust decision-making, a property that we feel reflects natural human behavior.

Simao, H. P. and W. B. Powell, "Approximate Dynamic Programming for Management of High Value Spare Parts", Journal of Manufacturing Technology Management Vol. 20, No. 9 (2009).

This is a short conference proceedings paper that briefly summarizes the use of approximate dynamic programming for a real application to the management of spare parts for a major aircraft manufacturer.

Topaloglu, H. and W.B. Powell, “Dynamic Programming Approximations for Stochastic, Time-Staged Integer Multicommodity Flow Problems,” Informs Journal on Computing, Vol. 18, No. 1, pp. 31-42 (2006). (c) Informs.

This paper applies the technique of separable, piecewise linear approximations to multicommodity flow problems. This technique worked very well for single commodity problems, but it was not at all obvious that it would work well for multicommodity problems, since there are more substitution opportunities. The experimental comparisons against multistage nested Benders (which is very slow) and more classical rolling horizon procedures suggests that it works very well indeed. This paper also provides a more rigorous treatment of what is known as the "multiperiod travel time" problem, and provides a formal development of a procedure for accelerating convergence.

Powell, W.B. and T. Carvalho, "Dynamic Control of Logistics Queueing Networks for Large Scale Fleet Management," Transportation Science, Vol. 32, No. 2, pp. 90-109, 1998. (c) Informs.

This paper introduces the use of linear approximations of value functions that are learned adaptively.

Powell, W.B., J. Shapiro and H. P. Simao, "An Adaptive, Dynamic Programming Algorithm for the Heterogeneous Resource Allocation Problem," Transportation Science, Vol. 36, No. 2, pp. 231-249 (2002). (c) Informs.

This paper also used linear approximations, but in the context of the heterogeneous resource allocation problem. In this setting, we assume that the size of theattribute state spaceof a resource is too large to enumerate. As a result, estimating the value of resource with a particular set of attributes becomes computationally difficult. We resort to hierarchical aggregation schemes.

Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times,”Transportation Science, Vol. 36, No. 1, pp. 21-39 (2002). (c) Informs

Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, II: Multiperiod Travel Times,”Transportation Science, Vol. 36, No. 1, pp. 40-54 (2002). (c) Informs

This paper adapts the CAVE algorithm to stochastic multistage problems. The paper demonstrates both rapid convergence of the algorithm as well as very high quality solutions. We found that the use of nonlinear approximations was complicated by the presence of multiperiod travel times (a problem that does not arise when we use linear approximations).

Powell, W.B., A. Ruszczynski and H. Topaloglu, “Learning Algorithms for Separable Approximations of Stochastic Optimization Problems,” Mathematics of Operations Research, Vol 29, No. 4, pp. 814-836 (2004). (c) Informs.

We have been doing a lot of work on the adaptive estimation of concave functions. This paper represents a major plateau. It describes a new algorithm dubbed theSeparable Projective Approximation Routine(SPAR) and includes 1) a proof that the algorithm converges when we sample all intervals infinitely often, 2) a proof that the algorithm produces an optimal solution when we only sample the optimal solution of our approximation at each iteration, when applied to separable problems, 3) a bound when the algorithm is applied to nonseparable problems such as two-stage stochastic programs with network resource, and 4) computational comparisons against deterministic approximations and variations of Benders decomposition (which is provably optimal). The experiments show that the SPAR algorithm, even when applied to nonseparable approximations, converges much more quickly than Benders decomposition.

Approximate dynamic programming in discrete routing and scheduling:

Spivey, M. and W.B. Powell, “The Dynamic Assignment Problem,” Transportation Science, Vol. 38, No. 4, pp. 399-419 (2004). (c) Informs.

This paper proposes a general model for the dynamic assignment problem, which involves the assignment of resources to tasks over time, in the presence of potentially several streams of information processes. It proposes an adaptive learning model that produces non-myopic behavior, and suggests a way of using hierarchical aggregation to reduce statistical errors in the adaptive estimation of the value of resources in the future. Finally, it reports on a study on the value of advance information. Past studies of this topic have used myopic models where advance information provides a major benefit over no information at all. Our model uses adaptive learning to bring forecast information into decisions made now, providing a more realistic estimate of the value of future information.

Approximate dynamic programming for batch service problems

Papadaki, K. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for a Stochastic Multiproduct Batch Dispatch Problem,” Naval Research Logistics, Vol. 50, No. 7, pp. 742-769, 2003.

One of the oldest problems in dynamic programming arises in the context of planning inventories. You can use textbook backward dynamic programming if there is only one product type, but real problems have multiple products. In this paper, we consider a multiproduct problem in the context of a batch service problem where different types of customers wait to be served. Arrivals are stochastic and nonstationary. We show that an approximate dynamic programming strategy using linear value functions works quite well and is computationally no harder than a simple myopic heuristics (once the iterative learning is completed).

Papadaki, K. and W.B. Powell, “Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem,” European Journal of Operational Research, Vol. 142, No. 1, pp. 108-127 (2002). (c) Elsevier.

This paper compares an optimal policy for dispatching a truck over a single link (with one product type) against an approximate policy that uses approximations of the future. Why would we approximate a problem that is easy to solve to optimality? Because the optimal policy only works on single link problems with one type of product, while the other is scalable to much harder problems.

(click here to download paper)

*Machine
learning (back)
*

*Our pursuit of approximate dynamic programming has brought us into the
field of machine learning. Although our original intent was to develop methods
to approximate value functions, this work has taken us into new areas.*

**L. Hannah, D. Blei and W. B. Powell, “Dirichlet Process Mixtures of Generalized Linear Models,” J. Machine Learning Research, Vol. 12, pp. 11923-1953, 2011.**

We propose Dirichlet Process-Generalized Linear Models (DP-GLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and any response that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate and give a practical example for when those conditions hold. Additionally, we provide Bayesian bounds on the distance of the estimate from the true mean function based on the number of observations and posterior samples. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART and Gaussian processes. We show that the DP-GLM is competitive with the other methods, while accommodating various inputs and outputs and being robust when confronted with heteroscedasticity.

**L. Hannah, W. B. Powell, D. Blei, "Nonparametric Density Estimation
for Stochastic Optimization with an Observable State Variable," Neuro-Information
Processing Society, December, 2010.**

Consider the newsvendor problem, where we have to choose a quantity and then observe a demand. Now assume that we are first given information about the state of the world - perhaps it is rainy or sunny. This observable state changes our belief about the demand, which changes our optimal solution. We have to design an online, stochastic optimization algorithm that solves the problem without actually knowning the distribution of demand, even when we know the state. If there were only two states of the world, the problem would be easy. Now assume the state of the world is a multidimensional vector. We use Dirichlet process mixtures to prove convergence to optimality of an online learning algorithm which can be used for complex state-of-the-world variables. We demonstrate the method in the context of making commitments for wind energy.

**L. Hannah, W. B. Powell, D. Blei, "Stochastic Search with an Observable
State Variable," (under review)**

Think of a newsvendor problem where we have to choose a quantity, and then observe a random demand. Now assume that before we choose the quantity, we can observe the weather, which gives us some information about the demand. Now assume we are making a commitment of energy that we are going to deliver from our wind farm before we know how much the wind will blow. But before we make the commitment, we observe a vector of continuous variables including current and previous wind, prices and demand, along with time of day and information about the weather. We use DP-GLM to create an approximation that is a weighted sum of piecewise linear functions which allow us to capture the concavity of the function with respect to the wind commitment. This paper proves convergence of this strategy, and demonstrates its effectiveness in the context of an energy commitment problem.

**George, A., W.B. Powell and S. Kulkarni, “Value Function Approximation
Using Hierarchical Aggregation for Multiattribute Resource Management,”
Journal of Machine Learning Research, Vol. 9, pp. 2079-2111 (2008).**

There are a number of problems in approximate dynamic programming where we have to use coarse approximations in the early iterations, but we would like to transition to finer approximations as we collect more information. This paper studies the statistics of aggregation, and proposes a weighting scheme that weights approximations at different levels of aggregation based on the inverse of the variance of the estimate and an estimate of the bias. This weighting scheme is known to be optimal if we are weighting independent statistics, but this is not the case here. What is surprising is that the weighting scheme works so well. We demonstrate this, and provide some important theoretical evidence why it works.

**Stochastic
optimization (back)**

The following series of three papers provides an introduction to how to model stochastic optimization problems. The first is a general article aimed at the operations research community. This is followed by a two-part tutorial series aimed at the IEEE/controls community.

Warren B. Powell. “Clearing the Jungle of Stochastic Optimization,” Informs Tutorials in Operations Research: Bridging Data and Decisions, pp. 109-137, November, 2014, http://dx.doi.org/10.1287/educ.2014.0128 (c) Informs

This paper describes a general modeling framework for stochastic optimization, discusses how to model state variables, and provides an overview of four fundamental (meta) classes of policies. Section 5 provides an in-depth discussion of lookahead policies, which provides a bridge to (multistage) stochastic programming.

W. B. Powell and S. Meisel, "Tutorial on Stochastic Optimization in Energy Part I: Models and Policies," IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1459-1467, 2016.

Reviews the different canonical models used by different communities, and introduces our own canonical modeling framework and introduces the four classes of policies.

W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy II: An energy storage illustration", IEEE Trans. on Power Systems, Vol. 31, No. 2, pp. 1468-1475, 2016

Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. A fifth problem shows that in some cases a hybrid policy is needed.

**Lauren Hannah, W.B. Powell, D. Dunson, “Semi-Convex Regression for Metamodeling-Based Optimization,” SIAM J. on Optimization, Vol. 24, No. 2, pp. 573-597 (2014). **

Stochastic search involves finding a set of controllable parameters that minimizes an unknown objective function using a set of noisy observations. We consider the case when the unknown function is convex and a metamodel is used as a surrogate objective function. Often the data are non-i.i.d. and include a observable state variable, such as applicant information in a loan rate decision problem. State information is difficult to incorporate into convex models. We propose a new semi-convex regression method that is used to produce a convex metamodel in the presence of a state variable. We show consistency for this method. We demonstrate its effectiveness for metamodeling on a set of synthetic inventory management problems and a large, real-life auto loan dataset.

**J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for Concave, Scalar Storage Problems with Vector-Valued Controls,” IEEE Transactions on Automatic Control, Vol. 58, No. 12, pp. 2995-3010. http://dx.doi.org/10.1109/TAC.2013.2272973 (2013).**

This paper proves convergence for an ADP algorithm using approximate value iteration (TD(0)),for problems that feature vector-valued decisions (e.g. allocating energy over a grid), linked by a scalar storage system, such as a water reservoir. The problem arises in settings where resources are distributed from a central storage facility. The algorithm is well suited to continuous problems which requires that the function that captures the value of future inventory be finely discretized, since the algorithm adaptively generates break points for a piecewise linear approximation. The strategy does not require exploration, which is common in reinforcement learning.

**D.H. Lee and W. B. Powell, "An Intelligent Battery Controller Using Bias-Corrected Q-learning"** **AAAI, Toronto, July, 2012.**

This paper uses Q-learning to optimize a battery storage problem in the presence of stochastic prices. We found that for this application (with significant randomness in the cost function), that statistical errors in the Q-factors produces a large bias in the max operator. This paper suggests a correction factor based on an approximation derived from a model with an infinite number of actions. Without the bias correction, the algorithm still shows significant deviation from optimal after millions of iterations. The bias correction term significantly improves the rate of convergence.

**Ilya Ryzhov, Boris Defourny, Warren Powell, “Ranking and Selection Meets Robust Optimization,” Winter Simulation Conference, December, 2012.**

The objective of ranking and selection is to efficiently allocate an information budget among a set of design alternatives with unknown values in order to maximize the decision-maker's chances of discovering the best alternative. The field of robust optimization, however, considers risk-averse decision makers who may accept a suboptimal alternative in order to minimize the risk of a worst-case outcome. We bring these two fields together by defining a Bayesian ranking and selection problem with a robust implementation decision. We propose a new simulation allocation procedure that is risk-neutral with respect to simulation outcomes, but risk-averse with respect to the implementation decision. We discuss the properties of the procedure and present numerical examples illustrating the difference between the risk-averse problem and the more typical risk-neutral problem from the literature.

**Ryzhov, I. O., Awais Tariq, W. B. Powell, “May the Best Man Win: Simulation Optimization for Match-Making in E-Sports,” Proceedings of the Winter Simulation Conference, Phoenix, Arizona, December 11-14.**

**L. Hannah, W. B. Powell, D. Blei, "Nonparametric Density Estimation
for Stochastic Optimization with an Observable State Variable," Neuro-Information
Processing Society, December, 2010.**

Consider the newsvendor problem, where we have to choose a quantity and then observe a demand. Now assume that we are first given information about the state of the world - perhaps it is rainy or sunny. This observable state changes our belief about the demand, which changes our optimal solution. We have to design an online, stochastic optimization algorithm that solves the problem without actually knowning the distribution of demand, even when we know the state. If there were only two states of the world, the problem would be easy. Now assume the state of the world is a multidimensional vector. We use Dirichlet process mixtures to prove convergence to optimality of an online learning algorithm which can be used for complex state-of-the-world variables. We demonstrate the method in the context of making commitments for wind energy.

**L. Hannah and W. B. Powell, “Evolutionary Policy Iteration Under
a Sampling Regime for Stochastic Combinatorial Optimization,” IEEE Transactions
on Automatic Control, Vol. 55, No. 5, pp. 1254-1257, May, 2010.**

This paper provides a convergence proof for a type of evolutionary policy iteration algorithm which depends only on Monte Carlo sampling of the value of a policy. The algorithm was used to prove convergence for an algorithm in the literature which required computing expectations exactly.

**P. Frazier, W. B. Powell, H. P. Simao, "Simulation Model Calibration
with Correlated Knowledge-Gradients," Winter Simulation Conference, December,
2009.**

A common challenge in the calibration of simulation model is that we have to tune several continuous parameters. This is our first application of the knowledge gradient algorithm with correlated beliefs to the problem of parameter tuning for simulation models. A single run of the model (which uses adaptive learning from approximate dynamic programming) requires more than a day, so the paper also introduces methods to product results without a full run. A Bayesian model is set up to capture the uncertainty in our beliefs about the convergence of the model. The method is illustrated in the tuning of two continuous parameters, which required approximately six runs of the model.

**S. Dayanik, W. Powell, and K. Yamazaki (2007). “Index policies
for discounted bandit problems with availability constraints,” Journal
of Applied Probability (to appear).**

One of the most famous problems in information collection is the multiarmed bandit problem, where make a choice (out of a discrete set of choices), observe a reward, and use this observation to update estimates of the future value of rewards. This problem can be solved by choosing the option with the highest index (known as the Gittins index). In this paper, we present an index problem for the case where not all the choices are available each time. As a result, it is sometimes important to make an observation just because the observation is available to be made.

**Papadaki, K. and W. B. Powell, “Monotonicity in Multidimensional
Markov Decision Processes for Batch Service Problems,” Operations Research
Letters, Vol. 35, pp. 267-272 (2007).**

Structural properties of stochastic dynamic programs are essential to understanding the nature of the solutions and in

deriving appropriate approximation techniques.We concentrate on a class of multidimensional Markov decision processes and

derive sufficient conditions for the monotonicity of the value functions. We illustrate our result in the case of the multiproduct

batch dispatch (MBD) problem.

**Topaloglu, H. and W.B. Powell, “Dynamic Programming Approximations
for Stochastic, Time-Staged Integer Multicommodity Flow Problems,” Informs
Journal on Computing, Vol. 18, No. 1, pp. 31-42 (2006). (c) Informs.**

This paper applies the technique of separable, piecewise linear approximations to multicommodity flow problems. This technique worked very well for single commodity problems, but it was not at all obvious that it would work well for multicommodity problems, since there are more substitution opportunities. The experimental comparisons against multistage nested Benders (which is very slow) and more classical rolling horizon procedures suggests that it works very well indeed. This paper also provides a more rigorous treatment of what is known as the "multiperiod travel time" problem, and provides a formal development of a procedure for accelerating convergence.

**Powell, W.B., A. Ruszczynski and H. Topaloglu, “Learning Algorithms
for Separable Approximations of Stochastic Optimization Problems,” Mathematics
of Operations Research, Vol 29, No. 4, pp. 814-836 (2004).(c) Informs**

We have been doing a lot of work on the adaptive estimation of concave functions. This paper represents a major plateau. It describes a new algorithm dubbed theSeparable Projective Approximation Routine(SPAR) and includes 1) a proof that the algorithm converges when we sample all intervals infinitely often, 2) a proof that the algorithm produces an optimal solution when we only sample the optimal solution of our approximation at each iteration, when applied to separable problems, 3) a bound when the algorithm is applied to nonseparable problems such as two-stage stochastic programs with network resource, and 4) computational comparisons against deterministic approximations and variations of Benders decomposition (which is provably optimal). The experiments show that the SPAR algorithm, even when applied to nonseparable approximations, converges much more quickly than Benders decomposition.

**Spivey, M. and W.B. Powell, “Some Fixed Point Results for the
Dynamic Assignment Problem,” Annals of Operations Research, Vol. 124,
(G. Mitra, ed.), Kluwer Academic Publishers, pp. 15-33 (2003).**

In the course of conducting research on the use of linear value function approximations for the dynamic assignment problem, we ran across a class of problems where the right linear value function would produce the optimal solution. This paper summarizes this result and provides a proof.

**Topaloglu, H. and W.B. Powell, “An Algorithm
for Approximating Piecewise Linear Concave Functions from Sample Gradients,”
Operations Research Letters, Vol. 31, No. 1, pp. 66-76 (2003). **

We often find ourselves needing to approximate the marginal value of an additional resource, where we would like to estimate a piecewise linear concave function. But we have to do this with sample gradient information, which is random. And we still want the function to be concave. This paper proves convergence of an algorithm which maintains concavity by "leveling out" parts of the function that violate concavity.

**Chen, Z.-L. and W.B. Powell, "A Convergent Cutting
Plane and Partial Sampling Algorithm for Multistage Linear Programs with Recourse,"
Journal of Optimization Theory and Applications, Vol. 103, No. 3, pp. 497-524
(1999). (c) Plenum Publishing.**

This paper provides a proof of convergence for a class of nested Bender's cutting-plane algorithm. The basic model allows only right-hand side randomness, and requires independence between stages. The algorithm is computationally tractable for problems that are much larger than any prior algorithm for which a proof of convergence has been produced. It requires a finite sample space, but it may have thousands of realizations per stage, and the technique can handle dozens (even hundreds) of stages. Numerical performance will depend on the specific problem class.

** R. K.-L. Cheung and W.B. Powell, "SHAPE –
A Stochastic, Hybrid Approximation Procedure for Two-Stage Stochastic Programs,”
Operations Research, Vol. 48, No. 1, pp. 73-79 (2000) (c) Informs.**

This paper offers a provably convergent algorithm for two-stage stochastic programs. The method uses a differentiable (strictly convex) auxiliary function, and then "tilts" it using stochastic gradient information. The algorithm is very easy to implement.

**Godfrey, G. and W.B. Powell, "An Adaptive,
Distribution-Free Algorithm for the Newsvendor Problem with Censored Demands,
with Application to Inventory and Distribution Problems," Management Science,
Vol. 47, No. 8, pp. 1101-1112, (2001). (c) Informs.**

This paper proposes a novel approximation procedure for stochastic resource allocation problems. The technique works with a series of stochastic gradients which are then used to maintain a concave estimate of the value function. When applied to problems with known optimal solutions, the method provides results within a fraction of a percent of optimal. One byproduct is a fast algorithm for the newsvendor problem that does not require knowledge of the underlying demand distribution. The technique is also applied to separable distribution problems.(Click here to download paper)

**Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times,” Transportation Science, Vol. 36, No. 1, pp. 21-39 (2002). (c) Informs**

**Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, II: Multiperiod Travel Times,” Transportation Science, Vol. 36, No. 1, pp. 40-54 (2002). (c) Informs
**

This paper adapts the CAVE algorithm to stochastic multistage problems. The paper demonstrates both rapid convergence of the algorithm as well as very high quality solutions. We found that the use of nonlinear approximations was complicated by the presence of multiperiod travel times (a problem that does not arise when we use linear approximations).

**Cheung, R.K.-M. and W.B. Powell, "An Algorithm
for Multistage Dynamic Networks with Random Arc Capacities, with an Application
to Dynamic Fleet Management," Operations Research, Vol. 44, No. 6, pp.
951-963. (1996). (c) Informs.**

This paper applies the work in the paper immediately below to multistage problems. The motivating problem is dynamic fleet management with stochastic demands. This technique produces an approximate nonlinear function for the value of resources in a particular state. These nonlinear functions are then used in the context of a rolling horizon procedure. Numerical experiments indicate that the method outperforms rolling horizon procedures with deterministic demand forecasts. Note, however, that the nonlinear approximations are not updated iteratively.

**Cheung, R.K.-M. and W.B. Powell, "Network Recourse Decomposition
Method for Dynamic Networks with Random Arc Capacities," Networks, Vol.
24, No. 7, pp. 369-384 (1994).**

We consider a two-stage stochastic transshipment network. Our goal is to develop nonlinear approximations for the value of flow into a node at the beginning of the second stage. We decompose the second stage (which is assumed to be a dynamic transshipment network) into a series of trees, which allows us to use the tree recourse algorithm (described in the paper below) to find the exact value function. Because the second stage network is a transshipment network, we use a Lagrangian approach to decompose the network into trees. An iterative algorithm then updates the multipliers.

**Cheung, R.K.-M. and W.B. Powell, "Stochastic
Programs over Trees with Random Arc Capacities," Networks, Vol. 24, pp.
161-175 (1994).**

This paper considers the problem of a tree with random arc capacities. If S units of flow enter the root node, where they are to be rounded through n stages before they depart through the leavese of the tree, then we are interested in finding V(S), which is the expected value of S units of flow. We propose an efficient, exact algorithm for finding V(S), which works for both single stage (all arc capacities are known at the same time) or n stages (the arc capacities are learned only after a stage is progressed).

**Powell, W.B. and L. Frantzeskakis, "Restricted Recourse Strategies
for Dynamic Networks with Random Arc Capacities," Transportation Science
, Vol. 64, pp. 261-287, 1996. (c) Informs.**

General stochastic networks are quite complex. Not surprisingly, it is easier to analyze problems with special structure. In this paper, we explore specialized recourse strategies that arise in stochastic networks. We cover simple recourse (where the action is implemented regardless of the outcome, meaning that the state of the system in the future is deterministic), nodel recourse (where we assume that a decision is made at a node regarding which outbound arc should be chosen), and tree recourse (where we make the decision of how to route ourselves over a tree). Variations are also considered.

**Frantzeskakis, L. and W.B. Powell, "Restricted Recourse Strategies
for Bounding the Expected Network Recourse Function" Annals of Operations
Research on Stochastic Programming, (J. Higle and S. Sen, eds), Vol. 64, pp.
261-287, 1996. **

We investigate bounds for stochastic networks drawing on the principles of restricted recourse strategies, and concepts drawn from the work of Stein Wallace. Not for the faint of heart. As with a lot of the work on bounds for stochastic programs, this is a difficult intellectual exercise with a lot of input for very little output.(click here to download paper)

See also:

**Frantzeskakis, L. and W.B. Powell, "Bounding Procedures for Dynamic
Networks with Random Link Capacities, with Application to Stochastic Programming,"
Networks , Vol. 23, pp. 575-595 (1993).**

Here we actually produce a computationally tractable bound for multistage dynamic networks with random arc capacities. We use our ability to successively solve to optimality network recourse functions with linear value functions. If we choose the right constant, we can produce a bound. But, don't get your hopes up; the bounds are not very tight.

**Deterministic
optimization (back)**

**Chen, Z.-L. and W.B. Powell, "A Note on Bertsekas' Small-Label-First
Strategy," Networks, Vol. 29, No. 2, pp. 111-116 (1997).**

A neat little technical result showing that an algorithm by Bertsekas is NP, and how it can be easily fixed.

**Chen, Z.L. and W.B. Powell, "A Generalized Threshold Algorithm
for the Shortest Path Problem with Time Windows," in Network Design: Connectivity
and Facilities (P. Pardalos, D. Du, eds.), pp. 303-318 (1998).**

We look at the resource constrained shortest path problem (by Desroschers, Desrosiers and Soumis) and show that Klingman's threshold algorithm is faster in side by side comparisons of code programmed by the same programmer. This is yet another experimental demonstration that the algorithm with the better complexity bound (label setting) is not quite as good as label correcting algorithms (the threshold algorithm is a member of this class).

**Jones, K.L., I.J. Lustig, W.B. Powell and J.M. Farvolden, "Multicommodity
network flows: The impact of formulation on decomposition," Mathematical
Programming, Vol. 62, pp. 95-117 (1993).**

When this research was done, it was "common knowledge" that Dantzig-Wolfe decomposition did not work. It was common practice (primarily in the 1980's) to use decompositions that produced small master problems. Here, we show that it is possible to get dramatic increases in speed by breaking extreme point solutions (which might be a solution to a network flow problem) into more elementary representations (trees or paths). This produces much larger master problems (an issue in the 80's, but less important now) but much faster convergence.

**Farvolden, J.M., W.B. Powell and I.J. Lustig, "A Primal Partitioning
Solution for the Arc-Chain Formulation of a Multicommodity Network Flow Problem,"
Operations Research , Vol. 41, No. 4, pp. 669-693 (1993). (c) Informs.**

A fast, neat algorithm for multicommodity network flow problems where we show that the basis can be partitioned and exploited to produce a very fast algorithm. Our experiments are primarily on dynamic graphs, and commodities that represent O/D flows.

**Powell, W.B., E. Berkkam and I.J. Lustig, "On Algorithms for Nonlinear
Dynamic Networks", in Network Optimization Problems: Algorithms, Complexity
and Applications (Dingzhu Du and Panox Pardalos, eds.) World Scientific Press,
New Jersey, pp. 203-231 (1993).**

**Powell, W.B. "A Review of Sensitivity Results for Linear Networks
and a New Approximation to Reduce the Effects of Degeneracy," Transportation
Science, Vol. 23, No. 4, pp. 231-243 (1989). (c) Informs**

This paper is perhaps the most frequently used result in CASTLE Lab. The paper shows how to quickly obtain left and right derivatives with respect to resource constraints using flow augmenting paths. The basic result appears to be "well known" in the research community, but we could not find any documentation of the algorithm or formal writeup of the theory.

**Queueing
theory (back)**

This research is quite old. If you would like a pdf of one of these papers, I can request a copy. Just email me at powell@princeton.edu. If you are interested in queueing theory, I encourage working on problems where there is some sort of optimal control dimension, in which case I would not use this older research. Consider using the framework of approximate dynamic programming or simulation optimization for these problems.

Simao, H. and W.B. Powell, "Numerical Methods for Simulating Transient, Stochastic Queueing Networks - I: Methodology," Transportation Science , Vol. 26, No. 4, pp. 296-311 (1992). (c) Informs

Simao, H. and W.B. Powell, "Numerical Methods for Simulating Transient, Stochastic Queueing Networks - II: Experimental Results," Transportation Science , Vol. 26, No. 4, pp. 312-329 (1992). (c) Informs

Simao, H. and W.B Powell, "Waiting Time Distributions for Bulk Arrival, Bulk Service Queues with Vehicle Holding and Cancellation Strategies," Naval Research Logistics Quarterly, Vol. 34, No. 2, pp. 207-228 (1987).

Simao, H. and W.B Powell, "Waiting Time Distributions for Transient Bulk Queues with General Vehicle Dispatching Strategies," Naval Research Logistics Quarterly, Vol. 35, pp. 285-306 (1988).

Powell, W.B. and P. Humblet, "The Bulk Service Queue with a General Control Strategy: Theoretical Analysis and a New Computational Procedure," Operations Research, Vol. 34, No. 2, pp. 267-275 (1986). (c) Informs

Powell, W.B. "Iterative Algorithms for Bulk Arrival, Bulk Service Queues with Poisson and non-Poisson Arrivals," Transportation Science, Vol. 20, No. 2, pp. 65-79 (1986). (c) Informs

Powell, W.B. "Approximate, Closed Form Moment Formulas for Bulk Arrival, Bulk Service Queues," Transportation Science, Vol. 20, No. 1, pp. 13-23 (1986). (c) Informs

Powell, W.B. "Analysis of Vehicle Holding and Cancellation Strategies in Bulk Arrival, Bulk Service Queues," Transportation Science, Vol. 19, No. 4, pp. 352-377 (1985). (c) Informs

Powell, W.B. and H. Simao, "Stochastic Properties of Flows in Freight Consolidation Networks," Elsevier, pp. 437-457, 1987. Tenth International Symposium on Transportation and Traffic Theory, (N. Gartner and N. Wilson, eds.)

Powell, W.B.,"Bulk Service Queues with Deviations from Departure Schedules: The Problem of Correlated Headways," Transportation Research, Vol. 17B, No. 3, pp. 221-232, (1982).

**Modeling and
problem representation (back)**

**Marar, A. and W.B. Powell, “Combining Cost-Based and Rule-Based
Knowledge in Complex Resource Allocation Problems,” IIE transactions Vol.
38 (2), pp. 159-172 2006.**

Assume that we wish to solve a (static) resource allocation problem. We have a cost model, but it is not perfect. We also have observations of how resources (each described by a vector of attributes) behave in the field. However, the data from history is typically at a more aggregate level. For example: "team drivers like long loads", "high powered locomotives are best pulling merchandise trains" are examples of aggregate behaviors that we would like to replicate in a model, while also trying to minimize costs.

**Marar, A. and W. B. Powell, “Capturing Incomplete Information
in Resource Allocation Problems through Numerical Patterns,” European
Journal of Operations Research, 2008 (in press). doi:10.1016/j.ejor.2008.06.002**

We have long found that it is useful to use historical (or exogenous) patterns of behavior to guide optimization-based models. This paper specifically addresses the use of exogenous patterns of the form "we do a particular action x percent of the time".

**Powell, W.B., T. Wu, H. P. Simao and A. Whisman, “Using Low Dimensional
Patterns in Optimizing Simulators: An Illustration for the Military Airlift
Problem,” Mathematical and Computer Modeling 29, pp. 657-675 (2004).**

Sometimes a knowledgeable user will look at the results of a model and claim "You cannot send C-141's into Saudi Arabia!". Complaints such as these are instances of low-dimensional patterns, and represent an expression of a desirable behavior that is fairly easy to capture. The problem is that these simplistic rules are not enough to build an entire simulation. In this paper, we draw on the results of Marar to guide the behavior of a simulation model for the military airlift problem.

**Shapiro, J, W.B. Powell and D.E. Bernstein,
“A Flexible Java Representation for Uncertainty in Online Operations Research
Models,” Journal of Computing, Vol. 13, No. 1, pp. 29-55, 2001. (c) Informs.**

Sometimes a fresh perspective on an old problem produces some really nice insights. In this paper, Joel Shapiro observed that the classical way of representing random variables in software was neither accurate mathematically nor elegant from a software engineering perspective. The problem is that it is common in simulation packages to sample a random variable before it is ``known.'' In this paper, he introduces anInformationobject that captures whether a quantity is known or not, and introduces the use of the event listener paradigm to notify a decision function when the information is updated.

**Powell, W.B., J. Shapiro and H.P. Simao,
“A Representational Paradigm for Dynamic Resource Transformation Problems,”
Annals of Operations Research on Modeling (C. Coullard, R. Fourer, and J. H.
Owen, eds), Vol. 104, pp. 231-279, 2001.**

This paper is the fundamental modeling paradigm for our work. It evolved over a fouryear period starting in 1997. As we have worked on harder and more complex problems, it became clear that we needed to focus attention on notation. This paper formally introduces a problem class that we callDynamic Resource Transformation Problems.(This can be abbreviated DRTP, but we use DRiP since it is more pronouncable.)click here.

**Powell, W.B. "On Languages for Dynamic Resource Scheduling Problems,"
in Fleet Management and Logistics, (T. G. Crainic and G. Laporte, eds.), Kluwer
Academic Publishers, Boston, 1998.**

This paper was written for a speciality conference on routing and scheduling in honor of the 25th anniversary of the Centre de recherche sur les transports. We were asked to think about the future, and it caught me at a time when I was struggling with the problem of modeling complex operational problems. The paper reflected my frustration with the standard modeling paradigms that our community was using. This was the foundational work for the creation of the DRiP problem class.

**"Toward a Unified Framework for Real-Time Logistics Control,"
Military Journal of Operations Research, Vol. I, No. 4, Winter, 1996, pp. 69-79.**

An early application of the DRiP framework for airlift operations.

**General dynamic
resource management (back)
**

Almost all the work in CASTLE Lab can be described as dynamic resource management. However, most of this work has been presented in the contextual domains of fleet management or driver routing and scheduling. In this section, we are going to list papers that are written using the general framework of the DRTP paradigm which can be viewed as general purpose dynamic resource management problems. The major paper summarizing this modeling framework is the first paper below. Virtually all the others represent applications of this modeling framework:

Powell, W.B., J. Shapiro and H.P. Simao, “A Representational Paradigm for Dynamic Resource Transformation Problems,” Annals of Operations Research on Modeling (C. Coullard, R. Fourer, and J. H. Owen, eds), Vol. 104, pp. 231-279, 2001.

This paper is the fundamental modeling paradigm for our work. It evolved over a two year period starting in 1997. As we have worked on harder and more complex problems, it became clear that we needed to focus attention on notation. This paper formally introduces a problem class that we callDynamic Resource Transformation Problems.(This can be abbreviated DRTP, but we use DRiP since it is more pronouncable.) For more on this subject click here.

Wu, Tongqiang, W.B. Powell and A. Whisman, “The Optimizing-Simulator: An Illustration using the Military Airlift Problem,” ACM Transactions on Modeling and Simulation, Vol. 19, No. 3, Issue 14, pp. 1-31 (2009).

There is a growing sense that the modeling of complex problems in transportation and logistics require drawing on the skills of both mathematical programming and simulation. This paper accomplishes this, combining the modeling and algorithmic framework of approximate dynamic programming with pattern matching and proximal point algorithms. The paper is organized along the theme of modeling not just the physical process, but also the organization and flow of information and decisions. The paper proposes that a decision function can be a modeling choice, and shows how you can use four classes of information to build a family of decision functions.

Cheung, R. K.-M., N. Shi, W. B. Powell, and H. P. Simao, “An Attribute-Decision Model for Cross-Border Drayage Problem,” Transportation Research E: Logistics and Transportation Review, Volume 44, No. 2, pp. 217-234 (2008). (c) Elsevier

This paper illustrates a modeling and algorithmic strategy for multi-layered resource scheduling, using the context of the cross-border drayage problem.

Shapiro, J. and W.B. Powell, “A Metastrategy for Dynamic Resource Management Problems based on Informational Decomposition,” Informs Journal on Computing, Vol. 18, No. 1, pp. 43-60 (2006). (c) Informs.

We do a lot of work on very large scale problems in transportation. These are typically characterized by multiagent decision making, which also means multiagent information structures. This paper provides a fairly general model of the organization of information and decisions, and a method for decomposing and linking these decisions so that the individual agents, operating separately, produce near optimal overall solutions.

Topaloglu, H. and W.B. Powell, “Dynamic Programming Approximations for Stochastic, Time-Staged Integer Multicommodity Flow Problems,” Informs Journal on Computing, Vol. 18, No. 1, pp. 31-42 (2006). (c) Informs.

This paper applies the technique of separable, piecewise linear approximations to multicommodity flow problems. This technique worked very well for single commodity problems, but it was not at all obvious that it would work well for multicommodity problems, since there are more substitution opportunities. The experimental comparisons against multistage nested Benders (which is very slow) and more classical rolling horizon procedures suggests that it works very well indeed. This paper also provides a more rigorous treatment of what is known as the "multiperiod travel time" problem, and provides a formal development of a procedure for accelerating convergence.

Powell, W.B., “Dynamic Models of Transportation Operations,” Handbook in Operations Research and Management Science: Supply Chain Management, (S. Graves and T. A. G. de Tok, eds.), (to appear).

This chapter provides an overview of problems in freight transportation, and a modeling and algorithmic strategy for solving these problems. For students looking for a single reference that covers the DRTP modeling framework and our approximate dynamic programming strategies, this is a good overview.

Powell, W.B., J. Shapiro and H. P. Simao, "An Adaptive, Dynamic Programming Algorithm for the Heterogeneous Resource Allocation Problem," Transportation Science, Vol. 36, No. 2, pp. 231-249 (2002). (c) Informs.

This paper provides a general model for a two-layer DRiP with heterogeneous resources and tasks. The technique was applied to a driver scheduling problem involving 5,000 drivers and 30,000 loads.

Godfrey, G. and W.B. Powell, "An Adaptive, Distribution-Free Algorithm for the Newsvendor Problem with Censored Demands, with Application to Inventory and Distribution Problems," Management Science, Vol. 47, No. 8, pp. 1101-1112, (2001). (c) Informs.

This paper proposes a novel approximation procedure for stochastic resource allocation problems. The technique works with a series of stochastic gradients which are then used to maintain a concave estimate of the value function. When applied to problems with known optimal solutions, the method provides results within a fraction of a percent of optimal. One byproduct is a fast algorithm for the newsvendor problem that does not require knowledge of the underlying demand distribution. The technique is also applied to separable distribution problems.

Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times,”Transportation Science, Vol. 36, No. 1, pp. 21-39 (2002). (c) Informs

Godfrey, G. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, II: Multiperiod Travel Times,”Transportation Science, Vol. 36, No. 1, pp. 40-54 (2002). (c) Informs

This paper adapts the CAVE algorithm to stochastic multistage problems. The paper demonstrates both rapid convergence of the algorithm as well as very high quality solutions. We found that the use of nonlinear approximations was complicated by the presence of multiperiod travel times (a problem that does not arise when we use linear approximations).

**Multiagent control (back)**

Powell, W. B., A. George, J. Berger, A. Boukhtouta, “An adaptive-learning Framework for Semi-cooperative Multi-agent coordination,” SSCI 2011 ADPRL - 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011.

Complex problems involving multiple agents exhibit varying degrees of cooperation. The levels of cooperation might reflect both differences in information as well as differences in goals. In this research, we develop a general mathematical model for distributed, semi-cooperative planning and suggest a solution strategy which involves decomposing the system into subproblems, each of which is specified at a certain period in time and controlled by an agent. The agents communicate marginal values of resources to each other, possibly with distortion. We design experiments to demonstrate the benefits of communication between the agents and show that, with communication, the solution quality approaches that of the ideal situation where the entire problem is controlled by a single agent.

Topaloglu, H. and W.B. Powell, “A Distributed Decision-Making Structure for Dynamic Resource Allocation with Nonlinear Functional Approximations,” Operations Research, Vol. 53, No. 2, pp. 281-297 (2005) (c) Informs

Many complex problems in transportation are characterized by multiple decision-makers who control their own decisions with their own information. Coordinating these decision makers is a well-recognized challenge. This paper proposes to use anonlinearapproximation architecture, where the impact of decisions to move resources from one part of the network to another is captured using a nonlinear function. This introduces unique complications that do not arise with more traditional linear architectures, but produces much better results.

Shapiro, J. and W.B. Powell, “A Metastrategy for Dynamic Resource Management Problems based on Informational Decomposition,” Informs Journal on Computing, Vol. 18, No. 1, pp. 43-60 (2006). (c) Informs.

We do a lot of work on very large scale problems in transportation. These are typically characterized by multiagent decision making, which also means multiagent information structures. This paper provides a fairly general model of the organization of information and decisions, and a method for decomposing and linking these decisions so that the individual agents, operating separately, produce near optimal overall solutions.

**Energy systems analysis
(back)**

**Michael Coulon, Javad Khazaei, W. B. Powell, “SMART-SREC: A stochastic model of the New Jersey solar renewable energy certificate market,” Journal of Environmental Economics and Management, Vol. 73, pp. 13-31, 2015.**

New Jersey uses targets to encourage the use of solar energy by setting up a market for solar renewable energy certificates. This paper develops a dynamic programming model of these markets and calibrates the model against historical data for New Jersey.

**D. Jiang, W. B. Powell, “Optimal Hour-ahead Bidding in the Real-Time Electricity Market with Battery Storage Using Approximate Dynamic Programming” Informs J. on Computing, Vol. 27, No. 3, pp. 525-543 (2015).**

The hour-ahead bidding problem requires setting a low (buy) and high (sell) bid that determines how a battery is managed one hour in the future, which means that the post-decision state has to include both the bids we just determined, along with other variables such as energy in storage, prices and exogenous energy sources. The dynamic program features a value function that is monotone in all the state variables, which is exploited in the design of an optimal algorithm.

**M. Coulon, W. B. Powell, R. Sircar, “A Model for Hedging Load and Price Risk in the Texas Electricity Market,” Energy Economics, Vol. 40, pp. 976-988, 2013.**

We develop a joint model of prices and loads for the Texas electricity market. The model captures the relationship between prices and loads, as well as a variety of seasonal factors.

**Hugo Simao, H. B. Jeong, B. Defourny, W. B. Powell, A. Boulanger, A. Gagneja, L. Wu, R. Anderson, “A Robust Solution for the Load Curtailment Problem,” IEEE Transactions on the Smart Grid, Vol. 4, No. 4, pp. 2209-2019 (2013).**

We develop a load curtailment system for New York City using approximate dynamic programming so that we can issue advance notifications to building operators to reduce their demands on the grid before failures have actually happened. The paper compares a cost function approximation (CFA) to a policy based on a value function approximation (VFA).

**J. Enders, W. B. Powell and D. Egan,"Two-Stage Stochastic Program for the Allocation of High-Voltage Transformer Spares in the Electric Grid", Handbook of Wind Power Systems I, (Sorokin, A.; Rebennack, S.; Pardalos, P.M.; Iliadis, N.A.; Pereira, M.V.F. (Eds.)), Springer, New York, pp. 435-466, 2012.**

PJM faced the problem of allocating spare high-voltage transformers in locations that make it possible to respond quickly to failures. This is an integer, two-stage stochastic optimization problem. We solve the problem by approximating the second stage using piecewise linear value functions.

**Jae Ho Kim, W. B. Powell, “Optimal Energy Commitments with Storage and Intermittent Supply,” Operations Research, Vol. 59, No. 6, pp. 1347-1360 (2011).**

We develop a simple, analytic policy that determines how much energy we can commit to delivering from a wind farm in a day-ahead (or hour-ahead) market, in the presence of a finite capacity storage device with energy conversion losses. Our policy is derived assuming that wind is uniformally distributed betwee known bounds, but experiments with actual wind show that the approximation is quite robust. The policy is sensitive to the variability of the wind forecast, the capacity of the storage device and the conversion losses. We then use the policy to estimate the value of storage.

**Kim, Jae Ho, Powell, Warren B., An hour-ahead prediction model for heavy-tailed spot prices, Energy Economics (2011), doi:10.1016/j.eneco.2011.06.007**

We propose an hour-ahead prediction model for electricity prices that captures the heavy-tailed behavior that we observe in the hourly spot market in the Ercot (Texas) and the PJM West hub grids. We present a model according to which we separate the price process into a thin-tailed trailing-median process and a heavy-tailed residual process whose probability distribution can be approximated by a Cauchy distribution. We show empirical evidence that supports our model.

**Powell, W. B., George, A., A. Lamont and J. Stewart, “SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology and Policy,” Informs Journal on Computing, Vol. 24, No. 4, pp. 665-682 (2012).**

We address the problem of modeling energy resource allocation, including dispatch, storage and the long-term investments in new technologies, capturing different sources of uncertainty such as energy from wind, demands, prices and rainfall. We also wish to model long-term investment decisions in the presence of uncertainty. Accurately modeling the value of all investments such as wind and solar requires handling fine-grained temporal variability and uncertainty in wind and solar in the presence of storage. We propose a modeling and algorithmic strategy based on the framework of approximate dynamic programming (ADP) that can model these problems at hourly time increments over an entire year, or over several decades. We demonstrate the methodology using both spatially aggregate and disaggregate representations of energy supply and demand. This paper describes initial proof of concept experiments for an ADP-based model, called SMART, by describing the modeling and algorithmic strategy, and providing comparisons against a deterministic benchmark as well as initial experiments on stochastic datasets.

**Anderson, B. R. N., Boulanger, A., Powell, W. B., & Scott, W. Adaptive Stochastic Control for the Smart Grid. Proceedings of the IEEE, 99(6), 1098-1115, 2011.**

This paper explores a number of stochastic control challenges that arise in smart grid applications, and introduces algorithmic strategies under the general umbrella of approximation dynamic programming.

**Powell, W. B., George, A., A. Lamont and J. Stewart, “SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology and Policy,” Informs Journal on Computing, Vol. 24, No. 4, pp. 665-682 (2012).**

We address the problem of modeling energy resource allocation, including dispatch, storage and the long-term investments in new technologies, capturing different sources of uncertainty such as energy from wind, demands, prices and rainfall. We also wish to model long-term investment decisions in the presence of uncertainty. Accurately modeling the value of all investments such as wind and solar requires handling fine-grained temporal variability and uncertainty in wind and solar in the presence of storage. We propose a modeling and algorithmic strategy based on the framework of approximate dynamic programming (ADP) that can model these problems at hourly time increments over an entire year, or over several decades. We demonstrate the methodology using both spatially aggregate and disaggregate representations of energy supply and demand. This paper describes initial proof of concept experiments for an ADP-based model, called SMART, by describing the modeling and algorithmic strategy, and providing comparisons against a deterministic benchmark as well as initial experiments on stochastic datasets.

**J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic
Programming Algorithm for the Energy Dispatch Problem with Grid- Level Storage,”
**

This paper proves convergence of the approximate dynamic programming algorithm used within the SMART model for optimizing the use of hydroelectric storage for a fine-grained (hourly) energy dispatch model. The ADP algorithm uses a piecewise linear functional approximation, and does not require any explicit exploration. This is useful in an energy setting because the value function is discretized into thousands of segments.

**L. Hannah, W. B. Powell, and J. Stewart, “One-Stage R&D Portfolio
Optimization with an Application to Solid Oxide Fuel Cells,” Energy Systems
Journal, Vol. 1, No. 1, 2010.**

We propose and test an algorithm for large R&D portfolio optimization problems, which are tested in the context of solid oxide fuel cells. The logic handles large portfolios, such as finding the best 30 proposals out of 100 (or the best 100 proposals out of 300). The algorithm is a heuristic, but is compared to two competing algorithms (one of which is provably convergent). We show that our new algorithm, based on the priniciple of stochastic gradient optimization, reliably produces lower cost designs than the competing algorithms.

**J. Enders, W. B. Powell, D. Egan, “A Dynamic Model for the Failure
Replacement of Aging High-Voltage Transformers,” Energy Systems Journal,
Vol. 1, No. 1, pp. 31-59 (2010) (c) Springer**

This paper introduces a model that supports these efforts by optimizing the acquisition and the deployment of high-voltage transformers dynamically over time. We formulate the problem as a Markov Decision Process which cannot be solved for realistic problem instances. Instead we solve the problem using approximate dynamic programming using three different value function approximations, which are compared against an optimal solution for a simplified version of the problem. The methods include a separable, piecewise linear value function, a piecewise linear, two-dimensional approximation, and a piecewise linear function based on an aggregated inventory that is shown to produce solutions within a few percent with very fast convergence. The application of the best performing algorithm to a realistic problem instance gives insights into transformer management issues of practical interest.

**Enders, J., W. B. Powell and D. Egan “Robust Policies for the
Transformer Acquisition and Allocation Problem,” **

This paper compares a replacement policy derived using approximate dynamic programming to a replacement policy proposed by a major regional transmission organization based on classical failure analysis. The analysis shows that approximate dynamic programming produces a policy that offers lower cost as well as lower risk.

**Ryzhov, I., W. B. Powell, “A Monte-Carlo Knowledge Gradient Method
for Learning Abatement Potential of Emissions Reduction Technologies,”
Winter Simulation Conference, 2009. M. D. Rossetti, R. R. Hill, B. Johansson,
A. Dunkin, and R. G. Ingalls, eds, 2009, pp. 1492-1502.**

**Dynamic
fleet management (trucking, rail, air)(back)**

W.B. Powell, B. Bouzaiene-Ayari, Clark Cheng, Sourav Das, Ricardo Fiorillo, Coleman Lawrence, "Locomotive Planning at Norfolk Southern using Approximate Dynamic Programming," Interfaces, (to appear).(c) Informs

This paper describes the implementation of PLASMA at the Norfolk Southern Railroad.Click here for more details.

Belgacem Bouzaiene-Ayari, C. Cheng, S. Das, R. Fiorillo, W.B. Powell, “From Single Commodity to Multiattribute Models for Locomotive Optimization: A Comparison of Integer Programming and Approximate Dynamic Programming,” Transportation Science, Appeared online, August 4, 2014. http://dx.doi.org/10.1287/trsc.2014.0536 (c) Informs

The paper compares three optimization models for the locomotive planning problem for Norfolk Southern railroad: an aggregate model that is used in production to provide high level guidance on locomotive flows, a more detailed multicommodity flow formulation (which could be solved only over a three day horizon), and a very detailed model based on the modeling and algorithmic framework of approximate dynamic programming. The ADP-based formulation, called PLASMA, has been used by NS for a number of years for strategic fleet planning, and for daily operational planning. NS (as of this writing) only uses the deterministic version of the code, but this paper describes experiments using stochastic travel times (a major issue at North American freight railroads) and demonstrates that ADP produces much more robust solutions. Click here for more details.

Powell, W.B., Belgacem Bouzaiene-Ayari, Clark Cheng, Ricardo Fiorillo, Sourav Das, "Strategic, Tactical and Real-Time Planning of Locomotives at Norfolk Southern using Approximate Dynamic Programming,” ASME Conference, Philadelphia, April, 2012.

This paper summarizes a 10-year project to develop a production locomotive planning system for Norfolk Southern Railroad. The effort uses the modeling and algorithmic technology of approximate dynamic programming, which combines the power of mathematical programming and simulation. This makes it possible to model locomotives at a high level of detail, while simultaneously using optimization principles to give the model intelligent behavior. The system handles foreign power and shop routing, including congestion at yards. While Norfolk Southern runs the model in its deterministic form, we demonstrate the power of ADP to handle randomness in transit times and yard delays, producing results that are much more robust.

(Click here to download paper)- short version published in the proceedings of the ASME

(Click here to download paper)- long version (pre-publication)

Wu, Tongqiang, W.B. Powell and A. Whisman, “The Optimizing-Simulator: An Illustration using the Military Airlift Problem,” ACM Transactions on Modeling and Simulation, Vol. 19, No. 3, Issue 14, pp. 1-31 (2009).

There is a growing sense that the modeling of complex problems in transportation and logistics require drawing on the skills of both mathematical programming and simulation. This paper accomplishes this, combining the modeling and algorithmic framework of approximate dynamic programming with pattern matching and proximal point algorithms. The paper is organized along the theme of modeling not just the physical process, but also the organization and flow of information and decisions. The paper proposes that a decision function can be a modeling choice, and shows how you can use four classes of information to build a family of decision functions.

Simao, H. P., J. Day, A. George, T. Gifford, W. B. Powell, “An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application,” Transportation Science, published online doi 10.1287/trsc.1080.0238 (c) Informs

This is a major application paper, which summarizes several years of development to produce a model based on approximate dynamic programming which closely matches historical performance. The model represents drivers with 15 attributes, capturing domicile, equipment type, days from home, and all the rules (including the 70 hour in eight days rule) governing drivers. The model gets drivers home, on weekends, on a regular basis (again, closely matching historical performance). The value functions produced by the ADP algorithm are shown to accurately estimate the marginal value of drivers by domicile.

(click here to download paper) See also the companion paper below:

Simao, H. P. A. George, Warren B. Powell, T. Gifford, J. Nienow, J. Day, "Approximate Dynamic Programming Captures Fleet Operations for Schneider National," Interfaces, Vol. 40, No. 5, pp. 342-352, 2010. (c) Informs

This paper is a lite version of the paper above, submitted for the Wagner competition. This paper does with pictures what the paper above does with equations.

Topaloglu, H. and W. B. Powell, “Sensitivity Analysis of a Dynamic Fleet Management Model Using Approximate Dynamic Programming” Operations Research, Vol. 55, No. 2, pp. 319-331 (2007). (c) InformsWe present tractable algorithms to assess the sensitivity of a stochastic dynamic fleet management model to fleet size

and load availability. In particular, we show how to compute the change in the objective function value in response to an

additional vehicle or an additional load introduced into the system. The novel aspect of our approach is that it does not

require multiple simulations with different values of the model parameters, and in this respect it differs from trial-anderror-

based “what-if” analyses. Numerical experiments show that the proposed methods are accurate and computationally

attractive.

Topaloglu, H. and W.B. Powell, “Incorporating Pricing Decisions into the Stochastic Dynamic Fleet Management Problem,” Transportation Science, Vol. 41, No. 3, pp. 281-301 (2007). (c) Informs

There have been many papers addressing the problem of efficiently managing fleets of vehicles. This paper addresses this problem from the perspective of pricing. Using an approximate dynamic programming to optimize the flows of vehicles, we derive gradient information that allows us to adjust the prices at the same time. The model requires that we have a model of how demand responds to price. The paper shows that we get near-optimal solutions to prices while we simultaneously optimize flows.

Powell, W.B. and H. Topaloglu, “Fleet Management,” in Applications of Stochastic Programming, (S. Wallace and W. Ziemba, eds.), Math Programming Society - SIAM Series in Optimization, Philadelphia,pp. 185-216, 2005.

Stochastic programming has been steadily making its way into a variety of application areas. This chapter demonstrates the use of a class of stochastic programming methods for fleet management applications, using the freight car distribution problem as a problem context.

Transportation Science, Vol. 36, No. 1, pp. 21-39 (2002). (c) Informs

Transportation Science, Vol. 36, No. 1, pp. 40-54 (2002). (c) Informs

Carvalho, T. and W.B. Powell, "A Multiplier Adjustment Method for Dynamic Resource Allocation Problems," Transportation Science, Vol. 34, No. 2, pp. 150-164 (2000). (c) Informs.

This paper gives our best results using linear value functions. The algorithm ("Lama" = linear approximation, mutliplier adjustment) uses a multiplier adjustment procedure to introduce the highest level of precision. The method is not suitable for stochastic problems.

Powell, W.B. and T. Carvalho, "Dynamic Control of Logistics Queueing Networks for Large Scale Fleet Management," Transportation Science, Vol. 32, No. 2, pp. 90-109, 1998. (c) Informs.

This paper launched an entirely new line of investigation into dynamic fleet management, using adaptively estimated linear value functions. Linear value functions by themselves are too unstable, so we used upper bounds to keep the flow from bouncing around too much. Since this paper, we have investigated the use of auxiliary functions to stabilize the flows, and nonlinear value function approximations. Both of these approaches appear to be quite promising.

Powell, W.B. and T. Carvalho, "Real-Time Optimization of Containers and Flatcars for Intermodal Operations," Transportation Science, Vol. 32, No. 2, pp. 110-126, 1998. (c) Informs.

This paper demonstrates the flexibility of the "LQN" modeling framework for solving a very complex problem arising in railroads. The challenge here was the assignment of trailers and containers to flatcars. The problem was that the rules for determining what trailer/container combinations could be assigned to a particular flatcar are quite complex. We used our ability to break the entire problem into a sequence of very small subproblems, representing each terminal at each point in time. Here, we were able to use a very precise, but computationally demanding, algorithm that captured the rules for capacity allocation with a high level of detail. The resulting algorithm was quite tractable, and we showed how the downstream gradients provided significantly better overall performance.

Powell, W.B. "A Stochastic Formulation of the Dynamic Assignment Problem, with an Application to Truckload Motor Carriers," Transportation Science. Vol. 30, No. 3, pp. 195-219 (1996). (c) Informs.

A summary of my work, primarily from the 1980's, in truckload trucking. This was the first model to integrate the load matching problem with forecasting, using a neat nonlinear approximation of the future. The paper summarizes the research of several masters theses, which investigated the impact of advance information on system performance.

Powell, W.B., T. Carvalho, G. Godfrey and H. Simao, "Dynamic Fleet Management as a Logistics Queueing Network.," Annals of Operations Research, Vol. 6, pp. 165--188 (1995). (c) Informs.

Our very earliest effort in this direction. If this topic seems interesting, look at (click here)

Carvalho, T. and W.B. Powell, "Dynamic Control of Multicommodity Logistics Queueing Networks," European Journal of Operations Research, Vol. 98, No. 3, 1997.

This was an attempt to apply the LQN-methodology (basically, linear approximations of the value of the future) to multicommodity problems. It is not trivial. Our current thinking is to use the nonlinear approximations of Topaloglu.

Cheung, R. K.-M. and W.B. Powell, "An Algorithm for Multistage Dynamic Networks with Random Arc Capacities, with an Application to Dynamic Fleet Management," Operations Research, Vol. 44, No. 6, pp. 951-963. (1996). (c) Informs.

Here we use the static, nonlinear approximations developed for stochastic trees in the context of multistage problems. The best available at the time, but even better results yet to come (see the work of Topaloglu).

Frantzeskakis, L. and W. B. Powell, "A Successive Linear Approximation Procedure for Stochastic, Dynamic Vehicle Allocation Problems," Transportation Science, Vol. 24, No. 1, pp. 40 - 57 (1990). (c) Informs.

Powell, W.B. and L. Frantzeskakis, "Restricted Recourse Strategies for Dynamic Networks with Random Arc Capacities," Transportation Science , Vol. 64, pp. 261-287, 1996. (c) Informs

Powell, W.B. "A Comparative Review of Alternative Algorithms for the Dynamic Vehicle Allocation Problem," Vehicle Routing: Methods and Studies (B. Golden and A. Assad, eds.), North Holland, pp. 249-291 (1988).This was the first paper to formulate the dynamic vehicle allocation problem as a multistage stochastic network with random arc capacities. This insight was the foundation of the work on approximations and bounding by Frantzeskakis, and the work on tree recourse by Cheung.

Powell, W.B.,"An Operational Planning Model for the Dynamic Vehicle Allocation Problem with Uncertain Demands," Transportation Research , Vol. 21B, No. 3, pp. 217-232 (1987).

This is the first paper I wrote that used a separable approximation to produce piecewise linear recourse functions to capture the value of trucks in a region. The beauty of this paper is that it was able to produce a computationally tractable formulation that combined fleet management, which is a network problem, and the forecasts of future activities. It took me 15 years to really close the loop on this work with the paper:

Powell, W.B., A. Ruszczynski and H. Topaloglu, “Learning Algorithms for Separable Approximations of Stochastic Optimization Problems,” Mathematics of Operations Research, Vol 29, No. 4, pp. 814-836 (2004). (c) Informs.

which proved some key theorems and relaxed the core assumption of the separability of the objective function.

Hughes, R. and W.B. Powell, "Mitigating End Effects in the Dynamic Vehicle Allocation Model," Management Science, Vol. 34, No. 7, pp. 859-379 (1988). (c) Informs.

An extension of work by Grinold and Hopkins to the fleet assignment problem.

Powell, W.B., "A Stochastic Model of the Dynamic Vehicle Allocation Problem," Transportation Science Vol. 20, pp. 117-129 (1986).

My first stochastic fleet management paper - this paper extends (and fills some holes in) the seminal paper by Jordan and Turnquist for empty freight car distribution.(click here to download paper)

**Military airlift (back)**

Powell, W. B., Belgacem Bouzaiene-Ayari, Jean Berger, Abdeslem Boukhtouta, Abraham P. George, “The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations”, ACM Transactions on Automatic Control, Vol. 22, No. 1, pp. 39-57 (2011), DOI: 10.1145/2043635.2043636.

This paper shows that approximate dynamic programming can produce robust strategies in military airlift operations. Simulations are run using randomness in demands and aircraft availability. When demands are uncertain, we vary the degree to which the demands become known in advance. The results show that if we allocate aircraft using approximate dynamic programming, the effect of uncertainty is significantly reduced. These results call into question simulations that examine the effect of advance information which do not use robust decision-making, a property that we feel reflects natural human behavior.

Wu, Tongqiang, W.B. Powell and A. Whisman, “The Optimizing-Simulator: An Illustration using the Military Airlift Problem,” ACM Transactions on Modeling and Simulation, Vol. 19, No. 3, Issue 14, pp. 1-31 (2009).

There is a growing sense that the modeling of complex problems in transportation and logistics require drawing on the skills of both mathematical programming and simulation. This paper accomplishes this, combining the modeling and algorithmic framework of approximate dynamic programming with pattern matching and proximal point algorithms. The paper is organized along the theme of modeling not just the physical process, but also the organization and flow of information and decisions. The paper proposes that a decision function can be a modeling choice, and shows how you can use four classes of information to build a family of decision functions.

**Health and medical
applications**

**Miao He, L. Zhao, W. B. Powell, “Approximate Dynamic Programming Algorithms for Optimal Dosage Decisions in Controlled Ovarian Hyperstimulation,” European J. of Operational Research, Vol. 222, No. 2, October 2012, pp. 328–340**

In the controlled ovarian hyperstimulation (COH) treatment, clinicians monitor the patients' physiological responses to gonadotropin administration to tradeoff between pregnancy probability and ovarian hyperstimulation syndrome (OHSS). We formulate the dosage control problem in the COH treatment as a stochastic dynamic program and design approximate dynamic programming (ADP) algorithms to overcome the well-known curses of dimensionality in Markov decision processes (MDP). Our numerical experiments indicate that the piecewise linear (PWL) approximation ADP algorithms can obtain policies that are very close to the one obtained by the MDP benchmark with significantly less solution time.

**Ryzhov, I., W. B. Powell, “A Monte-Carlo Knowledge Gradient Method
for Learning Abatement Potential of Emissions Reduction Technologies,”
Winter Simulation Conference, 2009. M. D. Rossetti, R. R. Hill, B. Johansson,
A. Dunkin, and R. G. Ingalls, eds, 2009, pp. 1492-1502.**

Suppose that we have a set of emissions reduction technologies whose greenhouse gas abatement potential is unknown, and we wish to find an optimal portfolio (subset) of these technologies. Due to the interaction between technologies, the effectiveness of a portfolio can only be observed through expensive field implementations. We view this problem as an online optimal learning problem with correlated prior beliefs, where the performance of a portfolio of technologies in one project is used to guide choices for future projects. Given the large number of potential portfolios, we propose a learning policy which uses Monte Carlo sampling to narrow down the choice set to a relatively small number of promising portfolios, and then applies a one-period look-ahead approach using knowledge gradients to choose among this reduced set.

(click here to download paper)

**Vehicle routing
and scheduling (back)**

**Cheung, R. K.-M., N. Shi, W. B. Powell, and H. P. Simao, “An
Attribute-Decision Model for Cross-Border Drayage Problem,” Transportation
Research E: Logistics and Transportation Review, Volume 44, No. 2, pp. 217-234
(2008). (c) Elsevier**

This paper illustrates a modeling and algorithmic strategy for multi-layered resource scheduling, using the context of the cross-border drayage problem.

**Spivey, M. and W.B. Powell, “The Dynamic Assignment Problem,”
Transportation Science, Vol. 38, No. 4, pp. 399-419 (2004). (c) Informs.**

This paper proposes a general model for the dynamic assignment problem, which involves the assignment of resources to tasks over time, in the presence of potentially several streams of information processes. It proposes an adaptive learning model that produces non-myopic behavior, and suggests a way of using hierarchical aggregation to reduce statistical errors in the adaptive estimation of the value of resources in the future. Finally, it reports on a study on the value of advance information. Past studies of this topic have used myopic models where advance information provides a major benefit over no information at all. Our model uses adaptive learning to bring forecast information into decisions made now, providing a more realistic estimate of the value of future information.

**Powell, W.B., M.T. Towns and A. Marar, "On
the Value of Globally Optimal Solutions for Dynamic Routing and Scheduling Problems,"
Transportation Science, Vol. 34, No. 1, pp. 50-66 (2000). (c) Informs.**

This paper addresses the issue of user noncompliance - what happens when the user does not do what the model recommends. Not surprisingly, the value of global optimization methods is diminished. Perhaps more surprisingly, suboptimal algorithms which are more greedy in nature work even better than an optimal solution.

**Powell, W.B., W. Snow and R. K.-M. Cheung, "Adaptive
Labeling Algorithms for the Dynamic Assignment Problem," Transportation
Science, Vol. 34, No. 1, pp. 67-85 (2000). (c) Informs.**

**Chen, Z.L. and W.B. Powell, "A Generalized Threshold Algorithm
for the Shortest Path Problem with Time Windows," in Network Design: Connectivity
and Facilities (P. Pardalos, D. Du, eds.), pp. 303-318 (1998).**

We look at the resource constrained shortest path problem (by Desroschers, Desrosiers and Soumis) and show that Klingman's threshold algorithm is faster in side by side comparisons of code programmed by the same programmer. This is yet another experimental demonstration that the algorithm with the better complexity bound (label setting) is not quite as good as label correcting algorithms (the threshold algorithm is a member of this class).

**Powell, W.B. "A Stochastic Formulation of the Dynamic Assignment
Problem, with an Application to Truckload Motor Carriers," Transportation
Science. Vol. 30, No. 3, pp. 195-219 (1996). (c) Informs.**

A summary of my work, primarily from the 1980's, in truckload trucking. This was the first model to integrate the load matching problem with forecasting, using a neat nonlinear approximation of the future. The paper summarizes the research of several masters theses, which investigated the impact of advance information on system performance.

**Koskosidis, I A., W.B. Powell and M. Solomon, "An Optimization-Based
Heuristic for Vehicle Routing and Scheduling with Soft Time Window Constraints,"
Transportation Science, Vol. 26, No. 2, pp. 69-85, 1992. (c) Informs**

**Powell, W.B. and D. Gittoes, "An Approximate Labeling Algorithm
for the Dynamic Assignment Problem," Advanced Methods in Transportation
Analysis, (L. Bianco, P. Toth, M. Bielli, eds.), Springer, New York, pp. 547-584,
1996.**

**Koskosidis, I.A. and W.B. Powell, "Clustering Algorithms for Consolidation
of Customer Orders into Vehicle Shipments," Transportation Research, Vol.
26B, No. 5, pp. 365-379, 1992.**

(Click here to download paper)

**Service
network design (back)**

*Service network design covers the problem of when to move a truck that can
carry multiple shipments. In a static setting, this produces large integer programming
problems. In a dynamic setting, it creates batch service processes.*

**Dall’Orto, L. C., T. G. Crainic, J. E. Leal and W. B. Powell,
“The Single-Node Dynamic Service Scheduling and Dispatching Problem,”
European Journal of Operations Research, Vol. 170, No. 1, pp. 1-23 (2005).**

Network design problems represent one of the hardest classes of integer programming problems. Time-staged versions of these problems are even harder. In this paper we extend the approximate dynamic programming strategy introduced by Katerina Papadaki for the single link problem to a problem where trucks are being dispatched out of a single node to multiple destinations. We have to figure out where to send trucks, and what freight to put on each truck. A truck going to destination j may carry freight to a final destination k. We treat the cost function from j to k as linear, and focus on the integer programming problem out of the origin node i. We solve one time period at a time, using a linear approximation to capture the impact of holding customers from time t to t+1. A metaheuristic is used to solve the single node (multiple link) problem, which while not very large, has to be solved very quickly (since it has to be solved iteratively).

**Papadaki, K. and W.B. Powell, “An Adaptive Dynamic Programming
Algorithm for a Stochastic Multiproduct Batch Dispatch Problem,” Naval
Research Logistics, Vol. 50, No. 7, pp. 742-769, 2003.**

One of the oldest problems in dynamic programming arises in the context of planning inventories. You can use textbook backward dynamic programming if there is only one product type, but real problems have multiple products. In this paper, we consider a multiproduct problem in the context of a batch service problem where different types of customers wait to be served. Arrivals are stochastic and nonstationary. We show that an approximate dynamic programming strategy using linear value functions works quite well and is computationally no harder than a simple myopic heuristics (once the iterative learning is completed).

**Papadaki,
K. and W.B. Powell, “Exploiting structure in adaptive dynamic programming
algorithms for a stochastic batch service problem,” European Journal of
Operational Research, Vol. 142, No. 1, pp. 108-127 (2002).**

This paper compares an optimal policy for dispatching a truck over a single link (with one product type) against an approximate policy that uses approximations of the future. Why would we approximate a problem that is easy to solve to optimality? Because the optimal policy only works on single link problems with one type of product, while the other is scalable to much harder problems.

**J.B. Braklow, W. Graham, K. Peck, S. Hassler, and W.B. Powell, "Interactive
Optimization Improves Service and Performance for Yellow Freight System,"
Interfaces, Vol. 22, No. 1, 1992, pp. 147-172.**

This was a finalist in the 1991 Franz Edelman competition, and for many years one of the most requested videotapes from the Edelman library. The paper documents a strategic planning system implemented at Yellow in the late 1980's which is still in production at Yellow as of this writing (2005). The system uses interactive optimization techniques to build a service network that delivers packages both efficiently and with high service.

**Farvolden, J.M. and W.B. Powell, "Subgradient Optimization for
the Service Network Design Problem," Transportation Science, Vol. 28, No.
3, pp. 256-272. (c) Informs **

**Powell, W.B. "A Local Improvement Heuristic for the Design of
Less-Than-Truckload Motor Carrier Networks," Transportation Science, Vol
20, No. 4, pp. 246-257 (1986). (c) Informs**

**Koskosidis, I.A. and W.B. Powell, "Shipment Routing Algorithms
with Tree Constraints," Transportation Science Vol 26, No. 3, pp. 230-245,
1992. (c) Informs **

Less-than-truckload networks require that shipments be routed over links where the cost on a link is a piecewise linear function of the flow. Such problems could be solved as classical traffic assignment problems, but LTL networks (at the time) require that the routing of the shipments be communicated iin a way that follows a tree so that they can be described with the instruction "a shipment at i, headed to destination k, be sent on a truck going to j." This paper formulates the problem and describes a solution procedure.

**Powell, W.B. and Y. Sheffi, "Design and Implementation of an Interactive
Optimization System for Network Design in the Motor Carrier Industry,"
Operations Research, Vol. 37, No. 1, pp. 12-29 (1989). (c) Informs.**

This paper describes a production system for static load planning for less-than-truckload motor carriers. The problem is determining where a carrier should offer "direct service". If we run trucks from i to j, it will carry freight that not only originates and i and terminates at j, but also other origin-destination pairs for which this link in the network helps them move quickly and at low cost. The problem the carrier faces is that the trucks have to move at least once a day, and possibly more. So, if there is not enough freight to fill the trucks, then it is possible that it is better to not run trucks between these points, and force freight through links with higher density. While this can be set up as a large-scale integer network design problem, the practical reality is that there are issues not captured by the cost model. This system solves the problem interactively, which also means that the calculations had to be *really* fast (on 1980's vintage computers). Developed at Princeton University, this system was one of the founding products of the Princeton Transportation Consulting Group, and has been used by numerous LTL carriers in the 1990's.

**Powell, W.B. and Y. Sheffi, "The Load Planning Problem of Motor
Carriers: Problem Description and a Proposed Solution Approach," Transportation
Research, Vol. 17A, No. 6, pp. 471-480, (1983).**

**Machine
Scheduling (back)**

*An important subclass of resource allocation problems fall
under the general category of machine scheduling. These problems
arise when there are resources that must be assigned to tasks
over time. The machine scheduling literature focuses on problems
where the "machines" (resources) are few and relatively
(or completely) homogeneous.*

**D. R. Ronconi and W. B. Powell, “Minimizing Total Tardiness
in a Single Machine Scheduling Problem using Approximate Dynamic Programming,”
Journal of Scheduling. Vol. 13, No. 6, pp. 597-607 (2010). DOI: 10.1007/s10951-009-0160-6.**

The paper uses approximate dynamic programming to help decide whether a job should be scheduled for today or some day in the future.

**Chen, Z.-L. and W.B. Powell, "Solving
Parallel Machine Scheduling Problems by Column Generation," Informs Journal
of Computing, Vol. 11, No. 1, Winter 1999. (c) Informs **

This is the first paper to use column generation for parallel machine scheduling problems. Optimal solutions are presented demonstrating the computational efficiency of the technique.

**Chen, Z.-L. and W.B. Powell, Chen, Z.-L. and W.B. Powell, "A Column-Generation
Based Decomposition Algorithm for a Parallel Machine Just-In-Time Scheduling
Problem," European Journal of Operations Research, Vol. 116, pp. 220-232
(1999).**

**Chen, Z.L. and W.B. Powell, “Exact Algorithms for Scheduling
Multiple Families of Jobs on Parallel Machines,” Naval Research Logistics,
Vol. 50, No. 7, pp. 823-840, 2003.**

**Physical
distribution (back)**

**Godfrey, G. and W.B. Powell, "An Adaptive,
Distribution-Free Algorithm for the Newsvendor Problem with Censored Demands,
with Application to Inventory and Distribution Problems," Management Science,
Vol. 47, No. 8, pp. 1101-1112, (2001). (c) Informs.**

This paper proposes a novel approximation procedure for stochastic resource allocation problems. The technique works with a series of stochastic gradients which are then used to maintain a concave estimate of the value function. When applied to problems with known optimal solutions, the method provides results within a fraction of a percent of optimal. One byproduct is a fast algorithm for the newsvendor problem that does not require knowledge of the underlying demand distribution. The technique is also applied to separable distribution problems.

**"Models and Algorithms for Distribution Problems with Uncertain
Demands," Transportation Science, Vol. 30, No. 1, pp. 43-59 (with R. K.-M.
Cheung). (c) Informs **

**Traffic and public
transportation (back)**

*These are papers from an earlier life. The most important papers from this
era (actually, before I started my dissertation) were:*

**Powell, W.B. and Y. Sheffi, "The
Convergence of Equilibrium Algorithms with Predetermined Step Sizes," Transportation
Science, Vol. 16, No. 1, pp. 45-55, (1982). (c) Informs **

**Sheffi, Y. and W.B. Powell, "An Algorithm for the Equilibrium
Assignment Problem with Random Link times," Networks, Vol. 12, No. 1, pp.
191-207, (1982).**

*These two papers were the first to discover the use of stochastic approximation
procedures for solving complex traffic assignment problems. These papers applied
a theory developed by Kiefer and Wolfowitz and Blum from the early 1950's to
solving stochastic optimization problems where the challenge was finding the
stepsize. These papers "adapted" this theory, in a somewhat heuristic
way, to the constrained optimization problem posed for modeling static traffic
assignment problems.*

**Y. Sheffi, H. Mahmassani, W. B. Powell, "A Transportation Network
Evacuation Model," Transportation Research, Vol. 16A, No. 3, pp. 209-218,
(1982).**

This was one of the very earliest papers (and models) addressing the problem of evacuating people from around nuclear power plants. This model, called "Netvac1," was written in the spring of 1980 for a local consulting firm, HMM Associates, in response to a new regulation requiring nuclear power plants to have an evacuation plan. The model used a deterministic, discrete-time simulation which kept careful track of intersection and lane capacities. As segments of the road would become full, cars would be prevented from moving forward. The model assumed that everyone was following a shortest path to a 10 mile radius, and this logic would adapt as road congestion changed. The model has taken on renewed interest following disasters such as Hurricane Katrina and the nuclear accident in Japan resulting from the tsunami.

**Sheffi, Y., H. Mahmassani, and W.B. Powell,
"Evacuation Studies for Nuclear Power Plant Sites: A New Challenge for
Transportation Engineers," Journal of the Institute of Traffic Engineers,
Vol. 51, No. 6, pp. 25-28, (1981).**

**Powell, W.B.,"A Stochastic Passenger
Loading Model of Airline Schedule Performance," Transportation Research,
Vol. 17B, No. 5, pp. 399-410, (1983).**

**Powell, W. B., "Analysis of Airline Operating Strategies Under
Stochastic Demand," Transportation Research, Vol. 16B, No. 1, pp. 31-44,
(1982).**

**Powell, W.B. and C. Winston, "A Numerical
Investigation of the Impact of Uncertain Demand and Varying Risk Preferences
on the Pricing and Capacity Decisions of Transportation Firms: the Case of Airlines,"
Transportation Research, Vol. 17B, No. 6, pp. 471-490 (1983).**

**Powell, W.B. and Y. Sheffi, "A Probabilistic
Model of Bus Route Performance," Transportation Science, Vol. 17, No. 4,
pp. 376-404, (1983). (c) Informs**

**Sheffi, Y. and W.B. Powell, "A Comparison
of Stochastic and Deterministic Traffic Assignment Over Congested Networks,"
Transportation Research, Vol. 15B, No. 1, pp. 53-64, (1980).**

**Freight
demand estimation (back)**

*OK, so we don't have a long list of papers on statistical
estimation. At the same time, we have had to put an inordinate
amount of work into forecasting (with very little time to write
it up). It turns out that if you want to develop stochastic models,
you have to create your own estimates of the future. We never
anticipated needing to add to the forecasting literature, but
forecasting for freight applications has a number of unique characteristics.*

**G. Godfrey and W.B. Powell, "Adaptive Estimation of Daily Demands
with Complex Calendar Effects," Transportation Research, Vol. 34, No. 6,
pp. 451-469 (2000). **

**Stories from
the field (back)**

**Simao, H. P., J. Day, A. George, T. Gifford, W. B. Powell, “An
Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management:
A Case Application,” Transportation Science, published online doi 10.1287/trsc.1080.0238
(c) Informs**

This is a major application paper, which summarizes several years of development to produce a model based on approximate dynamic programming which closely matches historical performance. The model represents drivers with 15 attributes, capturing domicile, equipment type, days from home, and all the rules (including the 70 hour in eight days rule) governing drivers. The model gets drivers home, on weekends, on a regular basis (again, closely matching historical performance). The value functions produced by the ADP algorithm are shown to accurately estimate the marginal value of drivers by domicile.

(click here to download paper) See also the companion paper below:

**Simao, H. P. A. George, Warren B. Powell, T. Gifford, J. Nienow, J.
Day, "Approximate Dynamic Programming Captures Fleet Operations for Schneider
National," Interfaces, Vol. 40, No. 5, pp. 342-352, 2010. (c) Informs**

This paper is a lite version of the paper above, submitted for the Wagner competition. This paper does with pictures what the paper above does with equations.

**Powell, W.B., “Real-time dispatching for truckload
motor carriers,” in Logistics Engineering Handbook (G. Don Taylor, ed.),
CRC Press, 2007, pp. 15-1 – 15-30. (c) CRC Press.**

This paper describes our experience implementing a real-time optimization model at Burlington Motor Carriers. Intended for a broad audience, it describes the basics of the model and some of tbe behind-the-scenes events that were encountered during implementation.

**Powell, W.B., A. Marar, J. Gelfand, and S. Bowers, “Implementing
Operational Planning Models: A Case Application from the Motor Carrier Industry,”
Operations Research, Vol. 50, No. 4, (2002). (c) Informs**

This paper summarizes our experience implementing a real-time load matching model at a large motor carrier. As part of the research, we compiled a roughly six month history of actual dispatch decisions, and then compared what actually happened to what our model would do. The company was liquidated about five years after the project was terminated. Makes you wonder what might have happened if they had used the systems more aggressively.

**J.B. Braklow, W. Graham, K. Peck, S. Hassler, and W.B. Powell, "Interactive
Optimization Improves Service and Performance for Yellow Freight System,"
Interfaces, Vol. 22, No. 1, 1992, pp. 147-172. This paper was our entry for
the 1991 Franz Edelman Competition, in which we were a finalist. (c) Informs
**

This was a finalist in the 1991 Franz Edelman competition, and for many years one of the most requested videotapes from the Edelman library. The paper documents a strategic planning system implemented at Yellow in the late 1980's which is still in production at Yellow as of this writing (2005). The system uses interactive optimization techniques to build a service network that delivers packages both efficiently and with high service.

**Powell, W.B., Y. Sheffi, K. Nickerson, K. Butterbaugh and S. Atherton,
"Maximizing Profits for North American Van Lines' Truckload Division: A
New Framework for Pricing and Operations," Interfaces, Vol. 18, No. 1,
pp. 21-41 (1988). This paper was our entry for the 1987 Franz Edelman competition,
in which we placed second. (c) Informs**

(click here to download paper)

"Finding the Yellow Brick Road" is a seven part series that is written in short-story format, that is intended to give the reader an idea of some of the issues that arise from a management perspective. The paper features an academic consulting (modeled after myself) and the president of a trucking firm, modeled after Don Mayoras (currently the president of two trucking firms). The sequence appeared as the following papers:

Powell, W.B. and D. Mayoras, "Finding the Yellow Brick Road: Part I, "Toto, I Have a Feeling We're Not in Kansas Anymore!", Interfaces, Vol. 26, No. 5, pp. 26-33 (1996). (c) Informs

- (Click here to view paper)

Powell, W.B. and D. Mayoras, "Finding the Yellow Brick Road: Part II, "Lions and Tigers and Bears!", Interfaces, Vol. 26, No. 6, pp. 57-67 (1996). (c) Informs

- (Click here to view paper)

Powell, W.B. and D. Mayoras, "Finding the Yellow Brick Road: Part III, "The Wizard of Oz", Interfaces, Vol. 27, No. 1, pp. 143-154 (1997). (c) Informs

- (Click here to view paper)

Powell, W.B. and D. Mayoras, "Finding the Yellow Brick Road: Part IV, "I Wish I had a Brain", Interfaces, Vol. 27, No. 2, pp. 37-47 (1997). (c) Informs

- (Click here to view paper)

Powell, W.B. and D. Mayoras, "Finding the Yellow Brick Road: Part V: Through the Crystal Ball" Interfaces, Vol. 27, No. 3, pp. 14-21 (1997). (c) Informs

- (Click here to view paper)

Powell, W.B. and D. Mayoras, "Finding the Yellow Brick Road: Part VI: Courage!" Interfaces, Vol. 27, No. 4, pp. 12-22 (1997). (c) Informs

- (Click here to view paper)

Powell, W.B. and D. Mayoras, "Finding the Yellow Brick Road: Part VII: I Finally Have a Brain!," Interfaces, Vol. 27, No. 5, pp. 9-14. (1997). (c) Informs

- (Click here to view paper)