People
Warren
B. Powell is the founder and director of CASTLE Laboratory and PENSA, the Princeton Laboratory for ENergy Systems Analysis. A faculty member
at Princeton since 1981, CASTLE Lab was created in 1990 to reflect an expanding
research program into dynamic resource management.
Click here for a biographical summary and my c.v.
Senior research staff
Click here for research of recent graduate students.
Click here for a list of prior graduate students.
Graduate student and post-doc research:
Boris works at the interface of stochastic optimization and machine learning. While finishing this work, he has developed a novel adaptation of the knowledge gradient to a general class of robust learning problems, spanning ranking and selection to linear programs.
Boris also brings significant expertise in power flow modeling, and has developed the AC and DC power flow models used in our new SMART-PJM model of the PJM power grid.
W. R. Scott and W. B. Powell, “Portfolio Selection and Covariance Matrix Estimation using an Errors-in-Variables Factor Model with an Application to the PJM Electricity Market,” in preparation.
Companies can form portfolios of financial trading rights that give them the right to purchase electricity between pairs of nodes. In this paper, we use a Markowitz portfolio model, but introduce a novel way of estimating the covariance matrix by using very short time series. This allows us to capture short bursts of volatility. The method exploits the method of instrumental variables and the structure of a factor model for prices.
Warren Scott, W. B. Powell, “Approximate Dynamic Programming for Energy Storage with New Results on Instrumental Variables and Projected Bellman Errors,” under review.
The problem of optimizing energy storage in the presence of wind and stochastic spot prices for electricity to serve a stochastic and time-depend load is modeled and solved as a steady state Markov decision process. We first solve the problem using several variations of Bellman error minimization, starting with least-squares policy iteration. Following Bradtke & Barto and Lagoudakis and Parr, we introduce instrumental variables to handle the need to partially simulate the independent variables, and compare this to projected Bellman error minimization. We show for the first time that these are mathematically equivalent. We create a series of benchmark problems which are solved optimally to produce a set of benchmarks. We show that instrumental variables significantly improve the results, but still produce solutions that are approximately 70 percent of optimal. We then fit the regression parameters using direct policy search, using the knowledge gradient algorithm to perform the search. This strategy produces results within 5 percent of optimal.
W. Scott, P. Frazier, W. B. Powell – “The Correlated Knowledge Gradient for Maximizing Expensive Continuous Functions with Noisy Observations using Gaussian Process Regression,” SIAM J. on Optimization (to appear)
Our
original work in information collection has focused on problems where measurements
are discrete (which technology should we try, what is the best drug compound).
TThis logic can handle problems with hundreds or thousands of measurements,
but there are many problems where measurements are continuous, and in particular
where they are continuous vectors. We might, for example, need to determine
where to measure radiation or disease (two spatial dimensions) or finding the
best set of parameters to optimize a device (might be 5 or 10 dimensions), or
finding the best set of parameters to optimize an expensive simulation (dozens
to hundreds of dimensions). He is using the knowledge gradient principle (introduced
by Peter Frazier) to compute continuous representations of the knowledge gradient
which can then be optimized using classical nonlinear programming algorithms.
The graph to the right is a contour plot of the knowledge gradient, which gives the marginal value if we measure a particular set of parameters (illustrated here for a two-dimensional application). We can use gradients of the knowledge gradient surface to guide us to the next point we should measure (toward the bottom, just to the right of center).
(Click here to download paper)
Scott, W., W. B. Powell, H. P. Simao, "Calibrating Simulation Models using the Knowledge Gradient with Continuous Parameters," Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Y¨ucesan, eds., Baltimore, Md, December, 2010.
This is the proceedings paper that is a synopsis of the full paper above. The work is to be presented at the Winter Simulation Simulation Conference in 2010.
(Click here to download paper)
Jae is a Ph.D. student in Electrical Engineering, but he has joined the lab to contribute his skills in signal processing, statistics and stochastic processes to model optimization problems arising in the contest of making the most of wind energy.
J. Kim, W. B. Powell, “Quantile Optimization using Asymmetric Signum Functions with Heavy-Tailed Distributions” (under review)
We present a provably convergent algorithm for computing the quantile of a random variable that does not require storing all of the sample realizations. We then present an algorithm for optimizing the quantile of a random function which may be characterized by a heavy-tailed distribution where the expectation is not de ned. The algorithm is illustrated in the context of electricity trading in the presence of storage, where electricity prices are known to be heavy-tailed with in nite variance.
Kim, J. H. and W. B. Powell, "Optimal Commitment Strategies for Wind Energy in the Presence of Storage," (to appear)
This paper formulates the problem of making advance commitments for wind energy as an optimization problem, taking into account the presence of storage with storage losses. The model provides for Markovian spot and regulating prices, and a stochastic wind process. If there is no storage loss, we derive an optimal policy for wind energy commitments for an arbitrary wind distribution. When there are storage losses, we derive an optimal policy if wind follows a (shifted) uniform distribution. For this case, we also derive a simple formula for the economic value of storage.
J. Kim, W. B. Powell, “An Hourly Electricity Price Model for Heavy-Tailed Spot Prices,” Energy Economics (to appear).
We propose an hour-ahead prediction model for electricity prices that captures the heavy-tailed behavior that we observe in the hourly spot market in the Ercot (Texas) and the PJM West hub grids. We present a model according to which we separate the price process into a thin-tailed trailing-median process and a heavy-tailed residual process whose probability distribution can be approximated by a Cauchy distribution. We show empirical evidence that supports our model.
Ilya is working on a very general class of information collection problems, extending the knowledge gradient principle developed by Peter Frazier. He is considering a class of problems that involve making a measurement of some form which improves our estimates of parameters (such as costs) of an underlying optimization problem. The challenge is to spend our measurement budget as carefully as possible. Applications of this research span health, biosurveillance and energy, among others.
Book in preparation:
W. B. Powell, I. O. Ryzhov, Optimal Learning, John Wiley and Sons, 2012.
This book introduces the principles of how to collect information efficiently for problems where observations are time consuming and/or expensive. Using a presentation that is appropriate for an undergraduate class with a prior course in probability, the book introduces Bayesian and frequentist learning, the ranking and selection problem, the multiarmed bandit problem, and a number of other specialized learning problems (stopping problems, learning a unimodular function, learning a continuous surface, learning on a graph). Frequentist and Bayesian learning policies are presented, along with a thorough discussion of how to properly compare learning policies. Special emphasis is given to the knowledge gradient policy, which is developed for both online and offline problems, problems with independent and correlated beliefs, normal-normal models along with a number of other probability models.
Papers:
Journal publications
Ryzhov, I., W. B. Powell, “Information Collection in a Linear Program,”submitted to Mathematics of Operations Research, November, 2010.
Consider a linear program where the cost coecients are uncertain, for which we have a Bayesian prior. We can collect information to improve our understanding of these coecients, but this may be expensive, giving us a separate problem of optimizing the collection of information to improve the quality of the solution relative to the true cost coecients. We formulate this information collection problem for linear programs for the rst time, and derive a knowledge gradient policy which maximizes the marginal value of each measurement. We prove that this policy is asymptotically optimal, and demonstrate the performance of our algorithm on a network flow problem.
I. Ryzhov, W.B. Powell, "Information collection on a graph", Operations Research (to appear).
We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to rene Bayesian estimates of individual arc costs in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute, and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.
I. Ryzhov, W.B. Powell, and P. Frazier, "The Knowledge-Gradient Algorithm for a General Class of Online Learning Problems" (to appear in Operations Research)
The knowledge-gradient policy was originally derived for off-line learning problems such as ranking and selection. Considerable attention has been given to the on-line version of this problem, known popularly as the multiarmed bandit problem, for which Gittins indices are known to be optimal for discounted, infinite-horizon versions of the problem. In this paper, we derive a knowledge gradient policy for on-line problems, and show that it very closely matches the performance of Gittins indices for discounted infinite horizon problems. It actually slightly outperforms the best available approximation of Gittins indices (by Gans and Chick) on problems for which Gittins indices should be optimal. The KG policy is also effective on finite horizon problems. The only policy which is competitive with KG seems to be interval estimation, but this requires careful tuning of a parameter. The KG policy also works on problems where the beliefs about different alternatives are correlated. The KG policy with independent beliefs is extremely easy to compute (we provide closed-form expressions for the case with normal rewards), and requires a simple numerical algorithm for the case with correlated beliefs.
Ryzhov, I., P. I. Frazier and W. B. Powell, “Stepsize Selection for Approximate Value Iteration and a New Optimal Stepsize Rule,” (under review).
Approximate value iteration is used in dynamic programming when we use random observations to estimate the value of being in a state. These observations are smoothed to approximate the expected value function, leading to the problem of choosing a stepsize (the weight given to the most recent observation). A stepsize of 1/n is a common (and provably convergent) choice. However, we show that it leads to very slow convergence for a basic instance of approximate value iteration, to the point of being unusable for most applications. We then derive an optimal stepsize rule for that instance. This is the first stepsize rule that explicitly takes into account the covariance between the current observation and the previous prediction in approximate dynamic programming. The rule can be easily extended to general ADP settings; experimental results show that it leads to faster convergence than other formulas.
Conference proceedings
Ryzhov, I. and W. B. Powell, “Bayesian Active Learning with Basis Functions,” IEEE Workshop on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011.
A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to
use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of
basis functions. We propose a Bayesian strategy for resolving the exploration/exploitation dilemma in this setting. Our approach is
based on the knowledge gradient concept from the optimal learning literature, which has been recently adapted for approximate
dynamic programming with lookup-table approximations. The new method performs well in numerical experiments conducted
on an energy storage problem.
Ryzhov, I. O., W. B. Powell, "Approximate Dynamic Programming with Correlated Bayesian Beliefs," Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Sept. 29-Oct. 1, 2010.
The exploration-exploitation problem in dynamic programming is well-known, and yet most algorithms resort to heuristic exploration policies such as epsilon-greedy. We use a Bayesian model of the value of being in each state with correlated beliefs, which reflects the common fact that visiting one state teaches us something about visiting other states. We use the knowledge gradient algorithm with correlated beliefs to capture the value of the information gained by visiting a state. Using both a simple newsvendor problem and a more complex problem of making wind commitments in the presence of stochastic prices, we show that this method produces significantly better results than epsilon-greedy for both Bayesian and non-Bayesian beliefs. These are shown for both offline and online implementations.
Ryzhov, I. O., P. I. Frazier, W. B. Powell, “On the Robustness of a One-Period Look-ahead Strategy for Multi-armed Bandit Problems,” International Conference on Computer Science, Amsterdam, May, 2010.
We analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies.
Ryzhov, I. and W. B. Powell, “The Knowledge Gradient Algorithm For Online Subset Selection,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning (part of IEEE Symposium on Computational Intelligence), March, 2009.
Ryzhov, I., W. B. Powell, “A Monte-Carlo Knowledge Gradient Method for Learning Abatement Potential of Emissions Reduction Technologies,”, Winter Simulation Conference, 2009. To appear.
Jun's general area of research is approximate dynamic programming with continuous state and action variables. These problems have many applications, but one of our motivating applications is control problems that arise in energy settings. However, rigorous algorithms for this general problem class do not exist, although there is a growing body of theory.
J. Ma and W. B. Powell, "Convergence Analysis of Kernel-based On-policy Approximate Policy Iteration Algorithms for Markov Decision Processes with Continuous, Multidimensional States and Actions" (under review)
Using kernel smoothing techniques, we propose three dierent online, on-policy approximate policy iteration algorithms which can be applied to innite horizon problems with continuous and vector-valued states and actions. Using Monte Carlo sampling to estimate the value function around the post-decision state, we reduce the problem to a sequence of deterministic, nonlinear programming problems that allow us to handle continuous, vector-valued states and actions. We provide a formal convergence analysis of the algorithms under a variety of technical assumptions.
J. Ma and W. B. Powell, "Convergence Analysis of On-Policy LSPI for Multi-Dimensional Continuous State and Action-Space MDPs and Extension with Orthogonal Polynomial Approximation" (under review).
This paper considers problems with continuous (and high-dimensional) state and action spaces, where the imbedded expectation within Bellman's equation cannot be computed exactly (the three curses of dimensionality). The paper provides a review of the literature on convergence proofs using continuous value function approximations, and then provides a series of convergence proofs, including problems where the expectation can and cannot be computed exactly, but where we assume that the value function is spanned by a known set of basis functions. A key feature of the algorithm is that it uses online policy approximation, which means that it avoids a separate exploration process, something that is problematic for high-dimensional action spaces. The paper closes with an algorithm that relaxes the assumption of known basis functions using orthogonal polynomials.
Ma, J. and W. B. Powell, “A convergent recursive least squares policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning (part of IEEE Symposium on Computational Intelligence), March, 2009.
Lauren has been addressing a general class of stochastic optimization problems that arise in energy applications. She began with stochastic optimization models arising in energy, and has been moving in the general area of using advanced machine learning methods to solve stochastic optimization problems.
An updated list of her papers can be found on her website at Columbia University at www.columbia.edu/~lah2178
L. Hannah, W. B. Powell, D. Blei, "Stochastic Search with an Observable State Variable," (under review)
Think of a newsvendor problem where we have to choose a quantity, and then observe a random demand. Now assume that before we choose the quantity, we can observe the weather, which gives us some information about the demand. Now assume we are making a commitment of energy that we are going to deliver from our wind farm before we know how much the wind will blow. But before we make the commitment, we observe a vector of continuous variables including current and previous wind, prices and demand, along with time of day and information about the weather. We use DP-GLM to create an approximation that is a weighted sum of piecewise linear functions which allow us to capture the concavity of the function with respect to the wind commitment. This paper proves convergence of this strategy, and demonstrates its effectiveness in the context of an energy commitment problem.
L. Hannah, D. Blei and W. B. Powell, “Dirichlet Process Mixtures of Generalized Linear Models,” (under review).
We propose Dirichlet Process-Generalized Linear Models (DP-GLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and any response that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate and give a practical example for when those conditions hold. Additionally, we provide Bayesian bounds on the distance of the estimate from the true mean function based on the number of observations and posterior samples. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART and Gaussian processes. We show that the DP-GLM is competitive with the other methods, while accommodating various inputs and outputs and being robust when confronted with heteroscedasticity.
L. Hannah, W. B. Powell, and J. Stewart, “One-Stage R&D Portfolio Optimization with an Application to Solid Oxide Fuel Cells,” Energy Systems Journal, Vol. 1, No. 1, 2010.
This paper addresses the problem of determining which of a subset of R&D projects (constrained by a budget) we should invest in to improve the performance of a complex device (in this case, a solid oxide fuel cell). Projects may be competitive (we may end up choosing one type of material for a particular function) or complementary (an improvement in a particular material in one part of the fuel cell may make a different technology in a different part of the fuel cell more competitive). We might be choosing dozens of projects from a choice of several hundred, creating a combinatorially large set of potential portfolios (we only consider a single investment). Furthermore, estimating the value of a particular R&D portfolio involves evaluating an expectation that cannot be done analytically, and hence requires imprecise Monte Carlo evaluations.
L. Hannah and W. B. Powell, “Evolutionary Policy Iteration Under a Sampling Regime for Stochastic Combinatorial Optimization,” IEEE Transactions on Automatic Control, Vol. 55, No. 5, pp. 1254-1257, May, 2010.
The algorithmic research into the R&D problem involved making a comparison of a new algorithm against an existing algorithm in the literature, known as evolutionary policy iteration. This algorithm, however, required that we be able to compute the expectation exactly. This paper shows how to modify the EPI algorithm so that it retains its fundamental convergence properties.
Peter Frazier (Graduated June, 2009)
Peter developed the idea of the knowledge gradient policy for guiding the collection of information. His work started with simple ranking and selection problems (the off-line version of the multiarmed bandit problem) and has extended into a general class of algorithms for learning. He has shown that the knowledge gradient is not only myopically optimal (by construction, it makes the best measurement if there is only one measurement to make) but is also asymptotically optimal. Many algorithms are asymptotically optimal (e.g. randomly choosing what to measure), but KG is the only algorithm that is both myopically and asymptotically optimal. This supports the growing body of empirical work that suggests that KG works well in practice. His paper on asymptotic optimality of sequential sampling policies is a major theory paper establishing general conditions for asymptotic optimality for measurement strategies. Of particular practical importance is his paper on the knowledge gradient for problems with correlated beliefs.
P. Frazier and W. B. Powell, “Asymptotic Optimality of Sequential Sampling Policies for Bayesian Information Collection”
This paper provides a very general theory for the asymptotic optimality of a broad class of information collection strategies. The work builds on Peter's research on the knowledge gradient algorithm, but this paper provides general conditions for asymptotic optimality of learning algorithms.
P. Frazier, W. B. Powell, S. Dayanik, “The Knowledge-Gradient Policy for Correlated Normal Beliefs,” Informs Journal on Computing (to appear).
The knowledge gradient policy is a method for determining which of a discrete set of measurements we should make to determine which of a discrete set of choices we should make. Most of the applications that we have considered introduce the dimension of correlated beliefs. If we evaluate the level of contamination in one location and it measures high, we are likely to raise our belief about the level of toxin in nearby locations. If we test a machine for airport security that can sense explosives and it works poorly, we might lower our evaluation of other devices that might use similar technologies (e.g. a particular material or sensor within the device). This article shows how to compute the knowledge gradient for problems with correlated beliefs. The paper shows that just as with problems with independent beliefs, the knowledge gradient is both myopically and asymptotically optimal. Experimental work shows that it can produce a much higher rate of convergence than the knowledge gradient with independent beliefs, in addition to outperforming other more classical information collection mechanisms.
(click here to download paper) (Click here for online supplement)
Frazier, P., W. B. Powell and S. Dayanik, “A Knowledge Gradient Policy for Sequential Information Collection,” SIAM J. on Control and Optimization, Vol. 47, No. 5, pp. 2410-2439 (2008).
The knowledge gradient policy is introduced here as a method for solving the ranking and selection problem, which is an off-line version of the multiarmed bandit problem. Imagine that you have M choices (M is not too large) where you have a normally distributed belief about the value of each choice. You have a budget of N measurements to evaluate each choice to refine your distribution of belief. After your N measurements, you have to choose what appears to be the best based on your current belief. The knowledge gradient policy guides this search by always choosing to measure the choice which would produce the highest value if you only have one more measurement (the knowledge gradient can be viewed as a method of steepest ascent). The paper shows that this policy is myopically optimal (by construction), but is also asymptotically optimal, making it the only stationary policy that is both myopically and asymptotically optimal. The paper provides bounds for finite measurement budgets, and provides experimental work that shows that it works as well as, and often better, than other standard learning policies.
I. Ryzhov, W.B. Powell, and P. Frazier, The Knowledge-Gradient Algorithm for a General Class of Online Learning Problems (under review)
The knowledge-gradient policy was originally derived for off-line learning problems such as ranking and selection. Considerable attention has been given to the on-line version of this problem, known popularly as the multiarmed bandit problem, for which Gittins indices are known to be optimal for discounted, infinite-horizon versions of the problem. In this paper, we derive a knowledge gradient policy for on-line problems, and show that it very closely matches the performance of Gittins indices for discounted infinite horizon problems. The knowledge gradient policy, however, outperforms Gittins indices for finite horizon problems. Of perhaps greater practical interest, the knowledge gradient policy can be adapted to handle applications with correlated beliefs. This is demonstrated in an application involving the evaluation of treatments for diabetes, where a treatment consists of three separate drugs. Different treatments may share drugs (one treatment may involve drugs A, B and C, while another may use A,D and C) which leads to correlated beliefs. We show that the knowledge gradient, adapted to on-line learning problems with correlated beliefs, outperforms a number of competing policies.
P. Frazier, W. B. Powell, S.. Dayanik and P. Kantor, “Approximate Dynamic Programming in Knowledge Discovery for Rapid Response,” HICSS Conference, 2009.
Frazier, P. and W. B. Powell, “The knowledge gradient stopping rule for ranking and selection,” Proceedings of the Winter Simulation Conference, December 2008.
P. Frazier and A.J. Yu, "Sequential Hypothesis Testing under Stochastic Deadlines," Neural Information Processing Systems Conference, 2007.
P. Frazier, W. B. Powell, H. P. Simao, "Simulation Model Calibration with Correlated Knowledge-Gradients," Winter Simulation Conference, December, 2009.
A common challenge in the calibration of simulation model is that we have to tune several continuous parameters. This is our first application of the knowledge gradient algorithm with correlated beliefs to the problem of parameter tuning for simulation models. A single run of the model (which uses adaptive learning from approximate dynamic programming) requires more than a day, so the paper also introduces methods to product results without a full run. A Bayesian model is set up to capture the uncertainty in our beliefs about the convergence of the model. The method is illustrated in the tuning of two continuous parameters, which required approximately six runs of the model.
Frazier, P. and W. B. Powell, “The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards,” in Approximate Dynamic Programming and Reinforcement Learning, IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 143-150, April, 2007.
Powell, W. B. and P. Frazier, “Optimal Learning,” TutORials in Operations Research, Chapter 10, pp. 213-246, Informs (2008).
Kazutoshi Yamazaki (Graduated June, 2009)
Kazu has been working working on a special problem in information collection which involves questions such as when we should collect information, determining when a random signal has undergone a change, and determining the cause of the change. These decisions have to be made as quickly as possible, since these problems often arise in the context of responding to disease outbreaks or potential terrorist threats.
S. Dayanik, W. Powell, and K. Yamazaki (2007). “Index policies for discounted bandit problems with availability constraints,” Journal of Applied Probability (to appear).
One of the most famous problems in information collection is the multiarmed bandit problem, where make a choice (out of a discrete set of choices), observe a reward, and use this observation to update estimates of the future value of rewards. This problem can be solved by choosing the option with the highest index (known as the Gittins index). In this paper, we present an index problem for the case where not all the choices are available each time. As a result, it is sometimes important to make an observation just because the observation is available to be made.
S., Dayanik, W. Powell, K. Yamazaki. "Asymptotic analysis of sequential change diagnosis problem," Proceedings of the 2008 International Workshop on Applied Probability (2008).
S Dayanik, W. B. Powell and K. Yamazaki "An Asymptotically Optimal Strategy in Sequential Change Detection and Identification Applied to Problems in Biosurveillance" Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics, (2008).
Change detection addresses the problem of determining when a signal has changed from one state to another. In addition, it is often necessary to assess the cause of a change. For example, we may be detecting gamma radiation using a roadside detector. Since there are many environmental causes of radiation, jumps in the signal may be meaningless. We need to be able to identify when a burst of radiation is due to some new source, and then we need to determine whether the source is benign (hospital waste) or aggressive (a terrorist moving a dirty bomb around the city). The technical challenge is making this determination as quickly as possible.
Juliana Nascimento (graduated February, 2008)
Juliana worked on a general class of control problems that can be generally described as "storage problems." These can be modeled as dynamic programs, but generally cannot be solved exactly. Approximate dynamic programming can be applied, but encounter problems such as exploration. For many problems, it is useful to have algorithms that use pure exploitation, but in general these can become stuck and often produce poor solutions. Storage problems exhibit special structure, however, which enabled her to develop a convergence theory that proves convergence of ADP algorithms using pure exploitation. This work evolved from an initial problem involved the lagged acquisition of assets, to a more general theory of storage problems. Her work then extended to two related applications: a jet fuel hedging problem which involves the pricing of assets acquired over time (in the context of jet fuel), and the problem of managing the cash balance of a mutual fund.
J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Energy Dispatch Problem with Grid- Level Storage
We consider the problem of modeling hourly dispatch and energy allocation decisions in the presence of grid-level storage. The model makes it possible to capture hourly, daily and seasonal variations in wind, solar and demand, while also modeling the presence of hydroelectric storage to smooth energy production over both daily and seasonal cycles. The problem is formulated as a stochastic, dynamic program, where we approximate the value of stored energy. We provide a rigorous convergence proof for an approximate dynamic programming algorithm, which can capture the presence of both the amount of energy held in storage as well
as other exogenous variables. Our algorithm exploits the natural concavity of the problem to avoid any need for explicit exploration policies.
J. Nascimento and W. B. Powell, “An Optimal Solution to a General Jet Fuel Hedging Problem,” under review.
This paper solves the problem of hedging jet fuel prices while simultaneously trying to minimize the cost of purchasing jet fuel. Instead of simply minimizing risk, this paper offers a method of trading off risk versus purchase cost.
Nascimento, J. and W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem,” Mathematics of Operations Research, Vol. 34, No. 1, pp. 210-237 (2009).
I have worked for a number of years using piecewise linear function approximations for a broad range of complex resource allocation problems. A few years ago we proved convergence of this algorithmic strategy for two-stage problems (click here for a copy). In this latest paper, we have our first convergence proof for a multistage problem. This paper is more than a convergence proof for this particular problem class - it lays out a proof technique, which combines our work on concave approximations with theory laid out by Bertsekas and Tsitsiklis (in their Neuro-Dynamic Programming book).
Nascimento, J. and W. B. Powell, “Dynamic programming models and algorithms for the mutual fund cash balance problem,” Management Science (to appear)
This is a nice illustration of classical approximate dynamic programming for a storage problem that arises in mutual funds - how much cash should be kept on hand at a mutual fund. The model is motivated by a real problem faced by a mutual fund manager, and handles issues such as retail and institutional investors (which have different redemption rules), along with market behavior and interest rates. The model identifies regimes, which depend on market performance and interest rates, where money should be moved into and out of cash.
Nascimento, J. and W. B. Powell, “An Optimal ADP Algorithm for a High-Dimensional Stochastic Control Problem,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning, April, 2007.
A conference proceedings paper summarizing the research on storage problems.
(click here to download paper)
Abraham George (graduated 2005)
Abraham contributed two significant algorithmic results for approximate dynamic programming. The first was a new adaptive stepsize rule that balances, in an optimal way, the tradeoff between smaller stepsizes to smooth noise and larger stepsizes to adapt to a changing signal. Then, he showed that a very simple weighting formula could be used to provide estimates of value functions at different levels of aggregation. The weighting formula adjusts to the estimated level of bias and noise, increasing the weight on estimates that appear to have low noise and low bias.
George, A., W.B. Powell and S. Kulkarni, “Value Function Approximation Using Hierarchical Aggregation for Multiattribute Resource Management,” Journal of Machine Learning Research (to appear)
There are a number of problems in approximate dynamic programming where we have to use coarse approximations in the early iterations, but we would like to transition to finer approximations as we collect more information. This paper studies the statistics of aggregation, and proposes a weighting scheme that weights approximations at different levels of aggregation based on the inverse of the variance of the estimate and an estimate of the bias. This weighting scheme is known to be optimal if we are weighting independent statistics, but this is not the case here. What is surprising is that the weighting scheme works so well. We demonstrate this, and provide some important theoretical evidence why it works.
George, A. and W.B. Powell, “Adaptive Stepsizes for Recursive Estimation with Applications in Approximate Dynamic Programming,” Machine Learning, Vol. 65, No. 1, pp. 167-198, (2006). (c) Springer
One of the first challenges anyone will face when using approximate dynamic programming is the choice of stepsizes. Use the wrong stepsize formula, and a perfectly good algorithm will appear not to work. Deterministic stepsize formulas can be frustrating since they have parameters that have to be tuned (difficult if you are estimating thousands of values at the same time). This paper reviews a number of popular stepsize formulas, provides a classic result for optimal stepsizes with stationary data, and derives a new optimal stepsize formula for nonstationary data. This result assumes we know the noise and bias (knowing the bias is equivalent to knowing the answer). A formula is provided when these quantities are unknown. Our result is compared to other deterministic formulas as well as stochastic stepsize rules which are proven to be convergent. The numerical work suggests that the new optimal stepsize formula (OSA) is very robust. It often is the best, and never works poorly.
(Click here to download paper)
Ph.D. students (24):
Waiting placement:
- Warren R. Scott, 2012, “Portfolio Selection and Covariance Matrix Estimation using an Errors-in-Variables Factor Model with an Application to the PJM Electricity Market,”
Academic placement:
- Ilya O. Ryzhov, 2011, “Information Collection in Stochastic Optimization,” First position: Assistant professor, University of Maryland business school.
- Lauren A. Hannah, 2010 - "Stochastic Search, Optimization and Regression with Energy Applications," First position: Columbia University, Statistics (following post-doc at Duke in statistics).
- Peter Frazier, 2009 – “Knowledge Gradient Methods for Statistical Learning,” First position: Assistant professor, Cornell University, Department of Operations Research and Information Engineering.
- Kazutoshi Yamazaki, 2009 – “Essays on Sequential Analysis: Multi-Armed Bandit with Availability Constraints and Sequential Change Detection and Identification,” First position: Osaka University, Center for the Study of Finance and Insurance.
- Katerina Papadaki, 2002, “Adaptive Dynamic Programming for Aging and Replenishment Processes,” First position: Assistant professor, London School of Economics (tenured)
- Huseyin Topaloglu, 2001, “Dynamic Programming Approximations for Dynamic Resource Allocation Problems,” First position: Assistant professor, Operations Research and Industrial Engineering, Cornell University (tenured)
- Mike Spivey, 2001, “The Dynamic Assignment Problem,” First position: Assistant professor, Math department, Samford University (now tenured at College of Puget Sound, Washington, D.C.)
- Zhi-Long Chen, 1997, “Algorithms for Deterministic and Stochastic Scheduling,” First position: Assistant professor, Department of Systems Engineering, University of Pennsylvania (now tenured at University of Maryland)
- Raymond K.-M. Cheung, 1993, “Dynamic Networks with Random Arc Capacities: Solution Methods and Applications,” First position: Assistant professor, Industrial engineering, Iowa State University (now tenured, Hong Kong University of Science and Technology)
- Judy Farvolden, 1989, “A Primal Partitioning Solution for the Multicommodity Network Flow Problem,” First position: Assistant professor, Industrial Engineering, University of Toronto
- Yiannis Koskosidis, 1988, “Optimization-Based Models and Algorithms for Routing and Scheduling with Time Window Constraints,” First position: Assistant professor, Department of Civil Engineering, City University of New York.
- Hugo P. Simao, 1987, “Numerical, Discrete-Time Simulation of Transportation Queueing Networks,” First position: Associate Professor, Instituto Tecnologico de Aeronautica, Brazil.
Research laboratories:
- Abraham George, 2005, “Optimal Learning Strtegies for Multi-Attribute Resource Allocation Problems,” First position: Research staff, Princeton University. Second position: AT&T Research Laboratories.
- Tongqiang Wu, 2004, “The Optimizing Simulator for the Military Airlift Problem,” First position: Lawrence Livermore National Laboratory.
- Tassio Carvalho, 1996, “Dynamic Control of Spatial Resource Allocation Problems,” First position: IBM Watson Research Labs.
- Linos Frantzeskakis, 1990, “Dynamic Networks with Random Arc Capacities, with Application to the Stochastic Dynamic Vehicle Allocation Problem,” First position: AT&T Bell Laboratories.
Industry:
- Jun Ma, 2011, "Approximate Policy Iteration Algorithms for Continuous, Multidimensional Applications and Convergence Analysis,"
- Johannes Enders, 2008, “Mitigating Failure Risk in an Aging Electric Power Transmission System” Louis Dreyfus Highbridge Energy
- Juliana Nascimento, 2008, “Approximate dynamic programming for complex storage problems,” McKinsey Consulting, Sao Paolo, Brazil
- Gregory Godfrey, 2007, “Nonlinear Approximation Method for Solving Stochastic, Dynamic Resource Allocation Problems,” First position: Metron Inc.
- Arun Marar, 2002, “Information Representation in Large-Scale Resource Allocation Problems: Theory, Algorithms and Applications.” First position: Amaranth Advisers
- Joel Shapiro, 1999, “A Framework for Representing and Solving Dynamic Resource Transformation Problems,” First position: i2 Technologies.
Masters students: (9)
- Ekaterina Jager, 2008, “Sensor Management.”
- Dennis Panos, 2007, “Approximate dynamic programming and aerial refueling.”
- Jayanth Marasanapalle, 2000, “Function Approximations for Integer, Stochastic Resource Allocation Problems.”
- Tom Dong, 1998, “A Dynamic Programming Approximation for the Dynamic Assignment Problem.”
- Mike Towns, 1997, “The Impact of User Noncompliance and System Stochasticity on Dynamic Routing Problems: A Study of the Truckload Industry.”
- Sheraz Shere, 1996, “A Dynamic Programming Approximation for the Driver Assignment Problem.”
- Tony Snow, 1996, “Adaptive Labeling Algorithms for the Dynamic Assignment Problem.”
- Derek Gittoes, 1994, “A Generalized Labeling Algorithm for Solving the Dynamic Assignment Problem.”
- Mary-Ellen Noyes, 1993, “Validation and Testing of a Stochastic, Dynamic Fleet Management System.”