People

Warren B. PowellWarren B. Powell is the founder and director of CASTLE Laboratory and PENSA, the Princeton Laboratory for ENergy Systems Analysis. A faculty member at Princeton since 1981, CASTLE Lab was created in 1990 to reflect an expanding research program into dynamic resource management.

Click here for a biographical summary and my c.v.

 

 

Senior research staff

Post-doctoral associates:

Current graduate students:

 

Click here for research of recent graduate students.

Click here for a list of prior graduate students.

 

Graduate student and post-doc research:

Boris Defourny

Boris works at the interface of stochastic optimization and machine learning. While finishing this work, he has developed a novel adaptation of the knowledge gradient to a general class of robust learning problems, spanning ranking and selection to linear programs.

Boris also brings significant expertise in power flow modeling, and has developed the AC and DC power flow models used in our new SMART-PJM model of the PJM power grid.

 

Recent graduate students:

Warren Scott

W. R. Scott and W. B. Powell, “Portfolio Selection and Covariance Matrix Estimation using an Errors-in-Variables Factor Model with an Application to the PJM Electricity Market,” in preparation.

Companies can form portfolios of financial trading rights that give them the right to purchase electricity between pairs of nodes. In this paper, we use a Markowitz portfolio model, but introduce a novel way of estimating the covariance matrix by using very short time series. This allows us to capture short bursts of volatility. The method exploits the method of instrumental variables and the structure of a factor model for prices.

 

Warren Scott, W. B. Powell,  “Approximate Dynamic Programming for Energy Storage with New Results on Instrumental Variables and Projected Bellman Errors,” under review.

The problem of optimizing energy storage in the presence of wind and stochastic spot prices for electricity to serve a stochastic and time-depend load is modeled and solved as a steady state Markov decision process. We first solve the problem using several variations of Bellman error minimization, starting with least-squares policy iteration. Following Bradtke & Barto and Lagoudakis and Parr, we introduce instrumental variables to handle the need to partially simulate the independent variables, and compare this to projected Bellman error minimization. We show for the first time that these are mathematically equivalent. We create a series of benchmark problems which are solved optimally to produce a set of benchmarks. We show that instrumental variables significantly improve the results, but still produce solutions that are approximately 70 percent of optimal. We then fit the regression parameters using direct policy search, using the knowledge gradient algorithm to perform the search. This strategy produces results within 5 percent of optimal.

 

 

W. Scott, P. Frazier, W. B. Powell – “The Correlated Knowledge Gradient for Maximizing Expensive Continuous Functions with Noisy Observations using Gaussian Process Regression,” SIAM J. on Optimization (to appear)

Our original work in information collection has focused on problems where measurements are discrete (which technology should we try, what is the best drug compound). TThis logic can handle problems with hundreds or thousands of measurements, but there are many problems where measurements are continuous, and in particular where they are continuous vectors. We might, for example, need to determine where to measure radiation or disease (two spatial dimensions) or finding the best set of parameters to optimize a device (might be 5 or 10 dimensions), or finding the best set of parameters to optimize an expensive simulation (dozens to hundreds of dimensions). He is using the knowledge gradient principle (introduced by Peter Frazier) to compute continuous representations of the knowledge gradient which can then be optimized using classical nonlinear programming algorithms.

The graph to the right is a contour plot of the knowledge gradient, which gives the marginal value if we measure a particular set of parameters (illustrated here for a two-dimensional application). We can use gradients of the knowledge gradient surface to guide us to the next point we should measure (toward the bottom, just to the right of center).

(Click here to download paper)

Scott, W., W. B. Powell, H. P. Simao, "Calibrating Simulation Models using the Knowledge Gradient with Continuous Parameters," Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Y¨ucesan, eds., Baltimore, Md, December, 2010.

This is the proceedings paper that is a synopsis of the full paper above. The work is to be presented at the Winter Simulation Simulation Conference in 2010.

(Click here to download paper)

 

 

Jae Ho Kim

Jae is a Ph.D. student in Electrical Engineering, but he has joined the lab to contribute his skills in signal processing, statistics and stochastic processes to model optimization problems arising in the contest of making the most of wind energy.

J. Kim, W. B. Powell, “Quantile Optimization using Asymmetric Signum Functions with Heavy-Tailed Distributions” (under review)

We present a provably convergent algorithm for computing the quantile of a random variable that does not require storing all of the sample realizations. We then present an algorithm for optimizing the quantile of a random function which may be characterized by a heavy-tailed distribution where the expectation is not de…ned. The algorithm is illustrated in the context of electricity trading in the presence of storage, where electricity prices are known to be heavy-tailed with in…nite variance.

(Click here to download paper)

Kim, J. H. and W. B. Powell, "Optimal Commitment Strategies for Wind Energy in the Presence of Storage," (to appear)

This paper formulates the problem of making advance commitments for wind energy as an optimization problem, taking into account the presence of storage with storage losses. The model provides for Markovian spot and regulating prices, and a stochastic wind process. If there is no storage loss, we derive an optimal policy for wind energy commitments for an arbitrary wind distribution. When there are storage losses, we derive an optimal policy if wind follows a (shifted) uniform distribution. For this case, we also derive a simple formula for the economic value of storage.

(Click here to download paper)

J. Kim, W. B. Powell, “An Hourly Electricity Price Model for Heavy-Tailed Spot Prices,” Energy Economics (to appear).

We propose an hour-ahead prediction model for electricity prices that captures the heavy-tailed behavior that we observe in the hourly spot market in the Ercot (Texas) and the PJM West hub grids. We present a model according to which we separate the price process into a thin-tailed trailing-median process and a heavy-tailed residual process whose probability distribution can be approximated by a Cauchy distribution. We show empirical evidence that supports our model.

(Click here to download paper)

 

Ilya Ryzhov

Ilya is working on a very general class of information collection problems, extending the knowledge gradient principle developed by Peter Frazier. He is considering a class of problems that involve making a measurement of some form which improves our estimates of parameters (such as costs) of an underlying optimization problem. The challenge is to spend our measurement budget as carefully as possible. Applications of this research span health, biosurveillance and energy, among others.

Book in preparation:

W. B. Powell, I. O. Ryzhov, Optimal Learning, John Wiley and Sons, 2012.

This book introduces the principles of how to collect information efficiently for problems where observations are time consuming and/or expensive. Using a presentation that is appropriate for an undergraduate class with a prior course in probability, the book introduces Bayesian and frequentist learning, the ranking and selection problem, the multiarmed bandit problem, and a number of other specialized learning problems (stopping problems, learning a unimodular function, learning a continuous surface, learning on a graph). Frequentist and Bayesian learning policies are presented, along with a thorough discussion of how to properly compare learning policies. Special emphasis is given to the knowledge gradient policy, which is developed for both online and offline problems, problems with independent and correlated beliefs, normal-normal models along with a number of other probability models.

 

Papers:

Journal publications

Ryzhov, I., W. B. Powell, “Information Collection in a Linear Program,”submitted to Mathematics of Operations Research, November, 2010.

Consider a linear program where the cost coecients are uncertain, for which we have a Bayesian prior. We can collect information to improve our understanding of these coecients, but this may be expensive, giving us a separate problem of optimizing the collection of information to improve the quality of the solution relative to the true cost coecients. We formulate this information collection problem for linear programs for the rst time, and derive a knowledge gradient policy which maximizes the marginal value of each measurement. We prove that this policy is asymptotically optimal, and demonstrate the performance of our algorithm on a network flow problem.

(click here to download paper)

I. Ryzhov, W.B. Powell, "Information collection on a graph", Operations Research (to appear).

We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to re ne Bayesian estimates of individual arc costs in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute, and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.

(click here to download paper)

I. Ryzhov, W.B. Powell, and P. Frazier, "The Knowledge-Gradient Algorithm for a General Class of Online Learning Problems" (to appear in Operations Research)

The knowledge-gradient policy was originally derived for off-line learning problems such as ranking and selection. Considerable attention has been given to the on-line version of this problem, known popularly as the multiarmed bandit problem, for which Gittins indices are known to be optimal for discounted, infinite-horizon versions of the problem. In this paper, we derive a knowledge gradient policy for on-line problems, and show that it very closely matches the performance of Gittins indices for discounted infinite horizon problems. It actually slightly outperforms the best available approximation of Gittins indices (by Gans and Chick) on problems for which Gittins indices should be optimal. The KG policy is also effective on finite horizon problems. The only policy which is competitive with KG seems to be interval estimation, but this requires careful tuning of a parameter. The KG policy also works on problems where the beliefs about different alternatives are correlated. The KG policy with independent beliefs is extremely easy to compute (we provide closed-form expressions for the case with normal rewards), and requires a simple numerical algorithm for the case with correlated beliefs.

(click here to download paper)

Ryzhov, I., P. I. Frazier and W. B. Powell, “Stepsize Selection for Approximate Value Iteration and a New Optimal Stepsize Rule,” (under review).

Approximate value iteration is used in dynamic programming when we use random observations to estimate the value of being in a state. These observations are smoothed to approximate the expected value function, leading to the problem of choosing a stepsize (the weight given to the most recent observation). A stepsize of 1/n is a common (and provably convergent) choice. However, we show that it leads to very slow convergence for a basic instance of approximate value iteration, to the point of being unusable for most applications. We then derive an optimal stepsize rule for that instance. This is the first stepsize rule that explicitly takes into account the covariance between the current observation and the previous prediction in approximate dynamic programming. The rule can be easily extended to general ADP settings; experimental results show that it leads to faster convergence than other formulas.

(click here to download paper)

Conference proceedings

Ryzhov, I. and W. B. Powell, “Bayesian Active Learning with Basis Functions,” IEEE Workshop on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011.

A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to
use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of
basis functions. We propose a Bayesian strategy for resolving the exploration/exploitation dilemma in this setting. Our approach is
based on the knowledge gradient concept from the optimal learning literature, which has been recently adapted for approximate
dynamic programming with lookup-table approximations. The new method performs well in numerical experiments conducted
on an energy storage problem.

(click here to download paper)

Ryzhov, I. O., W. B. Powell, "Approximate Dynamic Programming with Correlated Bayesian Beliefs," Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Sept. 29-Oct. 1, 2010.

The exploration-exploitation problem in dynamic programming is well-known, and yet most algorithms resort to heuristic exploration policies such as epsilon-greedy. We use a Bayesian model of the value of being in each state with correlated beliefs, which reflects the common fact that visiting one state teaches us something about visiting other states. We use the knowledge gradient algorithm with correlated beliefs to capture the value of the information gained by visiting a state. Using both a simple newsvendor problem and a more complex problem of making wind commitments in the presence of stochastic prices, we show that this method produces significantly better results than epsilon-greedy for both Bayesian and non-Bayesian beliefs. These are shown for both offline and online implementations.

(click here to download paper)

Ryzhov, I. O., P. I. Frazier, W. B. Powell, “On the Robustness of a One-Period Look-ahead Strategy for Multi-armed Bandit Problems,” International Conference on Computer Science, Amsterdam, May, 2010.

We analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies.

(click here to download paper)

 

Ryzhov, I. and W. B. Powell, “The Knowledge Gradient Algorithm For Online Subset Selection,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning (part of IEEE Symposium on Computational Intelligence), March, 2009.

(click here to download paper)

Ryzhov, I., W. B. Powell, “A Monte-Carlo Knowledge Gradient Method for Learning Abatement Potential of Emissions Reduction Technologies,”, Winter Simulation Conference, 2009. To appear.

(click here to download paper)

 

 

Jun Ma

Jun's general area of research is approximate dynamic programming with continuous state and action variables. These problems have many applications, but one of our motivating applications is control problems that arise in energy settings. However, rigorous algorithms for this general problem class do not exist, although there is a growing body of theory.

J. Ma and W. B. Powell, "Convergence Analysis of Kernel-based On-policy Approximate Policy Iteration Algorithms for Markov Decision Processes with Continuous, Multidimensional States and Actions" (under review)

Using kernel smoothing techniques, we propose three di erent online, on-policy approximate policy iteration algorithms which can be applied to in nite horizon problems with continuous and vector-valued states and actions. Using Monte Carlo sampling to estimate the value function around the post-decision state, we reduce the problem to a sequence of deterministic, nonlinear programming problems that allow us to handle continuous, vector-valued states and actions. We provide a formal convergence analysis of the algorithms under a variety of technical assumptions.

(Click here to download paper)

J. Ma and W. B. Powell, "Convergence Analysis of On-Policy LSPI for Multi-Dimensional Continuous State and Action-Space MDPs and Extension with Orthogonal Polynomial Approximation" (under review).

This paper considers problems with continuous (and high-dimensional) state and action spaces, where the imbedded expectation within Bellman's equation cannot be computed exactly (the three curses of dimensionality). The paper provides a review of the literature on convergence proofs using continuous value function approximations, and then provides a series of convergence proofs, including problems where the expectation can and cannot be computed exactly, but where we assume that the value function is spanned by a known set of basis functions. A key feature of the algorithm is that it uses online policy approximation, which means that it avoids a separate exploration process, something that is problematic for high-dimensional action spaces. The paper closes with an algorithm that relaxes the assumption of known basis functions using orthogonal polynomials.

(Click here to download paper)

Ma, J. and W. B. Powell, “A convergent recursive least squares policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning (part of IEEE Symposium on Computational Intelligence), March, 2009.

(Click here to download paper)

 

 

Lauren Hannah

Lauren has been addressing a general class of stochastic optimization problems that arise in energy applications. She began with stochastic optimization models arising in energy, and has been moving in the general area of using advanced machine learning methods to solve stochastic optimization problems.

An updated list of her papers can be found on her website at Columbia University at www.columbia.edu/~lah2178

L. Hannah, W. B. Powell, D. Blei, "Stochastic Search with an Observable State Variable," (under review)

Think of a newsvendor problem where we have to choose a quantity, and then observe a random demand. Now assume that before we choose the quantity, we can observe the weather, which gives us some information about the demand. Now assume we are making a commitment of energy that we are going to deliver from our wind farm before we know how much the wind will blow. But before we make the commitment, we observe a vector of continuous variables including current and previous wind, prices and demand, along with time of day and information about the weather. We use DP-GLM to create an approximation that is a weighted sum of piecewise linear functions which allow us to capture the concavity of the function with respect to the wind commitment. This paper proves convergence of this strategy, and demonstrates its effectiveness in the context of an energy commitment problem.

(Click here to download paper)

L. Hannah, D. Blei and W. B. Powell, “Dirichlet Process Mixtures of Generalized Linear Models,” (under review).

We propose Dirichlet Process-Generalized Linear Models (DP-GLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and any response that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate and give a practical example for when those conditions hold. Additionally, we provide Bayesian bounds on the distance of the estimate from the true mean function based on the number of observations and posterior samples. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART and Gaussian processes. We show that the DP-GLM is competitive with the other methods, while accommodating various inputs and outputs and being robust when confronted with heteroscedasticity.

(click here to download paper)

L. Hannah, W. B. Powell, and J. Stewart, “One-Stage R&D Portfolio Optimization with an Application to Solid Oxide Fuel Cells,” Energy Systems Journal, Vol. 1, No. 1, 2010.

This paper addresses the problem of determining which of a subset of R&D projects (constrained by a budget) we should invest in to improve the performance of a complex device (in this case, a solid oxide fuel cell). Projects may be competitive (we may end up choosing one type of material for a particular function) or complementary (an improvement in a particular material in one part of the fuel cell may make a different technology in a different part of the fuel cell more competitive). We might be choosing dozens of projects from a choice of several hundred, creating a combinatorially large set of potential portfolios (we only consider a single investment). Furthermore, estimating the value of a particular R&D portfolio involves evaluating an expectation that cannot be done analytically, and hence requires imprecise Monte Carlo evaluations.

(click here to download paper)

L. Hannah and W. B. Powell, “Evolutionary Policy Iteration Under a Sampling Regime for Stochastic Combinatorial Optimization,” IEEE Transactions on Automatic Control, Vol. 55, No. 5, pp. 1254-1257, May, 2010.

The algorithmic research into the R&D problem involved making a comparison of a new algorithm against an existing algorithm in the literature, known as evolutionary policy iteration. This algorithm, however, required that we be able to compute the expectation exactly. This paper shows how to modify the EPI algorithm so that it retains its fundamental convergence properties.

(click here to download paper)

 

Peter Frazier (Graduated June, 2009)

Peter developed the idea of the knowledge gradient policy for guiding the collection of information. His work started with simple ranking and selection problems (the off-line version of the multiarmed bandit problem) and has extended into a general class of algorithms for learning. He has shown that the knowledge gradient is not only myopically optimal (by construction, it makes the best measurement if there is only one measurement to make) but is also asymptotically optimal. Many algorithms are asymptotically optimal (e.g. randomly choosing what to measure), but KG is the only algorithm that is both myopically and asymptotically optimal. This supports the growing body of empirical work that suggests that KG works well in practice. His paper on asymptotic optimality of sequential sampling policies is a major theory paper establishing general conditions for asymptotic optimality for measurement strategies. Of particular practical importance is his paper on the knowledge gradient for problems with correlated beliefs.

P. Frazier and W. B. Powell, “Asymptotic Optimality of Sequential Sampling Policies for Bayesian Information Collection”

This paper provides a very general theory for the asymptotic optimality of a broad class of information collection strategies. The work builds on Peter's research on the knowledge gradient algorithm, but this paper provides general conditions for asymptotic optimality of learning algorithms.

(Click here to download paper)

P. Frazier, W. B. Powell, S. Dayanik, “The Knowledge-Gradient Policy for Correlated Normal Beliefs,” Informs Journal on Computing (to appear).

The knowledge gradient policy is a method for determining which of a discrete set of measurements we should make to determine which of a discrete set of choices we should make. Most of the applications that we have considered introduce the dimension of correlated beliefs. If we evaluate the level of contamination in one location and it measures high, we are likely to raise our belief about the level of toxin in nearby locations. If we test a machine for airport security that can sense explosives and it works poorly, we might lower our evaluation of other devices that might use similar technologies (e.g. a particular material or sensor within the device). This article shows how to compute the knowledge gradient for problems with correlated beliefs. The paper shows that just as with problems with independent beliefs, the knowledge gradient is both myopically and asymptotically optimal. Experimental work shows that it can produce a much higher rate of convergence than the knowledge gradient with independent beliefs, in addition to outperforming other more classical information collection mechanisms.

(click here to download paper) (Click here for online supplement)

Frazier, P., W. B. Powell and S. Dayanik, “A Knowledge Gradient Policy for Sequential Information Collection,” SIAM J. on Control and Optimization, Vol. 47, No. 5, pp. 2410-2439 (2008).

The knowledge gradient policy is introduced here as a method for solving the ranking and selection problem, which is an off-line version of the multiarmed bandit problem. Imagine that you have M choices (M is not too large) where you have a normally distributed belief about the value of each choice. You have a budget of N measurements to evaluate each choice to refine your distribution of belief. After your N measurements, you have to choose what appears to be the best based on your current belief. The knowledge gradient policy guides this search by always choosing to measure the choice which would produce the highest value if you only have one more measurement (the knowledge gradient can be viewed as a method of steepest ascent). The paper shows that this policy is myopically optimal (by construction), but is also asymptotically optimal, making it the only stationary policy that is both myopically and asymptotically optimal. The paper provides bounds for finite measurement budgets, and provides experimental work that shows that it works as well as, and often better, than other standard learning policies.

(click here to download paper)

I. Ryzhov, W.B. Powell, and P. Frazier, The Knowledge-Gradient Algorithm for a General Class of Online Learning Problems (under review)

The knowledge-gradient policy was originally derived for off-line learning problems such as ranking and selection. Considerable attention has been given to the on-line version of this problem, known popularly as the multiarmed bandit problem, for which Gittins indices are known to be optimal for discounted, infinite-horizon versions of the problem. In this paper, we derive a knowledge gradient policy for on-line problems, and show that it very closely matches the performance of Gittins indices for discounted infinite horizon problems. The knowledge gradient policy, however, outperforms Gittins indices for finite horizon problems. Of perhaps greater practical interest, the knowledge gradient policy can be adapted to handle applications with correlated beliefs. This is demonstrated in an application involving the evaluation of treatments for diabetes, where a treatment consists of three separate drugs. Different treatments may share drugs (one treatment may involve drugs A, B and C, while another may use A,D and C) which leads to correlated beliefs. We show that the knowledge gradient, adapted to on-line learning problems with correlated beliefs, outperforms a number of competing policies.

(click here to download paper)

P. Frazier, W. B. Powell, S.. Dayanik and P. Kantor, “Approximate Dynamic Programming in Knowledge Discovery for Rapid Response,” HICSS Conference, 2009.

(click here to download paper)

Frazier, P. and W. B. Powell, “The knowledge gradient stopping rule for ranking and selection,” Proceedings of the Winter Simulation Conference, December 2008.

(click here to download paper)

P. Frazier and A.J. Yu, "Sequential Hypothesis Testing under Stochastic Deadlines," Neural Information Processing Systems Conference, 2007.

(click here to download paper)

P. Frazier, W. B. Powell, H. P. Simao, "Simulation Model Calibration with Correlated Knowledge-Gradients," Winter Simulation Conference, December, 2009.

A common challenge in the calibration of simulation model is that we have to tune several continuous parameters. This is our first application of the knowledge gradient algorithm with correlated beliefs to the problem of parameter tuning for simulation models. A single run of the model (which uses adaptive learning from approximate dynamic programming) requires more than a day, so the paper also introduces methods to product results without a full run. A Bayesian model is set up to capture the uncertainty in our beliefs about the convergence of the model. The method is illustrated in the tuning of two continuous parameters, which required approximately six runs of the model.

(click here to download paper)

Frazier, P. and W. B. Powell, “The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards,” in Approximate Dynamic Programming and Reinforcement Learning, IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 143-150, April, 2007.

(click here to download paper)

Powell, W. B. and P. Frazier, “Optimal Learning,” TutORials in Operations Research, Chapter 10, pp. 213-246, Informs (2008).

(click here to download paper)

 

 

Kazutoshi Yamazaki (Graduated June, 2009)

Kazu has been working working on a special problem in information collection which involves questions such as when we should collect information, determining when a random signal has undergone a change, and determining the cause of the change. These decisions have to be made as quickly as possible, since these problems often arise in the context of responding to disease outbreaks or potential terrorist threats.

S. Dayanik, W. Powell, and K. Yamazaki (2007). “Index policies for discounted bandit problems with availability constraints,” Journal of Applied Probability (to appear).

One of the most famous problems in information collection is the multiarmed bandit problem, where make a choice (out of a discrete set of choices), observe a reward, and use this observation to update estimates of the future value of rewards. This problem can be solved by choosing the option with the highest index (known as the Gittins index). In this paper, we present an index problem for the case where not all the choices are available each time. As a result, it is sometimes important to make an observation just because the observation is available to be made.

(click here to download paper)

S., Dayanik, W. Powell, K. Yamazaki. "Asymptotic analysis of sequential change diagnosis problem," Proceedings of the 2008 International Workshop on Applied Probability (2008).

(click here to download paper)

S Dayanik, W. B. Powell and K. Yamazaki "An Asymptotically Optimal Strategy in Sequential Change Detection and Identification Applied to Problems in Biosurveillance" Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics, (2008).

Change detection addresses the problem of determining when a signal has changed from one state to another. In addition, it is often necessary to assess the cause of a change. For example, we may be detecting gamma radiation using a roadside detector. Since there are many environmental causes of radiation, jumps in the signal may be meaningless. We need to be able to identify when a burst of radiation is due to some new source, and then we need to determine whether the source is benign (hospital waste) or aggressive (a terrorist moving a dirty bomb around the city). The technical challenge is making this determination as quickly as possible.

(click here to download paper)

 

Juliana Nascimento (graduated February, 2008)

Juliana worked on a general class of control problems that can be generally described as "storage problems." These can be modeled as dynamic programs, but generally cannot be solved exactly. Approximate dynamic programming can be applied, but encounter problems such as exploration. For many problems, it is useful to have algorithms that use pure exploitation, but in general these can become stuck and often produce poor solutions. Storage problems exhibit special structure, however, which enabled her to develop a convergence theory that proves convergence of ADP algorithms using pure exploitation. This work evolved from an initial problem involved the lagged acquisition of assets, to a more general theory of storage problems. Her work then extended to two related applications: a jet fuel hedging problem which involves the pricing of assets acquired over time (in the context of jet fuel), and the problem of managing the cash balance of a mutual fund.

J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Energy Dispatch Problem with Grid- Level Storage

We consider the problem of modeling hourly dispatch and energy allocation decisions in the presence of grid-level storage. The model makes it possible to capture hourly, daily and seasonal variations in wind, solar and demand, while also modeling the presence of hydroelectric storage to smooth energy production over both daily and seasonal cycles. The problem is formulated as a stochastic, dynamic program, where we approximate the value of stored energy. We provide a rigorous convergence proof for an approximate dynamic programming algorithm, which can capture the presence of both the amount of energy held in storage as well
as other exogenous variables. Our algorithm exploits the natural concavity of the problem to avoid any need for explicit exploration policies.

(Click here to download paper)

J. Nascimento and W. B. Powell, “An Optimal Solution to a General Jet Fuel Hedging Problem,” under review.

This paper solves the problem of hedging jet fuel prices while simultaneously trying to minimize the cost of purchasing jet fuel. Instead of simply minimizing risk, this paper offers a method of trading off risk versus purchase cost.

(Click here to download paper)

Nascimento, J. and W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem,” Mathematics of Operations Research, Vol. 34, No. 1, pp. 210-237 (2009).

I have worked for a number of years using piecewise linear function approximations for a broad range of complex resource allocation problems. A few years ago we proved convergence of this algorithmic strategy for two-stage problems (click here for a copy). In this latest paper, we have our first convergence proof for a multistage problem. This paper is more than a convergence proof for this particular problem class - it lays out a proof technique, which combines our work on concave approximations with theory laid out by Bertsekas and Tsitsiklis (in their Neuro-Dynamic Programming book).

(click here to download paper)

Nascimento, J. and W. B. Powell, “Dynamic programming models and algorithms for the mutual fund cash balance problem,” Management Science (to appear)

This is a nice illustration of classical approximate dynamic programming for a storage problem that arises in mutual funds - how much cash should be kept on hand at a mutual fund. The model is motivated by a real problem faced by a mutual fund manager, and handles issues such as retail and institutional investors (which have different redemption rules), along with market behavior and interest rates. The model identifies regimes, which depend on market performance and interest rates, where money should be moved into and out of cash.

(click here to download paper)

Nascimento, J. and W. B. Powell, “An Optimal ADP Algorithm for a High-Dimensional Stochastic Control Problem,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning, April, 2007.

A conference proceedings paper summarizing the research on storage problems.

(click here to download paper)

 

 

 

Abraham George (graduated 2005)

Abraham contributed two significant algorithmic results for approximate dynamic programming. The first was a new adaptive stepsize rule that balances, in an optimal way, the tradeoff between smaller stepsizes to smooth noise and larger stepsizes to adapt to a changing signal. Then, he showed that a very simple weighting formula could be used to provide estimates of value functions at different levels of aggregation. The weighting formula adjusts to the estimated level of bias and noise, increasing the weight on estimates that appear to have low noise and low bias.

George, A., W.B. Powell and S. Kulkarni, “Value Function Approximation Using Hierarchical Aggregation for Multiattribute Resource Management,” Journal of Machine Learning Research (to appear)

There are a number of problems in approximate dynamic programming where we have to use coarse approximations in the early iterations, but we would like to transition to finer approximations as we collect more information. This paper studies the statistics of aggregation, and proposes a weighting scheme that weights approximations at different levels of aggregation based on the inverse of the variance of the estimate and an estimate of the bias. This weighting scheme is known to be optimal if we are weighting independent statistics, but this is not the case here. What is surprising is that the weighting scheme works so well. We demonstrate this, and provide some important theoretical evidence why it works.

(click here to download paper)

George, A. and W.B. Powell, “Adaptive Stepsizes for Recursive Estimation with Applications in Approximate Dynamic Programming,” Machine Learning, Vol. 65, No. 1, pp. 167-198, (2006). (c) Springer

One of the first challenges anyone will face when using approximate dynamic programming is the choice of stepsizes. Use the wrong stepsize formula, and a perfectly good algorithm will appear not to work. Deterministic stepsize formulas can be frustrating since they have parameters that have to be tuned (difficult if you are estimating thousands of values at the same time). This paper reviews a number of popular stepsize formulas, provides a classic result for optimal stepsizes with stationary data, and derives a new optimal stepsize formula for nonstationary data. This result assumes we know the noise and bias (knowing the bias is equivalent to knowing the answer). A formula is provided when these quantities are unknown. Our result is compared to other deterministic formulas as well as stochastic stepsize rules which are proven to be convergent. The numerical work suggests that the new optimal stepsize formula (OSA) is very robust. It often is the best, and never works poorly.

(Click here to download paper)

 

Past graduates

Ph.D. students (24):

Waiting placement:

Academic placement:

Research laboratories:

Industry:

Masters students: (9)