Warren B. Powell

Professor Emeritus, Princeton University

In my 40 years of research (250+ papers, the vast majority on sequential decision problems) I have encountered my fair share of problematic reviewing, especially given my focus on stochastic dynamic models and algorithms which is served by highly fragmented communities. I have come to live with this state of affairs (I have a lot of company), but I have decided to take my most recent experience and turn it into what I hope is a learning experience for the OR community.

Below I will:

- Provide some background on cost function approximations (see here for a brief but more thorough introduction).
- Describe a complex energy storage problem that we use to demonstrate the power of the CFA.
- Summarize our experience getting this novel idea reviewed. The goal here is to highlight the problems of proposing a method that is completely new, and which therefore challenges the standard methods in the literature.
- I close by highlighting a fundamental weakness of our review process, and describe an easy fix which I used during my days as Area Editor of Operations Research.

## Background – Policies based on cost function approximations

A few years ago I realized that there was a strategy for solving complex sequential decision problems under uncertainty (aka dynamic programs, multistage stochastic programs, stochastic control, …) that used parameterized optimization models for policies. The idea is so powerful that it is a widely used heuristic in industry, but applied in an ad hoc manner. However, the general strategy appears to have been completely overlooked in the research literature, and the approach is routinely dismissed as a “deterministic heuristic.” These criticisms overlook the fundamental weaknesses of policies based on the methods of the stochastic programming literature.

I came to appreciate the idea when I saw it being used by the power grid operators to plan energy generation for tomorrow in the presence of uncertainty. They would solve very large integer programming models to plan when power generators would be turned on and off (this is known as the “unit commitment problem”). To handle uncertainty, they insert buffers to insure that enough generation capacity is scheduled to handle a failure of their largest generator.

The research community decided that these “deterministic models” needed to be replaced by “stochastic models” solved using stochastic programming; these models are *much* harder to solve, despite the use of several simplifying approximations. After years of developing a highly detailed stochastic simulator of a major grid operator, I came to the realization that the stochastic programming models have a fundamental flaw (unrelated to the computational challenges). By contrast, the parameterized deterministic optimization models used by industry were not just more practical, the specific parameterization captured considerable industry knowledge and represented a much more effective approach for handling uncertainty (click here for a detailed discussion of this topic). What it lacked was a formal modeling framework.

I decided to call the general strategy of using a parameterized optimization model a parametric “cost function approximation.” *It represents what looks to be the first fundamentally new approach for solving large-scale stochastic programs.* This approach is not just practical and scalable, it is likely to be better than stochastic programming for many applications. This is a field that has been absolutely dominated by the idea of using simplified stochastic lookahead models, first introduced by Dantzig in 1955, that depend on a variety of approximations to make the stochastic lookahead model solvable (see chapter 19 of my new book for a detailed discussion). It is used in thousands of papers, and is the topic of major books such as those by Birge and Louveaux as well as Kall and Wallace.

I have found that many in the stochastic programming community fail to realize that an optimal solution of an approximate stochastic lookahead model (the “stochastic program”) is *not* an optimal policy for the underlying sequential decision problem. *I believe that parametric cost function approximations are very likely to outperform traditional stochastic programming models in most applications for which stochastic programming has been used. *This idea, however, requires the same type of research that has been devoted to parametric models in statistics for over 100 years.

As we point out here, stochastic programming builds policies based on stochastic lookahead models, which require strong approximations to create something that is solvable. Parametric CFAs, however, used a parameterized deterministic lookahead which is then trained in a stochastic base model (typically a simulator, but it can be the real world) which does not require any of the simplifications made in stochastic lookahead models. Furthermore, the parameterization can incorporate industry intuition into how uncertainty should be managed (as is done in the stochastic unit commitment problem).

## Our paper: the first formal introduction of CFA as a methodology, illustrated using a complex energy storage problem

I first introduced the term “cost function approximation” in a tutorial article, “Clearing the Jungle of Stochastic Optimization,” in 2014 (for the Informs TutORials series), but the first research paper to formally propose and investigate the idea was completed two years ago (Saeed Ghadimi is the lead author; Saeed was a post-doc in my lab when this work was done) and submitted to the Informs Journal on Computing. A summary of the paper, along with an outline of the general idea of parametric cost function approximations, is given here. Key elements of the problem include:

- We demonstrate the CFA on a complex energy storage problem where we have to make decisions to store or discharge energy each hour. We can acquire energy from a wind farm or the grid to serve a time-dependent load, with a large storage device to absorb fluctuations.
- Energy demand is highly time-dependent, with predictable peaks and valleys.
- We are given rolling forecasts of the energy from wind, where peaks and valleys in the supply of wind energy can occur at any time of day. There are significant errors in these forecasts.
- The transmission of energy is constrained, which means that we have to plan storage given estimates of when energy is available and when it is needed. The interaction of supply (which we have to plan with highly stochastic rolling forecasts), time-dependent demands (which are relatively predictable) and capacity constraints is what makes this problem so hard.
- Our approach is to solve a deterministic rolling horizon model with a 24 hour horizon. The stochastic forecasts are multiplied by a 24-dimensional vector of coefficients \theta_{\tau} for \tau = 1, 2, \ldots, 24 time periods into the future. This is a natural parameterization for handling uncertain forecasts that can be applied in many settings.
- The coefficients \theta_{\tau} are tuned in a stochastic simulator that captures the hourly updating of forecasts (these rolling forecasts are central to the problem). This is a complex stochastic process that cannot be captured in a two-stage stochastic program.

This is not an energy storage paper; the energy storage problem (basically a form of inventory problem) was chosen to demonstrate the important features of a CFA. This appears to be the first method that can handle the challenge of rolling forecasts, which arises in almost any operational planning problem, and yet has been almost completely ignored in the research literature. However, the modeling and algorithmic approach applies to *any* use of a parameterized optimization model for making decisions in the context of sequential decision problems. For example, it is standard industry practice to use a parameterized deterministic lookahead for the stochastic unit commitment problem; this problem has received considerable attention from the stochastic programming community for decades, without any evidence of industry acceptance. Our approach replaces this ad-hoc solution approach with a formal stochastic optimization model.

## The reviewing process

It is the experience of trying to publish this breakthrough idea that is the topic of this webpage, since the handling of the paper has been particular egregious and highlights the challenge of proposing new ideas which means that there will not be any referees with expertise in the specific methodology. The paper was rejected after a first round of reviews. My major complaint was that none of the reviews had recognized the core contribution of creating policies by parameterizing an optimization model. I wrote a strong rebuttal, and the Area Editor (Pascal van Hentenryck) accepted my arguments and agreed to send it out again. The second round of reviews *made the same mistake*, so I am going to focus on these reviews here.

I will focus on the comments of the lead reviewer who was the most critical.

*The central contribution of the paper, which is highlighted clearly in the introduction, is the parameterization of an optimization model as the basis for a policy.*The first referee even includes the following quote: “The main contribution of this manuscript is to introduce and develop, for the first time, the idea of a parameterized cost function approximation (CFA) as a tool for solving important classes of multistage stochastic programming problems.” The referee then goes on to say “The idea of using parameterized policies is certainly not new and was discussed in numerous publications. In books of Bertsekas it is called the Galerkin method. Of course the question in such approach is what parameterization to use. One standard way is to employ linear combinations of chosen basis functions. I would suggest for the authors to read the cited book of Bertsekas. More recently the approach of linear (affine) decision rules became popular.”*So, the referee almost immediately confuses the idea of a parameterized optimization model for the policy (which is what a CFA is), with the use of a simple parametric function (I call these “policy function approximations” or PFAs), by suggesting that we use some form of “affine policy” which would be of the form X^\pi(S_t|\theta) = \sum_{f\in F} \theta_f \phi_f(S_t)*. An affine policy (aka linear decision rule) is clearly*not*a parameterized optimization problem, which we have clearly identified as the central contribution of the paper. Further, it could never be used to solve our energy storage problem (see points 4, 5 and 6 below). The referee seems to feel that all the relevant material to solve the problem is in Bertsekas’ book. In fact, there is not a single method proposed in Bertsekas’ book that would solve this problem. If the referee wants to claim otherwise, then he has to a) name the method and b) cite the empirical evidence in a published paper. An affine policy will never solve this problem (see 4 below) and no evidence is cited that it would work. (Is this referee also claiming that his Galerkin method would solve a stochastic unit commitment problem? That would be both astonishing and embarrassing.)- The choice of policy requires an appreciation of the structure of the problem. A quick Google search (which the editors could have done) reveals that Bertsekas does not have a single reference in his entire storied career that even uses the term “energy storage,” so I wonder why anyone would feel that Bertsekas’ book would contain any insights into how to solve this particularly complex inventory problem. By contrast, I have an extensive body of research in energy storage, where I have worked on a wide range of energy storage problems using a variety of strategies (click here for a quick summary which includes demonstrations of all four classes of policies), which would also be revealed with a Google search.
- The referee writes “I would suggest for the authors to read the cited book of Bertsekas.” I am a big admirer of Bertsekas’ theoretical research. This is a computational paper in a computational journal, and I do not turn to Bertsekas’ work for guidance in modeling and computation. I have no idea why he would be invited as a reviewer. As I told him in a recent email, he provides no guidance in how to properly model a problem like this (e.g. including the forecast in the state variable – I am fairly sure that Bertsekas never even defines a state variable, something that can be found in most optimal control books), and no guidance at all in the choice of policy. I do not believe he has a single paper that discusses how to choose among different classes of policies, and while he seems to be familiar with my work, he has never mentioned that there are four classes of policies (see chapter 11 of my new book for a thorough discussion of the four classes of policies and how to choose among them). I am quite positive that this time-dependent inventory problem with a rolling forecast cannot be solved using a policy function approximation (such as an affine policy or even a neural network), or a policy using approximate dynamic programming. A stochastic lookahead policy is intractable, so there would be need for an entire line of research on how to solve this efficiently (real energy storage problems need to step forward in 5-minute increments). The parametric CFA that we suggest, using a parameterized deterministic lookahead, appears to be the only solution to the problem of rolling forecasts. We also need to point out that a) rolling forecasts arise throughout operational planning problems, and yet we are not aware of a single paper that explicitly models the stochastic evolution of a forecast in any form of stochastic lookahead policy, and b) the idea of a policy using a parameterized optimization problem is a general methodology with broad applicability, as evidenced by its wide use in industry as an ad-hoc heuristic (what you will not find is any formal tuning of the parameters – it is specifically this tuning that makes a CFA a form of stochastic optimization problem).
- A simple (stationary) energy storage problem with stochastic prices can be optimized with a buy low, sell high policy, which is a parameterized policy, but it is a step function (other inventory problems would also use a step function in the form of an order-up-to policy). I have no idea how to create this structure with an affine policy, since the step function is highly nonlinear in the parameters. The referee does not cite any papers which has used this approach, which is critical if you want to claim that a problem has already been solved.
*Waving your hands and saying that you should try something which has not been tested is simply not appropriate; the editors should have picked up on this, and any experienced referee should know better.*Honestly, to propose an affine policy for this problem class is almost laughable. But it gets worse… - This problem is clearly a nonstationary problem since energy demand is highly time-dependent (see the figures under “A Case Application” here). Any simple parameterized policy (including a step function) would require a parameter vector \theta_t that is indexed by time. We used hourly increments, which means the dimensionality of our parameter vector is 24 times larger, but we often solve these problems with 5-minute increments, producing 288 time periods. This is a significant complication for any policy search strategy, regardless of whether you are using derivative-based or derivative-free methods. This is where I was surprised that the referee did not suggest using approximate dynamic programming (the central theme of Bertsekas’ book, although his book focuses primarily on stationary problems) since it is quite easy to create time-dependent policies using time-indexed VFAs (using a finite-horizon model). This would have been the right strategy for solving the time dependent version of this problem, except that this would not handle the truly difficult part of this problem…
- The real challenge with this energy storage problem is the rolling forecast. Now the state variable includes, in addition to the energy in the storage device and the level of wind, grid price and demand, it also has to include the 24-dimensional forecast in the state variable. A small number of authors have tried to handle rolling forecasts using approximate dynamic programming without success (this includes me). The referee did not even pick up on the importance of handling a rolling forecast, in a time-dependent problem (which also has transmission constraints). The interaction of the time-dependent demands, rolling forecasts (with significant errors), and transmission constraints requires a policy that is time-dependent and handles the dynamics of the rolling forecast. This has never been done with an affine policy, or any other type of parameterized policy (note: one of my post-docs spent several months trying to train a neural network to handle this problem without success). Now consider what happens when you use a parameterized deterministic lookahead policy. By incorporating the forecasts into the lookahead policy, it not only turns a time-dependent policy into a stationary policy (our parameters \theta_\tau do not depend on time of day) it also immediately handles the high-dimensional rolling forecast. A stochastic lookahead policy (e.g. stochastic programming with scenario trees) can incorporate a forecast at time t, but cannot model the fact that it is a rolling forecast, evolving over time. By contrast, when we tune our parameter vector \theta_\tau in a simulator, the simulator easily models the rolling forecast. Hence, the tuned parameter vector \theta_\tau is tuned using an accurate model, while a stochastic program would not. There is no chance that this could be handled by an affine policy (I did try fitting a neural network) or using approximate dynamic programming (which I have tried).
- A significant oversight of the review is the failure to understand one of the major benefits of using a parameterized deterministic approximation, which is its effect on scaling. If we were to use a pure parameterized policy (say, a neural network), we would have little insight into the scaling of the coefficients, which is a major complication with any stochastic search algorithm. With our particular parameterization, we obtain the property that if the forecasts were perfect, the optimal values of the coefficients would be \theta_\tau = 1. With stochastic forecasts, all the coefficients were between 0 and 2. Knowing this dramatically simplifies the stochastic search problem.
- The referee tells us to use the Galerkin method and refers to Bertsekas’ book. It is not appropriate for a referee to say “try this method I think it will work.” The referee has to point to a paper where the method has been used on the same or comparable problem. So now I am going to ask: is the referee claiming that his Galerkin method can solve a stochastic unit commitment problem? Again – this is laughable. He discusses Galerkin methods in the notes for his section 6.3, which described projected Bellman methods. Methods based on value function approximations simply cannot handle problems of this complexity. He describes mathematics that could never handle a problem of the complexity of a stochastic unit commitment problem, which are already solved using a parameterized CFA. What is missing is a formal process for tuning, which is what separates an ad-hoc industrial heuristic from a rigorous stochastic optimization problem. Again, I feel that the editors should have realized that the author is just promoting methods in Bertsekas’ book without any empirical evidence that it could be applied to this problem class.
- Two other referee reports also seemed to make the mistake of equating our parameterized optimization model with the general field of stochastic search for tuning parameters in other settings. None of these reports realized that a parameterized deterministic lookahead can potentially be a much more effective way of solving stochastic programming problems than using the two- or multi-stage stochastic programming methods that are popular in the stochastic programming community. Again, neither the AE nor Area Editor recognized that our claim is the use of a parameterized optimization model as a policy for solving sequential stochastic programming problems. If this has been done, please provide a reference.
- Any reviewer who points to a complex sequential decision problem and claims, without empirical evidence, that some method should be used for a particular application has an astonishingly naive understanding of the complexities of this rich class of problems. There are
*many*methods that have been proposed and used for sequential decision problems, and there is a reason for this: for any specific problem, most methods do not work. There is a reason that so many methods have been proposed which do not use Bellman’s equation: methods based on approximating value functions that actually work on real problems are quite limited (I spent 20 years on approximate dynamic programming and wrote a 600 page book, so I feel that I make this statement based on considerable experience). I have organized this complex array into four (meta)classes of policies (these are the basis of my new book); only one of these involves value functions. What is largely missing in the research literature is the idea of parameterized optimization models. This paper fills this void.

## Closing notes and a recommendation

I have described this process to highlight the challenge of using our standard refereeing process to introduce a fundamentally new approach for solving a problem (in this case, a complex stochastic optimization problem that is typically solved using stochastic programming). The frustration is referees who say something that is not true (or completely misses the point), after which editors draw conclusions before an author has had a chance to provide a rebuttal. Whether it is theoreticians with little experience in computational work (or at least the computational challenges of certain problem classes) or people with no familiarity with a new method, authors introducing a fundamentally new idea have to struggle with referees who are not experts in the research. Our current process of refereeing favors incremental contributions that can be judged by people who already have experience in an area.

It is natural for referees to relate research to what they know. In the case of this paper, we have reviewers who saw a parameterized policy, and quickly equated it to their understanding of other parameterized methods without appreciating the significant difference of a parameterized policy (such as an affine policy or neural network) and a policy given by a parameterized optimization model (which captures a lot of problem structure).

In the 1990’s, I served for eight years as an Area Editor for Operations Research. I introduced the process of sending reviews to authors before I made my final decision. This is easy to do, and does not place a significant burden on reviewers or the editorial staff. This process would have gone a long way toward avoiding the problems Saeed and I have encountered with this paper.