Miao He, L. Zhao, W. B. Powell, “Approximate Dynamic Programming Algorithms for Optimal Dosage Decisions in Controlled Ovarian Hyperstimulation,” European J. of Operational Research, Vol. 222, No. 2, October 2012, pp. 328–340
In the controlled ovarian hyperstimulation (COH) treatment, clinicians monitor the patients’ physiological responses to gonadotropin administration to tradeoff between pregnancy probability and ovarian hyperstimulation syndrome (OHSS). We formulate the dosage control problem in the COH treatment as a stochastic dynamic program and design approximate dynamic programming (ADP) algorithms to overcome the well-known curses of dimensionality in Markov decision processes (MDP). Our numerical experiments indicate that the piecewise linear (PWL) approximation ADP algorithms can obtain policies that are very close to the one obtained by the MDP benchmark with significantly less solution time.
Ryzhov, I., W. B. Powell, “A Monte-Carlo Knowledge Gradient Method for Learning Abatement Potential of Emissions Reduction Technologies,” Winter Simulation Conference, 2009. M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds, 2009, pp. 1492-1502.
Suppose that we have a set of emissions reduction technologies whose greenhouse gas abatement potential is unknown, and we wish to find an optimal portfolio (subset) of these technologies. Due to the interaction between technologies, the effectiveness of a portfolio can only be observed through expensive field implementations. We view this problem as an online optimal learning problem with correlated prior beliefs, where the performance of a portfolio of technologies in one project is used to guide choices for future projects. Given the large number of potential portfolios, we propose a learning policy which uses Monte Carlo sampling to narrow down the choice set to a relatively small number of promising portfolios, and then applies a one-period look-ahead approach using knowledge gradients to choose among this reduced set.