I am interested in a broad collection of problems. Most frequently I work on problems in simulation
optimization and the global optimization of expensive functions, but I have also worked on problems in
computer vision, logistics, information retrieval, neuroscience, inventory control, and drug discovery.
Although they come from different disciplines, all of these problems are tied together by a common question: what is
the most efficient way to collect information?
I study these questions within a mathematical framework that uses decision theory to pose the problem of collecting
information in an optimal way as a Markov decision process (MDP). I then study the solutions of these MDPs using
dynamic programming. This set of mathematical problems and tools is most frequently called sequential design of
experiments, but it is also related to reinforcement learning, decision theory, active learning, budgeted learning, and multi-armed bandits. I often
refer to it as optimal learning because we are asking how we can learn optimally.
My PhD thesis
developed knowledge-gradient methods, a class of sequential Bayesian information collection methods that balance the cost
and benefit of collecting information. I have also received
AFOSR Young Investigator and
NSF CAREER awards
to study Bayesian methods in simulation optimization.
Introductory Materials
I gave a tutorial on Bayesian Methods for Global and Simulation Optimization at the INFORMS Annual meeting in 2011. Here are the slides.
R. Waeber, P.I. Frazier & S.G. Henderson,, Bisection Search with Noisy Responses,
SIAM Journal on Control and Optimization, to appear.
[PDF, Abstract]
Abstract: Bisection search is the most efficient algorithm for locating a unique point X* in [0,1] when we are able to query an oracle only about whether X* lies to the left or right of a point x of our choosing. We study a noisy version of this classic problem, where the oracle’s response is correct only with probability p. The Probabilistic Bisection Algorithm (PBA) introduced in Horstein (1963) can be used to locate X* in this setting. While the method works extremely well in practice, very little is known about its theoretical properties. In this paper, we provide several key findings about the PBA, which lead to the main conclusion that the expected absolute residuals of successive search results, i.e., E[|X*-Xn|], converge to 0 at a geometric rate.
S.C. Clark, R. Egan, Frazier, & Z. Wang,, ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies,
Bioinformatics, 2013.
[PDF, Abstract]
Abstract: Motivation: Researchers need general-purpose methods for objectively evaluating the accuracy of single and metagenome assemblies, and for automatically detecting any errors they may contain. Current methods do not fully meet this need because they require a reference, only consider one of the many aspects of assembly quality, or lack statistical justification, and none are designed to evaluate metagenome assemblies. Results: In this paper we present an Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired end reads), sequencing coverage, read alignment, and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by PacBio sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real PacBio data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process. Availability: ALE is released as open source software under the UoI/NCSA license at http://www.alescore.org. It is implemented in C and Python.
S.E. Chick & Frazier, Sequential Sampling with Economics of Selection Procedures,
Management Science, 2012.
[PDF, Abstract]
Abstract: Sequential sampling problems arise in stochastic simulation and many other applications. Sampling is used to infer the unknown performance of several alternatives before one alternative is selected as best. This paper presents new economically motivated fully sequential sampling procedures to solve such problems, called Economics of Selection Procedures (ESP). The optimal procedure is derived for comparing a known standard with one alternative whose unknown reward is inferred with sampling. That result motivates heuristics when multiple alternatives have unknown rewards. The resulting procedures are more effective in numerical experiments than any previously proposed procedure of which we are aware, and are easily implemented. The key driver of the improvement is the use of dynamic programming to model sequential sampling as an option to learn before selecting an alternative. It accounts for the expected benefit of adaptive stopping policies for sampling, rather than of one-stage policies, as is common in the literature.
I.O. Ryzhov, W.B. Powell & Frazier, The Knowledge-Gradient Algorithm for a General Class of Online Learning Problems,
Operations Research, 2012.
[PDF, Abstract]
Abstract: We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the optimal policy in the classic bandit problem, and outperforms many learning policies in the correlated case.
R. Waeber, Frazier & S.G. Henderson, A Framework for Selecting a Selection Procedure,
ACM Transactions on Modeling and Computer Simulation, 2012.
[PDF, Abstract]
Abstract: For many discrete simulation-optimization applications, it is often difficult to decide which Ranking and Selection (R&S) procedure to use. To efficiently compare R&S procedures, we present a three-layer performance evaluation process. We show that the two most popular performance formulations, namely the Bayesian formulation and the indifference-zone formulation, have a common representation analogous to convex risk measures used in mathematical finance. We then specify how a decision maker can impose a performance requirement on R&S procedures that is more adequate for her risk attitude than the indifference-zone or the Bayesian performance requirements. Such a performance requirement partitions the space of R&S procedures into acceptable and non-acceptable procedures. The minimal computational budget required for a procedure to become acceptable introduces an easy-to-interpret preference order on the set of R&S policies. We demonstrate with a numerical example how the introduced framework can be used to guide the choice of selection procedure in practice.
B. Jedynak, Frazier & R. Sznitman, Twenty Questions with Noise: Bayes Optimal Policies for Entropy Loss,
Journal of Applied Probability, 2012.
[PDF, Abstract]
Abstract: We consider the problem of 20 questions with noisy answers, in which we seek to find a target by repeatedly choosing a set, asking an oracle whether the target lies in this set, and obtaining an answer corrupted by noise. Starting with a prior distribution on the target's location, we seek to minimize the expected entropy of the posterior distribution. We formulate this problem as a dynamic program and show that any policy optimizing the one-step expected reduction in entropy is also optimal over the full horizon. Two such Bayes-optimal policies are presented: one generalizes the probabilistic bisection policy due to Horstein and the other asks a deterministic set of questions. We study the structural properties of the latter, and illustrate its use in a computer vision application.
A.J. Meltzer, A. Graham, P.H. Connolly, J.K. Karwowski, H.L. Bush, Frazier & D.B. Schneider, Risk Factors for Early Failure after Peripheral Endovascular Intervention: Application of a Reliability Engineering Approach,
Annals of Vascular Surgery, 2012.
[PDF, Abstract]
Abstract: Objective: We apply an innovative analytic approach, based on reliability engineering (RE) principles frequently utilized to characterize the behavior of manufactured products, to examine outcomes after peripheral ET. We hypothesized that this would allow for improved prediction of outcome with ET, specifically with regard to identification of risk factors for early failure.
Methods: Patients undergoing ET for chronic lower extremity ischemia from 2005-2010 were identified in a prospectively maintained database. The primary outcome of failure was defined as patency loss detected by duplex ultrasound and/or failure to achieve clinical improvement. Analysis included univariate and multivariate Cox Regression models, as well as RE-based analysis including product life cycle models and Weibull failure plots. Early failures were distinguished using the RE principle of "basic rating life" and multivariate models identified risk factors for early failure.
Results: From 2005-2010, 434 primary endovascular peripheral interventions were performed for claudication (51.8%), rest pain (16.8%), or tissue loss (31.3%). 55% of patients were 75 years of age or older; 43% were women. Failure was noted following 159 (36.6%) interventions during a mean follow-up of 18 months (0-71 months). By multivariate (Cox) regression, rest pain and tissue loss were independent predictors of patency loss with hazard ratios of 2.5 (95% Confidence Interval: 1.6-4.1; P < 0.001) and 3.2 (2.0-5.2; P<0.001), respectively. The distribution of failure times for both claudication and CLI fit distinct Weibull distributions with different characteristics: interventions for claudication demonstrated an increasing failure rate (Beta = 1.22; Theta=13.46; Mean time to failure [MTTF] = 12.603; Index of fit: 0.99037; R2: 0.98084), whereas interventions for CLI demonstrated a decreasing failure rate (DFR), suggesting the predominance of early failures. (Beta = 0.7395; Theta=6.8; MTTF = 8.2; Index of fit: 0.99391; R2: 0.98786). By 3.1 months, 10% of interventions failed. This point (90% reliability) was identified as the "basic rating life." By multivariate analysis of failure data, independent predictors of early failure (before 3.1 months) included tissue loss, long lesion length, chronic occlusions, heart failure, and end stage renal disease.
Conclusions: Application of a reliability engineering framework to the assessment of clinical outcomes after peripheral interventions is feasible, and potentially more informative than traditional statistical analysis. Conceptualization of interventions as "products" permits application of product life cycle models that allow for empiric definition of "early failure" and may lead to the development of individualized duplex surveillance schedules following intervention.
M.R.K. Mes, W.B. Powell & Frazier, Hierarchical Knowledge Gradient for Sequential Sampling,
Journal of Machine Learning Research, 2011.
[PDF, Abstract]
Abstract: We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multi-dimensional vector of categorical and numerical attributes and has independent normal rewards. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledge-gradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. We propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement. This approach greatly reduces the measurement effort required, but it requires some prior knowledge on the smoothness of the function in the form of an aggregation function and computational issues limit the number of alternatives that can be easily considered to the thousands. We prove that our policy is consistent, finding a globally optimal alternative when given enough measurements, and show through simulations that it performs competitively with or significantly better than other policies.
W. Scott, Frazier & W.B. Powell, The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters Using Gaussian Process Regression,
SIAM Journal on Optimization, 2011.
[PDF, Abstract]
Abstract: We extend the concept of the correlated knowledge-gradient policy for ranking and selection to the case of continuous decision variables. We propose an approximate knowledge gradient for problems with continuous decision variables in the context of a Gaussian process regression model, along with an algorithm to maximize the approximate knowledge gradient. In the problem class considered, we use the knowledge gradient for continuous parameters to sequentially choose where to sample an expensive noisy function in order to find the maximum quickly. We show that the knowledge gradient for continuous decisions is a generalization of the efficient global optimization algorithm proposed by Jones, Schonlau, and Welch.
D. Negoescu, Frazier & W.B. Powell, The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery,
INFORMS Journal on Computing, 2011.
[PDF, Abstract, Slides]
Abstract: We present a new technique for adaptively choosing the sequence of molecular compounds to test in drug discovery. Beginning with a base compound, we consider the problem of searching for a chemical derivative of this molecule that best treats a given disease. The problem of choosing the molecules to test to maximize the expected quality of the best compound discovered may be formulated mathematically as a ranking and selection problem, in which each molecule is an alternative. We apply a recently developed algorithm, known as the knowledge-gradient algorithm. This algorithm uses correlations in our Bayesian prior distribution between the performance of different alternatives (molecules) to dramatically reduce the number of molecular tests required, but has heavy computational requirements that limit the number of alternatives that can be considered to a few thousand. We develop computational improvements that allow the knowledge-gradient method to consider much larger sets of alternatives, and demonstrate the method on a problem with 87120 alternatives.
D.M. Blei & Frazier, Distance Dependent Chinese Restaurant Processes,
Journal of Machine Learning Research, 2011.
[PDF, Abstract]
Abstract: We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation.
Frazier & W.B. Powell, Consistency of Sequential Bayesian Sampling Policies,
SIAM Journal on Control and Optimization, 2011.
[PDF, Abstract, Slides]
Abstract: We consider Bayesian information collection, in which a measurement policy collects information to support a future decision. This framework includes ranking and selection, continuous global optimization, and many other problems in sequential experimental design. We give a sufficient condition under which measurement policies sample each measurement type infinitely often, ensuring consistency, i.e., that a globally optimal future decision is found in the limit. This condition is useful for verifying consistency of adaptive sequential sampling policies that do not do forced random exploration, making consistency difficult to verify by other means. We demonstrate the use of this sufficient condition by showing consistency of two previously proposed ranking and selection policies: OCBA for linear loss, and the knowledge-gradient policy with independent normal priors. Consistency of the knowledge-gradient policy was shown previously, while the consistency result for OCBA is new. (This paper was formerly titled "Convergence to Global Optimality with Sequential Bayesian Sampling Policies".)
Frazier & W.B. Powell, Paradoxes in Learning and the Marginal Value of Information,
Decision Analysis, 2010.
[PDF, Abstract, Slides]
Abstract: We consider the Bayesian ranking and selection problem, in which one wishes to allocate an information collection budget as efficiently as possible to choose the best among several alternatives. In this problem, the marginal value of information is not concave, leading to algorithmic difficulties and apparent paradoxes. Among these paradoxes is that when there are many identical alternatives, it is often better to ignore some completely and focus on a smaller number than it is to spread the measurement budget equally across all the alternatives. We analyze the consequences of this nonconcavity in several classes of ranking and selection problems, showing that the value of information is "eventually concave," i.e., concave when the number of measurements of each alternative is large enough. We also present a new fully sequential measurement strategy that addresses the challenge that nonconcavity it presents.
Frazier, W.B. Powell & S. Dayanik, The Knowledge-Gradient Policy for Correlated Normal Beliefs,
INFORMS Journal on Computing, 2009.
[PDF, Abstract, Slides]
Abstract: We consider a Bayesian ranking and selection problem with independent normal rewards and a correlated multivariate normal belief on the mean values of these rewards. Because this formulation of the ranking and selection problem models dependence between alternatives' mean values, algorithms may use this dependence to perform efficiently even when the number of alternatives is very large. We propose a fully sequential sampling policy called the knowledge-gradient policy, which is provably optimal in some special cases and has bounded suboptimality in all others. We then demonstrate how this policy may be applied to efficiently maximize a continuous function on a continuous domain while constrained to a fixed number of noisy measurements.
Frazier, W.B. Powell & S. Dayanik, A Knowledge-Gradient Policy for Sequential Information Collection,
SIAM Journal on Control and Optimization, 2008.
[PDF, Abstract, Slides]
Abstract: In a sequential Bayesian ranking and selection problem with independent normal populations and common known variance, we study a previously introduced measurement policy which we refer to as the knowledge-gradient policy. This policy myopically maximizes the expected increment in the value of information in each time period, where the value is measured according to the terminal utility function. We show that the knowledge-gradient policy is optimal both when the horizon is a single time period and in the limit as the horizon extends to infinity. We show furthermore that, in some special cases, the knowledge-gradient policy is optimal regardless of the length of any given fixed total sampling horizon. We bound the knowledge-gradient policy's suboptimality in the remaining cases, and show through simulations that it performs competitively with or significantly better than other policies.
Working Papers
Frazier, A Fully Sequential Elimination Procedure for Indifference-Zone Ranking and Selection with Tight Bounds on Probability of Correct Selection,
in review .
[PDF, Abstract, Slides]
Abstract: (Formerly titled: "Indifference-Zone Ranking and Selection for More Than 15,000 Alternatives") We consider the indifference-zone (IZ) formulation of the ranking and selection problem with independent normal samples. In this problem, we must use stochastic simulation to select the best among several noisy simulated systems, with a statistical guarantee on solution quality. Existing IZ procedures sample excessively in problems with many alternatives, in part because loose bounds on probability of correct selection lead them to deliver solution quality much higher than requested. Consequently, existing IZ procedures are seldom considered practical for problems with more than a few hundred alternatives. To overcome this, we present a new sequential elimination IZ procedure, called BIZ (Bayes-inspired Indifference Zone), whose lower bound on worst-case probability of correct selection in the preference zone is tight in continuous time, and nearly tight in discrete time. To the author’s knowledge, this is the first sequential elimination procedure with tight bounds on worst-case preference-zone probability of correct selection for more than two alternatives. Theoretical results for the discrete-time case assume that variances are known and have an integer multiple structure, but the BIZ procedure itself can be used when these assumptions are not met. In numerical experiments, the sampling effort used by BIZ is significantly smaller than that of another leading IZ procedure, the KN procedure of Kim and Nelson (2001), especially on the largest problems tested (2^14 = 16,384 alternatives).
Abstract: Latent feature models are widely used to decompose data into a small number of components. Bayesian nonparametric variants of these models, which use the Indian buffet process (IBP) as a prior over latent features, allow the number of features to be determined from the data. We present a generalization of the IBP, the distance dependent Indian buffet process (dd-IBP), for modeling non-exchangeable data. It relies on a distance function defined between data points, biasing nearby data to share more features. The choice of distance function allows for many kinds of dependencies, including temporal or spatial. Further, the original IBP is a special case of the dd-IBP. In this paper, we develop the dd-IBP and theoretically characterize the distribution of how features are shared between data. We derive a Markov chain Monte Carlo sampler for a linear Gaussian model with a dd-IBP prior and study its performance on several data sets for which exchangeability is not a reasonable assumption.
J. Xie & Frazier, Sequential Bayes-Optimal Policies for Multiple Comparisons with a Control,
in review.
[PDF, Abstract]
Abstract: We consider the problem of efficiently allocating simulation effort to determine which of several simulated systems have mean performance exceeding a known threshold. This determination is known as multiple comparisons with a control. Within a Bayesian formulation, the optimal fully sequential policy for allocating simulation effort is the solution to a dynamic program. We show that this dynamic program can be solved efficiently, providing a tractable way to compute the Bayes-optimal policy. The solution uses techniques from optimal stopping and multi-armed bandits. We then present further theoretical results characterizing this Bayes-optimal policy, compare it numerically to several approximate policies, and apply it to an application in ambulance positioning.
I.O. Ryzhov, Frazier & W.B. Powell, A New Optimal Stepsize Rule for Approximate Dynamic Programming,
in review.
[Abstract]
Abstract: Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large scale transportation problems, scheduling, health care and energy systems. The design of eff ective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new convergent stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. Without any tunable parameters, the new rule adapts to the level of noise in the measurements and produces faster convergence in a wide range of problems.
D. Negoescu, Frazier & W.B. Powell, Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales,
in preparation.
[PDF, Abstract]
Abstract: In this paper, we consider a version of the newsvendor problem in which the demand for newspapers is unknown and the lost sales are unobservable. This problem serves as a surrogate for more complex supply chain problems in which learning plays a role. We propose the knowledge gradient (KG) policy, which considers how much is learned about the demand distribution from each order. This policy combines computability with good performance. We analyze the behavior of this policy under two demand distributions: an exponential distribution with a gamma prior, under which the optimal policy is computable, and a general demand distribution with finite support and Dirichlet prior under which the optimal policy is very hard to compute. In the exponential demand case we contrast the KG policy with two naive policies and the optimal policy, and in the general demand case we contrast the KG policy with a naive policy. Our simulations show similar behavior to the optimal policy in the exponential demand case, and good performance in the general case.
Conference Publications
R. Sznitman, A. Lucchi, B. Jedynak, Frazier, & P. Fua,, An Optimal Policy for Target Localization with Application to Electron Microscopy,
International Conference on Machine Learning (ICML), 2013.
[PDF, Abstract]
Abstract: This paper considers the task of finding a target location by making a limited number of sequential observation. Each observation results from evaluating an imperfect classifier of a chosen cost and accuracy on an interval of chosen length and position. Within a Bayesian framework, we study the problem of minimizing an objective that combines the entropy of the posterior distribution with the cost of the questions asked. In this problem, we show that the one-step lookahead policy is Bayes-optimal for any arbitrary time horizon. Moreover, this one-step lookahead policy is easy to compute and implement. We then use this policy in the context of localizing mitochondria in electron microscope images, and experimentally show that significant speed ups in acquisition can be gained, while maintaining near equal image quality at target locations, when compared to current policies.
Frazier, Tutorial: Optimization via Simulation with Bayesian Statistics and Dynamic Programming,
Winter Simulation Conference, 2012.
[PDF, Abstract, Slides]
Abstract: Bayesian statistics comprises a powerful set of methods for analyzing simulated systems. Combined with dynamic programming and other methods for sequential decision making under uncertainty, Bayesian methods have been used to design algorithms for finding the best of several simulated systems. When the dynamic program can be solved exactly, these algorithms have optimal average-case performance. In other situations, this dynamic programming analysis supports the development of approximate methods with sub-optimal but nevertheless good average-case performance. These methods with good average-case performance are particularly useful when the cost of simulation prevents the use of procedures with worst-case statistical performance guarantees. We provide an overview of Bayesian methods used for selecting the best, providing an in-depth treatment of the simpler case of ranking and selection with independent priors appropriate for smaller-scale problems, and then discussing how these same ideas can be applied to correlated priors appropriate for large-scale problems.
Frazier, B. Jedynak, & L. Chen, Sequential Screening: A Bayesian Dynamic Programming Analysis of Optimal Group-Splitting,
Winter Simulation Conference, 2012.
[PDF, Abstract, Slides]
Abstract: Sequential screening is the problem of allocating simulation effort to identify those input factors that have an important effect on a simulation's output. In this problem, sophisticated algorithms can be substantially more efficient than simulating one factor at a time. We consider this problem in a Bayesian framework, in which each factor is important independently and with a known probability. We use dynamic programming to compute the Bayes-optimal method for splitting factors among groups within a sequential bifurcation procedure (Bettonvil & Kleijnen 1997). We assume importance can be tested without error. Numerical experiments suggest that existing group-splitting rules are optimal, or close to optimal, when factors have homogeneous importance probability, but that substantial gains are possible when factors have heterogeneous probability of importance.
S. Zhang, P. Hanagal, Frazier, A.J. Meltzer, & D.B. Schneider, Optimal Patient-specific Postoperative Surveillance for Vascular Surgery,
7th INFORMS Workshop on Data Mining and Health Informatics 2012.
[PDF, Abstract, Slides]
Abstract: We optimize post-operative surveillance schedules for patients who have undergone surgery to restore peripheral blood flow. Post-operative surveillance is performed using periodic duplex ultrasound tests to detect loss of blood flow before it poses a serious risk to the patient. Currently, tests are performed according to the same schedule for all patients, even though problems tend to occur early in some patient groups, and later in others. Moreover, the current schedule is not optimized to when failures occur. We provide a statistical procedure for estimating, from censored observations, the joint distribution of the time at which a problem can be detected in a follow-up visit, and the possibly later time at which the problem threatens the health of the patient. We then find optimal patient-specific follow-up schedules based on this estimation procedure, and compare its performance to the schedule currently in use.
J. Xie, Frazier, S. Sankaran, A. Marsden, & S. Elmohamed, Optimization of Computationally Expensive Simulations with Gaussian Processes and Parameter Uncertainty: Application to Cardiovascular Surgery,
50th Annual Allerton Conference on Communication, Control, and Computing, 2012.
[PDF, Abstract, Slides]
Abstract: In many applications of simulation optimization, the random output variable whose expectation is being optimized is a deterministic function of a low-dimensional random vector. This deterministic function is often expensive to compute, making simulation optimization difficult. Motivated by an application in the design of grafts for heart surgery, we use Bayesian methods to design an algorithm that exploits this random vector's low-dimensionality to improve performance.
R. Waeber, Frazier & S.G. Henderson, A Bayesian Approach to Stochastic Root Finding,
Winter Simulation Conference, 2011.
[PDF, Abstract]
Abstract: A stylized model of one-dimensional stochastic root finding involves repeatedly querying an oracle as to whether the root lies to the left or right of a given point x. The oracle answers this question, but the received answer is incorrect with probability 1-p(x). A Bayesian-style algorithm for this problem that assumes knowledge of p() repeatedly updates a density giving, in some sense, one's belief about the location of the root. We demonstrate how the algorithm works, and provide some results that shed light on its performance, both when p() is constant and when p() varies with x.
Frazier, J. Xie & S.E. Chick, Bayesian Optimization via Simulation with Correlated Sampling and Correlated Prior Beliefs,
Winter Simulation Conference, 2011.
[PDF, Abstract, Slides]
Abstract: We consider optimization via simulation over a finite set of alternatives. We employ a Bayesian value-of-information approach in which we allow both correlated prior beliefs on the sampling means and correlated sampling. Correlation in the prior belief allow us to learn about an alternative's value from samples of similar alternatives. Correlation in sampling, achieved through common random numbers, allow us to reduce the variance in comparing one alternative to another. We allow for a more general combination of both types of correlation than has been offered previously in the Bayesian ranking and selection literature. We do so by giving an exact expression for the value of information for sampling the difference between a pair of alternatives, and derive new knowledge-gradient methods based on this valuation.
Frazier & A.M. Kazachkov, Guessing Preferences: A New Approach to Multi-Attribute Ranking and Selection,
Winter Simulation Conference, 2011.
[PDF, Abstract, Slides]
Abstract: We consider an analyst tasked with using simulation to help a decision-maker choose among several decision alternatives. Each alternative has several competing attributes, e.g., cost and quality, that are unknown but can be estimated through simulation. We model this problem in a Bayesian context, where the decision-maker's preferences are described by a utility function, but this utility function is unknown to the analyst. The analyst must choose how to allocate his simulation budget among the alternatives in the face of uncertainty about both the alternatives' attributes, and the decision-maker's preferences. Only after simulation is complete are the decision-maker's preferences revealed. In this context, we calculate the value of the information in simulation samples, and propose a new multi-attribute ranking and selection procedure based on this value. This procedure is able to incorporate prior information about the decision-maker's preferences to improve sampling efficiency.
R. Waeber, Frazier & S.G. Henderson, Performance Measures for Ranking and Selection Procedures,
Winter Simulation Conference, 2010.
[PDF, Abstract, Slides]
Abstract: To efficiently compare Ranking and Selection procedures, we present a three-layer performance evaluation process. The two most popular formulations, namely the Bayes and Indifference Zone formulations, have a common representation analogous to convex risk measures used in quantitative risk management. We study decision makers' acceptance sets via an axiomatic approach and introduce a new performance measure using computational cost.
D. Blei & Frazier , Distance Dependent Chinese Restaurant Processes,
International Conference on Machine Learning, 2010.
[PDF, Abstract]
Abstract: We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with time-dependent models and three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation.
I.O. Ryzhov, Frazier & W.B. Powell, On the Robustness of a One-Period Look-Ahead Policy in Multi-Armed Bandit Problems,
International Conference on Computational Stochastics, 2010.
[PDF, Abstract]
Abstract: We analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies.
Frazier, W.B. Powell & H.P. Simao, Simulation Model Calibration with Correlated Knowledge-Gradients,
Winter Simulation Conference, 2009.
[PDF, Abstract, Slides]
Abstract: We address the problem of calibrating an approximate dynamic programming model, where we need to find a vector of parameters to produce the best fit of the model against historical data. The problem requires adaptively choosing the sequence of parameter settings on which to run the model, where each run of the model requires approximately twelve hours of CPU time and produces noisy non-stationary output. We describe an application of the knowledge-gradient algorithm with correlated beliefs to this problem and show that this algorithm finds a good parameter vector out of a population of one thousand with only three runs of the model.
S. Chick & Frazier, The Conjunction of the Knowledge Gradient and the Economic Approach to Simulation Selection,
Winter Simulation Conference, 2009.
[PDF, Abstract, Slides]
Abstract: This paper deals with the selection of the best of a finite set of systems, where best is defined with respect to the maximum mean simulated performance. We extend the ideas of the knowledge gradient, which accounts for the expected value of one stage of simulation, by accounting for the future value of the option to simulate over multiple stages. We extend recent work on the economics of simulation, which studied discounted rewards, by balancing undiscounted simulation costs and the expected value of information from simulation runs. This contribution results in a diffusion model for comparing a single simulated system with a standard that has a known expected reward, and new stopping rules for fully sequential procedures when there are multiple systems. These stopping rules are more closely aligned with the expected opportunity cost allocations that are effective in numerical tests. We demonstrate an improvement in performance over previous methods.
Frazier, W.B. Powell, S. Dayanik & P. Kantor, Approximate Dynamic Programming in Knowledge Discovery for Rapid Response,
Hawaii International Conference on Systems Science, 2009.
[PDF, Abstract]
Abstract: One knowledge discovery problem in the rapid response setting is the cost of learning which patterns are indicative of a threat. This typically involves a detailed follow-through, such as review of documents and information by a skilled analyst, or detailed examination of a vehicle at a border crossing point, in deciding which suspicious vehicles require investigation. Assessing various strategies and decision rules means we must compare not only the short term effectiveness of interrupting a specific traveler, or forwarding a specific document to an analyst, but we must also weigh the potential improvement in our profiles that results even from sending a “false alarm”. We show that this problem can be recast as a dynamic programming problem with, unfortunately, a huge state space. Several specific heuristics are introduced to provide improved approximations to the solution. The problems of obtaining real-world data to sharpen the analysis are discussed briefly.
Frazier & W.B. Powell, The Knowledge-Gradient Stopping Rule for Ranking and Selection,
Winter Simulation Conference, 2008.
[PDF, Abstract, Slides]
Abstract: We consider the ranking and selection of normal means in a fully sequential Bayesian context. By considering the sampling and stopping problems jointly rather than separately, we derive a new composite stopping/sampling rule. The sampling component of the derived composite rule is the same as the previously introduced LL1 sampling rule, but the stopping rule is new. This new stopping rule significantly improves the performance of LL1 as compared to its performance under the best other generally known adaptive stopping rule, EOC Bonf, outperforming it in every case tested.
Frazier & A.J. Yu, Sequential Hypothesis Testing under Stochastic Deadlines,
Neural Information Processing Systems, 2007.
[PDF, Abstract, Slides]
Abstract: Most models of decision-making in neuroscience assume an {\it infinite} horizon, which yields an optimal solution that integrates evidence up to a fixed decision threshold; however, under most experimental as well as naturalistic behavioral settings, the decision has to be made before some finite deadline, which is often experienced as a stochastic quantity, either due to variable external constraints or internal timing uncertainty. In this work, we formulate this problem as sequential hypothesis testing under a stochastic horizon. We use dynamic programming tools to show that, for a large class of deadline distributions, the Bayes-optimal solution requires integrating evidence up to a threshold that declines monotonically over time. We use numerical simulations to illustrate the optimal policy in the special cases of a fixed deadline and one that is drawn from a gamma distribution.
Frazier & W.B. Powell, The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards,
IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007.
[PDF, Abstract, Slides]
Abstract: We formulate an offline learning problem in which the values of a finite number of alternatives must be measured as efficiently as possible. These alternatives are assumed to have independent and normally distributed rewards. We define a knowledge gradient policy, show how to compute its decisions efficiently, and demonstrate through Monte Carlo simulations that it performs as well or better than several existing learning policies.
Book Chapters and PhD Thesis
Frazier, Decision-Theoretic Foundations of Simulation Optimization,
Wiley Encyclopedia of Operations Research & Management Science, 2010.
[PDF, Abstract]
Abstract: In simulation optimization, a task appearing frequently in applications, we wish to find a set of inputs to a simulator that cause its output to be maximal in some sense. Within any simulation optimization algorithm we must make a sequence of decisions about which input to test at each point in time. The problem of making this sequence of decisions so as to discover a near-optimal point as quickly as possible may be understood within a decision-theoretic framework. We describe this decision-theoretic framework in the context of two problems: ranking and selection and Bayesian global optimization. This decision-theoretic framework provides both a way of better understanding existing algorithms, and a path to developing new algorithms.
Frazier, Learning with Dynamic Programming,
Wiley Encyclopedia of Operations Research & Management Science, 2010.
[PDF, Abstract]
Abstract: We consider the role of dynamic programming in sequential learning problems. These problems require deciding which information to collect in order to best support later actions. Such problems are ubiquitous, appearing in simulation, global optimization, revenue management, and many other areas. Dynamic programming offers a coherent framework for understanding and solving Bayesian formulations of these problems. We present the dynamic programming formulation applied to a canonical problem, Bernoulli ranking and selection. We then review other sequential learning problems from the literature and the role of dynamic programming in their analysis.
Abstract: Optimal learning addresses the problem of efficiently collecting information with which to make decisions. These problems arise in both offline settings (making a series of measurements, after which a decision is made) and online settings (the process of making a decision results in observations that change the distribution of belief about future observations). Optimal learning is an issue primarily in applications where observations or measurements are expensive. These include expensive simulations (where a single observation might take a day or more), laboratory sciences (testing a drug compound in a lab), and field experiments (testing a new energy saving technology in a building). This tutorial provides an introduction to this problem area, covering important dimensions of a learning problem and introducing a range of policies for collecting information.
Abstract: We consider the class of fully sequential Bayesian information collection problems, a class that includes ranking and selection problems, multi-armed bandit problems, and many others. Although optimal policies for such problems are generally known to exist and to satisfy Bellman's recursion, the curses of dimensionality prevent us from actually computing them except in a few very special cases. Motivated by this difficulty, we develop a general class of practical and theoretically well-founded information collection policies known as {\it knowledge-gradient} (KG) policies. KG policies have several attractive qualities: they are myopically optimal in general; they are asymptotically optimal in a broad class of problems; they are flexible and may be computed easily in a broad class of problems; and they perform well numerically in several well-studied ranking and selection problems compared with other state-of-the-art policies designed specifically for these problems.
Teaching
ORIE 3800: Information Systems and Analysis, Fall 2011-2013 (2011 syllabus)
ORIE 3120: Industrial Systems and Data Analysis, Spring 2011-2013
The two movies above illustrate how the correlated knowledge gradient method may be used to find the maximum of a continuous function when the only way to learn about this function is through noisy measurements. The red lines shows the current estimate of the unknown function as well as the uncertainty in this estimate. The blue circles are measurement points, and the black line (right plot) and green squares (left plot) show the true value of the function. See "The Knowledge-Gradient Policy for Correlated Normal Beliefs" above for more details.
The two movies above illustrate the independent knowledge gradient method finding the best of several discrete alternatives, again through noisy measurement. The right plot illustrates the case for which the method was designed, in which the discrete alternatives have no relationship to each other, while the left plot illustrates what happens when you use the independent knowledge gradient to find the maximum of a continuous function at discretized location. To obtain faster convergence for such continuous problems, one can use the correlated knowledge gradient instead.
Presentations
Simulation Optimization Workshop, Vina del Mar, Chile, Mar 2013.
NSF CAREER, "Methodology for Optimization via Simulation: Bayesian Methods, Frequentist Guarantees, and Applications to Cardiovascular Medicine," 7/2013-6/2018.
Air Force Office of Scientific Research Young Investigator Program, "Decision Theoretic Methods for Simulation Optimization," 6/2011-5/2014.
NSF, "Collaborative Research: Discovery and Social Analytics for Large-Scale Scientific Literature",
with P. Kantor (PI), D. Blei, P. Ginsparg, T. Joachims. 01/2013-12/2015.
Air Force Office of Scientific Research,
"Optimal Learning for Efficient Experimentation in Nanotechnology and Biochemistry",
with W.B. Powell (PI), 7/1/2012-6/30/2015.
NSF EAGER, "Adaptive Methods for Scalable Dissemination and Retrieval of Scientific Information",
with P. Kantor (PI), D. Blei, P. Ginsparg, T. Joachims.
8/2011-7/2012.
Planning Grants for Collaborations Between Cornell University-Ithaca and Weill Cornell Medical College Faculty,
"Modeling Reliability and Failure of Endovascular Abdominal Aortic Aneurysm Repair"
with D.B. Schneider (PI) and A. Meltzer.
6/1/2012-5/30/2013.
Cornell Center for a Sustainable Future, "Replacing Antibiotic Therapy in the Food Animal Industry with Bacteriophage Therapy," with R. Bicalho (PI) and T. Joachims. 9/2010-1/2012.
Verizon Wireless,
"Driving Business Value through Supply Chain Analytics,"
with P. Jackson (PI), J. Muckstadt, D. Shmoys, H. Topaloglu, P. Rusmevichientong, O. Gao, and K. Caggiano.
12/2009-11/2010.