The NSF awards Dawn Woodard a grant to study the tractability of Bayesian computation

Mark Eisner (ORIE); August 13, 2012

"Statistics courses almost never touch on ideas from computational complexity," according to ORIE Assistant Professor Dawn Woodard. That situation may change if Woodard is successful in overcoming what she says is "arguably the greatest barrier to wider adoption of Bayesian statistical approaches," namely, uncertainty about how efficiently answers to Bayesian formulations can be computed. The National Science Foundation (NSF) has provided Woodard a grant so she can pursue the computational tractability of methods that are increasingly being used in statistics and operations research in ORIE and elsewhere.

Bayesian Statistics

Bayesian statistical approaches derive from a formula due to 18th century clergyman Thomas Bayes. His formula can be used to estimate probability distributions of quantities of interest, for example the travel times ambulances would use to get to a medical emergency. Statisticians and operations researchers can estimate the distributions by applying observed data - obtained from GPS and other sources for the ambulance example - to a "prior" estimate of the desired distribution, yielding a "posterior" estimate. In the ambulance example, knowledge of the resulting distribution facilitates the computation of the best route for any service response. For this and other examples, Bayesian methods can provide estimates that are superior to those obtained from purely local data, e.g. about travel time on road segments, but using these methods may be computationally intense with unpredictable accuracy, according to Woodard.

Are Bayesian methods tractable?

Woodard points out that the field of "Bayesian statistics currently suffers from a lack of verifiably accurate computational methods." In many complex and high-dimensional Bayesian statistical computations, such as those requiring the use of sophisticated Monte Carlo Markov Chain methods that move at random through the space of possible solutions, it is hard to know how long the computation should be allowed to run in order to achieve a specified accuracy. Woodard proposed that the NSF fund a study of Bayesian statistical approaches from the perspective of computational complexity, a perspective that is characteristic of research on optimization methods but less common in statistics.

Operations researchers and computer scientists at Cornell and elsewhere are concerned with whether a computational method scales efficiently (hopefully linearly or quadratically but at most polynomially) in the 'size' of the problem, as measured in some rigorous way. For many problems, it is now known that there may be no such method. In statistics, if no such method exists for a given problem or model, Woodard calls the model intractable. The NSF has now funded her proposal, titled "Bayesian Computation, Guaranteed Efficient (or Intractable), to distinguish between tractable and intractable Bayesian statistical problems.

Applicability of the research

Woodard's proposal is based on her experience in industry and academia with problems such as the ambulance example mentioned earlier (which she studied extensively on a prior grant from NSF) and as well understanding the demand facing a large 'cloud' datacenter and how well it responds, based on the real-time analysis of data. She credits the fact that "I am a statistician housed in an operations research department" with the opportunity to incorporate studies of such cases, employing Bayesian statistics, into the curriculum of engineering and management statistics courses.

Putting it together

"Since my undergraduate days I have been interning and consulting for research labs and software companies where my colleagues were primarily computer scientists," Woodard said. In fact she initially faced a choice between studying for a Ph.D. in a computer science department or in a statistics department. An experienced statistician with expertise in Bayesian methods, Woodard has "had plenty of exposure to ideas and techniques from computational complexity theory," as well, and considers them to be "a critical part of a well-rounded understanding of modern applied math and engineering."