Bayesian Machine Learning
ORIE 6741

Fall 2016

Course Information

Title: Bayesian Machine Learning
Course Number: ORIE 6741
Semester: Fall 2016
Times: Tu/Th 11:40 am - 12:55 pm
Rhodes Hall 571  (changed from Hollister Hall 320)

Course Syllabus [PDF]

Andrew Gordon Wilson
Assistant Professor
Rhodes Hall 235
Office Hours: Tuesday 4:00 pm - 5:00 pm, or by appointment


To answer scientific questions, and reason about data, we must build models and perform inference within those models.  But how should we approach model construction and inference to make the most successful predictions?  How do we represent uncertainty and prior knowledge?  How flexible should our models be?  Should we use a single model, or multiple different models?  Should we follow a different procedure depending on how much data are available?

In this course, we will approach these fundamental questions from a Bayesian perspective.  From this perspective, we wish to faithfully incorporate all of our beliefs into a model, and represent uncertainty over these beliefs, using probability distributions.  Typically, we believe the real world is in a sense infinitely complex: we will always be able to add flexibility to a model to gain better performance.  If we are performing character recognition, for instance, we can always account for some additional writing styles for greater predictive success.  We should therefore aim to maximize flexibility, so that we are capable of expressing any hypothesis we believe to be possible.  For inference, we will not have a priori certainty that any one hypothesis has generated our observations.  We therefore typically wish to weight an uncountably infinite space of hypotheses by their posterior probabilities.  This Bayesian model averaging procedure has no risk of overfitting, no how matter how flexible our model.  How we distribute our a priori support over these different hypotheses determines our inductive biases.  In short, a model should distribute its support across as wide a range of hypotheses as possible, and have inductive biases which are aligned to particular applications.

This course aims to provide students with a strong grasp of the fundamental principles underlying Bayesian model construction and inference.  We will go into particular depth on Gaussian process and deep learning models.

The course will be comprised of three units:

Model Construction and Inference: Parametric models, support, inductive biases, gradient descent, sum and product rules, graphical models, exact inference, approximate inference (Laplace approximation, variational methods, MCMC), model selection and hypothesis testing, Occam's razor, non-parametric models.

Gaussian Processes: From finite basis expansions to infinite bases, kernels, function space modelling, marginal likelihood, non-Gaussian likelihoods, Bayesian optimisation.

Bayesian Deep Learning: Feed-forward, convolutional, recurrent, and LSTM networks.

Depending on the available time, we may omit some of these topics.  Most of the material will be derived on the chalkboard, with some supplemental slides.  The course will have both theoretical and practical (e.g. coding) aspects.

After taking this course, you should:
- Be able to think about any problem from a Bayesian perspective.
- Be able to create models with a high degree of flexibility and appropriate inductive biases.
- Understand the interplay between model specification and inference, and be able to construct a successful inference algorithm for a given model.
- Have familiarity with Gaussian process and deep learning models.


Homework 0 has been released.  It is due Tuesday, August 30th, at the beginning of class!
This homework will be graded.  Its main purpose is to indicate the expected prior background of students in the course.  For calibration, at least 25% of the graded questions should be easy to answer without searching at all for any outside resources.  Overall about 80% should be approachable, after searching or reminding yourself of some definitions or identities, or teaching yourself the relevant background material (e.g., reading about Bayes' rule, maximum likelihood, conjugate priors, the sum and product rules of probability, properties of multivariate Gaussian distributions, derivatives of matrices and vectors, etc.).  Do not worry if you find about 10-20% of the material challenging or unfamiliar (e.g., two to four of the question parts)
.  If you did struggle with this assignment, I recommend coming to my office hours to discuss the source of the difficulty.  It's possible that you could catch up on the background, or it may be preferable to first take a more introductory course.

Homework 1 has been released.  It is due Thursday, September 29.


Date Lecture Notes        
Tuesday, August 23
Introduction, Logistics, Overview
HW 0 Released

HW 0 tex

Thursday, August 26
Probability distributions, Linear Regression, Sum and Product Rules, Bayesian Basics  (Conjugate Priors, etc.)
Lecture Notes

Lecture 2 Supplement

Bishop (2006), PRML, Chapters 1-3
Hinton Lectures on Gradient Descent (6.1, 6.2, 6.3)

Cribsheet, Gaussian Identities, Matrix Identities
Tuesday, August 30
Stochastic Gradients,
Occam's razor, Support, Inductive Biases,
Graphical Models I
HW 0 Due

Lecture Notes

Reading Summary (Thesis) Due
MacKay (2003): Chapter 28
C. Rasmussen and Z. Ghahramani,
Occam's Razor, NIPS 2001

Wilson PhD Thesis, Chapter 1, pages 2-5, 8-19.
Learning the Dimensionality of PCA,
Minka, NIPS 2001

Thursday, September 1
Graphical Models II
Reading Summary 
Ch 8 (up to end of 8.2)

Lecture Notes

Written Notes
Required: Bishop (2006), Chapter 8
Tuesday, September 6 Graphical Models III

Summary Ch. 8 due (complete)

Thursday, September 8 Graphical Models

9/13/2016 MCMC

Reading Summary (Murray) Due.
Required: Optional:
  • Mackay Textbook, Ch. 29, 30.
  • C. Bishop, Pattern Recognition and Machine Learning (PRML), Ch. 11
  • R. Neal, Slice Sampling, Annals of Statistics, 2003
  • C. Geyer, Practical Markov chain Monte Carlo, Statistical Science 7(4): 473-492. 1992.
  • J. Geweke, Getting it right: joint distribution tests of posterior simulators, JASA 99(467): 799-804, 2004.
9/15/2016 MCMC, Variational Methods
H1 Released

Variational Methods, Laplace Approximation,
Gaussian Processes I

GP Readings Due

GPML, Preface and Chapter 2
Wilson (PhD Thesis), Chapters 1, 2
9/22/2016 Gaussian Processes II (Kernel Functions, and Marginal Likelihood Learning)
GP Readings Due (Ch, 4,5)
GPML, Chapter 4, 5
9/27/2016 Kernel Learning
HW 2 Released

GP Readings Due (ICML paper)
Gaussian Process Kernels for Pattern Discovery and Extrapolation, ICML 2013
9/29/2016 Gaussian Processes III (Non-Gaussian Likelihoods) HW1 Due

Readings Due (Ch 3)
GPML, Chapter 3
Review Session

10/6/2016 Midterm 1


Gaussian Processes IV (Scalability)

Quinonero-Candela & Rasmussen (2005)
Wilson et. al (2014)
and Nickisch (2015)
Bayesian Optimization
HW 2 Due
Project Proposal Due

Snoek et. al (2012)
A review of Bayesian Optimization
Discrete Bayesian Nonparametrics and the Dirichlet Process Mixture Model

10/25/2016 Feed-forward Neural Networks

10/27/2016 Convolutional Networks
HW 3 Released

11/1/2016 Recurrent Networks and LSTMs

11/3/2016 Bayesian Neural Networks

11/8/2016 Review Session
HW 3 Due

11/10/2016 Midterm II

11/15/2016 Deep Kernel Learning
Midterm report due

11/17/2016 Variational Autoencoder

11/22/2016 Generative Adversarial Networks


11/29/2016 Project Presentations

12/1/2016 Project Presentations

12/7/2016-12/15/2016 Exams Project Due
No final exam for this course