Bayesian Machine Learning
ORIE 6741

Fall 2016

Course Information

Title: Bayesian Machine Learning
Course Number: ORIE 6741
Semester: Fall 2016
Times: Tu/Th 11:40 am - 12:55 pm
Room:
Rhodes Hall 571  (changed from Hollister Hall 320)

Course Syllabus [PDF]
 
Instructor

Andrew Gordon Wilson
Assistant Professor
Rhodes Hall 235
Website: https://people.orie.cornell.edu/andrew
E-Mail: andrew@cornell.edu
Office Hours: Tuesday 4:00 pm - 5:00 pm, or by appointment

Overview

To answer scientific questions, and reason about data, we must build models and perform inference within those models.  But how should we approach model construction and inference to make the most successful predictions?  How do we represent uncertainty and prior knowledge?  How flexible should our models be?  Should we use a single model, or multiple different models?  Should we follow a different procedure depending on how much data are available?

In this course, we will approach these fundamental questions from a Bayesian perspective.  From this perspective, we wish to faithfully incorporate all of our beliefs into a model, and represent uncertainty over these beliefs, using probability distributions.  Typically, we believe the real world is in a sense infinitely complex: we will always be able to add flexibility to a model to gain better performance.  If we are performing character recognition, for instance, we can always account for some additional writing styles for greater predictive success.  We should therefore aim to maximize flexibility, so that we are capable of expressing any hypothesis we believe to be possible.  For inference, we will not have a priori certainty that any one hypothesis has generated our observations.  We therefore typically wish to weight an uncountably infinite space of hypotheses by their posterior probabilities.  This Bayesian model averaging procedure has no risk of overfitting, no how matter how flexible our model.  How we distribute our a priori support over these different hypotheses determines our inductive biases.  In short, a model should distribute its support across as wide a range of hypotheses as possible, and have inductive biases which are aligned to particular applications.

This course aims to provide students with a strong grasp of the fundamental principles underlying Bayesian model construction and inference.  We will go into particular depth on Gaussian process and deep learning models.

The course will be comprised of three units:

Model Construction and Inference: Parametric models, support, inductive biases, gradient descent, sum and product rules, graphical models, exact inference, approximate inference (Laplace approximation, variational methods, MCMC), model selection and hypothesis testing, Occam's razor, non-parametric models.

Gaussian Processes: From finite basis expansions to infinite bases, kernels, function space modelling, marginal likelihood, non-Gaussian likelihoods, Bayesian optimisation.

Bayesian Deep Learning: Feed-forward, convolutional, recurrent, and LSTM networks.

Depending on the available time, we may omit some of these topics.  Most of the material will be derived on the chalkboard, with some supplemental slides.  The course will have both theoretical and practical (e.g. coding) aspects.

After taking this course, you should:
- Be able to think about any problem from a Bayesian perspective.
- Be able to create models with a high degree of flexibility and appropriate inductive biases.
- Understand the interplay between model specification and inference, and be able to construct a successful inference algorithm for a given model.
- Have familiarity with Gaussian process and deep learning models.

Announcements


Homework 0 has been released.  It is due Tuesday, August 30th, at the beginning of class!
This homework will be graded.  Its main purpose is to indicate the expected prior background of students in the course.  For calibration, at least 25% of the graded questions should be easy to answer without searching at all for any outside resources.  Overall about 80% should be approachable, after searching or reminding yourself of some definitions or identities, or teaching yourself the relevant background material (e.g., reading about Bayes' rule, maximum likelihood, conjugate priors, the sum and product rules of probability, properties of multivariate Gaussian distributions, derivatives of matrices and vectors, etc.).  Do not worry if you find about 10-20% of the material challenging or unfamiliar (e.g., two to four of the question parts)
.  If you did struggle with this assignment, I recommend coming to my office hours to discuss the source of the difficulty.  It's possible that you could catch up on the background, or it may be preferable to first take a more introductory course.

Homework 1 has been released.  It is due Thursday, September 29.

Schedule



Date Lecture Notes        
Readings
Tuesday, August 23
Introduction, Logistics, Overview
HW 0 Released

HW 0 tex

Thursday, August 26
Probability distributions, Linear Regression, Sum and Product Rules, Bayesian Basics  (Conjugate Priors, etc.)
Lecture Notes

Lecture 2 Supplement

Optional:
Bishop (2006), PRML, Chapters 1-3
Hinton Lectures on Gradient Descent (6.1, 6.2, 6.3)

Cribsheet, Gaussian Identities, Matrix Identities
Tuesday, August 30
Stochastic Gradients,
Occam's razor, Support, Inductive Biases,
Graphical Models I
HW 0 Due

Lecture Notes

Reading Summary (Thesis) Due
MacKay (2003): Chapter 28
C. Rasmussen and Z. Ghahramani,
Occam's Razor, NIPS 2001

Wilson PhD Thesis, Chapter 1, pages 2-5, 8-19.
Learning the Dimensionality of PCA,
Minka, NIPS 2001


Thursday, September 1
Graphical Models II
Reading Summary 
Ch 8 (up to end of 8.2)
due

Lecture Notes

Written Notes
Required: Bishop (2006), Chapter 8
Tuesday, September 6 Graphical Models III

Reading
Summary Ch. 8 due (complete)



Thursday, September 8 Graphical Models



9/13/2016 MCMC

Reading Summary (Murray) Due.
Required: Optional:
  • Mackay Textbook, Ch. 29, 30.
  • C. Bishop, Pattern Recognition and Machine Learning (PRML), Ch. 11
  • R. Neal, Slice Sampling, Annals of Statistics, 2003
  • C. Geyer, Practical Markov chain Monte Carlo, Statistical Science 7(4): 473-492. 1992.
  • J. Geweke, Getting it right: joint distribution tests of posterior simulators, JASA 99(467): 799-804, 2004.
9/15/2016 MCMC, Variational Methods
H1 Released

                9/20/2016
Variational Methods, Laplace Approximation,
Gaussian Processes I

GP Readings Due

GPML, Preface and Chapter 2
Wilson (PhD Thesis), Chapters 1, 2
9/22/2016 Gaussian Processes II (Kernel Functions, and Marginal Likelihood Learning)
GP Readings Due (Ch, 4,5)
GPML, Chapter 4, 5
9/27/2016 Kernel Learning
HW 2 Released

GP Readings Due (ICML paper)
Gaussian Process Kernels for Pattern Discovery and Extrapolation, ICML 2013
9/29/2016 Gaussian Processes III (Non-Gaussian Likelihoods) HW1 Due


GP
Readings Due (Ch 3)
GPML, Chapter 3
10/4/2016
Review Session


10/6/2016 Midterm 1


Break


           10/13/2016
Gaussian Processes IV (Scalability)

Quinonero-Candela & Rasmussen (2005)
Wilson et. al (2014)
Wilson
and Nickisch (2015)
           10/18/2016
Bayesian Optimization
HW 2 Due
Project Proposal Due

Snoek et. al (2012)
A review of Bayesian Optimization
           10/20/2016
Discrete Bayesian Nonparametrics and the Dirichlet Process Mixture Model


10/25/2016 Feed-forward Neural Networks


10/27/2016 Convolutional Networks
HW 3 Released

11/1/2016 Recurrent Networks and LSTMs
 

11/3/2016 Bayesian Neural Networks


11/8/2016 Review Session
HW 3 Due

11/10/2016 Midterm II


11/15/2016 Deep Kernel Learning
Midterm report due

11/17/2016 Variational Autoencoder


11/22/2016 Generative Adversarial Networks


Break


11/29/2016 Project Presentations


12/1/2016 Project Presentations






12/7/2016-12/15/2016 Exams Project Due
No final exam for this course