ORIE 6340 Mathematics of Data Science

Announcements

  • Welcome to the class!

Instructor Information

instructor: Damek Davis
office hours: M 1:30PM-2:30PM, and by appointment
office: Rhodes Hall 218
email: dsd95 at cornell.edu

teaching assistant: Mateo Diaz
office hours: W 4-5 PM
email: md825 at cornell.edu

Ed Discussions: See canvas

Meeting Times and Location

lecture time: Monday and Wednesday 11:25am - 12:40pm
lecture location: Zoom (see canvas for links)

Course Description

This course is an introduction to an emerging research area broadly described as “Math of Data Science.” This area is highly interdisciplinary, so acquiring the tools necessary to participate is usually an overwhelming, and unsystematic process. ORIE 6340 is an attempt to overcome the current state of affairs.

The topics of the course will include:

  • Concentration of measure phenomena for random vectors and matrices (e.g., subGaussian vectors; McDiarmid; Lipschitz functions; empirical processes; Rademacher complexity);

  • Estimation in high dimensions:

    • Convex Relaxations and Spectral Methods (e.g., SDPs; stochastic block model; max cut; compressive sensing)

    • Direct nonconvex optimization methods (e.g., first-order methods; low-rank matrix estimation: matrix sensing and completion)

Don't let the outline fool you: this is a lot of material. Much of the course will be based on the excellent lecture notes of Bandeira-Singer-Strohmer. Throughout the semester, I will augment these notes with alternative readings (research papers/textbooks) that I find useful (see Resources below). Depending on how quickly we cover the material, we will transition to current research topics as we progress through the course.

Resources

I will assume working knowledge of linear algebra and probability, optimization, and algorithms. I will review necessary facts from optimization and probability, but the more you know about these topics, the better you will be prepared. To that end, you might make use of the following textbooks.

Requirements and Grading

Grading Component: The grade will be based on two components:

  • (40%) There will be (approximately) three homework assignments (To be uploaded).

  • (60%) There will be a final project (completed individually or in groups of two), which may either be a literature review or a research project based on topics similar to those mentioned in the course. Ideally, projects should be highly correlated with your own research interests.

    • Initial Project Proposal: Due Wednesday March 13th

    • Final Report: Due Monday May 6th

    • Presentation: Last Week of Class

Collaboration

Cornell’s Code of Academic Integrity can be found at cuinfo.cornell.edu/Academic/AIC.html.

You may work together on problem sets, but you must write up your own solutions AND acknowledge those with whom you discussed the problem. You must also cite any resources which helped you obtain your solutions.

Problem Sets

Lectures

  • Lecture Videos are posted on Canvas.

  • See Canvas>Files for links to lecture PDFs and references.