Assistant Professor
Richard and Sybil Smith Sesquicentennial Fellow
Operations Research and Information Engineering (ORIE)
Graduate field member, ORIE, Stats, CS, ECE, and CAM,
and Data Science,
Cornell University

## contact

office: Rhodes 227

email: udell@cornell.edu

October 2021. New paper accepted at NeurIPS this year, joint with collaborators Will Stephenson and Tamara Broderick, and my student Zach Frangella: Can we globally optimize cross-validation loss? We seek to understand when the loss landscape of logistic regression is easy to optimize, and when local minima might be lurking. Most importantly, the figures are psychedelic!

September 2021. Want to solve large scale linear systems faster? Take a look at our new paper, with student Zach Frangella and collaborator Joel Tropp: Randomized Nystrom Preconditioning. We show how to form a randomized preconditioner that accelerates the convergence of CG. It works for any linear system, square (directly) or rectangular (by operating on the normal equations), using spectral decay at the top of the spectrum to accelerate convergence. It's most useful for dense linear systems and ones with fast spectral decay, and provides a provable speedup for regularized systems.

June 2021. Thanks to Microsoft Research for hosting an awesome series on automated machine learning. Here is my talk on Structured Models for Automated Machine Learning.

June 2021. Thanks to the Fields institute for hosting a great session on Low Rank Models and Applications! Here's my contrarian talk about when low rank models don't work, and what to use instead: imputing missing data with the Gaussian copula, based on several papers led by student Yuxuan Zhao.

March 2021. What a pleasure to return to the Women in Data Science (WIDS) conference! Here are the slides for my workshop on Automating machine learning.

February 2021. I'm honored to be selected as one of the 2021 Sloan Research Fellows.

August 2020. I had a delightful time talking with researchers in Melbourne about dimensionality reduction. What a pleasure to give a talk in Melbourne without getting on an airplane!

May 2020. Two student papers accepted at KDD 2020! Chengrun Yang led the work on a tensor method for automated machine learning, and Yuxuan Zhao developed a parameter-free semiparametric method for missing value imputation using the Gaussian copula.

April 2020. My ONR YIP proposal, DREAMI: Dimension Reduction for Efficient Automated Machine Intelligence, was funded for $529K. Thanks to the ONR for their support! March 2020. I've posted my thoughts on the coronavirus outbreak, with links to the most reputable (informative) sources I can find. Summary: if business-as-usual continues in the US I would not be surprised to see US hospitals overwhelmed by coronavirus cases in April. I'm encouraging students not to come to class sick and providing all my students with means to complete all their course assignments remotely (including videolinks for lectures) and encourage all faculty to do the same. February 2020. My NSF CAREER proposal, Accelerating Machine Learning with Low Dimensional Structure, was funded for$550K. Thanks to the NSF for their support!

January 2020. My paper Why are big data matrices approximately low rank? with Alex Townsend is currently the most read article in the SIMODS journal.

Summer 2019. Congratulations to postdoc Jicong Fan for his oral presentation at CVPR on high-rank matrix completion, and to PhD student Chengrun Yang for his oral presentation at KDD on OBOE: a recommender systems approach to automated machine learning.

March 2019. I gave a plenary talk at the Women in Data Science (WiDS) Global Conference, reaching a worldwide audience of 100,000 participants. It was fantastic to see the depth of female talent in data science. Here's my talk on filling in missing data with low rank models. My interview with The Cube resulted in a nice news article about my research and teaching cleaning up big messy data.

February 2019. Why are big data matrices approximately low rank? Alex Townsend and I provide one answer in the first issue (!) of the new SIAM journal on the Mathematics of Data Science (SIMODS). Also covered in an interview in SIAM news.

February 2019. Can an algorithm racially discriminate if it doesn't know peoples’ race? (Hint: yes.) Student author Xiaojie Mao presented our work on fair decision making when the protected class is unobserved at the FAT* conference. The work was also covered by the Cornell Chronicle. I was interviewed for a related article about using social media to price life insurance.

January 2019. I'm teaching a new PhD level topics class on Optimization for Machine Learning (ORIE 7191). We'll read classic and cutting-edge papers at the interface of optimization and machine learning, guided by two questions: 1) Can we use classical ideas in optimization to better understand (and improve) algorithms for challenging problems in machine learning? 2) How can modern insights in machine learning guide the design of new and improved methods for optimization?

September 2018. Congratulations to NIPS student authors Xiaojie Mao (Causal Inference with Noisy and Missing Covariates via Matrix Factorization - poster) and Sam Zhou (Limited Memory Kelley's Method Converges for Composite Convex and Submodular Objectives - spotlight).

January 2017. I'm co-teaching ORIE/CS 1380, Data Science for All, in Spring 2018 with Michael Clarkson. Data science has become a fundamental skill for understanding the world and making decisions, and we're excited to teach these skills to students from any discipline  —  without any prerequisite skills  —  who may go on to do important data-driven work in their own disciplines.

October 2017. My paper on Optimal Design of Efficient Rooftop Photovoltaic Arrays with Oliver Toole at Aurora Solar won second place in INFORMS’ Doing Good with Good OR (DGWGOR) Prize! It uses OR techniques to design cheaper, safer, more energy efficient solar arrays than human experts.

June 2017. I had a great time at JuliaCon; every year I'm amazed to see new (and awesome) functionality and packages. Absurd pedant that I am, I talked about how to describe a mathematical function. Here's a video of the talk.

May 2017. We had a great workshop at ACC on Control Engineering in Julia. You can find slides and demos on the workshop's GitHub repo. Thanks to my co-organizers Cristian Rojas and Mikael Johansson!

April 2017. We're running a workshop at ICDM on Data-driven Discovery of Models (D3M), together with Christophe Giraud-Carrier and Ishanu Chattopadhyay. Please submit your papers! Deadline is August 7.

March 2017. My grant proposal for research on Composable Robust Structured Data Inference was selected for funding under DARPA's program on Data Driven Discovery of Models (D3m). Looking forward to automatically constructing models for data with the other performers!

March 2017. My paper Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage with Alp Yurtsever, Joel Tropp, and Volkan Cevher was selected for an oral presentation at AISTATS 2017.

January 2017. I'll be teaching a class on Convex Optimization at Cornell in Spring of 2017. We'll be roughly following Stanford's EE364a (and some of EE364b), using the excellent textbook by Boyd and Vandenberghe, with an additional emphasis on first order methods.

November 2016. (Most) data scientists did a terrible job predicting the results of the 2016 election. Did that matter for the outcome? I analyze the data in a lecture on the limits  —  and dangers  —  of predictive modeling.

October 2016. Thanks to Cornell's Scientific Software Club for inviting me to give an introduction to Julia, and asking great questions! Here are my slides + demos, which start with basic syntax and proceed to show off advanced capabilities like multi-language integration, shared memory parallelism, and mathematical optimization packages.

September 2016. I'm teaching a new class at Cornell on Learning with Big Messy Data. Interestingly, the course itself is generating a bunch of big messy data, from lecture slides to demos to Piazza posts to project repos. Next step: train an AI to learn how to learn with big messy data from this big messy data?

June 2016. Damek Davis, Brent Edmunds and I just posted a paper on a (provably convergent) stochastic asynchronous optimization method called SAPALM for fitting generalized low rank models. It turns out asynchrony barely affects the rate of convergence (per flop), while providing a linear speedup in the number of flops per second. In other words: it's fast!

May 2016. Congratulations to Ramchandran Muthukumar and Ayush Pandey for their fantastic proposals to work with me on Convex.jl this summer through Google Summer of Code. Ayush will be adding support for complex numbers, while Ramchandran develops a fast presolve routine.

March 2016. It was great meeting incoming PhD students at the ORIE visiting student days! Here are the slides I presented to introduce students to some of my research.

November 2015. H2O is a new framework for large scale machine learning, and has just released a great implementation of generalized low rank models (engineered by Anqi Fu). Here are the slides and the video from my talk at H2O World.