Machine Learning Made Easier

Big data sets are everywhere—in science, health, commerce, and government—yet extracting value from this data is a challenge. Every step requires human intervention: cleaning the data, identifying useful features, and choosing a machine-learning model. Automation of these tasks would free data scientists to concentrate on the important questions: Are we solving the right problems? And do we have the right data?

With this CAREER award, Madeleine Udell, Operations Research and Information Engineering/Statistics and Data Science, is developing methods to accelerate and automate the standard machine-learning workflow. This research relies on the central insight that measurements of a complex object, such as a patient in a hospital, a survey respondent, or a machine-learning data set, can be well described as simple or even linear functions of an underlying low-dimensional latent vector. New algorithms and software developed through this research will identify low-dimensional latent vectors and use them to 1) clean data by denoising observations or imputing missing entries, 2) reduce the dimensionality of feature vectors, and 3) recommend better algorithms.

This research aims to democratize machine learning and to promote data-driven decision making by developing automated methods to clean data and to select machine-learning models, including open-source software packages, that make these methods widely available and easy to use. The project will also train data scientists in how to use these models and understand their potential risks.

Cornell Researchers

Funding Received

$500 Thousand spanning 5 years