Statistical Inference for Reliable Machine Learning

A core capability of intelligence is the power to reason about hidden information. To meet this need, many artificial intelligence platforms learn hidden information from observed data by constructing a statistical model and then running a statistical inference algorithm. When learning from a large amount of data, however, statistical inference algorithms can take a long time to run; or worse, they may run quickly but return the wrong answer.

With this CAREER award, Christopher De Sa, Computer Science, is building new statistical inference algorithms that will run efficiently on very large data sets and very complicated models while having provable reliability guarantees. The project will focus on Markov chain Monte Carlo (MCMC) methods, a class of statistical inference algorithm that works by simulating a random process that converges to a desired statistical model. MCMC methods can give highly accurate statistical estimates, but current approaches scale poorly to large data sets and complicated models. This research will generate new algorithms that can scale to large data sets and large models by way of data subsampling and asynchronous parallelism, respectively. The project will maintain a focus throughout on proving theoretical guarantees that express the degree of confidence to be placed in a statistical inference generated by MCMC methods and that expose the trade-off between scalability and reliability.

This research has the goal of improving the reliability of scalable statistical inference. It will also further education in artificial intelligence through the development of open-source course resources that give students hands-on experience with how scalability and reliability interact in machine-learning systems.

Cornell Researchers

Funding Received

$422 Thousand spanning 5 years