New Statistical Tools for State-of-the-Art Research

New technologies in fields such as genomics, medicine, and neuroscience are generating more data than ever before. And not only more data, but more complex data, with a staggering number of features or variables—sometimes related, sometimes not. The complexity of massive, high-dimensional data sets—coupled with innovative methods of data collection—is creating unprecedented challenges for statistical science. Despite a surge in statistical research on big data analysis, modern data sets have outstripped the ability of existing statistical tools to provide reliable analyses.

With this CAREER award, Yang Ning, Statistics and Data Science, is developing statistical and computational tools to address emerging challenges in the statistical analysis of big data. The goal is a novel computational and statistical framework for high-dimensional M-estimation, a crucial tool for generating robust regressions when analyzing data sets characterized by outliers and other nonstandard conditions.

This research will address two common challenges. First, Ning will consider high-dimensional M-estimation with non-smooth loss functions, such as indicator function, for which the discontinuity of the loss function requires more refined theoretical analysis than is currently available. Second, Ning will consider M-estimation when measurement constraints compel researchers to gather outcomes from a limited subset of a much larger data set.

This research will yield scalable computational algorithms and statistically valid estimation and inference procedures. The resulting tools, implemented within software packages, will benefit a broad range of researchers, including biologists, epidemiologists, medical doctors, and neuroscientists.

Cornell Researchers

Funding Received

$400 Thousand spanning 5 years