Data-Driven Exploration of Dynamic Biological Processes

Genomics, proteomics, and metabolomics are creating new opportunities to understand complex, dynamic biological processes such as organ development, immune response, and disease progression. Recent experimental advances generate huge biological data sets, making it possible to investigate these biological systems in a data-driven fashion at high temporal resolution. Most time-course omics data sets present analytical challenges, however, because of their high-dimensionality and the non-linear relationships among components of biological systems. Tackling these challenges will require sophisticated dimension-reduction techniques that are biologically meaningful, computationally efficient, and allow uncertainty quantification.

Sumanta Basu, Statistics and Data Science/Computational Biology—with Andrew G. Clark, Molecular Biology and Genetics/Computational Biology, and Martin T. Wells, Social Statistics/Statistics and Data Science/Computational Biology/Biostatistics in Healthcare Policy and Research, and Myung Hee Lee, Medicine at Weill Cornell Medicine—is developing a statistical approach that incorporates prior biological information. Their approach is a known signaling pathway membership and protein-protein interactions in order to enable analysis of high-dimensional biological systems from limited data samples.

The team is developing several techniques: an empirical Bayes framework for clustering omics time-course data, using prior biological knowledge; a quantile-based Granger causality framework for learning interactions among genes or metabolites from their lead-lag relationships; and a decision tree ensemble framework for searching cascades of interactions among genes from their temporal expression profiles.

The interdisciplinary team will analyze time-course omics data from three projects: innate immune response systems in drosophila, developmental process in mouse models, and longitudinal metabolite profiling of tuberculosis patients. The research will elucidate data-driven, testable hypotheses regarding the regulatory architecture of biological processes. It will potentially impact clinical practice by monitoring disease progression and prognosis.

NIH Award Number: 1R01GM135926-01

Cornell Researchers

Funding Received

$1.4 Million spanning 4 years