Objectives

The overall program is structured around three specific aims:

Aim 1: Train scientists with proficiency in modeling and UQ. Trainees will develop foundations through coursework and methodological research under the supervision of top researchers, and they will benefit from vertical integration of undergraduate through post-doctoral researchers. They will shape and attend a Distinguished Seminar Series and “UQ Hackathon” that invites nationally- and internationally-renowned researchers and disseminate their work in top journals and conferences.

Aim 2: Create an interdisciplinary culture and solve important life science problems. Trainees will develop communication and collaborative skills being exposed to a range of applications, mentored by a committee with varied expertise, and traveling to conferences in mathematics, statistics, and life sciences. They will create and consume content on communication at the NC State Libraries and Data Science Academy.

Aim 3: Strengthen and diversify the STEM workforce. We propose to recruit a diverse group of trainees and retain them by establishing a supportive and vigorous research community. We will increase participation from historically underrepresented groups via research stipends for undergraduates and the establishment of a Bridge-to-PhD program. Trainees will be given tailored professional development mentoring, networking opportunities and gain experience in communication and leadership.

RTG trainees will develop expertise in one of the following methodological areas:

  • Bayesian calibration of mathematical models: A rigorous analysis of mathematical models and observational data simultaneously confronts many sources of uncertainty including unknown model parameters, discrepancy between the chosen mathematical model and reality, and error and correlation in the observation process. Bayesian statistics coherently deals with all of these uncertainties in a unified framework, and produces a model that is calibrated in the sense that the parameter and model-discrepancy distributions are driven by the data to produce agreement between the model and the observations.
  • Sensitivity analysis and subset selection: Mathematical models in life sciences often contain a large number of uncertain and unknown parameters. Quantifying the impact of these parameters on the model output is crucial for understanding system dynamics, designing experiments, guiding parameter inference and dimension reduction.
  • Equation learning: Differential equations based on physical principles can be used to represent complex dynamical systems in all fields of science. Given that the true dynamics of these systems are unknown for all but the most simple systems, learning the governing equations can improve our understanding of the mechanisms driving the systems. Recently, there has been increased interest in using data-driven approaches to learn these equations including sparse regression, symbolic regression, and deep models using either sparse or symbolic regression techniques.
  • Topological data analysis: TDA’s ability to capture and represent essential features in complex data has made it a powerful tool for data analysis with applications in various domains, including biology, engineering, and physics. Mathematical modeling, machine learning, and UQ rely on the challenging task of accurately characterizing underlying data. Taking into account spatial properties of data we will develop topological methods to characterize complexity of data in life-sciences and extract essential features from its shape for downstream modeling and UQ analysis.

All RTG trainees will work on a specific application in parallel to their methodological work, including:

  • Climate and meteorology: Climate change is one of the most pressing issues facing humanity and requires climate model simulations to understand the risks under different climate scenarios. However, the climate system is incredibly complex and thus model output requires statistical calibration and UQ. A particular challenge is studying extreme events such as flooding because flood extent is driven by both local topology and large-scale climate drivers, and it is difficult to adequately model both spatial scales simultaneously.
  • Physiology: Mathematical modeling has a long tradition of understanding physiological processes, differentiating disease phenotypes, and exploring patient-specific treatments. One of the challenges with such models is the lack of high quality data, sparse measurements, and high model complexity because systems often need to be studied both at the organ and the systems level. Physiological models typically combine insights from physics with empirical descriptors, and for models rendered patient-specific, the models often do not perfectly describe the data.
  • Ecology: Species distributions and population dynamics are important areas of research for many ecosystems. Species distribution models (SDMs) are a common approach for relating species information to environmental conditions across space and time to predict current and future species distributions. Bayesian hierarchical models can be used to estimate unknown parameters and the underlying process model dynamics
  • Infectious disease modeling: Modeling and predicting the spread of an infectious disease affecting human population is essential for devising optimal mitigation strategies. However, mathematically modeling individuals’ behavior is a daunting task and thus simple parametric models should be used cautiously, and uncertainty about the model and its parameters should be properly accounted for in downstream decision making.