Krinsman, William Edward

Statistics of High-Throughput Characterization of Microbial Interactions

2022

Krinsman, William Edward
Advisor(s): van der Laan, Mark

Abstract

An active area of research interest is the inference of ecological models of complex microbial communities. Inferring such ecological models entails understanding the interactions between microbes and how they affect each other’s growth. This dissertation employs a statistical perspective to contribute further to the knowledge currently addressing this problem.

Part I explains how high-throughput droplet-based microfluidics technology can be used to screen for microbial interactions. An explicit, statistical framework is motivated and developed that can guide the analysis of data from such experiments. Chapter 1 investigates the specific questions that need to be answered to study microbial interactions. Chapter 1 explains why high-throughput droplet-based microfluidics technology overcomes previous limitations to answering these questions. It is shown how answering these questions can be recast as the statistical problem of estimating a network with mixed-sign edge weights. Chapter 2 investigates how to approach these questions using statistical models. Chapter 2 explains that the data from the noisy dynamical systems corresponding to each droplet can be understood as censored observations of a multivariate Markov process. The statistical understanding of a droplet’s initial state is identified as crucial to overcoming the main limitation of these experiments, the uncontrolled assignment of microbes to droplets.

Part II explains how it might be possible to predict, based on the experimental setup, how much data will be produced to infer given microbial interactions. Running the experiment once without incubating the droplets turns out to be necessary to make such pre- dictions. Chapter 3 investigates which statistical (working) models can be used to describe a droplet’s initial state. Chapter 3 shows that specific assumptions justify a default working model. New working models are derived by relaxing each of these assumptions. Chapter 4 investigates whether the failure of any of these assumptions leads to substantially new behav- ior. Chapter 4 demonstrates that log likelihood ratios can be used to answer this question. Failure of the sampling without replacement assumption turns out to have negligible effects in practice, but failures of the other assumptions could be important. Chapter 5 investigates how failures of the relevant assumptions affect the targeted estimands that enable the pre- diction of how much data will be produced to infer given microbial interactions. Chapter 5 confirms that more severe failures of these assumptions lead to more severe discrepancies with the predictions derived from the default working model. The nature of the effect depends on the chosen grouping of droplets defining the targeted estimands. Chapter 6 investigates how to estimate failures of the relevant assumptions from the data produced by unincubated droplets. Chapter 6 presents both plugin and maximum likelihood estimators for doing so. Failures of these assumptions are shown to be understandable non-parametrically.

Part III demonstrates the feasibility of inferring microbial interactions from the data produced by these experiments. Relevant ideas from the microbiological and ecological literature are recast into an explicit, statistical framework. Chapter 7 investigates how a particular measure of relative fitness can be recast into the statistical framework of average treatment effects. Chapter 7 explains how violations of positivity assumptions are inevitable for this problem, making controlling for confounding difficult. Explicit assumptions are given under which the estimands are identifiable from the observed data produced by incubated droplets, even though initial states of the droplets are not directly observed. Chapter 8 investigates how comparisons of ecological interactions can be recast into the statistical framework of loss functions for signed networks. Chapter 8 explains how avoiding unexpected behavior requires loss functions for signed networks to satisfy what is called herein “the double penalization principle”. Starting from loss functions of unsigned networks, several examples of loss functions for signed networks are derived that satisfy this property.

Future work will explore further choices that can be made when modelling this problem, how to connect these ideas to more sophisticated statistical methodologies, and the biological interpretation of the results of applying these ideas to real experimental datasets. This work demonstrates the plausibility of characterizing microbial interactions using high-throughput droplet-based microfluidics technologies, and hopefully will guide the analysis of data produced by such experiments in the future.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley

Statistics of High-Throughput Characterization of Microbial Interactions