Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Latent Variable Models: Maximum Likelihood Estimation and Microbiome Data Analysis

Abstract

Data analysis often involves modeling complex relationships among many variables, some of which are unobserved. This type of analysis is usually tackled by latent variable models, which are graphical models consisting of both observed variables and latent variables. In this work, we delve into the computational aspect and the application aspect of latent variable models. On the computational side, we unify and extend stochastic gradient based maximum likelihood estimation methods for latent variable models under a framework called Hierarchical Model Stochastic Gradient Descent (HMSGD). Numerical studies have shown that certain extensions are more computationally efficient compared to the Monte Carlo Expectation Maximization (MCEM) algorithm. On the application side, we develop a non-parametric graphical model for microbiome data, and apply the framework to analyze the statistical properties of rarefaction, a popular normalization technique in microbiome data analysis. We show that rarefaction helps guarantee validity of permutation inference. We introduce the sample rarefaction efficiency index as a preliminary data-driven indicator of statistical efficiency of rarefied data compared to original data. Using the nonparametric graphical model, we propose a rarefaction-based nonparametric statistical testing procedure, the combined correlation permutation test, to assess whether library sizes are associated with microbial compositions conditioning on the grouping variable of interest. Case studies have shown that such associations are not uncommon in practice.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View