Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Graphon Estimation by Empirical Bayes Approach and Causal Discovery from Multiple Populations

Abstract

Graph is a natural representation of network data. Over the decades many researches have been conducted on graph theory, graphical models and statistical network analysis. Two main kinds of graphs: undirected graphs and directed graphs, each have their own developments and help solve different kinds of problems. Our works made contributions to these two regimes: nodes clustering and estimation in undirected graphs, and causal structure estimation using directed graphs.

In the first part of the dissertation, we focus on one type of undirected graphical model: the graphon (W-graph), including the stochastic block model as a special case. It has been widely used in modeling and analyzing network data. This random graph model is well-characterized by its graphon function, and estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on detecting the latent space of the model, while adopting simple maximum likelihood or Bayesian estimates for the graphon or connectivity parameters given the identified latent variables. In this project, we propose a hierarchical model and develop a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function. Based on the likelihood of our hierarchical model, we further introduce a new model selection criterion for choosing the number of communities. Numerical results on extensive simulations and two well-annotated social networks demonstrate the superiority of our approach in terms of estimation accuracy and model selection.

In the second part of the dissertation, we focus on one of the most popular directed graphical models: Bayesian networks. The intuition of our work came from the liquid association theory, which claims that gene regulatory strength differs by the cellular states. We encode this phenomenon into a statistical model and propose an algorithm to discover causal relations from observational data generated from different populations. We analyze the relationship of edges with different weights and coefficients of node wise regression in two populations. And we use this observed relationship to orient undirected edges in completed partially directed acyclic graphs (cpDAGs). Numerical results on simulations and a real data example show the effectiveness of our algorithm and its improvement on existing structural learning methods.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View