Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Application of Statistical Methods to Integrative Analysis of Genomic Data

Abstract

The genomic revolution has resulted in both the development of techniques for obtaining large quantities of genomic data rapidly and a striking increase in our knowledge on genomics. At the same time, the genomic revolution also created numerous open questions and challenges in analyzing the enormous amount of data required to gain insights on the underlying biological mechanisms. This dissertation addresses these challenges by answering fundamental questions arising from two closely related fields, functional genomics and pharmacogenomics, utilizing the nature and biology of microarray datasets.

In the functional genomic study, we try to identify pathway genes which are a group of genes that work cooperatively in the same pathway constituting a fundamental functional grouping in a biological process. Identifying pathway genes has been one of the major tasks in understanding biological processes. However, due to the difficulty in characterizing/inferring different types of biological gene relationships, as well as several computational issues arising from dealing with high-dimensional biological data, deducing genes in pathways remains challenging. In this study, we elucidate higher level gene-gene interactions by evaluating the conditional dependencies between genes, i.e. the relationships between genes after removing the influences of a set of previously known pathway genes. These previously known pathway genes serve as seed genes in our model and guide the detection of other genes involved in the same pathway. The detailed statistical techniques involve the estimation of a precision matrix whose elements are known to be proportional to partial correlations (i.e. conditional dependencies) between genes under appropriate normality assumptions. Likelihood ratio tests on two forms of precision matrices are further performed to see if a candidate pathway gene is conditionally independent of all the previously known pathway genes. When used effectively, this is shown to be a promising technique to recover gene relationships that would have otherwise gone undetected by conventional methods. The advantage of the proposed method is demonstrated using both simulation studies and real datasets. We also demonstrate the importance of taking into account experimental dependencies in the simulation and real data studies.

In the pharmacogenomic study, genetic variants causing inter-individual variation in drug response are investigated. Specifically, signature genes which contribute to the high and low responder variation in statin efficacy are discovered. Using Nonnegative Matrix Factorization (NMF) method, we first identify two distinct molecular patterns between the high and low responder groups. Based on this separation, the modified Significance Analysis Microarrays (SAM) method further searches for signature genes which had gone undetected by the original SAM method. In the biological validation studies, our gene signatures are shown to be significantly enriched with HMGCR-correlated genes. Furthermore, a notable difference is observed in the amount of HMGCR enzymatic activity change between the high and low

responder groups - the high responder group shows a bigger activity decrease, implying that statin inhibits the HMGCR enzymatic activity more efficiently in the high responder groups. This helps us understand why the high responder group shows a greater decrease in low density lipoprotein cholesterol (LDLC) level and higher statin efficacy than the low responder group. Overall, the discovered gene signatures are shown to have high biological relevance to the cholesterol biosynthesis pathway, which HMGCR mainly acts on.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View