Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Challenges in High-throughput Data Analysis: Proteomic Data Pre-processing and Network Methods for Integrating Multiple Data Types

Abstract

1) Proteomic Data Pre-processing: Quantification and Normalization of Luminex Assay System

High through-put genomic and proteomic technologies allow rapid analysis of molecular targets of thousands of genes at a time, either at the DNA, RNA or protein level. In these type of experiments variations in expression measurements can occur from a variety of sources. Our goal was to examine measurement and normalization techniques to reduce the experimental variation in data derived from a bead-based multiplex Luminex assay system which allows simultaneous measurements of proteins. Normalization for the Luminex assay system requires a fundamentally different approach than the case of traditional microarrays. In the Luminex assay system, each experimental unit is a plate and each plate has results for multiple subjects and analytes. We quantified performance among different measurement systems (fluorescent intensity, background in fluorescent intensity, and observed concentration) in both high and standard scanning systems. Various normalization techniques (scale normalization, quantile normalization, lowess curve normalization) were adapted to the Luminex data scenario and their performance was compared in two datasets.

We used the coefficient of variation across plates to compare the performance of normalization methods. Median and Lowess normalizations appeared to result in reducing plate- to-plate variation the most. Quantile normalization does not appear to work well for these datasets. Our results suggest that simple normalizations such as scale and lowess curve normalizations perform better than complex methods such as quantile normalization. Complex methods may add noise and bias to the normalized adjustment when the assumptions are not met.

2) Integration of microRNA and mRNA by Weighted Gene Co-expression Network Analysis

We focus on the step-by-step network construction and module detection of mRNAs by weighted gene co-expression network analysis (WGCNA), followed by identifying the strong correlation between miRNA and module eigengenes. We then evaluate whether the predicted mRNA targets are differentially present between a given module and other modules by using the Fisher's exact test. We retained miRNAs who are significant in the fisher exact test, and are strongly correlated with eigengenes in a module.

Next we relate modules to disease status by using eigengene network methodology, we found that 11 out of 13 modules are significantly related with disease status. Enrichment analyses by DAVID software are implemented for the 11 modules.

We also run step-by-step network construction and module detection of miRNAs and found 6 modules. We used LASSO regression to explore the relationship between miRNA and mRNAs. The predictors are module eigengene of miRNA and the outcome is the eigengene from each mRNA module.

We found that 1 miRNA "hsa_miR_25" is significantly anti-correlated with mRNA Magenta module. "hsa_miR_25" belongs to the miRNA module "blue" that is also predictive to Magenta mRNA module through LASSO regression. Its putative mRNA targets are found and integrated from the renal dataset.

In conclusion, the weighted co-expression network analysis provides a novel integrative view of miRNA and their putative genes. It also greatly alleviates the multiple testing problems that plague standard gene-centric methods.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View