Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Effective design and analysis of genetic association studies

Abstract

Genetic association studies are an effective means of discovering associations between genetic variants and diseases. The procedure of association studies can be summarized into four stages of design, sample collection, analysis, and follow-up. There exist many statistical and computational challenges in the design and analysis stages of these studies. These challenges are closely related to exploring the correlation structure of genetic variations in the genome called linkage disequilibrium (LD). In this dissertation, I address some of these challenges and propose solutions which effectively leverage the information in LD patterns. Multiple hypothesis testing correction is the major challenge in the analysis stage. It is difficult to assess the statistical significance of associations in association studies because a large number of correlated tests are simultaneously performed. Previous approaches are either inaccurate or prohibitively inefficient. I propose a novel multiple testing correction method which takes advantage of the local LD patterns by using a sliding-window approach. My method is highly accurate and efficient, effectively replacing the current approaches. Estimating statistical power of a study design is a necessary procedure in the design stage to avoid under- or over-powered study. Current approaches are either inefficient or too conservative because they ignore the correlation between tests. I propose a method which takes into account the LD patterns to estimate statistical power of a study design efficiently and accurately. Tag SNP selection problem is a widely-known challenge in the design stage. I propose a power-based tag SNP selection algorithm which greedily chooses SNPs to maximize the study power. My method outperforms other correlation only- based methods, because I take advantage of the relation between LD and power by accounting for allele frequencies. In the analysis stage, detecting spurious associations is a challenging problem. I propose a novel method which detects spurious associations at the post-association stage using the LD information. Moreover, I extend this framework to propose a new study scheme which "rescues" associations at markers that are excluded by quality controls. My method is applied to the WTCCC dataset to identify a novel association which is recently replicated

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View