Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Leveraging Similar Regions to Improve Genome Data Processing

Abstract

Though DNA sequencing has improved dramatically over the past decade, variant calling, which is the process of reconstructing a patient’s genome from the reads that the sequencers produce, remains a difficult problem, largely due to the genome’s redundant structure. In this thesis, we describe SiRen, our algorithm for characterizing the genome’s structure in a way that makes sense from the perspective of the reads themselves. We use the term similar regions to refer to the areas of redundancy that we have identified. We then confirm that the similar regions are characterized by low variant calling accuracy. We show that the structure of the similar regions provides a platform for repairing alignment errors, thus leading to significantly improved variant calling accuracy.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View