Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Multi-scale analysis of sequence and regulatory information in Escherichia coli

Abstract

Biological information is encoded and transmitted by nucleic acids. Next-generation sequencing technologies have unleashed a flood of large-scale genomics and transcriptomics data capturing this information flow. Here, we develop three analytical frameworks for deriving biological knowledge from this data at multiple scales, using Escherichia coli as a model. First, we introduce the Bitome, a single-base-pair resolution representation of genome annotation information for a genome sequence. This binarized construct highlights the uneven patterning of genomic information. Moreover, we leverage this information representation to classify genes based on adaptive mutability and to quantitatively predict mRNA transcript levels based on promoter sequence. Next, we analyze sequence variation in non-coding regions across 2,350 E. coli strains. We demonstrate that annotated functional non-coding features are significantly conserved. We also highlight the sufficiency of non-coding alleles to segment phylogroups, and contrast adaptive mutations with wild-type variation. Finally, we construct a high-precision, single-protocol 1,035-sample RNA-seq compendium called PRECISE-1K. Using unsupervised machine learning, we extract 201 independently-modulated groups of genes (iModulons) that capture the majority of the known transcriptional regulatory network. iModulons also reveal novel regulons and uncover a binding-site basis for different functional behavior within the same regulon. In combination, this expression and regulatory information constitute a knowledge base that may be applied towards the analysis of new data. As a whole, this work introduces a multi-scale suite of analytical tools that enable study of information flow by converting big data to biological knowledge.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View