Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Development of a statistical toolkit for probing protein structure: Experimental design guided by Bayesian statistics

Abstract

In the field of bioinformatics, a wide range of techniques are used to computationally predict properties of peptides and proteins. The body of work presented here centers on the development of a toolset based on principled Bayesian statistical methods for elucidating and predicting various properties of a protein’s structure. The main goal of this work is to provide novel, high-throughput computational methods to be used by researchers interested in gaining insight into protein structural properties prior to spending time and/or laboratory funds performing biophysical experiments. The key characteristic of these methods is that any prior experimental data given to these algorithms informs and refines the prediction, which can be used to define future experiments and yield meaningful experimental results in an expedited fashion. This thesis begins with a presentation of the Molecular Informant algorithm, which utilizes Bayesian model integration to systematically incorporate predictions of protein properties from multiple theoretical models. This computational tool only requires a protein’s primary sequence, and can give predictions of secondary structure or relative solvent accessibility on a per-amino acid basis. Its future applications include prediction of which mutations in a protein’s sequence would result in increased aggregation propensity. In working toward the development of this tool, biophysical experiments were conducted on the human γS-crystallin protein along with two mutational variants (γS-G18V and γS-G106V). These proteins were used as a model system in gathering experimental data on how small changes in a protein’s primary sequence can result in changes in aggregation propensity. Concentrated protein samples were subjected to a range of solution conditions, including varying the pH and temperature. UV-induced aggregates were produced by irradiation with a 355 nm laser at 385 mW. Aggregation was monitored with dynamic light scattering experiments and by measuring solution turbidity. The aggregate types were characterized using thioflavin T assays and x-ray diffraction. Finally, a Bayesian logistic regression model is presented that elucidates the factors that most greatly impact the probability of replicability of a particular amino acid within a given structure solved by x-ray crystallography.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View