The protein structure prediction problem consists of determining a protein’s three-dimensional
structure from the underlying sequence of amino acids. A standard approach for predicting
such structures is to conduct a stochastic search of conformation space in an attempt to find
a conformation that optimizes a scoring function. For one subclass of prediction protocols,
called template-based modeling, a new protein is suspected to be structurally similar to
other proteins with known structure. The solved related proteins may be used to guide the
search of protein structure space.
There are many potential applications for statistics in this area, ranging from the development
of structure scores to improving search algorithms. This dissertation focuses on
strategies for improving structure predictions by incorporating information about closely
related “template” protein structures into searches of protein conformation space. This is
accomplished by generating density estimates on conformation space via various simplifications
of structure models. By concentrating a search for good structure conformations
in areas that are inhabited by similar proteins, we improve the efficiency of our search and
increase the chances of finding a low-energy structure.
In the course of addressing this structural biology problem, we present a number of advances to the field of Bayesian nonparametric density estimation. We first develop a
method for density estimation with bivariate angular data that has applications to characterizing
protein backbone conformation space. We then extend this model to account for
multiple angle pairs, thereby addressing the problem of modeling protein regions instead
of single sequence positions. In the course of this analysis we incorporate an informative
prior into our nonparametric density estimate and find that this significantly improves performance
for protein loop prediction. The final piece of our structure prediction strategy is
to connect side-chain locations to our torsion angle representation of the protein backbone.
We accomplish this by using a Bayesian nonparametric model for dependence that can link
together two or more multivariate marginals distributions. In addition to its application for
our angular-linear data distribution, this dependence model can serve as an alternative to
nonparametric copula methods.
Identifer | oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2010-08-8226 |
Date | 2010 August 1900 |
Creators | Lennox, Kristin Patricia |
Contributors | Dahl, David B. |
Source Sets | Texas A and M University |
Language | en_US |
Detected Language | English |
Type | thesis, text |
Format | application/pdf |
Page generated in 0.0019 seconds