Return to search

Probabilistic protein homology modeling

Searching sequence databases and building 3D models for proteins are important tasks
for biologists. When the structure of a query protein is given, its function can be inferred. However, experimental methods for structure prediction are both expensive and
time consuming. Fully automatic homology modeling refers to building a 3D model for
a query sequence from an alignment to related homologous proteins with known structure (templates) by a computer. Current prediction servers can provide accurate models
within a few hours to days. Our group has developed HHpred, which is one of the top
performing structure prediction servers in the field.
In general, homology based structure modeling consists of four steps: (1) finding homologous templates in a database, (2) selecting and (3) aligning templates to the query, (4)
building a 3D model based on the alignment.
In part one of this thesis, we will present improvements of step (2) and (4). Specifically,
homology modeling has been shown to work best when multiple templates are selected
instead of only a single one. Yet, current servers are using rather ad-hoc approaches to
combine information from multiple templates. We provide a rigorous statistical framework for multi-template homology modeling. Given an alignment, we employ Modeller to calculate the most probable structure for a query. The 3D model is obtained
by optimally satisfying spatial restraints derived from the alignment and expressed as
probability density functions. We find that the query’s atomic distance restraints can
be accurately described by two-component Gaussian mixtures. Moreover, we derive statistical weights to quantify the redundancy among related templates. This allows us to
apply the standard rules of probability theory to combine restraints from several templates. Together with a heuristic template selection strategy, we have implemented this
approach within HHpred and could significantly improve model quality. Furthermore,
we took part in CASP, a community wide competition for structure prediction, where
we were ranked first in template based modeling and, at the same time, were more than
450 times faster than all other top servers.
Homology modeling heavily relies on detecting and correctly aligning templates to the
query sequence (step (1) and (3) from above). But remote homologies are difficult to
detect and hard to align on a pure sequence level. Hence, modern tools are based on
profiles instead of sequences. A profile summarizes the evolutionary history of a given
sequence and consists of position specific amino acid probabilities for each residue. In
addition to the similarity score between profile columns, most methods use extra terms
that compare 1D structural properties such as secondary structure or solvent accessibility. These can be predicted from local profile windows.
In the second part of this thesis, we develop a new score that is independent of any predefined structural property. For this purpose, we learn a library of 32 profile patterns that
are most conserved in alignments of remotely homologous, structurally aligned proteins.
Each so called “context state” in the library consists of a 13-residue sequence profile.
We integrate the new context score into our Hmm-Hmm alignment tool HHsearch and
improve especially the sensitivity and precision of difficult pairwise alignments significantly.
Taken together, we introduced probabilistic methods to improve all four main steps in
homology based structure prediction.

Identiferoai:union.ndltd.org:MUENCHEN/oai:edoc.ub.uni-muenchen.de:17129
Date27 June 2014
CreatorsMeier, Armin
PublisherLudwig-Maximilians-Universität München
Source SetsDigitale Hochschulschriften der LMU
Detected LanguageEnglish
TypeDissertation, NonPeerReviewed
Formatapplication/pdf
Relationhttp://edoc.ub.uni-muenchen.de/17129/

Page generated in 0.0024 seconds