Return to search

Fragment Based Protein Active Site Analysis Using Markov Random Field Combinations of Stereochemical Feature-Based Classifications

Recent improvements in structural genomics efforts have greatly increased the
number of hypothetical proteins in the Protein Data Bank. Several computational
methodologies have been developed to determine the function of these proteins but
none of these methods have been able to account successfully for the diversity in
the sequence and structural conformations observed in proteins that have the same
function. An additional complication is the
flexibility in both the protein active site
and the ligand.
In this dissertation, novel approaches to deal with both the ligand flexibility
and the diversity in stereochemistry have been proposed. The active site analysis
problem is formalized as a classification problem in which, for a given test protein,
the goal is to predict the class of ligand most likely to bind the active site based
on its stereochemical nature and thereby define its function. Traditional methods
that have adapted a similar methodology have struggled to account for the
flexibility
observed in large ligands. Therefore, I propose a novel fragment-based approach to
dealing with larger ligands. The advantage of the fragment-based methodology is
that considering the protein-ligand interactions in a piecewise manner does not affect
the active site patterns, and it also provides for a way to account for the problems
associated with
flexible ligands. I also propose two feature-based methodologies to account for the diversity observed
in sequences and structural conformations among proteins with the same function.
The feature-based methodologies provide detailed descriptions of the active site
stereochemistry and are capable of identifying stereochemical patterns within the
active site despite the diversity.
Finally, I propose a Markov Random Field approach to combine the individual
ligand fragment classifications (based on the stereochemical descriptors) into a single
multi-fragment ligand class. This probabilistic framework combines the information
provided by stereochemical features with the information regarding geometric constraints
between ligand fragments to make a final ligand class prediction.
The feature-based fragment identification methodology had an accuracy of 84%
across a diverse set of ligand fragments and the mrf analysis was able to succesfully
combine the various ligand fragments (identified by feature-based analysis) into one
final ligand based on statistical models of ligand fragment distances. This novel
approach to protein active site analysis was additionally tested on 3 proteins with very
low sequence and structural similarity to other proteins in the PDB (a challenge for
traditional methods) and in each of these cases, this approach successfully identified
the cognate ligand. This approach addresses the two main issues that affect the
accuracy of current automated methodologies in protein function assignment.

Identiferoai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2009-05-550
Date2009 May 1900
CreatorsPai Karkala, Reetal
ContributorsIoerger, Thomas R.
Source SetsTexas A and M University
LanguageEnglish
Detected LanguageEnglish
TypeBook, Thesis, Electronic Dissertation, text
Formatapplication/pdf

Page generated in 0.0015 seconds