An increasing number of proteins with unknown function have their three-dimensional structure solved at high resolution. This situation, largely due to structural genomics initiatives, has been stimulating the development of automated structure-based function prediction methods. Knowledge of residues important for function – and more particularly – for binding can help automated prediction of function in different ways. The properties of a binding site such as its shape or amino acid composition can provide clues on the ligand that may bind to it. Also, having information on functionally important regions in similar proteins can refine the process of annotation transfer between homologues.
Experimental results indicate that functional residues often have an unfavourable contribution to the stability of the folded state of a protein. This observation is the underlying principle of several computational methods for predicting the location of functional sites in protein structures. These methods search protein structures for destabilising residues, with the assumption that these are likely to be important for function.
We have developed a method to detect clusters of destabilising residues which are in close spatial proximity within a protein structure. Individual residue contributions to protein stability are evaluated using detailed atomic models and an energy function based on fundamental physico-chemical principles.
Our overall aim in this work was to evaluate the overlap between these clusters of destabilising residues and known binding sites in proteins.
Unfortunately, reliable benchmark datasets of known binding sites in proteins are sorely lacking. Therefore, we have undertaken a comprehensive approach to define binding sites unambiguously from structural data. We have rigorously identified seven issues which should be considered when constructing datasets of binding sites to validate prediction methods, and we present the construction of two new datasets in which these problems are handled. In this regard, our work constitute a major improvement over previous studies in the field.
Our first dataset consists of 70 proteins with binding sites for diverse types of ligands (e.g. nucleic acids, metal ions) and was constructed using all available data, including literature curation. The second dataset contains 192 proteins with binding sites for small ligands and polysaccharides, does not require literature curation, and can therefore be automatically updated.
We have used our dataset of 70 proteins to evaluate the overlap between destabilising regions and binding sites (the second dataset of 192 proteins was not used for that evaluation as it constitutes a later improvement). The overlap is on average limited but significantly larger than random. The extent of the overlap varies with the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for nucleic acid-binding sites. These differences are rationalised in terms of the geometry and energetics of the binding sites.
Although destabilising regions, as detected in this work, can in general not be used to predict all types of binding sites in protein structures, they can provide useful information, particularly on the location of binding sites for polysaccharides and small ligands.
In addition, our datasets of binding sites in proteins should help other researchers to derive and validate new function prediction methods. We also hope that the criteria which we use to define binding sites may be useful in setting future standards in other analyses.
Identifer | oai:union.ndltd.org:BICfB/oai:ulb.ac.be:ETDULB:ULBetd-09252007-151554 |
Date | 20 September 2007 |
Creators | Dessailly, Benoit H |
Contributors | Andrew Martin, Erik Goormaghtigh, Jacques van Helden, Shoshana Wodak, Marc Colet, Josiane Roscam-Szpirer, Jacques Urbain |
Publisher | Universite Libre de Bruxelles |
Source Sets | Bibliothèque interuniversitaire de la Communauté française de Belgique |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://theses.ulb.ac.be/ETD-db/collection/available/ULBetd-09252007-151554/ |
Rights | unrestricted, J'accepte que le texte de la thèse (ci-après l'oeuvre), sous réserve des parties couvertes par la confidentialité, soit publié dans le recueil électronique des thèses ULB. A cette fin, je donne licence à ULB : - le droit de fixer et de reproduire l'oeuvre sur support électronique : logiciel ETD/db - le droit de communiquer l'oeuvre au public Cette licence, gratuite et non exclusive, est valable pour toute la durée de la propriété littéraire et artistique, y compris ses éventuelles prolongations, et pour le monde entier. Je conserve tous les autres droits pour la reproduction et la communication de la thèse, ainsi que le droit de l'utiliser dans de futurs travaux. Je certifie avoir obtenu, conformément à la législation sur le droit d'auteur et aux exigences du droit à l'image, toutes les autorisations nécessaires à la reproduction dans ma thèse d'images, de textes, et/ou de toute oeuvre protégés par le droit d'auteur, et avoir obtenu les autorisations nécessaires à leur communication à des tiers. Au cas où un tiers est titulaire d'un droit de propriété intellectuelle sur tout ou partie de ma thèse, je certifie avoir obtenu son autorisation écrite pour l'exercice des droits mentionnés ci-dessus. |
Page generated in 0.0028 seconds