Global ETD Search

1	Applications d'un alphabet structural pour l'analyse, la prédiction et la reconnaissance des repliements des protéines / Applications of a structural alphabet for protein structure analysis, prediction and fold recognition Mahajan, Swapnil 29 October 2013 (has links) Les blocs protéiques (BP) constituent un alphabet structural qui permettent une bonne approximation du squelette carbonnée des protéines et la compression de l'information 3D en 1D. Leur utilisation a permis d'appréhender sous un nouvel angle la structure des protéines. Cette thèse explore de nouvelles applications des BP pour l'analyse des structures des protéines, leur prédiction et la reconnaissance de leurs repliements. Dans un premier temps, nous utilisons les BP pour une caractérisation fine des régions variables dans les alignements structuraux de protéines homologues. Ces régions peuvent néanmoins présenter des similarités importantes en terme de conformation. Leur caractérisation a permis de les distinguer des régions dont les conformations sont différentes. Nous montrons aussi que les variations intrinsèques de certaines régions comme les boucles au sein d’une protéine ne sont pas corrélées aux différences de conformation observées dans les régions équivalentes entre protéines homologues. Dans une deuxième partie, nous analysons la relation séquence-structure à l'aide de BP par le biais d'une base de données de pentapeptides issus des structures des protéines. Celle-ci a servi de base pour la mise en place d'outils pour la prédiction du squelette carbonnée des protéines (PB-kPRED) et de sa plasticité (PB-SVindex). Nous exposons comment ces prédictions permettent la reconnaissance du repliement des protéines avec un certain succès et l'identification de probables points chauds structuraux et fonctionnels. En dernière partie, nous présentons un nouvel algorithme (FoRSA) pour la reconnaissance du repliement des protéines à l'aide des BP. Cet algorithme s'appuie sur le calcul de la probabilité conditionnelle qu'une séquence adopte un repliement donné et a été testé avec succès sur des protéines tirées de CASP10. Nous montrons que FoRSA peut être utilisé pour l'annotation structurale rapide de génomes entiers. / Analysis of protein structures using structural alphabets has provided new insights into protein function and evolution. We have used a structural alphabet called proteins blocks (PBs) which efficiently approximates protein backbone and allows abstraction of 3D protein structures into 1D PB sequences. This thesis describes applications of PBs for protein structure analysis, prediction and fold recognition. First, PBs were used to provide a refined view of structurally variable regions (SVRs) in homologous proteins in terms of conformationally similar and dissimilar SVRs in which were compiled a database of structural alignments (DoSA). We also show that the inherent conformational variations in loop regions are not correlated to corresponding conformational differences in their homologues. Second, to further analyze sequence-structure relationships in terms of PBs and other structural features, we have set up a database of pentapeptides derived from protein structures. This served as a basis for the knowledge-based prediction of local protein structure in terms of PB sequences (PB-kPRED) and of local structure plasticity (PB-SVindex). We demonstrate the successful applications of PB-kPRED for fold recognition and explored possible identification of structural and functional hotspots in proteins using PB-SVindex. Finally, an algorithm for fold recognition using a structural alphabet (FoRSA) based on calculation of conditional probability of sequence-structure compatibility was developed. This new threading method has been successfully benchmarked on a test dataset from CASP10 targets. We further demonstrate the application of FoRSA for fast structural annotations of genomes. Structures des protéines Alphabet structural Blocs protéiques Pentapeptides Prédiction des structures Reconnaissance du repliement Variabilité structurale Relation séquence-structure-fonctions Protein structures Structural alphabet Proteins blocks Pentapeptides
2	Développement d'un alphabet structural intégrant la flexibilité des structures protéiques / Development of a structural alphabet integrating the flexibility of protein structures Sekhi, Ikram 29 January 2018 (has links) L’objectif de cette thèse est de proposer un Alphabet Structural (AS) permettant une caractérisation fine et précise des structures tridimensionnelles (3D) des protéines, à l’aide des chaînes de Markov cachées (HMM) qui permettent de prendre en compte la logique issue de l’enchaînement des fragments structuraux en intégrant l’augmentation des conformations 3D des structures protéiques désormais disponibles dans la banque de données de la Protein Data Bank (PDB). Nous proposons dans cette thèse un nouvel alphabet, améliorant l’alphabet structural HMM-SA27,appelé SAFlex (Structural Alphabet Flexibility), dans le but de prendre en compte l’incertitude des données (données manquantes dans les fichiers PDB) et la redondance des structures protéiques. Le nouvel alphabet structural SAFlex obtenu propose donc un nouveau modèle d’encodage rigoureux et robuste. Cet encodage permet de prendre en compte l’incertitude des données en proposant trois options d’encodages : le Maximum a posteriori (MAP), la distribution marginale a posteriori (POST)et le nombre effectif de lettres à chaque position donnée (NEFF). SAFlex fournit également un encodage consensus à partir de différentes réplications (chaînes multiples, monomères et homomères) d’une même protéine. Il permet ainsi la détection de la variabilité structurale entre celles-ci. Les avancées méthodologiques ainsi que l’obtention de l’alphabet SAFlex constituent les contributions principales de ce travail de thèse. Nous présentons aussi le nouveau parser de la PDB (SAFlex-PDB) et nous démontrons que notre parser a un intérêt aussi bien sur le plan qualitatif (détection de diverses erreurs)que quantitatif (rapidité et parallélisation) en le comparant avec deux autres parsers très connus dans le domaine (Biopython et BioJava). Nous proposons également à la communauté scientifique un site web mettant en ligne ce nouvel alphabet structural SAFlex. Ce site web représente la contribution concrète de cette thèse alors que le parser SAFlex-PDB représente une contribution importante pour le fonctionnement du site web proposé. Cette caractérisation précise des conformations 3D et la prise en compte de la redondance des informations 3D disponibles, fournies par SAFlex, a en effet un impact très important pour la modélisation de la conformation et de la variabilité des structures 3D, des boucles protéiques et des régions d’interface avec différents partenaires, impliqués dans la fonction des protéines / The purpose of this PhD is to provide a Structural Alphabet (SA) for more accurate characterization of protein three-dimensional (3D) structures as well as integrating the increasing protein 3D structure information currently available in the Protein Data Bank (PDB). The SA also takes into consideration the logic behind the structural fragments sequence by using the hidden Markov Model (HMM). In this PhD, we describe a new structural alphabet, improving the existing HMM-SA27 structural alphabet, called SAFlex (Structural Alphabet Flexibility), in order to take into account the uncertainty of data (missing data in PDB files) and the redundancy of protein structures. The new SAFlex structural alphabet obtained therefore offers a new, rigorous and robust encoding model. This encoding takes into account the encoding uncertainty by providing three encoding options: the maximum a posteriori (MAP), the marginal posterior distribution (POST), and the effective number of letters at each given position (NEFF). SAFlex also provides and builds a consensus encoding from different replicates (multiple chains, monomers and several homomers) of a single protein. It thus allows the detection of structural variability between different chains. The methodological advances and the achievement of the SAFlex alphabet are the main contributions of this PhD. We also present the new PDB parser(SAFlex-PDB) and we demonstrate that our parser is therefore interesting both qualitative (detection of various errors) and quantitative terms (program optimization and parallelization) by comparing it with two other parsers well-known in the area of Bioinformatics (Biopython and BioJava). The SAFlex structural alphabet is being made available to the scientific community by providing a website. The SAFlex web server represents the concrete contribution of this PhD while the SAFlex-PDB parser represents an important contribution to the proper function of the proposed website. Here, we describe the functions and the interfaces of the SAFlex web server. The SAFlex can be used in various fashions for a protein tertiary structure of a given PDB format file; it can be used for encoding the 3D structure, identifying and predicting missing data. Hence, it is the only alphabet able to encode and predict the missing data in a 3D protein structure to date. Finally, these improvements; are promising to explore increasing protein redundancy data and obtain useful quantification of their flexibility Structure 3D d'une protéine Alphabet Structural (AS) Chaînes de Markov cachées (HMM) Parser PDB Encodage structural Maximum a Posteriori (MAP) Marginal posterior distribution (POST) Entropie d’encodage Protein three-dimensional structure Structural Alphabet (SA) Hidden Markov Model (HMM) PDB (Protein Data Bank) PDB File Format PDB file parser Structural protein encoding Maximum a Posteriori (MAP) Marginal posterior distribution (POST) Encoding entropy

Search results

Applications d'un alphabet structural pour l'analyse, la prédiction et la reconnaissance des repliements des protéines / Applications of a structural alphabet for protein structure analysis, prediction and fold recognition

Développement d'un alphabet structural intégrant la flexibilité des structures protéiques / Development of a structural alphabet integrating the flexibility of protein structures