Global ETD Search

391	Characterization of solutecarrier SLC38A6 Al-walai, Somar January 2012 (has links) Transport across the membrane of a cell is of crucial importance for cellular functions. The solute carrier family,SLC38 is a family of membrane proteins that transports various substances through the membrane and thusperforms many physiologically important functions, for example, transport of glutamine from astrocyte toneurons in the central nervous system. In this paper, we demonstrate that one of the transporters in this familynamed SLC38A6 forms several protein complexes with a variety of proteins in the membrane and in synapticvesicles, suggesting that SLC38A6 is involved in the synaptic release of neurotransmitters in synapses. Weperformed sensitive protein interaction analysis between the protein of interest and a variety of proteinsexpressed at different sites in the neuronal cell. We showed that SLC38A6 interacts with proteins in the cellmembrane as well as in the membrane of synaptic vesicles. The current theory is that SLC38A6 interact withthese proteins when the synaptic vesicles are in close proximity with the cell membrane during the release of theneurotransmitters. Solute carriers SLC38A6 PLA proximity ligation assay Department of Neuroscience Functional Pharmacology Uppsala University Research training forskningspraktik civilingenjör molekylär bioteknik membrane transport membrane proteins synaptic vesicles protein-protein interactions ImageTool Somar Al-walai
392	Systems-Level Modelling And Simulation Of Mycobacterium Tuberculosis : Insights For Drug Discovery Raman, Karthik 10 1900 (has links) Systems biology adopts an integrated approach to study and understand the function of biological systems, particularly, the response of such systems to perturbations, such as the inhibition of a reaction in a pathway, or the administration of a drug. The complexity and large scale of biological systems make modelling and simulation an essential and critical part of systems-level studies. Systems-level modelling of pathogenic organisms has the potential to significantly enhance drug discovery programmes. In this thesis, we show how systems--level models can positively impact anti-tubercular drug target identification. Mycobacterium tuberculosis, the principal aetiological agent of tuberculosis in humans, is estimated to cause two million deaths every year. The existing drugs, although of immense value in controlling the disease to some extent, have several shortcomings, the most important of them being the emergence of drug resistance rendering even the front-line drugs inactive. As drug discovery efforts are increasingly becoming rational, focussing at a molecular level, the identification of appropriate targets becomes a fundamental pre-requisite. We have constructed many system-level models, to identify drug targets for tuberculosis. We construct a constraint-based stoichiometric model of mycolic acid biosynthesis, and simulate it using flux balance analysis, to identify critical points in mycobacterial metabolism for targeting drugs. We then analyse protein--protein functional linkage networks to identify influential hubs, which can be targeted to disrupt bacterial metabolism. An important aspect of tuberculosis is the emergence of drug resistance. A network analysis of potential information pathways in the cell helps to identify important proteins as co-targets, targeting which could counter the emergence of resistance. We integrate analyses of metabolism, protein--protein interactions and protein structures to develop a generic drug target identification pipeline, for identifying most suitable drug targets. Finally, we model the interplay between the pathogen and the human immune system, using Boolean networks, to elucidate critical factors influencing the outcome of infection. The strategies described can be applied to understand various pathogens and can impact many drug discovery programmes. Pathway Modelling Drug Discovery Target Identification Systems Biology Tuberculosis Biological Networks Mycolic Acid Pathway Protein-Protein Interactions Drug Resistance TargetTB Mtb Mycolic Acid Biosynthesis Pathway (MAP) Pathway–pathway Networks Systems Biology
393	Nouvelles méthodes de calcul pour la prédiction des interactions protéine-protéine au niveau structural / Novel computational methods to predict protein-protein interactions on the structural level Popov, Petr 28 January 2015 (has links) Le docking moléculaire est une méthode permettant de prédire l'orientation d'une molécule donnée relativement à une autre lorsque celles-ci forment un complexe. Le premier algorithme de docking moléculaire a vu jour en 1990 afin de trouver de nouveaux candidats face à la protéase du VIH-1. Depuis, l'utilisation de protocoles de docking est devenue une pratique standard dans le domaine de la conception de nouveaux médicaments. Typiquement, un protocole de docking comporte plusieurs phases. Il requiert l'échantillonnage exhaustif du site d'interaction où les éléments impliqués sont considérées rigides. Des algorithmes de clustering sont utilisés afin de regrouper les candidats à l'appariement similaires. Des méthodes d'affinage sont appliquées pour prendre en compte la flexibilité au sein complexe moléculaire et afin d'éliminer de possibles artefacts de docking. Enfin, des algorithmes d'évaluation sont utilisés pour sélectionner les meilleurs candidats pour le docking. Cette thèse présente de nouveaux algorithmes de protocoles de docking qui facilitent la prédiction des structures de complexes protéinaires, une des cibles les plus importantes parmi les cibles visées par les méthodes de conception de médicaments. Une première contribution concerne l‘algorithme Docktrina qui permet de prédire les conformations de trimères protéinaires triangulaires. Celui-ci prend en entrée des prédictions de contacts paire-à-paire à partir d'hypothèse de corps rigides. Ensuite toutes les combinaisons possibles de paires de monomères sont évalués à l'aide d'un test de distance RMSD efficace. Cette méthode à la fois rapide et efficace améliore l'état de l'art sur les protéines trimères. Deuxièmement, nous présentons RigidRMSD une librairie C++ qui évalue en temps constant les distances RMSD entre conformations moléculaires correspondant à des transformations rigides. Cette librairie est en pratique utile lors du clustering de positions de docking, conduisant à des temps de calcul améliorés d'un facteur dix, comparé aux temps de calcul des algorithmes standards. Une troisième contribution concerne KSENIA, une fonction d'évaluation à base de connaissance pour l'étude des interactions protéine-protéine. Le problème de la reconstruction de fonction d'évaluation est alors formulé et résolu comme un problème d'optimisation convexe. Quatrièmement, CARBON, un nouvel algorithme pour l'affinage des candidats au docking basés sur des modèles corps-rigides est proposé. Le problème d'optimisation de corps-rigides est vu comme le calcul de trajectoires quasi-statiques de corps rigides influencés par la fonction énergie. CARBON fonctionne aussi bien avec un champ de force classique qu'avec une fonction d'évaluation à base de connaissance. CARBON est aussi utile pour l'affinage de complexes moléculaires qui comportent des clashes stériques modérés à importants. Finalement, une nouvelle méthode permet d'estimer les capacités de prédiction des fonctions d'évaluation. Celle-ci permet d‘évaluer de façon rigoureuse la performance de la fonction d'évaluation concernée sur des benchmarks de complexes moléculaires. La méthode manipule la distribution des scores attribués et non pas directement les scores de conformations particulières, ce qui la rend avantageuse au regard des critères standard basés sur le score le plus élevé. Les méthodes décrites au sein de la thèse sont testées et validées sur différents benchmarks protéines-protéines. Les algorithmes implémentés ont été utilisés avec succès pour la compétition CAPRI concernant la prédiction de complexes protéine-protéine. La méthodologie développée peut facilement être adaptée pour de la reconnaissance d'autres types d'interactions moléculaires impliquant par exemple des ligands, de l'ARN… Les implémentations en C++ des différents algorithmes présentés seront mises à disposition comme SAMSON Elements de la plateforme logicielle SAMSON sur http://www.samson-connect.net ou sur http://nano-d.inrialpes.fr/software. / Molecular docking is a method that predicts orientation of one molecule with respect to another one when forming a complex. The first computational method of molecular docking was applied to find new candidates against HIV-1 protease in 1990. Since then, using of docking pipelines has become a standard practice in drug discovery. Typically, a docking protocol comprises different phases. The exhaustive sampling of the binding site upon rigid-body approximation of the docking subunits is required. Clustering algorithms are used to group similar binding candidates. Refinement methods are applied to take into account flexibility of the molecular complex and to eliminate possible docking artefacts. Finally, scoring algorithms are employed to select the best binding candidates. The current thesis presents novel algorithms of docking protocols that facilitate structure prediction of protein complexes, which belong to one of the most important target classes in the structure-based drug design. First, DockTrina - a new algorithm to predict conformations of triangular protein trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) is presented. The method takes as input pair-wise contact predictions from a rigid-body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation (RMSD) test. Being fast and efficient, DockTrina outperforms state-of-the-art computational methods dedicated to predict structure of protein oligomers on the collected benchmark of protein trimers. Second, RigidRMSD - a C++ library that in constant time computes RMSDs between molecular poses corresponding to rigid-body transformations is presented. The library is practically useful for clustering docking poses, resulting in ten times speed up compared to standard RMSD-based clustering algorithms. Third, KSENIA - a novel knowledge-based scoring function for protein-protein interactions is developed. The problem of scoring function reconstruction is formulated and solved as a convex optimization problem. As a result, KSENIA is a smooth function and, thus, is suitable for the gradient-base refinement of molecular structures. Remarkably, it is shown that native interfaces of protein complexes provide sufficient information to reconstruct a well-discriminative scoring function. Fourth, CARBON - a new algorithm for the rigid-body refinement of docking candidates is proposed. The rigid-body optimization problem is viewed as the calculation of quasi-static trajectories of rigid bodies influenced by the energy function. To circumvent the typical problem of incorrect stepsizes for rotation and translation movements of molecular complexes, the concept of controlled advancement is introduced. CARBON works well both in combination with a classical force-field and a knowledge-based scoring function. CARBON is also suitable for refinement of molecular complexes with moderate and large steric clashes between its subunits. Finally, a novel method to evaluate prediction capability of scoring functions is introduced. It allows to rigorously assess the performance of the scoring function of interest on benchmarks of molecular complexes. The method manipulates with the score distributions rather than with scores of particular conformations, which makes it advantageous compared to the standard hit-rate criteria. The methods described in the thesis are tested and validated on various protein-protein benchmarks. The implemented algorithms are successfully used in the CAPRI contest for structure prediction of protein-protein complexes. The developed methodology can be easily adapted to the recognition of other types of molecular interactions, involving ligands, polysaccharides, RNAs, etc. The C++ versions of the presented algorithms will be made available as SAMSON Elements for the SAMSON software platform at http://www.samson-connect.net or at http://nano-d.inrialpes.fr/software. Interactions protéine-protéine Docking moléculaire Scoring fonction Minimisation de corps rigide Optimisation convexe Root écart quadratique moyen Protein-protein interactions Molecular docking Scoring function Rigid-body minimization Convex optimization Root mean square deviation 510 004
394	Χρήση ευφυών αλγοριθμικών τεχνικών για επεξεργασία πρωτεϊνικών δεδομένων Θεοφιλάτος, Κωνσταντίνος 10 June 2014 (has links) H παρούσα διατριβή εκπονήθηκε στο Εργαστήριο Αναγνώρισης Προτύπων, του Τμήματος Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής του Πανεπιστημίου Πατρών. Αποτελεί μέρος της ευρύτερης ερευνητικής δραστηριότητας του Εργαστηρίου στον τομέα του σχεδιασμού και της εφαρμογής των τεχνολογιών Υπολογιστικής Νοημοσύνης στην ανάλυση βιολογικών δεδομένων. Η διδακτορική αυτή διατριβή χρηματοδοτήθηκε από το πρόγραμμα Ηράκλειτος ΙΙ. Ο τομέας της πρωτεωμικής είναι ένα σχετικά καινούργιο και γρήγορα αναπτυσσόμενο ερευνητικό πεδίο. Μια από τις μεγαλύτερες προκλήσεις στον τομέα της πρωτεωμικής είναι η αναδόμηση του πλήρους πρωτεϊνικού αλληλεπιδραστικού δικτύου μέσα στα κύτταρα. Εξαιτίας του γεγονότος, ότι οι πρωτεϊνικές αλληλεπιδράσεις παίζουν πολύ σημαντικό ρόλο στις βασικές λειτουργίες ενός κυττάρου, η ανάλυση αυτών των δικτύων μπορεί να αποκαλύψει τον ρόλο αυτών των αλληλεπιδράσεων στις ασθένειες καθώς και τον τρόπο με τον οποίο οι τελευταίες αναπτύσσονται. Παρόλα αυτά, είναι αρκετά δύσκολο να καταγραφούν και να μελετηθούν οι πρωτεϊνικές αλληλεπιδράσεις ενός οργανισμού, καθώς το πρωτέωμα διαφοροποιείται από κύτταρο σε κύτταρο και αλλάζει συνεχώς μέσα από τις βιοχημικές του αλληλεπιδράσεις με το γονιδίωμα και το περιβάλλον. Ένας οργανισμός έχει ριζικά διαφορετική πρωτεϊνική έκφραση στα διάφορα σημεία του σώματός του, σε διαφορετικά στάδια του κύκλου ζωής του και υπό διαφορετικές περιβαλλοντικές συνθήκες. Δημιουργούνται, λοιπόν, δύο πάρα πολύ σημαντικοί τομείς έρευνας, που είναι, πρώτον, η εύρεση των πραγματικών πρωτεϊνικών αλληλεπιδράσεων ενός οργανισμού που θα συνθέσουν το πρωτεϊνικό δίκτυο αλληλεπιδράσεων και, δεύτερον, η περαιτέρω ανάλυση του πρωτεϊνικού δικτύου για εξόρυξη πληροφορίας (εύρεση πρωτεϊνικών συμπλεγμάτων, καθορισμός λειτουργίας πρωτεϊνών κτλ). Στην παρούσα διδακτορική διατριβή παρουσιάζονται καινοτόμες αλγοριθμικές τεχνικές Υπολογιστικής Νοημοσύνης για την πρόβλεψη πρωτεϊνικών αλληλεπιδράσεων, τον υπολογισμό ενός βαθμού εμπιστοσύνης για κάθε προβλεφθείσα αλληλεπίδραση, την πρόβλεψη πρωτεϊνικών συμπλόκων από δίκτυα πρωτεϊνικών αλληλεπιδράσεων και την πρόβλεψη της λειτουργίας πρωτεϊνών. Συγκεκριμένα, στο κομμάτι της πρόβλεψης και βαθμολόγησης πρωτεϊνικών αλληλεπιδράσεων αναπτύχθηκε μια πληθώρα καινοτόμων τεχνικών ταξινόμησης. Αυτές κυμαίνονται από υβριδικούς συνδυασμούς μετα-ευρετικών μεθόδων και ταξινομητών μηχανικής μάθησης, μέχρι μεθόδους γενετικού προγραμματισμού και υβριδικές μεθοδολογίες ασαφών συστημάτων. Στο κομμάτι της πρόβλεψης πρωτεϊνικών συμπλόκων υλοποιήθηκαν δύο βασικές καινοτόμες μεθοδολογίες μη επιβλεπόμενης μάθησης, οι οποίες θεωρητικά και πειραματικά ξεπερνούν τα μειονεκτήματα των υπαρχόντων αλγορίθμων. Για τις περισσότερες από αυτές τις υλοποιηθείσες μεθοδολογίες υλοποιήθηκαν φιλικές προς τον χρήστη διεπαφές. Οι περισσότερες από αυτές τις μεθοδολογίες μπορούν να χρησιμοποιηθούν και σε άλλους τομείς. Αυτό πραγματοποιήθηκε με μεγάλη επιτυχία σε προβλήματα βιοπληροφορικής όπως η πρόβλεψη microRNA γονιδίων και mRNA στόχων τους και η μοντελοποίηση - πρόβλεψη οικονομικών χρονοσειρών. Πειραματικά, η μελέτη αρχικά επικεντρώθηκε στον οργανισμό της ζύμης (Saccharomyces cerevisiae), έτσι ώστε να αξιολογηθούν οι αλγόριθμοι, που υλοποιήθηκαν και να συγκριθούν με τις υπάρχουσες αλγοριθμικές μεθοδολογίες. Στη συνέχεια, δόθηκε ιδιαίτερη έμφαση στις πρωτεΐνες του ανθρώπινου οργανισμού. Συγκεκριμένα, οι καλύτερες αλγοριθμικές τεχνικές για την ανάλυση δεδομένων πρωτεϊνικών αλληλεπιδράσεων εφαρμόστηκαν σε ένα σύνολο δεδομένων που δημιουργήθηκε για τον ανθρώπινο οργανισμό. Αυτό είχε σαν αποτέλεσμα την δημιουργία ενός πλήρους, σταθμισμένου δικτύου πρωτεϊνικών αλληλεπιδράσεων για τον άνθρωπο και την εξαγωγή των πρωτεϊνικών συμπλόκων, που υπάρχουν σε αυτό καθώς και τον λειτουργικό χαρακτηρισμό πολλών αχαρακτήριστων πρωτεϊνών. Τα αποτελέσματα της ανάλυσης των δεδομένων πρωτεϊνικών αλληλεπιδράσεων για τον άνθρωπο είναι διαθέσιμα μέσω μίας διαδικτυακής βάσης γνώσης HINT-KB (http://hintkb.ceid.upatras.gr), που υλοποιήθηκε στα πλαίσια αυτής της διδακτορικής διατριβής. Σε αυτή την βάση γνώσης ενσωματώνεται, από διάφορες πηγές, ακολουθιακή, δομική και λειτουργική πληροφορία για ένα τεράστιο πλήθος ζευγών πρωτεϊνών του ανθρώπινου οργανισμού. Επίσης, οι χρήστες μπορούν να έχουν προσβαση στις προβλεφθείσες πρωτεϊνικές αλληλεπιδράσεις και στον βαθμό εμπιστοσύνης τους. Τέλος, παρέχονται εργαλεία οπτικοποίησης του δικτύου πρωτεϊνικών αλληλεπιδράσεων, αλλά και εργαλεία ανάκτησης των πρωτεϊνικών συμπλόκων που υπάρχουν σε αυτό και της λειτουργίας πρωτεϊνών και συμπλόκων. Το προβλήματα με τα οποία καταπιάνεται η παρούσα διδακτορική διατριβή έχουν σημαντικό ερευνητικό ενδιαφέρον, όπως τεκμηριώνεται και από την παρατιθέμενη στη διατριβή εκτενή βιβλιογραφία. Μάλιστα, βασικός στόχος είναι οι παρεχόμενοι αλγόριθμοι και υπολογιστικά εργαλεία να αποτελέσουν ένα οπλοστάσιο στα χέρια των βιοπληροφορικάριων για την επίτευξη της κατανόησης των κυτταρικών λειτουργιών και την χρησιμοποίηση αυτής της γνώσης για γονιδιακή θεραπεία διαφόρων πολύπλοκων πολυπαραγοντικών ασθενειών όπως ο καρκίνος. Τα σημαντικόταρα επιτεύγματα της παρούσας διατριβής μπορούν να συνοψισθούν στα ακόλουθα σημεία: • Παροχή ολοκληρωμένης υπολογιστικής διαδικασίας ανάλυσης δεδομένων πρωτεϊνικών αλληλεπιδράσεων • Σχεδιασμός και υλοποίηση ευφυών τεχνικών πρόβλεψης και βαθμολόγησης πρωτεϊνικών αλληλεπιδράσεων, που θα παρέχουν αποδοτικά και ερμηνεύσιμα μοντέλα πρόβλεψης. • Σχεδιασμός και υλοποίηση αποδοτικών αλγορίθμων μη επιβλεπόμενης μάθησης για την εξόρυξη πρωτεϊνικών συμπλόκων από δίκτυα πρωτεϊνικών αλληλλεπιδράσεων. • Δημιουργία μιας βάσης γνώσης που θα παρέχει στην επιστημονική κοινότητα όλα τα ευρήματα της ανάλυσης των δεδομένων πρωτεϊνικών αλληλεπιδράσεων για τον ανθρώπινο οργανισμό. / The present dissertation was conducted in the Pattern Recognition Laboratory, of the Department of Computer Engineering and Informatics at the University of Patras. It is a part of the wide research activity of the Pattern Recognition Laboratory in the domain of designing, implementing and applying Computational Intelligence technologies for the analysis of biological data. The present dissertation was co-financed by the research program Hrakleitos II. The proteomics domain is a quite new and fast evolving research domain. One of the great challenges in the domain of proteomics is the reconstruction of the complete protein-protein interaction network within the cells. The analysis of these networks is able to uncover the role of protein-protein interactions in diseases as well as their developmental procedure, as protein-protein interactions play very important roles in the basic cellular functions. However, this is very hard to be accomplished as protein-protein interactions and the whole proteome is differentiated among cells and it constantly changes through the biochemical cellular and environment interactions. An organism has radically different protein expression in different tissues, in different phases of his life and under varying environmental conditions. Two very important domains of research are created. First, the identification of the real protein-protein interactions within an organism which will compose its protein interaction network. Second, the analysis of the protein interaction network to extract knowledge (search for protein complexes, uncovering of proteins functionality e.tc.) In the present dissertation novel algorithmic Computational Intelligent techniques are presented for the prediction of protein-protein interactions, the prediction of a confidence score for each predicted protein-protein interaction, the prediction of protein complexes and the prediction of proteins functionality. In particular, in the task of predicting and scoring protein-protein interactions, a wide range of novel classification techniques was designed and developed. These techniques range from hybrid combinations of meta-heuristic methods and machine learning classifiers, to genetic programming methods and fuzzy systems. For the task of predicting protein complexes, two novel unsupervised methods were designed and developed which theoretically and experimentally surpassed the limitations of existing methodologies. For most of the designed techniques user friendly interfaces were developed to allow their utilizations by other researchers. Moreover, many of the implemented techniques were successfully applied to other research domaines such as the prediction of microRNAs and their targets and the forecastment of financial time series. The experimental procedure, initially focused on the well studied organism of Yeast (Saccharomyces cerevisiae) to validate the performance of the proposed algorithms and compare them with existing computational methodologies. Then, it focuses on the analysis of protein-protein interaction data from the Human organism. In specific, the best algorithmic techniques, from the ones proposed in the present dissertation, were applied to a human protein-protein interaction dataset. This resulted to the construction of a weighted protein-protein interaction network of high coverage, to the extraction of human protein complexes and to the functional characterization of Human proteins and complexes. The results of the analysis of Human protein-protein interaction data are available in the web knowledge base HINT-KB (http://hintkb.ceid.upatras.gr) which was implemented during this dissertation. In this knowledge base, structural, functional and sequential information from various sources were incorporated for every protein pair. Moreover, HINTKB provide access to the predicted and scored protein-protein interactions and to the predicted protein complexes and their functional characterization. The problems which occupied the present dissertation have very significant research interest as it is proved by the provided wide bibliography. The basic goal is the provided algorithms and tools to contribute in the ultimate goal of systems biology to understand the cellular mechanisms and contribute in the development of genomic therapy of complex diseases such as cancer. The most important achievements of the present dissertation are summarized in the next points: • Providing an integrated computational framework for the analysis of protein-protein interaction data. • Designing and implementing intelligent techniques for predicting and scoring protein-protein interactions in an accurate and interpretable manner. • Designing and implementing effective unsupervised algorithmic techniques for extracting protein complexes and predicting their functionality. • Creating a knowledge base which will provide to the scientific community all the findings of the analysis conducted on the Human protein-protein interaction data. Συστημική βιολογία 572.64 Protein-protein interactions Computational intelligence Biological networks clustering Protein function prediction Evolutionary algorithms Systems biology
395	Étude du réseau d'interactions entre les protéines du Virus de l'Hépatite C Racine, Marie-Eve January 2007 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal VHC HCV Interactions protéine-protéine Protein-protein interactions NS3/4A NS3/4A NS4B NS4B NS5A NS5A BRET BRET Microscopie de fluorescence Fluorescence microscopy Localisation subcellulaire Subcellular localization Co-immunoprécipitation Co-immunoprecipitation Mutagénèse Mutagenesis
396	Computational Analyses Of Proteins Encoded In Genomes Of Pathogenic Organisms : Inferences On Structures, Functions And Interactions Tyagi, Nidhi 11 1900 (has links) (PDF) The availability of completely sequenced genomes for a number of organisms provides an opportunity to understand the molecular basis of physiology, metabolism, regulation and evolution of these organisms. Significant understanding of the complexity of organisms can be obtained from the functional characterization of repertoire of proteins encoded in their genomes. Computational approaches for recognition of function of proteins of unknown function encoded in genomes often rely on ability to detect well characterized homologues. Homology searches based on pair-wise sequence comparisons can reliably detect homologues with sequence identity more than 30%. However, detecting homologues characterized by sequence identity below 30% is difficult using these methods. Distant homology relationship can be established using profiles or position specific scoring matrices, which encapsulate information about structurally and functionally conserved residues. These conserved residues imply high constraints at a particular amino acid residue site due to their involvement in structural stability, enzymatic activity, ligand binding, protein folding or protein–protein interactions. In addition, information on three dimensional structures of proteins also aid in detection of remote homologues, as tertiary structures of proteins are conserved better than the primary structures of proteins. The gross objective of the work reported in this thesis is to employ various sensitive remote homology detection methods to recognize relevant functional information of proteins encoded mainly in pathogenic organisms. Since proteins do not work in isolation in a cell, it has become essential to understand the in vivo context of functions of proteins. For this purpose, it is essential to have an understanding of all molecules that interact with a particular protein. Thus, another major area of bioinformatics has been to integrate protein-protein interaction information to enable better understanding of context of functional events. Protein-protein interaction analysis for host-pathogen can lead to useful insight into mode of pathogenesis and subsequent consequences in host cell. Chapters 2-6 of the thesis discuss the sequence and structural characteristics along with remote evolutionary relationships and functional implications of uncharacterized proteins encoded in genomes of following pathogens: Helicobacter pylori, Plasmodium falciparum and Leishmania donovani. The Chapters 6-8 discuss mainly various sequence, structural and functional aspects of protein kinases encoded in genomes of various prokaryotes and viruses. Chapter 1 discusses background information and literature survey in the areas of homology detection and prediction of protein-protein interactions. The growth of genomic data and need for processing genomic data to infer context of various functional events have been highlighted. Different approaches to recognize functions of proteins (experimental as well as computational) have been discussed. Various experimental and computational approaches to detect/predict protein-protein interactions have been mentioned. Chapter 2 discusses recognition of non-trivial remote homology relationships involving proteins of Helicobacter pylori and their implications for function recognition. H. pylori is microaerophilic, Gram negative bacterial pathogen. It colonizes human gastric mucosa and is a causative agent of gastroduodenal disease. The pathogen infects about 50% of the human population. It can lead to development of Mucosa-associated lymphoid tissue lymphoma. About 10% of the infected population develop gastric or duodenal ulcer and approximately 1% develop gastric cancer. H. pylori has been classified as class I carcinogen by WHO. Pathogen is characterized by type IV secretion system. The complete genomic sequences of three widely studied strains including 26695, J99 and HPAG1 of Helicobacter pylori are available. According to the genome analysis, the number of predicted open reading frames in strain 26695, J99 and HPAG1 are 1590, 1495 and 1536 respectively. Out of predicted H. pylori proteins from 26695, J99 and HPAG1 strains, numbers of proteins with no functional domain assignments in Pfam database (Protein family database) are 453, 357 and 400 respectively. There are proteins in different strains of H. pylori genomes where one part of the protein is associated with at least one protein domain of known function and hence preliminary indication of their functions is available whereas rest of the region is not associated with any function. There are 772, 803 and 790 such segments in proteins from strains 26695, J99 and HPAG1 respectively with at least 45 residues with no functional assignment currently available. Sensitive remote homology detection methods have been employed to establish relationships for 294 amino acid sequences and results have been grouped into 4 categories. Results of homology detection have been further confirmed by studying conservation of amino acid residues which are important for functioning of the proteins concerned. (i) Remote relationship has been established involving protein domain families for which no bonafide member is currently known in H. pylori. For example: DNA binding protein domain (Kor_B) has been assigned to a H. pylori protein at sequence identity of 20%. Study involving secondary structure prediction and conservation of amino acid residues confirms the results of homology detection methods. (ii) Remote relationship has been established involving H. pylori hypothetical proteins and protein domain families, for which paralogous members are present in Helicobacter pylori. For example, Cytochrome_C, an electron transfer protein domain could be associated with a Helicobacter pylori protein sequence which shows a sequence identity of 14% with sequences of bonafide cytochrome C. (iii) “Missing” metabolic proteins of H. pylori have also been recognized. For example, Aspartoacylase (EC 3.5.1.15) catalyzes deacetylation of N-acetylaspartic acid to produce acetate and L-aspartate. This enzyme in aspartate metabolism pathway has not been reported so far from H. pylori. A remote evolutionary relationship between a H. pylori protein and Aspartoacylase domain has been established at sequence identity of 17% thus filling the gap in this metabolic pathway in the pathogen. (iv) New functional assignments for domains in H. pylori sequences with prior assignment of domains for the rest of the sequences have been made. For example, DNA methylase domain has been assigned to C-terminal region of H. pylori protein which already had Helicase domain assigned to the N-terminal region of the protein. All these information should open avenues for further probing by carrying out experiments which will impact the design of inhibitor against this pathogen and will result in better understanding of pathogenesis of this organism in human. Chapter 3 describes prediction of protein–protein interactions between Helicobacter pylori and the human host. A lack of information on protein-protein interactions at the host-pathogen interface is impeding the understanding of the pathogenesis process. A recently developed, homology search-based method to predict protein-protein interactions is applied to the gastric pathogen, Helicobacter pylori to predict the interactions between proteins of H. pylori and human proteins in vitro. Many of the predicted interactions could potentially occur between the pathogen and its human host during pathogenesis as we focused mainly on the H. pylori proteins that have a transmembrane region or are encoded in the pathogenic island and those which are known to be secreted into the human host. By applying the homology search approach to protein-protein interaction databases DIP and iPfam, in vitro interactions for a total of 623 H. pylori proteins with 6559 human proteins could be predicted. The predicted interactions include 549 hypothetical proteins of as yet unknown function encoded in the H. pylori genome and 13 experimentally verified secreted proteins. A total of 833 interactions involving the extracellular domains of transmembrane proteins of H. pylori could be predicted. Structural analysis of some of the examples reveals that the predicted interactions are consistent with the structural compatibility of binding partners. Various probable interactions with discernible biological relevance are discussed in this chapter. For example, interaction between CFTR protein (NP_000483) and multidrug resistance protein (HP1206) has been predicted. The structure of the CFTR intracellular domain is known in the homomeric form and consists of five AAA transport domains in tandem (PDB code 1XMI). Out of the five identical subunits, two subunits (the B chain and the E chain in the PDB structure) have been selected. The structure of multidrug resistance protein of the pathogen based on the B chain (sequence identity 32%) of the template has been modeled. This exercise suggests that interface residues in the model are congenial for interaction. This makes the structural complex feasible in in vitro conditions and suggests that the pathogen protein may compete for occupancy with the host protein. Chapter 4 describes recognition of Plasmodium-specific protein domain families and their roles in Plasmodium falciparum life cycle. Malaria in humans is caused by the parasites of intracellular, eukaryotic protozoan of apicomplexan nature belonging to the genus Plasmodium. Out of five species of Plasmodium, namely, P. falciparum, P. ovale, P. vivax, P. malariae and P. knowlesi which infects human, P. falciparum causes lethal infection. P. falciparum proteins have diverged extensively during the course of evolution. Pathogen genome is rich in A+T composition which larger than the homologous proteins from other organisms due to presence of low complexity regions. Organism specific families are important as they play roles in peculiar life style of an organism. If the organism is a pathogen, then these family members may play roles in pathogenesis. Inhibiting these specific proteins is unlikely to interfere with host system as no homolog may be present in host. In the present work we identify Plasmodium specific protein families and their role in different stages of life cycle of the pathogen. A total of 5086 amino acid sequences (full length sequences/fragments of proteins) show homology only with amino acid sequences from Plasmodium organisms and hence are Plasmodium-specific. These Plasmodium-specific amino acid sequences cluster into 106 Plasmodium-specific families (≥2 members per family). 14 Plasmodium-specific protein domain families with known physico-chemical properties are observed. These Plasmodium-specific protein domain families are involved in various important functions such as rosetting and sequestering of infected erythrocytes, binding to surface of host cell and invasion process in life cycle of pathogen. Also, 89 new Plasmodium-specific protein domain families have been recognized. Analysis of various aspects of members of Plasmodium-specific proteins domain families such as their potential to target apicoplast, protein-protein interaction, expression profile and domain organization has been performed to derive relevant information about function. New Plasmodium specific domain families for which no function can be associated could provide some insight into much diverged Plasmodium species. These proteins may play role in parasite-specific life style. Experimental work on these Plasmodium-specific proteins might fill the gaps of less understood physiology of this parasite. Chapter 5 presents genome-wide compilation of low complexity regions (LCR) in proteins. An indepth analysis of the nature, structure, and functional role of the proteins containing low complexity regions in Plasmodium falciparum, was undertaken given the high prevalence of LCRs in the proteome of this organism. Low complexity regions and repeat patterns have been recognized in proteins encoded in 986 genomes (68 archaea, 896 prokaryotes and 22 eukaryotes). Low complexity regions have been classified into following three categories: a) Composition of LCRs: (i) LCRs can be stretches of homo amino acid residues (ii) LCRs can be stretches of more than one amino acid residue type b) Periodicity of amino acids in LCRs: Certain amino acid residues can be observed at certain specific periodicity in proteins. c) Repeat patterns: Certain motif of amino acid residues are repeated in protein. 850 Plasmodium falciparum proteins are observed to have at least one repeat pattern where the repeating unit is at least 5 amino acid residues long. Statistical analysis on single amino acid residue repeats indicate that occurrence of stretches of homo amino acid residues is not a random event. Studies on recognition of functions, protein protein interactions and organization of tethered domain(s) in proteins containing LCR suggest that these proteins are part of variety of functional events such as signal transduction, enzymatic processes, cell differentiation, pyrimidine biosynthesis, fatty acid biosynthesis and chromosomal replication. Representations of low complexity regions of Plasmodium falciparum in protein data bank suggest that LCRs can take conformation of regular secondary structure (apart from disordered regions) in 3-D structures of proteins. Chapter 6 describes sequence analysis, structural modeling and evolutionary studies of Leishmania donovani hypusine pathway enzymes. Leishmania is an eukaryotic kinetoplastid protozoan parasite which causes leishmaniasis in humans. Hypusine is a non standard polyaminederived amino acid Nε-(4-amino-2-hydroxybutyl) lysine and is named after its two structural components, hydroxyputrescine and lysine. The eukaryotic translation initiation factor 5A (eIF5A) is the only cellular protein containing hypusine. Synthesis of hypusine is critical for the function of elF5A and is essential for eukaryotic cell proliferation and survival. Formation of hypusine is the result of a two step post-translational modification process involving enzymes (i) deoxyhypusine synthase (DHS) (ii) deoxyhypusine hydroxylase (DOHH). DHS, the first enzyme involved in hypusine pathway catalyzes the NAD-dependent transfer of the butylamino moiety of spermidine (substrate) to the ε-amino group of a specific lysine residue of eIF5A precursor and generates deoxyhypusine containing intermediate. DOHH, the second enzyme in same pathway catalyzes the hydroxylation of deoxyhypusine-containing intermediate, generating hypusine-containing mature eIF5A. Two putative deoxyhypusine synthase (DHS) sequences DHS34 and DHS20 have been identified in Leishmania donovani, by Professor Madhubala and coworkers (Jawaharlal Nehru University, New Delhi) with whom the work embodied in this chapter was done in collaboration. Detailed comparison of DHS34 sequence from Leishmania with human DHS protein indicated conservation of functionally important residues. 3D structural modeling studies of protein suggested that residues around the active site were absolutely conserved. NAD binding regions are located spatially closer, however, one NAD binding region was observed in a large (225 amino acid residues long) insertion. Based on these observations, DHS34 was predicted to have enzymatic activity. Experimental studies done by our collaborators confirmed preliminary results of computational analysis. Based on sequence and structural analysis of DHS20 and DOHH proteins, DHS20 and DOHH were proposed to be catalytically inactive and active respectively. Experimental studies on these proteins supported results of computational analysis. Deoxyhypusine synthase (DHS) and Deoxyhypusine hydroxylase (DOHH) are key proteins conserved in the hypusine synthesis pathways of eukaryotes. Because they are highly conserved, they could be coevolving. Comparison of the genetic distance matrices of DHS and DOHH proteins reveals that their evolutionary rates are better correlated when compared to the rate of an unrelated protein such as Cytochrome C. This indicates that they are coevolving, further serving as an indicator that, even non-interacting proteins that are functionally coupled, experience correlated evolution. However, this correlation does not extend to their tree topologies. Chapter 7 provides a classification scheme for protein kinases encoded in genomes of prokaryotic organisms. Overwhelming majority of the Ser/Thr protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Ser/Thr protein kinases prokaryotic Ser/Thr protein kinases. Traditional sequence alignment and phylogenetic approaches have been used to identify and classify prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence databases, it is recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly, subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, it was also possible to identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses. Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity in protein-protein interactions and the signaling pathways in these microbes. Chapter 8 focuses on recognition, classification of protein kinases encoded in genomes of viruses and their implications in various functions and diseases. Protein kinases encoded by viral genomes play a major role in infection, replication and survival of viruses. Using traditional sequence homology detection tools, sequence alignment methods and phylogenetic approaches, protein kinases were recognized. 646123 protein sequences from 35799 viral genomes (including strains) have been used in this analysis. Protein kinases are identified using a combination of profile-based search methods such as PSI-BLAST, RPS-BLAST and HMMER approaches. Based upon sequence similarity over the length of catalytic kinase domains, 479 protein kinase domains recognized in 244 viral genomes have been clustered into 46 subfamilies with minimum sequence identity of 35% within a subfamily. Viral protein kinases are encoded in genomes of retro-transcribing viruses or viruses which possess double stranded DNA as genetic material. Based on the available functional information present for one or more members of a subfamily, a putative function has been assigned to other members of the subfamily. Information regarding interaction of viral protein kinases with viral/host protein has also been considered for enhancing understanding of function of kinases in a subfamily. Out of 46 subfamilies, 14 subfamilies are characterized by various functions. Kinases belonging to UL97, US69, UL13 and BGLF subfamilies are virus specific. For 7 subfamilies, nearest neighbors are from well characterized eukaryotic protein kinase groups such as AGC, CAMK and CDK. Out of 25 new uncharacterized subfamilies observed in this analysis, 13 subfamilies are virus specific. Different subfamilies have been characterized by various functions which are crucial for viral infection such as synthesis of structural unit, replication of genetic material, modification of cellular components, alteration in host immune system, competing with cellular protein for efficient usage of host machinery. Also, many viral kinases share very high sequence identity (~97%) with their eukaryotic counterpart and represent disease state. For example, a protein kinase encoded in Avian erythroblastosis virus shares 97% sequence identity with catalytic domain of human epidermal growth factor receptor tyrosine kinase. Leucine at position 861 in human protein is substituted by Gln in cancer conditions; the viral protein kinase sequence possesses Gln at corresponding position and thus represents disease state. Chapter 9 provides study of dependency on the ability of 3-D structural features of comparative models and crystal structures of inactive forms of enzymes to predict enzymes by considering protein kinases as case study. With the advent of structural genomics initiatives, there is a surge in the number of proteins with 3-D structural information even before functional features are understood on many of these proteins. One of the useful annotations of a protein is the demarcation of a protein into an enzyme or non-enzyme solely from the knowledge of 3-D structure. This is facilitated by the identification of active sites and ligand binding sites in a protein. In this work, which was carried out in collaboration with Dr Jim Warwicker of Manchester University, UK, an approach developed by Warwicker and coworkers has been used. In the 3D structure of proteins, the largest clefts are generally considered to be ligand binding sites. This feature along with other sequence alignment independent properties such as residue preferences, fraction of surface residues and secondary structure elements have been considered to differentiate enzymes from non-enzymes. Electrostatic potential at the active site is one of the key properties utilized in this respect. Active sites in enzymes are generally associated with ionizable groups which can take part in catalysis. In addition to the feature of large clefts in enzymes, active site residues are in buried environments and show larger deviation in pKa values than surface residues. The method proposed by Warwicker and co-workers distinguish proteins in to enzymes and non-enzymes considering the electrostatic features at clefts along with the sequence profile of the protein concerned. Conformation of the inactive state of an enzyme is not congenial to the catalytic function. In an ideal situation, a method should be capable of predicting an enzyme irrespective of whether determined structure corresponds to active or inactive state. Peak potential values have been calculated by using Warwicker program for a set of 15 protein kinases for which 3-D structures are present in active as well in inactive conformations. Comparison of peak potential values calculated for active and inactive conformations suggests that algorithm can differentiate between active and inactive conformations as value for active conformations are generally higher than corresponding values for inactive conformations. However, the peak potential values are high enough for even the inactive conformations to be predicted as enzyme. Peak potential values calculated for generated homology models of protein kinases (for which crystal structures are already available) at different sequence identities with template sequences predict protein kinases as enzymes and their peak potential values are comparable to corresponding values for X-ray structures. This suggests that proteins for which there are no crystal or NMR structures yet available and no good template with high sequence identity are present, peak potential values for models generated at low sequence identity can still give insight into probable function of protein as an enzyme. The enzyme/non-enzyme prediction algorithm was also found to be useful in confirming enzyme functionality using 3-D models of putative viral kinases. Initially, putative function of kinase has been assigned to these viral proteins based solely upon their sequence characteristics such as presence of residues/motifs which are important for activity of the protein. The enzyme recognition method which is not directly sensitive to these motifs confirmed that all the analyzed putative viral kinases are enzymes. Chapter 10 presents conclusions of work embodied in the entire thesis. Very briefly, various computational approaches have been used to analyze and understand structural and functional properties of repertoire of proteins of pathogenic organisms. Analysis of uncharacterized protein domain families has helped to understand the functional implications of constituent proteins. Experimental validation of these results can further facilitate unraveling of functional aspects of proteins encoded in various pathogenic organisms. Apart from studies embodied in the thesis, author has been involved in two other studies, which are provided as appendices. Appendix 1 describes comparison of substitution pattern of amino acid residues of protein encoded in P. falciparum genome with substitution pattern of corresponding homologous proteins from non-Plasmodium organisms. Salient differences have been highlighted. Appendix 2 discusses study of bacterial tyrosine kinases with an objective of recognition of all putative protein tyrosine kinases in E. coli. Computational study suggests that protein SopA can be a potential tyrosine kinase and this conclusion is being tested experimentally in collaborator’s laboratory. Proteins Viral Genomes Microorganisms Microbiology Viral Protein Kinases Protein-protein Interactions Helicobacter Pylori Proteins Plasmodium-Specific Proteins Leishmania donovani Deoxyhypusine Synthase Protein Kinases Plasmodium falciparum Microbiology
397	Nouvelles méthodes de calcul pour la prédiction des interactions protéine-protéine au niveau structural / Novel computational methods to predict protein-protein interactions on the structural level Popov, Petr 28 January 2015 (has links) Le docking moléculaire est une méthode permettant de prédire l'orientation d'une molécule donnée relativement à une autre lorsque celles-ci forment un complexe. Le premier algorithme de docking moléculaire a vu jour en 1990 afin de trouver de nouveaux candidats face à la protéase du VIH-1. Depuis, l'utilisation de protocoles de docking est devenue une pratique standard dans le domaine de la conception de nouveaux médicaments. Typiquement, un protocole de docking comporte plusieurs phases. Il requiert l'échantillonnage exhaustif du site d'interaction où les éléments impliqués sont considérées rigides. Des algorithmes de clustering sont utilisés afin de regrouper les candidats à l'appariement similaires. Des méthodes d'affinage sont appliquées pour prendre en compte la flexibilité au sein complexe moléculaire et afin d'éliminer de possibles artefacts de docking. Enfin, des algorithmes d'évaluation sont utilisés pour sélectionner les meilleurs candidats pour le docking. Cette thèse présente de nouveaux algorithmes de protocoles de docking qui facilitent la prédiction des structures de complexes protéinaires, une des cibles les plus importantes parmi les cibles visées par les méthodes de conception de médicaments. Une première contribution concerne l‘algorithme Docktrina qui permet de prédire les conformations de trimères protéinaires triangulaires. Celui-ci prend en entrée des prédictions de contacts paire-à-paire à partir d'hypothèse de corps rigides. Ensuite toutes les combinaisons possibles de paires de monomères sont évalués à l'aide d'un test de distance RMSD efficace. Cette méthode à la fois rapide et efficace améliore l'état de l'art sur les protéines trimères. Deuxièmement, nous présentons RigidRMSD une librairie C++ qui évalue en temps constant les distances RMSD entre conformations moléculaires correspondant à des transformations rigides. Cette librairie est en pratique utile lors du clustering de positions de docking, conduisant à des temps de calcul améliorés d'un facteur dix, comparé aux temps de calcul des algorithmes standards. Une troisième contribution concerne KSENIA, une fonction d'évaluation à base de connaissance pour l'étude des interactions protéine-protéine. Le problème de la reconstruction de fonction d'évaluation est alors formulé et résolu comme un problème d'optimisation convexe. Quatrièmement, CARBON, un nouvel algorithme pour l'affinage des candidats au docking basés sur des modèles corps-rigides est proposé. Le problème d'optimisation de corps-rigides est vu comme le calcul de trajectoires quasi-statiques de corps rigides influencés par la fonction énergie. CARBON fonctionne aussi bien avec un champ de force classique qu'avec une fonction d'évaluation à base de connaissance. CARBON est aussi utile pour l'affinage de complexes moléculaires qui comportent des clashes stériques modérés à importants. Finalement, une nouvelle méthode permet d'estimer les capacités de prédiction des fonctions d'évaluation. Celle-ci permet d‘évaluer de façon rigoureuse la performance de la fonction d'évaluation concernée sur des benchmarks de complexes moléculaires. La méthode manipule la distribution des scores attribués et non pas directement les scores de conformations particulières, ce qui la rend avantageuse au regard des critères standard basés sur le score le plus élevé. Les méthodes décrites au sein de la thèse sont testées et validées sur différents benchmarks protéines-protéines. Les algorithmes implémentés ont été utilisés avec succès pour la compétition CAPRI concernant la prédiction de complexes protéine-protéine. La méthodologie développée peut facilement être adaptée pour de la reconnaissance d'autres types d'interactions moléculaires impliquant par exemple des ligands, de l'ARN… Les implémentations en C++ des différents algorithmes présentés seront mises à disposition comme SAMSON Elements de la plateforme logicielle SAMSON sur http://www.samson-connect.net ou sur http://nano-d.inrialpes.fr/software. / Molecular docking is a method that predicts orientation of one molecule with respect to another one when forming a complex. The first computational method of molecular docking was applied to find new candidates against HIV-1 protease in 1990. Since then, using of docking pipelines has become a standard practice in drug discovery. Typically, a docking protocol comprises different phases. The exhaustive sampling of the binding site upon rigid-body approximation of the docking subunits is required. Clustering algorithms are used to group similar binding candidates. Refinement methods are applied to take into account flexibility of the molecular complex and to eliminate possible docking artefacts. Finally, scoring algorithms are employed to select the best binding candidates. The current thesis presents novel algorithms of docking protocols that facilitate structure prediction of protein complexes, which belong to one of the most important target classes in the structure-based drug design. First, DockTrina - a new algorithm to predict conformations of triangular protein trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) is presented. The method takes as input pair-wise contact predictions from a rigid-body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation (RMSD) test. Being fast and efficient, DockTrina outperforms state-of-the-art computational methods dedicated to predict structure of protein oligomers on the collected benchmark of protein trimers. Second, RigidRMSD - a C++ library that in constant time computes RMSDs between molecular poses corresponding to rigid-body transformations is presented. The library is practically useful for clustering docking poses, resulting in ten times speed up compared to standard RMSD-based clustering algorithms. Third, KSENIA - a novel knowledge-based scoring function for protein-protein interactions is developed. The problem of scoring function reconstruction is formulated and solved as a convex optimization problem. As a result, KSENIA is a smooth function and, thus, is suitable for the gradient-base refinement of molecular structures. Remarkably, it is shown that native interfaces of protein complexes provide sufficient information to reconstruct a well-discriminative scoring function. Fourth, CARBON - a new algorithm for the rigid-body refinement of docking candidates is proposed. The rigid-body optimization problem is viewed as the calculation of quasi-static trajectories of rigid bodies influenced by the energy function. To circumvent the typical problem of incorrect stepsizes for rotation and translation movements of molecular complexes, the concept of controlled advancement is introduced. CARBON works well both in combination with a classical force-field and a knowledge-based scoring function. CARBON is also suitable for refinement of molecular complexes with moderate and large steric clashes between its subunits. Finally, a novel method to evaluate prediction capability of scoring functions is introduced. It allows to rigorously assess the performance of the scoring function of interest on benchmarks of molecular complexes. The method manipulates with the score distributions rather than with scores of particular conformations, which makes it advantageous compared to the standard hit-rate criteria. The methods described in the thesis are tested and validated on various protein-protein benchmarks. The implemented algorithms are successfully used in the CAPRI contest for structure prediction of protein-protein complexes. The developed methodology can be easily adapted to the recognition of other types of molecular interactions, involving ligands, polysaccharides, RNAs, etc. The C++ versions of the presented algorithms will be made available as SAMSON Elements for the SAMSON software platform at http://www.samson-connect.net or at http://nano-d.inrialpes.fr/software. Interactions protéine-protéine Docking moléculaire Scoring fonction Minimisation de corps rigide Optimisation convexe Root écart quadratique moyen Protein-protein interactions Molecular docking Scoring function Rigid-body minimization Convex optimization Root mean square deviation 510 004
398	New insights into small molecules inhibitors and protein-protein interactions of VirB8 : a critical conserved component of the type IV secretion system Um Nlend, Ingrid 06 1900 (has links) No description available. infections bactériennes pathogènes bactéries à Gram-négatif facteur de virulence virulence médicaments d’antivirulence intéractions protéine-protéine Bacterial infections secretion systems systèmes de sécrétion pathogens Gram-negative bacteria virulence factors antivirulence drugs protein-protein interactions
399	Identificação de interações proteína-proteína envolvendo os produtos dos Loci hrp, vir e rpf do fitopatógeno Xanthomonas axonopodis pv. citri / Identification of protein-protein interactions involving the products of the loci hrp, vir and rpf the phytopathogen Xanthomonas axonopodis pv. citri Marcos Castanheira Alegria 24 September 2004 (has links) O Cancro Cítrico, um dos mais graves problemas fitossanitários da citricultura atual, é uma doença causada pelo fitopatógeno Xanthomonas axonopodis pv. citri (Xac). Um estudo funcional do genoma de Xac foi iniciado com o intuito de identificar interações proteína-proteína envolvidas em processos de patogenicidade de Xac. Através da utilização do sistema duplo-híbrido de levedura, baseado nos domínios de ligação ao DNA e ativação da transcrição do GAL4, nós analisamos os principais componentes dos mecanismos de patogenicidade de Xac, incluindo o Sistema de Secreção do Tipo III (TTSS), Sistema de Secreção do Tipo IV (TFSS) e Sistema de \"Quorum Sensing\" composto pelas proteínas Rpf. Componentes desses sistemas foram utilizados como iscas na triagem de uma biblioteca genômica de Xac. O TTSS é codificado pelos genes denominados hrp (\"hypersensitive response and pathogenicity\"), hrc (\"hrp conserved\") e hpa (\"hrp associated\") localizados no locus hrp do cromossomo de Xac. Esse sistema de secreção é capaz de translocar proteínas efetoras do citoplasma bacteriano para o interior da célula hospedeira. Nossos resultados mostraram novas interações proteínaproteína entre componentes do próprio TTSS além de associações específicas com uma proteína hipotética: 1) HrpG, um regulador de resposta de um sistema de dois componentes responsável pela expressão dos genes hrp, e XAC0095, uma proteína hipotética encontrada apenas em Xanthomonas spp; 2) HpaA, uma proteína secretada pelo TTSS, HpaB e o domínio C-terminal da HrcV; 3) HrpB1, HrpD6 e HrpW, 4) HrpB2 e HrcU e 5) interações homotrópicas envolvendo a ATPase HrcN. Em Xac, foram encontrados dois loci vir que codificam proteínas que possuem similaridade com componentes do TFSS envolvido em processos de conjugação/secreção bacteriana: TFSS-plasmídeo localizado no plasmídeo pXAC64 e TFSS-cromossomo localizado no cromossomo de Xac. O TFSS-plasmídeo, o qual possui maior similaridade com sistemas de conjugação, mostrou interações envolvendo proteínas cujos genes estão localizados na mesma região do plasmídeo pXAC64: 1) interação homotrópica da TrwA; 2) XACb0032 e XACb0033; 3) interações homotrópicas da proteína XACb0035; 4) VirB1 e VirB9; 5) XACb0042 e VirB6; 6) XACb0043 e XACb0021b. O TFSS-cromossomo apresentou interações envolvendo as proteínas: 1) VirD4 e um grupo de 12 proteínas que contém similaridade entre si, incluindo XAC2609 cujo gene encontra-se no locus vir, 2) XAC2609 e XAC2610; 3) Interações homotrópicas da VirB11; 4) XAC2622 e VirB9. A análise do sistema de \"Quorum-Sensing\" composto pelas proteínas Rpf mostrou interações envolvendo componentes do próprio sistema: 1) RpfC e RpfF; 2) RpfC e RpfG; 3) interações homotrópicas da RpfF; 4) RpfC e CmfA, uma proteína similar a Cmf de Dictyostelium discoideum que, neste organismo, é fundamental para processos de \"quorum-sensing\". As interações proteína-proteína encontradas permitiram-nos entender melhor a composição, organização e regulação dos fatores envolvidos na patogenicidade de Xac. / Citrus Canker, caused by the bacterial plant pathogen Xanthomonas axonopodis pv. citri (Xac) presents one of the most serious problems to Brazilian citriculture. We have initiated a project to identify protein-protein interactions involved in pathogenicity of Xac. Using a yeast two-hybrid system based on GAL4 DNA-binding and activation domains, we have focused on identifying interactions involving subunits, regulators and substrates of: Type Three Secretion System (TTSS), Type Four Secretion System (TFSS) and Quorum Sensing/Rpf System. Components of these systems were used as baits to screening a random Xac genomic library. The TTSS is coded by the hrp (hypersensitive response and pathogenicity), hrc (hrp conserved) and hpa (hrp associated) genes in the chromosomal hrp locus. This secretion system can translocate efector proteins from the bacterial cytoplasm into the host cells. We have identified several previously uncharacterized interactions involving: 1) HrpG, a two-component system response regulator responsible for the expression of Xac hrp operons, and XAC0095, a previously uncharacterized protein encountered only in Xanthomonas spp; 2) HpaA, a protein secreted by the TTSS, HpaB and the C-terminal domain HrcV; 3) HrpB1, HrpD6 and HrpW; 4) HrpB2 and HrcU; 5) Homotropic interactions were also identified for the ATPase HrcN. Xac contains two virB gene clusters, one on the chromosome and one on the pXAC64 plasmid, each of which codes for a unique and previously uncharacterized TFSS. Components of the TFSS of pXAC64, which is most similar to conjugation systems, showed interactions involving proteins coded by the same locus: 1) Homotropic interactions of TrwA; 2) XACb0032 and XACb0033; 3) XAC0035 homotropic interactions; 4) VirB1 and VirB9; 5) XACb0042 and VirB6; 6) XACb0043 and XACb0021 b. Components of the chromosomal TFSS exhibited interactions involving: 1) VirD4 and a group of 12 uncharacterized proteins with a common C-terminal domain motif, include XAC2609 whose gene resides within the vir locus; 2) XAC2609 and XAC261 O; 3) Homotropic interactions of VirB11; 4) XAC2622 and VirB9. Analysis of Quorum Sensing/Rpf System components revealed interactions between the principal Rpf proteins which control Xanthomonas quorum sensing: 1) RpfC and RpfF; 2) RpfC and RpfG; 3) RpfF homotropic interactions; 4) RpfC and CmfA, a protein that presents similarity with Cmf (conditioned medium factor) of Dictyostelium discoideum, which contrais quorum sensing in this organism. The protein-protein interactions that we have detected reveal insights into the composition, organization and regulation of these important mechanisms involved in Xanthomonas pathogenicity. Biologia molecular vegetal Fitopatógenos Genomas (Estudo) Genomica funcional Interações proteína-proteína Patogenicidade Proteínas recombinantes Quorum sensing Two-hybrid Xanthomonas (Estudo) Functional Genomics Genomes (Study) Pathogenicity Phytopathogen Plant molecular biology Protein-protein interactions Quorum sensing Recombinant proteins Two-hybrid Xanthomonas (Study)
400	Componentes genéticos que afetam a via de direcionamento de proteínas organelares em Arabidopsis thaliana / Genetic components affecting organelar protein targeting in Arabidopsis thaliana Larissa Spoladore 18 April 2016 (has links) Nos eucariotos, a evolução dos sistemas de transporte molecular foi essencial pois seu alto grau de compartimentalização requer mecanismos com maior especificidade para a localização de proteínas. Com o estabelecimento das mitocôndrias e plastídeos como organelas da célula eucariota, grande parte dos genes específicos para sua atividade e manutenção foram transferidos ao núcleo. Após a transferência gênica, a maioria das proteínas passaram a ser codificadas pelo núcleo, sintetizadas no citosol e direcionadas às organelas por uma maquinaria complexa que envolve receptores nas membranas das organelas, sequências de direcionamento nas proteínas e proteínas citossólicas que auxiliam o transporte. A importação depende em grande parte de uma sequência na região N-terminal das proteínas que contém sinais reconhecidos pelas membranas organelares. No entanto, muito ainda não é compreendido sobre o transporte de proteínas organelares e fatores ainda desconhecidos podem influenciar o direcionamento sub-celular. O objetivo deste trabalho foi a caracterização da General Regulatory Factor 9 (GRF9), uma proteína da família 14-3-3 de Arabidopsis thaliana potencialmente envolvida no direcionamento de proteínas organelares, e a geração de um genótipo para ser utilizado na obtenção de uma população mutante para genes que afetam o direcionamento da proteína Tiamina Monofosfato Sintetase (TH-1). Após experimentos in vivo e in planta, foi observado que GRF9 interage com as proteínas duplo-direcionadas Mercaptopyruvate Sulfurtransferase1 (MST1) e a Thiazole Biosynthetic Enzyme (THI1), e com a proteína direcionada aos cloroplastos TH-1. Experimentos de deleção e interação in vivo mostraram que a região Box1 de GRF9 é essencial para a interação com THI1 e MST1. Com a finalidade de dar continuidade a caracterização da GRF9 e para realização de testes com relação a sua função no direcionamento de proteínas organelares foi gerada uma linhagem homozigota que superexpressa GRF9. Plantas expressando o transgene TH-1 fusionado a Green Fluorescent Protein (GFP) em genótipo deficiente na TH-1 (CS3469/TH-1-GFP) foram obtidas para a geração de população mutante que possibilitará a descoberta de componentes genéticos ainda desconhecidos e responsáveis pelo direcionamento de proteínas aos cloroplastos. / In Eukaryotes, the evolution of molecular transport in the cell was essential due to their increase in compartmentalization, which requires more specific mechanisms for the correct localization of proteins. With the establishment of mitochondria and plastids as organelles, a great number of their genes, either specific for their metabolic functions or maintenance of their own transcription/translation processes, were transferred to the nucleus of the cell. These transfers caused most of the organellar proteins to be coded by the nucleus, then synthesized in the cytosol and targeted to the organelles by a complex machinery which involves membrane receptors in the organelles, targeting sequences in the proteins, and cytosolic proteins which assist them with the transport. Protein import depends greatly on an N-terminal sequence in proteins which has recognizable signals for the organellar membrane receptors. However, much is still not understood about the transport of organellar proteins, and unknown factors may still influence subcellular targeting. The goal of this work was the characterization of General Regulatory Factor 9 (GRF9), a protein of the 14-3-3 family in Arabidopsis thaliana potentially involved in the targeting of organellar proteins, and generating a genotype to be used in obtaining a mutant population for genes affecting the targeting of the protein Thiamine Requiring 1 (TH-1). After in vivo and in planta experiments it was observed that GRF9 interacts with the dual-targeted proteins Mercaptopyruvate Sulfurtransferase1 (MST1) and Thiazole Biosynthetic Enzyme (THI1), and with the chloroplast targeted protein TH-1. Deletion experiments followed by in vivo interaction assays showed that Box 1 region of GRF9 is essential for the interaction with THI1 and MST1. For the continuing characterization of GRF9 and for following tests of its function in the targeting of organellar proteins, a homozygous line was generated overexpressing GRF9. Plants expressing the transgene TH-1 fused to the Green Fluorescent Protein (GFP) in a TH-1 deficient genotype (CS3469/TH-1-GFP) were obtained for the generation of a mutant population which will allow the discovery of genetic components still unknown responsible for targeting proteins to the chloroplasts. Arabidopsis Chaperonas Duplo Direcionamento Interações proteína-proteína Localização Sub-celular Proteínas 14-3-3 Sequências de Direcionamento 14-3-3 Proteins Arabidopsis Chaperones Dual-targeting Protein-protein interactions Subcellular localization Targeting sequences

Search results