• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 250
  • 43
  • 25
  • 22
  • 20
  • 5
  • 5
  • 4
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 441
  • 441
  • 81
  • 65
  • 63
  • 50
  • 39
  • 39
  • 35
  • 34
  • 32
  • 27
  • 23
  • 23
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
391

Étude du réseau d'interactions entre les protéines du Virus de l'Hépatite C

Racine, Marie-Eve January 2007 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
392

Computational Analyses Of Proteins Encoded In Genomes Of Pathogenic Organisms : Inferences On Structures, Functions And Interactions

Tyagi, Nidhi 11 1900 (has links) (PDF)
The availability of completely sequenced genomes for a number of organisms provides an opportunity to understand the molecular basis of physiology, metabolism, regulation and evolution of these organisms. Significant understanding of the complexity of organisms can be obtained from the functional characterization of repertoire of proteins encoded in their genomes. Computational approaches for recognition of function of proteins of unknown function encoded in genomes often rely on ability to detect well characterized homologues. Homology searches based on pair-wise sequence comparisons can reliably detect homologues with sequence identity more than 30%. However, detecting homologues characterized by sequence identity below 30% is difficult using these methods. Distant homology relationship can be established using profiles or position specific scoring matrices, which encapsulate information about structurally and functionally conserved residues. These conserved residues imply high constraints at a particular amino acid residue site due to their involvement in structural stability, enzymatic activity, ligand binding, protein folding or protein–protein interactions. In addition, information on three dimensional structures of proteins also aid in detection of remote homologues, as tertiary structures of proteins are conserved better than the primary structures of proteins. The gross objective of the work reported in this thesis is to employ various sensitive remote homology detection methods to recognize relevant functional information of proteins encoded mainly in pathogenic organisms. Since proteins do not work in isolation in a cell, it has become essential to understand the in vivo context of functions of proteins. For this purpose, it is essential to have an understanding of all molecules that interact with a particular protein. Thus, another major area of bioinformatics has been to integrate protein-protein interaction information to enable better understanding of context of functional events. Protein-protein interaction analysis for host-pathogen can lead to useful insight into mode of pathogenesis and subsequent consequences in host cell. Chapters 2-6 of the thesis discuss the sequence and structural characteristics along with remote evolutionary relationships and functional implications of uncharacterized proteins encoded in genomes of following pathogens: Helicobacter pylori, Plasmodium falciparum and Leishmania donovani. The Chapters 6-8 discuss mainly various sequence, structural and functional aspects of protein kinases encoded in genomes of various prokaryotes and viruses. Chapter 1 discusses background information and literature survey in the areas of homology detection and prediction of protein-protein interactions. The growth of genomic data and need for processing genomic data to infer context of various functional events have been highlighted. Different approaches to recognize functions of proteins (experimental as well as computational) have been discussed. Various experimental and computational approaches to detect/predict protein-protein interactions have been mentioned. Chapter 2 discusses recognition of non-trivial remote homology relationships involving proteins of Helicobacter pylori and their implications for function recognition. H. pylori is microaerophilic, Gram negative bacterial pathogen. It colonizes human gastric mucosa and is a causative agent of gastroduodenal disease. The pathogen infects about 50% of the human population. It can lead to development of Mucosa-associated lymphoid tissue lymphoma. About 10% of the infected population develop gastric or duodenal ulcer and approximately 1% develop gastric cancer. H. pylori has been classified as class I carcinogen by WHO. Pathogen is characterized by type IV secretion system. The complete genomic sequences of three widely studied strains including 26695, J99 and HPAG1 of Helicobacter pylori are available. According to the genome analysis, the number of predicted open reading frames in strain 26695, J99 and HPAG1 are 1590, 1495 and 1536 respectively. Out of predicted H. pylori proteins from 26695, J99 and HPAG1 strains, numbers of proteins with no functional domain assignments in Pfam database (Protein family database) are 453, 357 and 400 respectively. There are proteins in different strains of H. pylori genomes where one part of the protein is associated with at least one protein domain of known function and hence preliminary indication of their functions is available whereas rest of the region is not associated with any function. There are 772, 803 and 790 such segments in proteins from strains 26695, J99 and HPAG1 respectively with at least 45 residues with no functional assignment currently available. Sensitive remote homology detection methods have been employed to establish relationships for 294 amino acid sequences and results have been grouped into 4 categories. Results of homology detection have been further confirmed by studying conservation of amino acid residues which are important for functioning of the proteins concerned. (i) Remote relationship has been established involving protein domain families for which no bonafide member is currently known in H. pylori. For example: DNA binding protein domain (Kor_B) has been assigned to a H. pylori protein at sequence identity of 20%. Study involving secondary structure prediction and conservation of amino acid residues confirms the results of homology detection methods. (ii) Remote relationship has been established involving H. pylori hypothetical proteins and protein domain families, for which paralogous members are present in Helicobacter pylori. For example, Cytochrome_C, an electron transfer protein domain could be associated with a Helicobacter pylori protein sequence which shows a sequence identity of 14% with sequences of bonafide cytochrome C. (iii) “Missing” metabolic proteins of H. pylori have also been recognized. For example, Aspartoacylase (EC 3.5.1.15) catalyzes deacetylation of N-acetylaspartic acid to produce acetate and L-aspartate. This enzyme in aspartate metabolism pathway has not been reported so far from H. pylori. A remote evolutionary relationship between a H. pylori protein and Aspartoacylase domain has been established at sequence identity of 17% thus filling the gap in this metabolic pathway in the pathogen. (iv) New functional assignments for domains in H. pylori sequences with prior assignment of domains for the rest of the sequences have been made. For example, DNA methylase domain has been assigned to C-terminal region of H. pylori protein which already had Helicase domain assigned to the N-terminal region of the protein. All these information should open avenues for further probing by carrying out experiments which will impact the design of inhibitor against this pathogen and will result in better understanding of pathogenesis of this organism in human. Chapter 3 describes prediction of protein–protein interactions between Helicobacter pylori and the human host. A lack of information on protein-protein interactions at the host-pathogen interface is impeding the understanding of the pathogenesis process. A recently developed, homology search-based method to predict protein-protein interactions is applied to the gastric pathogen, Helicobacter pylori to predict the interactions between proteins of H. pylori and human proteins in vitro. Many of the predicted interactions could potentially occur between the pathogen and its human host during pathogenesis as we focused mainly on the H. pylori proteins that have a transmembrane region or are encoded in the pathogenic island and those which are known to be secreted into the human host. By applying the homology search approach to protein-protein interaction databases DIP and iPfam, in vitro interactions for a total of 623 H. pylori proteins with 6559 human proteins could be predicted. The predicted interactions include 549 hypothetical proteins of as yet unknown function encoded in the H. pylori genome and 13 experimentally verified secreted proteins. A total of 833 interactions involving the extracellular domains of transmembrane proteins of H. pylori could be predicted. Structural analysis of some of the examples reveals that the predicted interactions are consistent with the structural compatibility of binding partners. Various probable interactions with discernible biological relevance are discussed in this chapter. For example, interaction between CFTR protein (NP_000483) and multidrug resistance protein (HP1206) has been predicted. The structure of the CFTR intracellular domain is known in the homomeric form and consists of five AAA transport domains in tandem (PDB code 1XMI). Out of the five identical subunits, two subunits (the B chain and the E chain in the PDB structure) have been selected. The structure of multidrug resistance protein of the pathogen based on the B chain (sequence identity 32%) of the template has been modeled. This exercise suggests that interface residues in the model are congenial for interaction. This makes the structural complex feasible in in vitro conditions and suggests that the pathogen protein may compete for occupancy with the host protein. Chapter 4 describes recognition of Plasmodium-specific protein domain families and their roles in Plasmodium falciparum life cycle. Malaria in humans is caused by the parasites of intracellular, eukaryotic protozoan of apicomplexan nature belonging to the genus Plasmodium. Out of five species of Plasmodium, namely, P. falciparum, P. ovale, P. vivax, P. malariae and P. knowlesi which infects human, P. falciparum causes lethal infection. P. falciparum proteins have diverged extensively during the course of evolution. Pathogen genome is rich in A+T composition which larger than the homologous proteins from other organisms due to presence of low complexity regions. Organism specific families are important as they play roles in peculiar life style of an organism. If the organism is a pathogen, then these family members may play roles in pathogenesis. Inhibiting these specific proteins is unlikely to interfere with host system as no homolog may be present in host. In the present work we identify Plasmodium specific protein families and their role in different stages of life cycle of the pathogen. A total of 5086 amino acid sequences (full length sequences/fragments of proteins) show homology only with amino acid sequences from Plasmodium organisms and hence are Plasmodium-specific. These Plasmodium-specific amino acid sequences cluster into 106 Plasmodium-specific families (≥2 members per family). 14 Plasmodium-specific protein domain families with known physico-chemical properties are observed. These Plasmodium-specific protein domain families are involved in various important functions such as rosetting and sequestering of infected erythrocytes, binding to surface of host cell and invasion process in life cycle of pathogen. Also, 89 new Plasmodium-specific protein domain families have been recognized. Analysis of various aspects of members of Plasmodium-specific proteins domain families such as their potential to target apicoplast, protein-protein interaction, expression profile and domain organization has been performed to derive relevant information about function. New Plasmodium specific domain families for which no function can be associated could provide some insight into much diverged Plasmodium species. These proteins may play role in parasite-specific life style. Experimental work on these Plasmodium-specific proteins might fill the gaps of less understood physiology of this parasite. Chapter 5 presents genome-wide compilation of low complexity regions (LCR) in proteins. An indepth analysis of the nature, structure, and functional role of the proteins containing low complexity regions in Plasmodium falciparum, was undertaken given the high prevalence of LCRs in the proteome of this organism. Low complexity regions and repeat patterns have been recognized in proteins encoded in 986 genomes (68 archaea, 896 prokaryotes and 22 eukaryotes). Low complexity regions have been classified into following three categories: a) Composition of LCRs: (i) LCRs can be stretches of homo amino acid residues (ii) LCRs can be stretches of more than one amino acid residue type b) Periodicity of amino acids in LCRs: Certain amino acid residues can be observed at certain specific periodicity in proteins. c) Repeat patterns: Certain motif of amino acid residues are repeated in protein. 850 Plasmodium falciparum proteins are observed to have at least one repeat pattern where the repeating unit is at least 5 amino acid residues long. Statistical analysis on single amino acid residue repeats indicate that occurrence of stretches of homo amino acid residues is not a random event. Studies on recognition of functions, protein protein interactions and organization of tethered domain(s) in proteins containing LCR suggest that these proteins are part of variety of functional events such as signal transduction, enzymatic processes, cell differentiation, pyrimidine biosynthesis, fatty acid biosynthesis and chromosomal replication. Representations of low complexity regions of Plasmodium falciparum in protein data bank suggest that LCRs can take conformation of regular secondary structure (apart from disordered regions) in 3-D structures of proteins. Chapter 6 describes sequence analysis, structural modeling and evolutionary studies of Leishmania donovani hypusine pathway enzymes. Leishmania is an eukaryotic kinetoplastid protozoan parasite which causes leishmaniasis in humans. Hypusine is a non standard polyaminederived amino acid Nε-(4-amino-2-hydroxybutyl) lysine and is named after its two structural components, hydroxyputrescine and lysine. The eukaryotic translation initiation factor 5A (eIF5A) is the only cellular protein containing hypusine. Synthesis of hypusine is critical for the function of elF5A and is essential for eukaryotic cell proliferation and survival. Formation of hypusine is the result of a two step post-translational modification process involving enzymes (i) deoxyhypusine synthase (DHS) (ii) deoxyhypusine hydroxylase (DOHH). DHS, the first enzyme involved in hypusine pathway catalyzes the NAD-dependent transfer of the butylamino moiety of spermidine (substrate) to the ε-amino group of a specific lysine residue of eIF5A precursor and generates deoxyhypusine containing intermediate. DOHH, the second enzyme in same pathway catalyzes the hydroxylation of deoxyhypusine-containing intermediate, generating hypusine-containing mature eIF5A. Two putative deoxyhypusine synthase (DHS) sequences DHS34 and DHS20 have been identified in Leishmania donovani, by Professor Madhubala and coworkers (Jawaharlal Nehru University, New Delhi) with whom the work embodied in this chapter was done in collaboration. Detailed comparison of DHS34 sequence from Leishmania with human DHS protein indicated conservation of functionally important residues. 3D structural modeling studies of protein suggested that residues around the active site were absolutely conserved. NAD binding regions are located spatially closer, however, one NAD binding region was observed in a large (225 amino acid residues long) insertion. Based on these observations, DHS34 was predicted to have enzymatic activity. Experimental studies done by our collaborators confirmed preliminary results of computational analysis. Based on sequence and structural analysis of DHS20 and DOHH proteins, DHS20 and DOHH were proposed to be catalytically inactive and active respectively. Experimental studies on these proteins supported results of computational analysis. Deoxyhypusine synthase (DHS) and Deoxyhypusine hydroxylase (DOHH) are key proteins conserved in the hypusine synthesis pathways of eukaryotes. Because they are highly conserved, they could be coevolving. Comparison of the genetic distance matrices of DHS and DOHH proteins reveals that their evolutionary rates are better correlated when compared to the rate of an unrelated protein such as Cytochrome C. This indicates that they are coevolving, further serving as an indicator that, even non-interacting proteins that are functionally coupled, experience correlated evolution. However, this correlation does not extend to their tree topologies. Chapter 7 provides a classification scheme for protein kinases encoded in genomes of prokaryotic organisms. Overwhelming majority of the Ser/Thr protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Ser/Thr protein kinases prokaryotic Ser/Thr protein kinases. Traditional sequence alignment and phylogenetic approaches have been used to identify and classify prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence databases, it is recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly, subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, it was also possible to identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses. Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity in protein-protein interactions and the signaling pathways in these microbes. Chapter 8 focuses on recognition, classification of protein kinases encoded in genomes of viruses and their implications in various functions and diseases. Protein kinases encoded by viral genomes play a major role in infection, replication and survival of viruses. Using traditional sequence homology detection tools, sequence alignment methods and phylogenetic approaches, protein kinases were recognized. 646123 protein sequences from 35799 viral genomes (including strains) have been used in this analysis. Protein kinases are identified using a combination of profile-based search methods such as PSI-BLAST, RPS-BLAST and HMMER approaches. Based upon sequence similarity over the length of catalytic kinase domains, 479 protein kinase domains recognized in 244 viral genomes have been clustered into 46 subfamilies with minimum sequence identity of 35% within a subfamily. Viral protein kinases are encoded in genomes of retro-transcribing viruses or viruses which possess double stranded DNA as genetic material. Based on the available functional information present for one or more members of a subfamily, a putative function has been assigned to other members of the subfamily. Information regarding interaction of viral protein kinases with viral/host protein has also been considered for enhancing understanding of function of kinases in a subfamily. Out of 46 subfamilies, 14 subfamilies are characterized by various functions. Kinases belonging to UL97, US69, UL13 and BGLF subfamilies are virus specific. For 7 subfamilies, nearest neighbors are from well characterized eukaryotic protein kinase groups such as AGC, CAMK and CDK. Out of 25 new uncharacterized subfamilies observed in this analysis, 13 subfamilies are virus specific. Different subfamilies have been characterized by various functions which are crucial for viral infection such as synthesis of structural unit, replication of genetic material, modification of cellular components, alteration in host immune system, competing with cellular protein for efficient usage of host machinery. Also, many viral kinases share very high sequence identity (~97%) with their eukaryotic counterpart and represent disease state. For example, a protein kinase encoded in Avian erythroblastosis virus shares 97% sequence identity with catalytic domain of human epidermal growth factor receptor tyrosine kinase. Leucine at position 861 in human protein is substituted by Gln in cancer conditions; the viral protein kinase sequence possesses Gln at corresponding position and thus represents disease state. Chapter 9 provides study of dependency on the ability of 3-D structural features of comparative models and crystal structures of inactive forms of enzymes to predict enzymes by considering protein kinases as case study. With the advent of structural genomics initiatives, there is a surge in the number of proteins with 3-D structural information even before functional features are understood on many of these proteins. One of the useful annotations of a protein is the demarcation of a protein into an enzyme or non-enzyme solely from the knowledge of 3-D structure. This is facilitated by the identification of active sites and ligand binding sites in a protein. In this work, which was carried out in collaboration with Dr Jim Warwicker of Manchester University, UK, an approach developed by Warwicker and coworkers has been used. In the 3D structure of proteins, the largest clefts are generally considered to be ligand binding sites. This feature along with other sequence alignment independent properties such as residue preferences, fraction of surface residues and secondary structure elements have been considered to differentiate enzymes from non-enzymes. Electrostatic potential at the active site is one of the key properties utilized in this respect. Active sites in enzymes are generally associated with ionizable groups which can take part in catalysis. In addition to the feature of large clefts in enzymes, active site residues are in buried environments and show larger deviation in pKa values than surface residues. The method proposed by Warwicker and co-workers distinguish proteins in to enzymes and non-enzymes considering the electrostatic features at clefts along with the sequence profile of the protein concerned. Conformation of the inactive state of an enzyme is not congenial to the catalytic function. In an ideal situation, a method should be capable of predicting an enzyme irrespective of whether determined structure corresponds to active or inactive state. Peak potential values have been calculated by using Warwicker program for a set of 15 protein kinases for which 3-D structures are present in active as well in inactive conformations. Comparison of peak potential values calculated for active and inactive conformations suggests that algorithm can differentiate between active and inactive conformations as value for active conformations are generally higher than corresponding values for inactive conformations. However, the peak potential values are high enough for even the inactive conformations to be predicted as enzyme. Peak potential values calculated for generated homology models of protein kinases (for which crystal structures are already available) at different sequence identities with template sequences predict protein kinases as enzymes and their peak potential values are comparable to corresponding values for X-ray structures. This suggests that proteins for which there are no crystal or NMR structures yet available and no good template with high sequence identity are present, peak potential values for models generated at low sequence identity can still give insight into probable function of protein as an enzyme. The enzyme/non-enzyme prediction algorithm was also found to be useful in confirming enzyme functionality using 3-D models of putative viral kinases. Initially, putative function of kinase has been assigned to these viral proteins based solely upon their sequence characteristics such as presence of residues/motifs which are important for activity of the protein. The enzyme recognition method which is not directly sensitive to these motifs confirmed that all the analyzed putative viral kinases are enzymes. Chapter 10 presents conclusions of work embodied in the entire thesis. Very briefly, various computational approaches have been used to analyze and understand structural and functional properties of repertoire of proteins of pathogenic organisms. Analysis of uncharacterized protein domain families has helped to understand the functional implications of constituent proteins. Experimental validation of these results can further facilitate unraveling of functional aspects of proteins encoded in various pathogenic organisms. Apart from studies embodied in the thesis, author has been involved in two other studies, which are provided as appendices. Appendix 1 describes comparison of substitution pattern of amino acid residues of protein encoded in P. falciparum genome with substitution pattern of corresponding homologous proteins from non-Plasmodium organisms. Salient differences have been highlighted. Appendix 2 discusses study of bacterial tyrosine kinases with an objective of recognition of all putative protein tyrosine kinases in E. coli. Computational study suggests that protein SopA can be a potential tyrosine kinase and this conclusion is being tested experimentally in collaborator’s laboratory.
393

Nouvelles méthodes de calcul pour la prédiction des interactions protéine-protéine au niveau structural / Novel computational methods to predict protein-protein interactions on the structural level

Popov, Petr 28 January 2015 (has links)
Le docking moléculaire est une méthode permettant de prédire l'orientation d'une molécule donnée relativement à une autre lorsque celles-ci forment un complexe. Le premier algorithme de docking moléculaire a vu jour en 1990 afin de trouver de nouveaux candidats face à la protéase du VIH-1. Depuis, l'utilisation de protocoles de docking est devenue une pratique standard dans le domaine de la conception de nouveaux médicaments. Typiquement, un protocole de docking comporte plusieurs phases. Il requiert l'échantillonnage exhaustif du site d'interaction où les éléments impliqués sont considérées rigides. Des algorithmes de clustering sont utilisés afin de regrouper les candidats à l'appariement similaires. Des méthodes d'affinage sont appliquées pour prendre en compte la flexibilité au sein complexe moléculaire et afin d'éliminer de possibles artefacts de docking. Enfin, des algorithmes d'évaluation sont utilisés pour sélectionner les meilleurs candidats pour le docking. Cette thèse présente de nouveaux algorithmes de protocoles de docking qui facilitent la prédiction des structures de complexes protéinaires, une des cibles les plus importantes parmi les cibles visées par les méthodes de conception de médicaments. Une première contribution concerne l‘algorithme Docktrina qui permet de prédire les conformations de trimères protéinaires triangulaires. Celui-ci prend en entrée des prédictions de contacts paire-à-paire à partir d'hypothèse de corps rigides. Ensuite toutes les combinaisons possibles de paires de monomères sont évalués à l'aide d'un test de distance RMSD efficace. Cette méthode à la fois rapide et efficace améliore l'état de l'art sur les protéines trimères. Deuxièmement, nous présentons RigidRMSD une librairie C++ qui évalue en temps constant les distances RMSD entre conformations moléculaires correspondant à des transformations rigides. Cette librairie est en pratique utile lors du clustering de positions de docking, conduisant à des temps de calcul améliorés d'un facteur dix, comparé aux temps de calcul des algorithmes standards. Une troisième contribution concerne KSENIA, une fonction d'évaluation à base de connaissance pour l'étude des interactions protéine-protéine. Le problème de la reconstruction de fonction d'évaluation est alors formulé et résolu comme un problème d'optimisation convexe. Quatrièmement, CARBON, un nouvel algorithme pour l'affinage des candidats au docking basés sur des modèles corps-rigides est proposé. Le problème d'optimisation de corps-rigides est vu comme le calcul de trajectoires quasi-statiques de corps rigides influencés par la fonction énergie. CARBON fonctionne aussi bien avec un champ de force classique qu'avec une fonction d'évaluation à base de connaissance. CARBON est aussi utile pour l'affinage de complexes moléculaires qui comportent des clashes stériques modérés à importants. Finalement, une nouvelle méthode permet d'estimer les capacités de prédiction des fonctions d'évaluation. Celle-ci permet d‘évaluer de façon rigoureuse la performance de la fonction d'évaluation concernée sur des benchmarks de complexes moléculaires. La méthode manipule la distribution des scores attribués et non pas directement les scores de conformations particulières, ce qui la rend avantageuse au regard des critères standard basés sur le score le plus élevé. Les méthodes décrites au sein de la thèse sont testées et validées sur différents benchmarks protéines-protéines. Les algorithmes implémentés ont été utilisés avec succès pour la compétition CAPRI concernant la prédiction de complexes protéine-protéine. La méthodologie développée peut facilement être adaptée pour de la reconnaissance d'autres types d'interactions moléculaires impliquant par exemple des ligands, de l'ARN… Les implémentations en C++ des différents algorithmes présentés seront mises à disposition comme SAMSON Elements de la plateforme logicielle SAMSON sur http://www.samson-connect.net ou sur http://nano-d.inrialpes.fr/software. / Molecular docking is a method that predicts orientation of one molecule with respect to another one when forming a complex. The first computational method of molecular docking was applied to find new candidates against HIV-1 protease in 1990. Since then, using of docking pipelines has become a standard practice in drug discovery. Typically, a docking protocol comprises different phases. The exhaustive sampling of the binding site upon rigid-body approximation of the docking subunits is required. Clustering algorithms are used to group similar binding candidates. Refinement methods are applied to take into account flexibility of the molecular complex and to eliminate possible docking artefacts. Finally, scoring algorithms are employed to select the best binding candidates. The current thesis presents novel algorithms of docking protocols that facilitate structure prediction of protein complexes, which belong to one of the most important target classes in the structure-based drug design. First, DockTrina - a new algorithm to predict conformations of triangular protein trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) is presented. The method takes as input pair-wise contact predictions from a rigid-body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation (RMSD) test. Being fast and efficient, DockTrina outperforms state-of-the-art computational methods dedicated to predict structure of protein oligomers on the collected benchmark of protein trimers. Second, RigidRMSD - a C++ library that in constant time computes RMSDs between molecular poses corresponding to rigid-body transformations is presented. The library is practically useful for clustering docking poses, resulting in ten times speed up compared to standard RMSD-based clustering algorithms. Third, KSENIA - a novel knowledge-based scoring function for protein-protein interactions is developed. The problem of scoring function reconstruction is formulated and solved as a convex optimization problem. As a result, KSENIA is a smooth function and, thus, is suitable for the gradient-base refinement of molecular structures. Remarkably, it is shown that native interfaces of protein complexes provide sufficient information to reconstruct a well-discriminative scoring function. Fourth, CARBON - a new algorithm for the rigid-body refinement of docking candidates is proposed. The rigid-body optimization problem is viewed as the calculation of quasi-static trajectories of rigid bodies influenced by the energy function. To circumvent the typical problem of incorrect stepsizes for rotation and translation movements of molecular complexes, the concept of controlled advancement is introduced. CARBON works well both in combination with a classical force-field and a knowledge-based scoring function. CARBON is also suitable for refinement of molecular complexes with moderate and large steric clashes between its subunits. Finally, a novel method to evaluate prediction capability of scoring functions is introduced. It allows to rigorously assess the performance of the scoring function of interest on benchmarks of molecular complexes. The method manipulates with the score distributions rather than with scores of particular conformations, which makes it advantageous compared to the standard hit-rate criteria. The methods described in the thesis are tested and validated on various protein-protein benchmarks. The implemented algorithms are successfully used in the CAPRI contest for structure prediction of protein-protein complexes. The developed methodology can be easily adapted to the recognition of other types of molecular interactions, involving ligands, polysaccharides, RNAs, etc. The C++ versions of the presented algorithms will be made available as SAMSON Elements for the SAMSON software platform at http://www.samson-connect.net or at http://nano-d.inrialpes.fr/software.
394

New insights into small molecules inhibitors and protein-protein interactions of VirB8 : a critical conserved component of the type IV secretion system

Um Nlend, Ingrid 06 1900 (has links)
No description available.
395

Identificação de interações proteína-proteína envolvendo os produtos dos Loci hrp, vir e rpf do fitopatógeno Xanthomonas axonopodis pv. citri / Identification of protein-protein interactions involving the products of the loci hrp, vir and rpf the phytopathogen Xanthomonas axonopodis pv. citri

Marcos Castanheira Alegria 24 September 2004 (has links)
O Cancro Cítrico, um dos mais graves problemas fitossanitários da citricultura atual, é uma doença causada pelo fitopatógeno Xanthomonas axonopodis pv. citri (Xac). Um estudo funcional do genoma de Xac foi iniciado com o intuito de identificar interações proteína-proteína envolvidas em processos de patogenicidade de Xac. Através da utilização do sistema duplo-híbrido de levedura, baseado nos domínios de ligação ao DNA e ativação da transcrição do GAL4, nós analisamos os principais componentes dos mecanismos de patogenicidade de Xac, incluindo o Sistema de Secreção do Tipo III (TTSS), Sistema de Secreção do Tipo IV (TFSS) e Sistema de \"Quorum Sensing\" composto pelas proteínas Rpf. Componentes desses sistemas foram utilizados como iscas na triagem de uma biblioteca genômica de Xac. O TTSS é codificado pelos genes denominados hrp (\"hypersensitive response and pathogenicity\"), hrc (\"hrp conserved\") e hpa (\"hrp associated\") localizados no locus hrp do cromossomo de Xac. Esse sistema de secreção é capaz de translocar proteínas efetoras do citoplasma bacteriano para o interior da célula hospedeira. Nossos resultados mostraram novas interações proteínaproteína entre componentes do próprio TTSS além de associações específicas com uma proteína hipotética: 1) HrpG, um regulador de resposta de um sistema de dois componentes responsável pela expressão dos genes hrp, e XAC0095, uma proteína hipotética encontrada apenas em Xanthomonas spp; 2) HpaA, uma proteína secretada pelo TTSS, HpaB e o domínio C-terminal da HrcV; 3) HrpB1, HrpD6 e HrpW, 4) HrpB2 e HrcU e 5) interações homotrópicas envolvendo a ATPase HrcN. Em Xac, foram encontrados dois loci vir que codificam proteínas que possuem similaridade com componentes do TFSS envolvido em processos de conjugação/secreção bacteriana: TFSS-plasmídeo localizado no plasmídeo pXAC64 e TFSS-cromossomo localizado no cromossomo de Xac. O TFSS-plasmídeo, o qual possui maior similaridade com sistemas de conjugação, mostrou interações envolvendo proteínas cujos genes estão localizados na mesma região do plasmídeo pXAC64: 1) interação homotrópica da TrwA; 2) XACb0032 e XACb0033; 3) interações homotrópicas da proteína XACb0035; 4) VirB1 e VirB9; 5) XACb0042 e VirB6; 6) XACb0043 e XACb0021b. O TFSS-cromossomo apresentou interações envolvendo as proteínas: 1) VirD4 e um grupo de 12 proteínas que contém similaridade entre si, incluindo XAC2609 cujo gene encontra-se no locus vir, 2) XAC2609 e XAC2610; 3) Interações homotrópicas da VirB11; 4) XAC2622 e VirB9. A análise do sistema de \"Quorum-Sensing\" composto pelas proteínas Rpf mostrou interações envolvendo componentes do próprio sistema: 1) RpfC e RpfF; 2) RpfC e RpfG; 3) interações homotrópicas da RpfF; 4) RpfC e CmfA, uma proteína similar a Cmf de Dictyostelium discoideum que, neste organismo, é fundamental para processos de \"quorum-sensing\". As interações proteína-proteína encontradas permitiram-nos entender melhor a composição, organização e regulação dos fatores envolvidos na patogenicidade de Xac. / Citrus Canker, caused by the bacterial plant pathogen Xanthomonas axonopodis pv. citri (Xac) presents one of the most serious problems to Brazilian citriculture. We have initiated a project to identify protein-protein interactions involved in pathogenicity of Xac. Using a yeast two-hybrid system based on GAL4 DNA-binding and activation domains, we have focused on identifying interactions involving subunits, regulators and substrates of: Type Three Secretion System (TTSS), Type Four Secretion System (TFSS) and Quorum Sensing/Rpf System. Components of these systems were used as baits to screening a random Xac genomic library. The TTSS is coded by the hrp (hypersensitive response and pathogenicity), hrc (hrp conserved) and hpa (hrp associated) genes in the chromosomal hrp locus. This secretion system can translocate efector proteins from the bacterial cytoplasm into the host cells. We have identified several previously uncharacterized interactions involving: 1) HrpG, a two-component system response regulator responsible for the expression of Xac hrp operons, and XAC0095, a previously uncharacterized protein encountered only in Xanthomonas spp; 2) HpaA, a protein secreted by the TTSS, HpaB and the C-terminal domain HrcV; 3) HrpB1, HrpD6 and HrpW; 4) HrpB2 and HrcU; 5) Homotropic interactions were also identified for the ATPase HrcN. Xac contains two virB gene clusters, one on the chromosome and one on the pXAC64 plasmid, each of which codes for a unique and previously uncharacterized TFSS. Components of the TFSS of pXAC64, which is most similar to conjugation systems, showed interactions involving proteins coded by the same locus: 1) Homotropic interactions of TrwA; 2) XACb0032 and XACb0033; 3) XAC0035 homotropic interactions; 4) VirB1 and VirB9; 5) XACb0042 and VirB6; 6) XACb0043 and XACb0021 b. Components of the chromosomal TFSS exhibited interactions involving: 1) VirD4 and a group of 12 uncharacterized proteins with a common C-terminal domain motif, include XAC2609 whose gene resides within the vir locus; 2) XAC2609 and XAC261 O; 3) Homotropic interactions of VirB11; 4) XAC2622 and VirB9. Analysis of Quorum Sensing/Rpf System components revealed interactions between the principal Rpf proteins which control Xanthomonas quorum sensing: 1) RpfC and RpfF; 2) RpfC and RpfG; 3) RpfF homotropic interactions; 4) RpfC and CmfA, a protein that presents similarity with Cmf (conditioned medium factor) of Dictyostelium discoideum, which contrais quorum sensing in this organism. The protein-protein interactions that we have detected reveal insights into the composition, organization and regulation of these important mechanisms involved in Xanthomonas pathogenicity.
396

Componentes genéticos que afetam a via de direcionamento de proteínas organelares em Arabidopsis thaliana / Genetic components affecting organelar protein targeting in Arabidopsis thaliana

Larissa Spoladore 18 April 2016 (has links)
Nos eucariotos, a evolução dos sistemas de transporte molecular foi essencial pois seu alto grau de compartimentalização requer mecanismos com maior especificidade para a localização de proteínas. Com o estabelecimento das mitocôndrias e plastídeos como organelas da célula eucariota, grande parte dos genes específicos para sua atividade e manutenção foram transferidos ao núcleo. Após a transferência gênica, a maioria das proteínas passaram a ser codificadas pelo núcleo, sintetizadas no citosol e direcionadas às organelas por uma maquinaria complexa que envolve receptores nas membranas das organelas, sequências de direcionamento nas proteínas e proteínas citossólicas que auxiliam o transporte. A importação depende em grande parte de uma sequência na região N-terminal das proteínas que contém sinais reconhecidos pelas membranas organelares. No entanto, muito ainda não é compreendido sobre o transporte de proteínas organelares e fatores ainda desconhecidos podem influenciar o direcionamento sub-celular. O objetivo deste trabalho foi a caracterização da General Regulatory Factor 9 (GRF9), uma proteína da família 14-3-3 de Arabidopsis thaliana potencialmente envolvida no direcionamento de proteínas organelares, e a geração de um genótipo para ser utilizado na obtenção de uma população mutante para genes que afetam o direcionamento da proteína Tiamina Monofosfato Sintetase (TH-1). Após experimentos in vivo e in planta, foi observado que GRF9 interage com as proteínas duplo-direcionadas Mercaptopyruvate Sulfurtransferase1 (MST1) e a Thiazole Biosynthetic Enzyme (THI1), e com a proteína direcionada aos cloroplastos TH-1. Experimentos de deleção e interação in vivo mostraram que a região Box1 de GRF9 é essencial para a interação com THI1 e MST1. Com a finalidade de dar continuidade a caracterização da GRF9 e para realização de testes com relação a sua função no direcionamento de proteínas organelares foi gerada uma linhagem homozigota que superexpressa GRF9. Plantas expressando o transgene TH-1 fusionado a Green Fluorescent Protein (GFP) em genótipo deficiente na TH-1 (CS3469/TH-1-GFP) foram obtidas para a geração de população mutante que possibilitará a descoberta de componentes genéticos ainda desconhecidos e responsáveis pelo direcionamento de proteínas aos cloroplastos. / In Eukaryotes, the evolution of molecular transport in the cell was essential due to their increase in compartmentalization, which requires more specific mechanisms for the correct localization of proteins. With the establishment of mitochondria and plastids as organelles, a great number of their genes, either specific for their metabolic functions or maintenance of their own transcription/translation processes, were transferred to the nucleus of the cell. These transfers caused most of the organellar proteins to be coded by the nucleus, then synthesized in the cytosol and targeted to the organelles by a complex machinery which involves membrane receptors in the organelles, targeting sequences in the proteins, and cytosolic proteins which assist them with the transport. Protein import depends greatly on an N-terminal sequence in proteins which has recognizable signals for the organellar membrane receptors. However, much is still not understood about the transport of organellar proteins, and unknown factors may still influence subcellular targeting. The goal of this work was the characterization of General Regulatory Factor 9 (GRF9), a protein of the 14-3-3 family in Arabidopsis thaliana potentially involved in the targeting of organellar proteins, and generating a genotype to be used in obtaining a mutant population for genes affecting the targeting of the protein Thiamine Requiring 1 (TH-1). After in vivo and in planta experiments it was observed that GRF9 interacts with the dual-targeted proteins Mercaptopyruvate Sulfurtransferase1 (MST1) and Thiazole Biosynthetic Enzyme (THI1), and with the chloroplast targeted protein TH-1. Deletion experiments followed by in vivo interaction assays showed that Box 1 region of GRF9 is essential for the interaction with THI1 and MST1. For the continuing characterization of GRF9 and for following tests of its function in the targeting of organellar proteins, a homozygous line was generated overexpressing GRF9. Plants expressing the transgene TH-1 fused to the Green Fluorescent Protein (GFP) in a TH-1 deficient genotype (CS3469/TH-1-GFP) were obtained for the generation of a mutant population which will allow the discovery of genetic components still unknown responsible for targeting proteins to the chloroplasts.
397

Conception de ligands protéiques artificiels par ingénierie moléculaire in silico / Design of artificial protein binders by in silico molecular engineering

Baccouche, Rym 30 November 2012 (has links)
Les travaux réalisés portent sur la conception de ligands protéiques capables de cibler le site catalytique des métalloprotéases matricielles (MMPs) grâce à une méthode d’ingénierie développée au laboratoire qui repose sur le greffage de motifs fonctionnels. Le motif fonctionnel choisi correspond aux 4 résidus N-terminaux du TIMP-2, un inhibiteur naturel des MMPs. Des plates-formes protéiques possédant des motifs d’acides aminés dans une topologie similaire à celle du motif de référence dans le complexe TIMP-2/MMP-14 ont été identifiées par criblage systématique de la PDB à l’aide du logiciel STAMPS (Search for Three-dimensional Atom Motif in Protein Structure). Dix candidats ligands satisfaisant les contraintes topologiques, stériques et de similarité électrostatique avec le ligand naturel TIMP-2 ont été sélectionnés. Ces ligands ont été produits par synthèse chimique ou par voie recombinante puis leur capacité à inhiber une série de 6 MMPs a été évaluée. Les résultats indiquent que tous les ligands protéiques conçus in silico sont capables de lier les sites catalytiques des MMPs avec des constantes d’association allant de 450 nM à 590 mM, sans optimisation supplémentaire. La caractérisation structurale par diffraction X de 2 variants d’un de ces ligands protéiques a permis de montrer que les interactions établies par le motif 1-4 dans ces ligands étaient similaires à celles observées dans le complexe TIMP-2/MMP-14, avec cependant des différences dans la géométrie de certaines d’entre elles. Des études de simulation par dynamique moléculaire ont également permis de mettre en évidence de possibles différences dans la géométrie et la stabilité de certaines des interactions reproduites dans les 10 plates-formes, pouvant contribuer aux affinités modestes observées pour ces ligands. Cependant, les résultats obtenus montrent que la méthode de conception in silico utilisée est capable de fournir une série de ligands protéiques de 1ère génération ciblant de manière spécifique un site catalytique d’intérêt avec un bon rendement. Cette méthode pourrait constituer la 1ère étape d’une approche hybride de conception in silico de ligands combinée à des techniques de sélection expérimentales. / Artificial mini-proteins able to target catalytic sites of matrix metalloproteinases (MMPs) were designed using a functional motif grafting approach. The motif corresponded to the 4 N-terminal residues of TIMP-2, a broad-spectrum natural protein inhibitor of MMPs. Scaffolds able to reproduce the functional topology of this motif as described in the TIMP-2/MMP-14 complex were obtained by exhaustive screening of the Protein Data Bank (PDB) using the STAMPS software (Search for Three-dimensional Atom Motif in Protein Structure). Ten artificial protein binders satisfying all topologic, steric and electrostatic criteria applied for selection were produced for experimental evaluation. These binders targeted catalytic sites of MMPs with affinities ranging from 450 nM and 590 μM prior to optimization. The crystal structures of two artificial binders in complex with the catalytic domain of MMP-12 showed that the intermolecular interactions established by the functional motif in these artificial binders corresponded to those found in the TIMP-2/MMP-14 complex, albeit with some differences in their geometry. Molecular dynamics simulations of the 10 binders in complex with MMP-14 suggested that these scaffolds could allow reproducing in part the native intermolecular interactions, but some differences in geometry and stability could contribute to the lower affinity of the artificial protein binders as compared to the natural one. Nevertheless, these results show that the in silico design method used can provide sets of starting protein binders targeting a specific binding site with a good rate of success. This approach could constitute the first step of an efficient hybrid computational-experimental protein binder design approach.
398

Structural bioinformatics tools for the comparison and classification of protein interactions

Garma, L. D. (Leonardo D.) 08 August 2017 (has links)
Abstract Most proteins carry out their functions through interactions with other molecules. Thus, proteins taking part in similar interactions are likely to carry out related functions. One way to determine whether two proteins do take part in similar interactions is by quantifying the likeness of their structures. This work focuses on the development of methods for the comparison of protein-protein and protein-ligand interactions, as well as their application to structure-based classification schemes. A method based on the MultiMer-align (or MM-align) program was developed and used to compare all known dimeric protein complexes. The results of the comparison demonstrates that the method improves over MM-align in a significant number of cases. The data was employed to classify the complexes, resulting in 1,761 different protein-protein interaction types. Through a statistical model, the number of existing protein-protein interaction types in nature was estimated at around 4,000. The model allowed the establishment of a relationship between the number of quaternary families (sequence-based groups of protein-protein complexes) and quaternary folds (structure-based groups). The interactions between proteins and small organic ligands were studied using sequence-independent methodologies. A new method was introduced to test three similarity metrics. The best of these metrics was subsequently employed, together with five other existing methodologies, to conduct an all-to-all comparison of all the known protein-FAD (Flavin-Adenine Dinucleotide) complexes. The results demonstrates that the new methodology captures the best the similarities between complexes in terms of protein-ligand contacts. Based on the all-to-all comparison, the protein-FAD complexes were subsequently separated into 237 groups. In the majority of cases, the classification divided the complexes according to their annotated function. Using a graph-based description of the FAD-binding sites, each group could be further characterized and uniquely described. The study demonstrates that the newly developed methods are superior to the existing ones. The results indicate that both the known protein-protein and the protein-FAD interactions can be classified into a reduced number of types and that in general terms these classifications are consistent with the proteins' functions. / Tiivistelmä Suurin osa proteiinien toiminnasta tapahtuu vuorovaikutuksessa muiden molekyylien kanssa. Proteiinit, jotka osallistuvat samanlaisiin vuorovaikutuksiin todennäköisesti toimivat samalla tavalla. Kahden proteiinin todennäköisyys esiintyä samanlaisissa vuorovaikutustilanteissa voidaan määrittää tutkimalla niiden rakenteellista samankaltaisuutta. Tämä väitöskirjatyö käsittelee proteiini-proteiini- ja proteiini-ligandi -vuorovaikutusten vertailuun käytettyjen menetelmien kehitystä, ja niiden soveltamista rakenteeseen perustuvissa luokittelujärjestelmissä. Tunnettuja dimeerisiä proteiinikomplekseja tutkittiin uudella MultiMer-align-ohjelmaan (MM-align) perustuvalla menetelmällä. Vertailun tulokset osoittavat, että uusi menetelmä suoriutui MM-alignia paremmin merkittävässä osassa tapauksista. Tuloksia käytettiin myös kompleksien luokitteluun, jonka tuloksena oli 1761 erilaista proteiinien välistä vuorovaikutustyyppiä. Luonnossa esiintyvien proteiinien välisten vuorovaikutusten määrän arvioitiin tilastollisen mallin avulla olevan noin 4000. Tilastollisen mallin avulla saatiin vertailtua sekä sekvenssin (”quaternary families”) sekä rakenteen (”quaternary folds”) mukaan ryhmiteltyjen proteiinikompleksien määriä. Proteiinien ja pienien orgaanisten ligandien välisiä vuorovaikutuksia tutkittiin sekvenssistä riippumattomilla menetelmillä. Uudella menetelmällä testattiin kolmea eri samankaltaisuutta mittaavaa metriikkaa. Näistä parasta käytettiin viiden muun tunnetun menetelmän kanssa vertailemaan kaikkia tunnettuja proteiini-FAD (Flavin-Adenine-Dinucleotide, flaviiniadeniinidinukleotidi) -komplekseja. Proteiini-ligandikontaktien osalta uusi menetelmä kuvasi kompleksien samankaltaisuutta muita menetelmiä paremmin. Vertailun tuloksia hyödyntäen proteiini-FAD-kompleksit luokiteltiin edelleen 237 ryhmään. Suurimmassa osassa tapauksista luokittelujärjestelmä oli onnistunut jakamaan kompleksit ryhmiin niiden toiminnallisuuden mukaisesti. Ryhmät voitiin määritellä yksikäsitteisesti kuvaamalla FAD:n sitoutumispaikka graafisesti. Väitöskirjatyö osoittaa, että siinä kehitetyt menetelmät ovat parempia kuin aikaisemmin käytetyt menetelmät. Tulokset osoittavat, että sekä proteiinien väliset että proteiini-FAD -vuorovaikutukset voidaan luokitella rajattuun määrään vuorovaikutustyyppejä ja yleisesti luokittelu on yhtenevä proteiinien toiminnan suhteen.
399

Structural and Mechanistic Features of Protein Assemblies with Special Reference to Spliceosome

Rakesh, Ramachandran January 2016 (has links) (PDF)
Macromolecular assemblies such as the ribosome, spliceosome, polymerases are imperative for cellular functions. The current understanding of these important machineries and many other assemblies at the molecular level is poor. The lack of structural data for many macromolecular assemblies further causes a bottleneck in understanding the cellular processes and the various disease manifestations. Hence, it is essential to characterize the structures and molecular architectures of these macromolecular assemblies. Though the number of 3-D structures for individual proteins structures or domains in the Protein Data Bank (PDB) is growing, the number of structures deposited for macromolecular assemblies is relatively poor. Hence, apart from the use of experimental techniques for characterizing macromolecular assembly structures, the use of computational techniques would help in supplementing the growth of macromolecular assembly structures. This thesis deals with the use of integrative approaches where computational methods are combined with experimental data to model and understand the mechanistic features of macromolecular assemblies with a special focus on a sub-complex of the spliceosome machinery. Chapter 1 of this thesis provides an introduction to protein-protein interactions and macromolecular assemblies. Further, the modelling of macromolecular assemblies using integrative methods are discussed, with a subsequent introduction to the spliceosome machinery. In chapter 2, modelling studies were performed on the proteins involved in the general amino acid control mechanism, which is triggered in yeast under amino acid starvation conditions. The proteins involved in the study were Gcn1, a ribosome binding protein and the RWD-domain containing proteins Gcn2, Yih1, Gir2 and Mtc5. From laboratory experiments it is known that in order for Gcn2 activation, an eIF2α kinase, its RWD-domain has to bind to Gcn1 and the residue Arg-2259 is important for this interaction. As the 3-D structure for the Gcn1 region containing Arg-2259 is not currently available, its 3-D structure was inferred using fold recognition and comparative modelling techniques. Further, in order to understand the Gcn2 RWD domain-Gcn1 molecular interaction, a complex structure was inferred by using a restrained protein-protein docking procedure. As the proteins, Yih1 and Gir2 are known to bind to Gcn1 using their RWD-domains, first the structures of the RWD-domain containing proteins including Mtc5 were inferred using a Gcn2 RWD domain NMR structure. Additionally, the Gcn1-Gcn2 complex was used to build a set of complexes to explain the binding of other RWD domain containing proteins Yih1, Gir2 and Mtc5. The important molecular interactions were obtained on analysing the interacting residues in these complexes. Thus, the Gcn1-Gcn2 interaction at the molecular level has been proposed for the first time. Future experiments guided by the protein-protein complex models and the proposed set of mutations should provide an understanding about the critical molecular interactions involved in the general amino acid control mechanism. Chapter 3 describes an integrative approach that was used to decipher a pseudo-atomic model of the closed form of human SF3b complex. SF3b is a multi-protein complex containing seven components – p14, SF3b49, SF3b155, SF3b145, SF3b130, SF3b14b and SF3b10. It recognizes the branch point adenosine in the pre-mRNA as part of U2 snRNP or U11/U12 di-snRNP in the spliceosome. Although, the cryo-EM map for human SF3b complex has been available for more than a decade, the structure and relative spatial arrangement of all components in the complex are not yet known. The integrative modelling approach used here involved utilizing structural data in the form of available X-ray and NMR structures, fold recognition and comparative modelling as well as currently available experimental datasets, along with the available cryo-EM density map to provide a model with high structural coverage. Hence, the molecular architecture of closed form human SF3b complex was derived that can now provide insights into the functioning of SF3b in splicing. This might also help the future high resolution structure determination efforts of the entire human spliceosome machinery In chapter 4, the molecular architecture of the closed form of SF3b complex obtained from the use of integrative modelling approach (Chapter 3) is extensively discussed. The structure-function relationships for some of the SF3b components based on the pseudo-atomic model has also been provided. In addition, the extreme flexibility associated with some of the SF3b components based on dynamics analysis has also been examined. Further, using an existing U11/U12 di-snRNP cryo-EM map and the closed form SF3b complex pseudo-atomic model, an open form of the SF3b complex was modelled and the component structures were fit into it. Hence, it was found that the transition between closed and open forms is primarily caused by a flap containing the HEAT repeat protein, SF3b155. This Protein is also known to harbour cancer causing mutations and has the potential to affect the Closed to open transition as well as SF3b complex structure and stability. Thus, this provides a framework for the future understanding of the closed to open transition in SF3b functioning within the spliceosome. Chapter 5 builds upon the integrative modelling approach (Chapter 3) that proposed the molecular architecture of the closed form of human SF3b complex and an open form of SF3b that was derived due to a flap opening of the closed form and which might help in accommodating RNA and other trans-acting factors within the U11/U12 di-snRNP (Chapter 4). In the current chapter, the SF3b open form and its interaction with the RNA elements is studied. The 5' end of U12 snRNA and its interaction with pre-mRNA in branch point duplex was modelled guided by the open form of SF3b that provided the necessary structural constraints and the RNA model is topologically consistent with the existing biochemical data. Further, utilizing the SF3b opens form-RNA model and the existing experimental knowledge, an extensive discussion has been provided on how the architecture of SF3b acts as a scaffold for U12 snRNA: pre-mRNA branch point duplex formation as well as its potential implications for branch point adenosine recognition fidelity. Moreover, the reasons for SF3b to be defined as a “fuzzy” complex - a complex with highly flexible folded regions along with intrinsically disordered regions is also discussed. Hence, the current work adds to the excellent developments made previously and deepens the understanding of the structure-function relationship of the human SF3b complex in the context of the spliceosome machinery. In chapter 6, a methodology has been proposed for the use of evolutionary conservation of protein-protein interfacial residues in multiple protein cryo-EM density based fitting of the protein components in the low-resolution density maps of multi-protein assemblies. First, the methodology was tested on a dataset of simulated density maps generated at four different resolutions -10, 15, 20 and 25 Å. On utilizing the evolutionary conservation scores obtained from multiple sequence alignments to score the fitted complexes, it was found that there was a decrease in the conservation scores when compared to that of the crystal structures, which were used to generate the simulated density maps. Further, the assessment of the multiple protein density fitting technique to align the actual protein-protein interface residues correctly using a performance metric called F-measure showed there was a decrease in performance as the resolutions became poorer. Hence, based on evolutionary conservations scores as well as F-measure the decrease in conservation scores or performance was found to be mainly due to the errors associated with the fitting process. Subsequently, a refinement methodology was designed involving the use of conservation scores, which improved the accuracy of the fitted models and the same, was observed in an experimental cryo-EM density test case of RyR1-FKBP12 complex. Hence, the conservation information acts as an effective filter to distinguish the incorrectly fitted structures and improves the accuracy of the fitting of the protein structures in the density maps. Thus, one can incorporate the conserved surface residues information in the current density fitting tools to reduce ambiguity and improve the accuracy of the macromolecular assembly structures determined using cryo-EM. In the concluding chapter 7, the learnings on the structural and mechanistic features of protein assemblies obtained from the use of computational techniques and integration of experimental datasets is discussed. In chapter 2, the modelling of a binary macromolecular complex such as the Gcn1-Gcn2 complex was performed using computational structure prediction strategies to understand the molecular basis of its interaction. Due to the potential inaccuracies which can exist in computational modelling, the chapters 3 to 5 dealt with the use of integrative approaches, primarily guided by the cryo-EM map, in order to decipher the molecular architecture of the human SF3b complex in the closed and open forms as well as its contribution for branch point adenosine recognition. Based on the extensive experience gained in modelling of assemblies using cryo-EM data in the previous chapters, a new method has been proposed on the use of evolutionary conservation information to improve the accuracy of cryo-EM density based fitting. Hence, these studies have provided strategies for modelling macromolecular assemblies as well as a deeper understanding of its mechanistic features.
400

Analysis Of Structural And Functional Types Of Protein-Protein Interactions

Nambudiry Rekha, * 02 1900 (has links) (PDF)
No description available.

Page generated in 0.1244 seconds