21 |
Applications d'un alphabet structural pour l'analyse, la prédiction et la reconnaissance des repliements des protéines / Applications of a structural alphabet for protein structure analysis, prediction and fold recognitionMahajan, Swapnil 29 October 2013 (has links)
Les blocs protéiques (BP) constituent un alphabet structural qui permettent une bonne approximation du squelette carbonnée des protéines et la compression de l'information 3D en 1D. Leur utilisation a permis d'appréhender sous un nouvel angle la structure des protéines. Cette thèse explore de nouvelles applications des BP pour l'analyse des structures des protéines, leur prédiction et la reconnaissance de leurs repliements. Dans un premier temps, nous utilisons les BP pour une caractérisation fine des régions variables dans les alignements structuraux de protéines homologues. Ces régions peuvent néanmoins présenter des similarités importantes en terme de conformation. Leur caractérisation a permis de les distinguer des régions dont les conformations sont différentes. Nous montrons aussi que les variations intrinsèques de certaines régions comme les boucles au sein d’une protéine ne sont pas corrélées aux différences de conformation observées dans les régions équivalentes entre protéines homologues. Dans une deuxième partie, nous analysons la relation séquence-structure à l'aide de BP par le biais d'une base de données de pentapeptides issus des structures des protéines. Celle-ci a servi de base pour la mise en place d'outils pour la prédiction du squelette carbonnée des protéines (PB-kPRED) et de sa plasticité (PB-SVindex). Nous exposons comment ces prédictions permettent la reconnaissance du repliement des protéines avec un certain succès et l'identification de probables points chauds structuraux et fonctionnels. En dernière partie, nous présentons un nouvel algorithme (FoRSA) pour la reconnaissance du repliement des protéines à l'aide des BP. Cet algorithme s'appuie sur le calcul de la probabilité conditionnelle qu'une séquence adopte un repliement donné et a été testé avec succès sur des protéines tirées de CASP10. Nous montrons que FoRSA peut être utilisé pour l'annotation structurale rapide de génomes entiers. / Analysis of protein structures using structural alphabets has provided new insights into protein function and evolution. We have used a structural alphabet called proteins blocks (PBs) which efficiently approximates protein backbone and allows abstraction of 3D protein structures into 1D PB sequences. This thesis describes applications of PBs for protein structure analysis, prediction and fold recognition. First, PBs were used to provide a refined view of structurally variable regions (SVRs) in homologous proteins in terms of conformationally similar and dissimilar SVRs in which were compiled a database of structural alignments (DoSA). We also show that the inherent conformational variations in loop regions are not correlated to corresponding conformational differences in their homologues. Second, to further analyze sequence-structure relationships in terms of PBs and other structural features, we have set up a database of pentapeptides derived from protein structures. This served as a basis for the knowledge-based prediction of local protein structure in terms of PB sequences (PB-kPRED) and of local structure plasticity (PB-SVindex). We demonstrate the successful applications of PB-kPRED for fold recognition and explored possible identification of structural and functional hotspots in proteins using PB-SVindex. Finally, an algorithm for fold recognition using a structural alphabet (FoRSA) based on calculation of conditional probability of sequence-structure compatibility was developed. This new threading method has been successfully benchmarked on a test dataset from CASP10 targets. We further demonstrate the application of FoRSA for fast structural annotations of genomes.
|
22 |
ESTUDOS ESTRUTURAIS DE PROTEÍNAS: INIBIDOR DE ALFA-AMILASE DE Secale cereale, GTPase YchF E ENOLASE DE Trypanosoma cruziVillalba, Cibeli May Arévalos 12 March 2012 (has links)
Made available in DSpace on 2017-07-24T19:38:06Z (GMT). No. of bitstreams: 1
Cibeli May Villalba.pdf: 4939252 bytes, checksum: 5a6b5eaea0352844ad9868a7a9b79f0b (MD5)
Previous issue date: 2012-03-13 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Proteins are the most abundant biomolecules in living organisms; they are present in all parts of a cell. They have different functions; their structural study is important because it brings greater insight into its functions and allows us to understand how they interact to each other and with the other molecules. Protein structures can be studied experimentally especially by the X-ray diffraction technique and computationally by homology modeling. Thus, in this work, structural studies were made with the alpha-amylase inhibitor from rye (Secale cereale), which inhibits the activity of amylase and then can be used in the treatment of Diabetes mellitus,
obesity, pest control, amongst other applications. Through chromatographies, two different inhibitors could be separated, namely A2 and B2, which were crystallized,
but did not show a minimum X-ray diffraction quality. Thus, a structural study was performed with data from a twinned crystal previously obtained, yet current
refinement programs can now deal with such data. The structure was refined and compared with the alpha-amylase inhibitor 0.19 from wheat. Then, structural studies
were also performed for the YchF GTPase and enolase from Trypanosoma cruzi; both have been studied with the possibility of being used as a target in the treatment
of Chagas' disease. Initially, trials to express heterologously and to purify them for crystallization trials were performed; yet those were unsuccessful, a computational work was pursued, in which alignments and homology modelling for both proteins were made. The computational work was continued for Trypanosoma cruzi enolase,in which comparisons with the Homo sapiens enolase to seek and plan inhibitors for the former, through literature and data bank searches, were made; thus, docking of these was performed, which pointed more favorable binding energies for the substrates, phosphoenolpyruvate (PEP) and 2-phosphoglycerate (PG2), for the inhibitor phosphonacetohydroxamate (PAH) and for the compound coded ZINC25695689 from the ZINC (ZINC Is Not Commercial) data bank. Also, from the experimental position of the PAH inhibitor (deposited in the PDB, code 2PTZ), the interaction energies for these searched and planned molecules were estimated,through the AMBER molecular dynamics program, and, apparently, the presence of a chlorine atom conveniently bound to the inhibitor could promote an improvement of the interaction energy. / As proteínas são as biomoléculas mais abundantes nos seres vivos, estando presentes em todas as partes de uma célula. Elas possuem diferentes funções no organismo; seu estudo estrutural é importante, pois traz maior conhecimento sobre
suas funções e possibilita entender como interagem entre elas e com outras moléculas. A estrutura de proteínas pode ser estudada experimentalmente principalmente por meio da técnica de difração de raios X e computacionalmente por
meio da modelagem por homologia. Sendo assim, realizaram-se neste trabalho estudos estruturais com o inibidor de alfa-amilase do centeio (Secale cereale), que inibem a atividade amilásica e que podem ser utilizados em tratamento de Diabetes mellitus, obesidade, controle de pragas, entre outros usos. Por meio de cromatografias, puderam-se separar dois diferentes inibidores, denominados A2 e B2, que foram cristalizados, mas não apresentaram mínima qualidade de difração de raios X. Assim, realizou-se um estudo estrutural com dados de difração obtidos anteriormente para um cristal geminado, dada a possibilidade atual dos programas de refinamento de tratarem este problema. A estrutura foi refinada e comparada com o inibidor de alfa-amilase 0,19 do trigo. Em sequência, estudos estruturais também foram realizados para a proteína YchF da família das GTPases e para a enolase de Trypanosoma cruzi; ambas vêm sendo estudadas com a possibilidade de serem usadas como alvo no tratamento da doença de Chagas. Inicialmente tentou-se expressá-las de maneira heteróloga e purificá-las para a realização de ensaios de cristalização; com o insucesso disto, partiu-se para um trabalho computacional em que se fizeram alinhamentos e modelos por homologia para as duas proteínas. O trabalho computacional foi continuado para a enolase de Trypanosoma cruzi, comparando-a com a enolase de Homo sapiens para se buscar e planejar inibidores para a primeira, por meio de pesquisa na literatura e em bancos de dados; assim, fez-se a alocação ("docking") destes, obtendo-se energias de ligação mais favoráveis para os substratos, fosfoenolpiruvato (PEP) e 2-fosfoglicerato (PG2), para o inibidor fosfonacetohidroxamato (PAH) e para o composto codificado ZINC25695689 do banco de dados ZINC (ZINC Is Not Commercial). Também, a partir da posição experimental do inibidor PAH (depositada no PDB, código 2PTZ) estimaram-se as energias de interação para as moléculas pesquisadas e planejadas, através do programa de dinâmica molecular AMBER e, aparentemente, a presença
de um átomo de cloro convenientemente ligado ao inibidor poderia promover melhoria da energia de interação.
|
23 |
Computational Intelligence Based Classifier Fusion Models for Biomedical Classification ApplicationsChen, Xiujuan 27 November 2007 (has links)
The generalization abilities of machine learning algorithms often depend on the algorithms’ initialization, parameter settings, training sets, or feature selections. For instance, SVM classifier performance largely relies on whether the selected kernel functions are suitable for real application data. To enhance the performance of individual classifiers, this dissertation proposes classifier fusion models using computational intelligence knowledge to combine different classifiers. The first fusion model called T1FFSVM combines multiple SVM classifiers through constructing a fuzzy logic system. T1FFSVM can be improved by tuning the fuzzy membership functions of linguistic variables using genetic algorithms. The improved model is called GFFSVM. To better handle uncertainties existing in fuzzy MFs and in classification data, T1FFSVM can also be improved by applying type-2 fuzzy logic to construct a type-2 fuzzy classifier fusion model (T2FFSVM). T1FFSVM, GFFSVM, and T2FFSVM use accuracy as a classifier performance measure. AUC (the area under an ROC curve) is proved to be a better classifier performance metric. As a comparison study, AUC-based classifier fusion models are also proposed in the dissertation. The experiments on biomedical datasets demonstrate promising performance of the proposed classifier fusion models comparing with the individual composing classifiers. The proposed classifier fusion models also demonstrate better performance than many existing classifier fusion methods. The dissertation also studies one interesting phenomena in biology domain using machine learning and classifier fusion methods. That is, how protein structures and sequences are related each other. The experiments show that protein segments with similar structures also share similar sequences, which add new insights into the existing knowledge on the relation between protein sequences and structures: similar sequences share high structure similarity, but similar structures may not share high sequence similarity.
|
24 |
Analysis Of Protein Evolution And Its Implications In Remote Homology Detection And Function RecognitionGowri, V S 10 1900 (has links)
One of the major outcomes of a genome sequencing project is the availability of amino acid sequences of all the proteins encoded in the genome of the organism concerned. However, most commonly, for a substantial proportion of the proteins encoded in the genome no information in function is available either from experimental studies or by inference on the basis of homology with a protein of known function. Even if the general function of a protein is known, the region of the protein corresponding to the function might be a domain and there may be additional regions of considerable length in the protein with no known function. In such cases the information on function is incomplete.
Lack of understanding of the repertoire of functions of proteins encoded in the genome limits the utility of the genomic data. While there are many experimental approaches available for deciphering functions of proteins at the genomic scale, bioinformatics approaches form a good early step in obtaining clues about functions of proteins at the genomic scale (Koonin et al, 1998). One of the common bioinformatics approaches is recognition of function by homology (Bork et al, 1994). If the evolutionary relationship between two proteins, one with known function and the other with unknown function, could be established it raises the possibility of common function and 3-D structure for these proteins(Bork and Gibson, 1996). While this approach is effective its utility is limited by the ability of the bioinformatics approach to identify related proteins when their evolutionary divergence is high leading to low amino acid sequence similarity which is typical of two unrelated proteins (Bork and Koonin, 1998). Use of 3-D structural information, obtained by predictive methods such as fold recognition, has offered approaches towards increasing the sensitivity of remote homology detection 9e.g., Kelley et al, 2000; Shi et al, 2001; Gough et al, 2001).
The work embodied in this thesis has the general objective of analysis of evolution of structural features and functions of families of proteins and design of new bioinformatics approaches for recognizing distantly related proteins and their applications. After an introductory chapter, a few chapters report analysis of functional and structural features of homologous protein domains. Further chapters report development and assessment of new remote homology detection approaches and applications to the proteins encoded in two protozoan organisms. A further chapter is presented on the analysis of proteins involved in methylglyoxal detoxification pathways in kinetoplastid organisms.
Chapter I of the thesis presents a brief introduction, based on the information available in the literature, to protein structures, classification, methods for structure comparison, popular methods for remote homology detection and homology-based methods for function annotation.
Chapter 2 describes the steps involved in the update and improvements made in this database. In addition to the update, the domain structural families are integrated with the homologous sequences from the sequence databases. Thus, every family in PALI is enriched with a substantial volume of sequence information from proteins with no known structural information.
Chapter 3 reports investigations on the inter-relationships between sequence, structure and functions of closely-related homologous enzyme domain families.
Chapter 4 describes the investigations on the unusual differences in the lengths of closely-related homologous protein domains, accommodation of additional lengths in protein 3-D structures and their functional implications.
Chapter 5 reports the development and assessment of a new approach for remote homology detection using dynamic multiple profiles of homologous protein domain families.
Chapter 6 describes development of another remote homology detection approach which are multiple, static profiles generated using the bonafide members of the family. A rigorous assessment of the approach and strategies for improving the detection of distant homologues using the multiple profile approach are discussed in this chapter.
Chapter 7 describes results of searches made in the database of multiple family profiles (MulPSSM database) in order to recognize the functions of hypothetical proteins encoded in two parasitic protozoa.
Chapter 8 describes the sequence and structural analyses of two glyoxalase pathway proteins from the kinetoplastid organism Leishmania donovani which causes Leishmaniases. An alternate enzyme, which would probably substitute the glyoxalase pathway enzymes in certain kinetoplastid organisms which lack the glyoxalase enzymes are also discussed.
Chapter 9 summarises the important findings from the various analyses discussed in this thesis.
Appendix describes an analysis on the correlation between a measure of hydrophobicity of amino acid residues aligned in a multiple sequence alignment and residue depth in 3-D structures of proteins.
|
25 |
Rapid Determination of Protein Structures in Solution Using NMR Dipolar Couplings / Schneller Proteinstrukturbestimmung in Lösung mittels NMR detektierter dipolarer KopplungenJung, Young-Sang 26 January 2005 (has links)
No description available.
|
26 |
Three-dimensional protein structure determination by high-resolution solid-state NMR spectroscopy / Dreidimensionale Proteinstrukturbestimmung mit Hilfe von hochaufgelöster Festkörper-NMR-SpektroskopieLange, Adam 18 April 2006 (has links)
No description available.
|
27 |
Rapid Determination of High-Resolution Protein Structures by Solution and Solid-state NMR Spectroscopy / Beschleunigung der Bestimmung von hochaufgelösten Lösungs- und Festkörper-NMR StrukturenKorukottu, Jegannath 22 January 2008 (has links)
No description available.
|
28 |
Analysis of Molecular Dynamics Trajectories of Proteins Performed using Different Forcefields and Identifiction of Mobile SegmentsKatagi, Gurunath M January 2013 (has links) (PDF)
The selection of the forcefield is a crucial issue in any MD related work and there is no clear indication as to which of the many available forcefields is the best for protein analysis. Many recent literature surveys indicate that MD work may be hindered by two limitations, namely conformational sampling and forcefields used (inaccuracies in the potential energy function may bias the simulation toward incorrect conformations). However, the advances in computing infrastructures, theoretical and computing aspects of MD have paved the way to carry out a sampling on a sufficiently longtime scale, putting a need for the accuracies in the forcefield. Because there are established differences in MD results when using forcefields, we have sought to ask how we could assess common mobility segments from a protein by analysis of trajectories using three forcefields in a similar environment. This is important because, disparate fluctuations appear to be more at flexible regions compared to stiff regions; in particular, flexible regions are more relevant to functional activities of the protein molecule. Therefore, we have tried to assess the similarity in the dynamics using three well-known forcefields ENCAD, CHARMM27 and AMBERFF99SB for 61 monomeric proteins and identify the properties of dynamic residues, which may be important for function. The comparison of popular forcefields with different parameterization philosophy may give hints to improve some of the currently existing agnostics in forcefields and characterization of mobile regions based on dynamics of proteins with diverse folds. These may also give some signature on the proteins at the level of dynamics in relation to function, which can be used in protein engineering studies.
Nanosecond level MD simulation(30ns) on 61 monomeric proteins were carried out using CHARMM and AMBER forcefields and the trajectories with ENCAD forcefield obtained from Dynameomics database. The trajectories were first analyzed to check whether structural and dynamic properties from the three forcefields similar choosing few parameters in each case. The gross dynamic properties calculated (root mean square deviation (RMSD), TM-score derived RMSD, radius of gyration and accessible surface area) indicated similarity in many proteins. Flexibility index analysis on 17 proteins, which showed a notable difference in the flexibility, indicated that tertiary interactions (fraction of nonnative stable hydrogen bonds and salt bridges) might be responsible for the difference in the flexibility index. The normalized subspace overlap and shape overlap score taken based on the covariance matrices derived from trajectories indicated that majority of the proteins show a range between 0.3-0.5 indicating that the first principal components from these proteins in different combinations may not match well. These results indicate that although dynamic properties in general are similar in many proteins. However, flexibility index and normalized subspace overlap score indicate that subspaces on the first principal component in many proteins may not match completely. The number of proteins showing a better correlation is higher in CHARMM-AMBER combinations than the other two.
The structural features from trajectories have been computed in terms of fraction of secondary structure, hydrogen bonds, salt bridges and native contacts. Although secondary structures and native contacts are well preserved during the simulations, the tertiary interactions (hydrogen bonds) are lost in many proteins and may be responsible for the difference in the some of properties among forcefields. Comparison of simulation results to experimental structures in terms of Root mean square fluctuations, Accessible surface area and radius of gyration indicates that the simulations results are on par with the ones derived from experimental structures.
We have tried to assess the flexibility in the proteins using normalized Root mean square fluctuations (nRMSF), which for a residue is the ratio of RMSF from simulation to that of crystal structure. We have selected a threshold for this nRMSF to indicate the mobile regions in a protein based on secondary structure analysis. Based on the threshold of nRMSF and conformational properties (deviation in the dihedral angles), we have classified the residue and evaluated the properties of rigid hinge residues and corresponding mobile residues in terms of residue propensity, secondary structure preference and accessible surface area ranges. Since the rigid dynamic residues represent the inherent mobility, they might be important for function. Therefore, we have tried to assess the functional relevance considering the dynamic mobile residues from each protein from each forcefield simulation with the residues important for the function (taken from literature and databases). It is observed that some residues found to be mobile from the simulation are found to match with the experimental ones, although in many cases the number of these mobile residues is higher compared to the experimental ones.
In summary, an analysis of protein simulation trajectories using three forcefields on a set of monomeric protein has shown that the gross structural properties and secondary structures from many proteins remain similar, but there are differences as may be seen from flexibility index. However correlation in parameters from CHARMM and AMBER force field is better compared to other two combinations. The differences seen in some of structural properties may arise mainly due to the loss of few tertiary interactions as indicated by the fraction of native hydrogen bonds and salt bridges. Based on the nRMSF, mobile segments obtained from the simulations were identified, and some of the mobile segments are found to match the functionally important residues from the experimental ones.
Our work indicates that there are still some differences in the properties from the simulations, which indicates that care must be exercised when choosing a forcefield, especially assessing the functionally relevant residues from the simulations.
|
29 |
Oxylipinstoffwechsel in Physcomitrella patens / Oxylipin metabolism in Physcomitrella patensSauer, Kristin 06 July 2010 (has links)
Im Rahmen der vorliegenden Dissertation wurden Enzyme des Oxylipinstoffwechsels in P. patens funktionell und strukturell charakterisiert. Dafür wurden die bifunktionelle PpLOX1 und zwei AOCs (PpAOC1 und PpAOC2) ausgewählt. Mittels verschiedener biochemischer, bioinformatischer und biophysikalischer Methoden wurden diese Enzyme bezüglich Funktion, Aktivität und Struktur charakterisiert. Desweiteren wurden nach erfolgreicher Kristallisation von PpAOC1 und PpAOC2 die hochaufgelösten Röntgenkristallstrukturen beider Enzyme im Grundzustand sowie im Komplex mit Substratanalogen gelöst. Für PpAOC2 wurden dabei zwei verschiedene Bindemodi des Liganden beobachtet. Der Einfluß der Aminosäurereste Arg-345, Arg-638 und Tyr-851 auf den Reaktionsmechanismus von PpLOX1 wurde durch zielgerichtete Mutagenese und nachfolgende Analyse der Produktbildung durch die erhaltenen Varianten untersucht. Es wurden keine signifikanten Unterschiede bei der Umsetzung verschiedener Fettsäuren durch das Ausgangsenzym oder die Varianten R345L bzw. R638L gefunden. Dagegen zeigte die Doppelvariante R345L/R638L eine stark verringerte Menge an gebildeten Produkten. Demnach scheint zumindest das Vorliegen einer dieser beiden positiv geladenen Reste wichtig für die Umsetzung der Substrate zu sein. Möglicherweise wird die negativ geladene Carboxylatgruppe der jeweiligen Fettsäure durch elektrostatische Wechselwirkungen über Arg-345 oder Arg-638 gebunden. Die Variante Y851I bildete geringere Mengen von 12-ODTE, Keto-Fettsäuren und auch weniger Produkt als das Ausgangsenzym. Demnach scheint auch dieser Rest an der Katalyse beteiligt zu sein. Da aber für die Variante Y851F sogar ein erhöhter Anteil an 12-ODTE gefunden wurde, scheint der voluminöse und hydrophobe aromatische Ring, und nicht die Hydroxyl-Gruppe des Tyrosin, wichtig zu sein. Die gereinigten Enzyme PpAOC1 und PpAOC2 wurden für Aktivitätstest mit verschiedenen C20-Fettsäure-Hydroperoxiden verwendet. Beide Enzyme zeigten Aktivität gegenüber den 15-Hydroperoxiden von EPA und ETA, jedoch nicht von AA. Darüber hinaus besitzt PpAOC2, aber nicht PpAOC1, Aktivität für die 12-Hydroperoxide welche sich von AA, EPA und DGLA ableiten. Es wurden zusätzlich zu 11-OPTA bislang nicht beschriebene zyklische Verbindungen gebildet, deren chemische Struktur durch Fragmentierung mittels ESI-MS/MS aufgeklärt wurde. In den vorliegenden Studien zu PpAOC1 und PpAOC2 wurde das Glutamat an Position 18 jeweils durch Glutamin oder Aspartat ausgetauscht. Es wurde gezeigt, dass der konservierte Glutamatrest und seine negative Carboxylatgruppe in beiden Enzymen essenziell für die Katalyse ist. Dagegen wurde für die Variante R22L lediglich ein Einfluß auf die Aktivität in PpAOC2 gefunden. Im aktiven Zentrum von PpAOC1 werden zwei Wassermoleküle von vier Aminosäureresten koordiniert, während in PpAOC2 ein Wassermolekül von zwei Aminosäureresten gebunden ist. Inwiefern diese Wassermoleküle an der Katalyse beteiligt sind, konnte bisher nicht eindeutig geklärt werden.
|
30 |
Structural and Mechanistic Features of Protein Assemblies with Special Reference to SpliceosomeRakesh, Ramachandran January 2016 (has links) (PDF)
Macromolecular assemblies such as the ribosome, spliceosome, polymerases are imperative for cellular functions. The current understanding of these important machineries and many other assemblies at the molecular level is poor. The lack of structural data for many macromolecular assemblies further causes a bottleneck in understanding the cellular processes and the various disease manifestations. Hence, it is essential to characterize the structures and molecular architectures of these macromolecular assemblies.
Though the number of 3-D structures for individual proteins structures or domains in the Protein Data Bank (PDB) is growing, the number of structures deposited for macromolecular assemblies is relatively poor. Hence, apart from the use of experimental techniques for characterizing macromolecular assembly structures, the use of computational techniques would help in supplementing the growth of macromolecular assembly structures. This thesis deals with the use of integrative approaches where computational methods are combined with experimental data to model and understand the mechanistic features of macromolecular assemblies with a special focus on a sub-complex of the spliceosome machinery.
Chapter 1 of this thesis provides an introduction to protein-protein interactions and macromolecular assemblies. Further, the modelling of macromolecular assemblies using integrative methods are discussed, with a subsequent introduction to the spliceosome machinery.
In chapter 2, modelling studies were performed on the proteins involved in the general amino acid control mechanism, which is triggered in yeast under amino acid starvation conditions. The proteins involved in the study were Gcn1, a ribosome binding protein and the RWD-domain containing proteins Gcn2, Yih1, Gir2 and Mtc5. From laboratory experiments it is known that in order for Gcn2 activation, an eIF2α kinase, its RWD-domain has to bind to Gcn1 and the residue Arg-2259 is important for this interaction. As the 3-D structure for the Gcn1 region containing Arg-2259 is not currently available, its 3-D structure was inferred using fold recognition and comparative modelling techniques. Further, in order to understand the Gcn2 RWD domain-Gcn1 molecular interaction, a complex structure was inferred by using a restrained protein-protein docking procedure. As the proteins, Yih1 and Gir2 are known to bind to Gcn1 using their RWD-domains, first the structures of the RWD-domain containing proteins including Mtc5 were inferred using a Gcn2 RWD domain NMR structure. Additionally, the Gcn1-Gcn2 complex was used to build a set of complexes to explain the binding of other RWD domain containing proteins Yih1, Gir2 and Mtc5. The important molecular interactions were obtained on analysing the interacting residues in these complexes. Thus, the Gcn1-Gcn2 interaction at the molecular level has been proposed for the first time. Future experiments guided by the protein-protein complex models and the proposed set of mutations should provide an understanding about the critical molecular interactions involved in the general amino acid control mechanism.
Chapter 3 describes an integrative approach that was used to decipher a pseudo-atomic model of the closed form of human SF3b complex. SF3b is a multi-protein complex containing seven components – p14, SF3b49, SF3b155, SF3b145, SF3b130, SF3b14b and SF3b10. It recognizes the branch point adenosine in the pre-mRNA as part of U2 snRNP or U11/U12 di-snRNP in the spliceosome. Although, the cryo-EM map for human SF3b complex has been available for more than a decade, the structure and relative spatial arrangement of all components in the complex are not yet known. The integrative modelling approach used here involved utilizing structural data in the form of available X-ray and NMR structures, fold recognition and comparative modelling as well as currently available experimental datasets, along with the available cryo-EM density map to provide a model with high structural coverage. Hence, the molecular architecture of closed form human SF3b complex was derived that can now provide insights into the functioning of SF3b in splicing. This might also help the future high resolution structure determination efforts of the entire human spliceosome machinery
In chapter 4, the molecular architecture of the closed form of SF3b complex obtained from the use of integrative modelling approach (Chapter 3) is extensively discussed. The structure-function relationships for some of the SF3b components based on the pseudo-atomic model has also been provided. In addition, the extreme flexibility associated with some of the SF3b components based on dynamics analysis has also been examined. Further, using an existing U11/U12 di-snRNP cryo-EM map and the closed form SF3b complex pseudo-atomic model, an open form of the SF3b complex was modelled and the component structures were fit into it. Hence, it was found that the transition between closed and open forms is primarily caused by a flap containing the HEAT repeat protein, SF3b155. This Protein is also known to harbour cancer causing mutations and has the potential to affect the Closed to open transition as well as SF3b complex structure and stability. Thus, this provides a framework for the future understanding of the closed to open transition in SF3b functioning within the spliceosome.
Chapter 5 builds upon the integrative modelling approach (Chapter 3) that proposed the molecular architecture of the closed form of human SF3b complex and an open form of SF3b that was derived due to a flap opening of the closed form and which might help in accommodating RNA and other trans-acting factors within the U11/U12 di-snRNP (Chapter 4). In the current chapter, the SF3b open form and its interaction with the RNA elements is studied. The 5' end of U12 snRNA and its interaction with pre-mRNA in branch point duplex was modelled guided by the open form of SF3b that provided the necessary structural constraints and the RNA model is topologically consistent with the existing biochemical data. Further, utilizing the SF3b opens form-RNA model and the existing experimental knowledge, an extensive discussion has been provided on how the architecture of SF3b acts as a scaffold for U12 snRNA: pre-mRNA branch point duplex formation as well as its potential implications for branch point adenosine recognition fidelity. Moreover, the reasons for SF3b to be defined as a “fuzzy” complex - a complex with highly flexible folded regions along with intrinsically disordered regions is also discussed. Hence, the current work adds to the excellent developments made previously and deepens the understanding of the structure-function relationship of the human SF3b complex in the context of the spliceosome machinery.
In chapter 6, a methodology has been proposed for the use of evolutionary conservation of protein-protein interfacial residues in multiple protein cryo-EM density based fitting of the protein components in the low-resolution density maps of multi-protein assemblies. First, the methodology was tested on a dataset of simulated density maps generated at four different resolutions -10, 15, 20 and 25 Å. On utilizing the evolutionary conservation scores obtained from multiple sequence alignments to score the fitted complexes, it was found that there was a decrease in the conservation scores when compared to that of the crystal structures, which were used to generate the simulated density maps. Further, the assessment of the multiple protein density fitting technique to align the actual protein-protein interface residues correctly using a performance metric called F-measure showed there was a decrease in performance as the resolutions became poorer. Hence, based on evolutionary conservations scores as well as F-measure the decrease in conservation scores or performance was found to be mainly due to the errors associated with the fitting process.
Subsequently, a refinement methodology was designed involving the use of conservation scores, which improved the accuracy of the fitted models and the same, was observed in an experimental cryo-EM density test case of RyR1-FKBP12 complex. Hence, the conservation information acts as an effective filter to distinguish the incorrectly fitted structures and improves the accuracy of the fitting of the protein structures in the density maps. Thus, one can incorporate the conserved surface residues information in the current density fitting tools to reduce ambiguity and improve the accuracy of the macromolecular assembly structures determined using cryo-EM.
In the concluding chapter 7, the learnings on the structural and mechanistic features of protein assemblies obtained from the use of computational techniques and integration of experimental datasets is discussed. In chapter 2, the modelling of a binary macromolecular complex such as the Gcn1-Gcn2 complex was performed using computational structure prediction strategies to understand the molecular basis of its interaction. Due to the potential inaccuracies which can exist in computational modelling, the chapters 3 to 5 dealt with the use of integrative approaches, primarily guided by the cryo-EM map, in order to decipher the molecular architecture of the human SF3b complex in the closed and open forms as well as its contribution for branch point adenosine recognition. Based on the extensive experience gained in modelling of assemblies using cryo-EM data in the previous chapters, a new method has been proposed on the use of evolutionary conservation information to improve the accuracy of cryo-EM density based fitting. Hence, these studies have provided strategies for modelling macromolecular assemblies as well as a deeper understanding of its mechanistic features.
|
Page generated in 0.0662 seconds