Global ETD Search

1	Analysis Of Protein Evolution And Its Implications In Remote Homology Detection And Function Recognition Gowri, V S 10 1900 (has links) One of the major outcomes of a genome sequencing project is the availability of amino acid sequences of all the proteins encoded in the genome of the organism concerned. However, most commonly, for a substantial proportion of the proteins encoded in the genome no information in function is available either from experimental studies or by inference on the basis of homology with a protein of known function. Even if the general function of a protein is known, the region of the protein corresponding to the function might be a domain and there may be additional regions of considerable length in the protein with no known function. In such cases the information on function is incomplete. Lack of understanding of the repertoire of functions of proteins encoded in the genome limits the utility of the genomic data. While there are many experimental approaches available for deciphering functions of proteins at the genomic scale, bioinformatics approaches form a good early step in obtaining clues about functions of proteins at the genomic scale (Koonin et al, 1998). One of the common bioinformatics approaches is recognition of function by homology (Bork et al, 1994). If the evolutionary relationship between two proteins, one with known function and the other with unknown function, could be established it raises the possibility of common function and 3-D structure for these proteins(Bork and Gibson, 1996). While this approach is effective its utility is limited by the ability of the bioinformatics approach to identify related proteins when their evolutionary divergence is high leading to low amino acid sequence similarity which is typical of two unrelated proteins (Bork and Koonin, 1998). Use of 3-D structural information, obtained by predictive methods such as fold recognition, has offered approaches towards increasing the sensitivity of remote homology detection 9e.g., Kelley et al, 2000; Shi et al, 2001; Gough et al, 2001). The work embodied in this thesis has the general objective of analysis of evolution of structural features and functions of families of proteins and design of new bioinformatics approaches for recognizing distantly related proteins and their applications. After an introductory chapter, a few chapters report analysis of functional and structural features of homologous protein domains. Further chapters report development and assessment of new remote homology detection approaches and applications to the proteins encoded in two protozoan organisms. A further chapter is presented on the analysis of proteins involved in methylglyoxal detoxification pathways in kinetoplastid organisms. Chapter I of the thesis presents a brief introduction, based on the information available in the literature, to protein structures, classification, methods for structure comparison, popular methods for remote homology detection and homology-based methods for function annotation. Chapter 2 describes the steps involved in the update and improvements made in this database. In addition to the update, the domain structural families are integrated with the homologous sequences from the sequence databases. Thus, every family in PALI is enriched with a substantial volume of sequence information from proteins with no known structural information. Chapter 3 reports investigations on the inter-relationships between sequence, structure and functions of closely-related homologous enzyme domain families. Chapter 4 describes the investigations on the unusual differences in the lengths of closely-related homologous protein domains, accommodation of additional lengths in protein 3-D structures and their functional implications. Chapter 5 reports the development and assessment of a new approach for remote homology detection using dynamic multiple profiles of homologous protein domain families. Chapter 6 describes development of another remote homology detection approach which are multiple, static profiles generated using the bonafide members of the family. A rigorous assessment of the approach and strategies for improving the detection of distant homologues using the multiple profile approach are discussed in this chapter. Chapter 7 describes results of searches made in the database of multiple family profiles (MulPSSM database) in order to recognize the functions of hypothetical proteins encoded in two parasitic protozoa. Chapter 8 describes the sequence and structural analyses of two glyoxalase pathway proteins from the kinetoplastid organism Leishmania donovani which causes Leishmaniases. An alternate enzyme, which would probably substitute the glyoxalase pathway enzymes in certain kinetoplastid organisms which lack the glyoxalase enzymes are also discussed. Chapter 9 summarises the important findings from the various analyses discussed in this thesis. Appendix describes an analysis on the correlation between a measure of hydrophobicity of amino acid residues aligned in a multiple sequence alignment and residue depth in 3-D structures of proteins. Proteins - Evolution Computational Biology Plasmodium Falciparum Homology Proteins - Structure Homologous Protein Structures Kinetoplastids - Detoxification Glyoxalase Enzymes - Modelling Methylglyoxal Detoxification Homologous Protein Domains Remote Homology Detection Remote Homologies Homologous Proteins MulPSSM Database PSSM Generation Biochemistry
2	Structural and Mechanistic Features of Protein Assemblies with Special Reference to Spliceosome Rakesh, Ramachandran January 2016 (has links) (PDF) Macromolecular assemblies such as the ribosome, spliceosome, polymerases are imperative for cellular functions. The current understanding of these important machineries and many other assemblies at the molecular level is poor. The lack of structural data for many macromolecular assemblies further causes a bottleneck in understanding the cellular processes and the various disease manifestations. Hence, it is essential to characterize the structures and molecular architectures of these macromolecular assemblies. Though the number of 3-D structures for individual proteins structures or domains in the Protein Data Bank (PDB) is growing, the number of structures deposited for macromolecular assemblies is relatively poor. Hence, apart from the use of experimental techniques for characterizing macromolecular assembly structures, the use of computational techniques would help in supplementing the growth of macromolecular assembly structures. This thesis deals with the use of integrative approaches where computational methods are combined with experimental data to model and understand the mechanistic features of macromolecular assemblies with a special focus on a sub-complex of the spliceosome machinery. Chapter 1 of this thesis provides an introduction to protein-protein interactions and macromolecular assemblies. Further, the modelling of macromolecular assemblies using integrative methods are discussed, with a subsequent introduction to the spliceosome machinery. In chapter 2, modelling studies were performed on the proteins involved in the general amino acid control mechanism, which is triggered in yeast under amino acid starvation conditions. The proteins involved in the study were Gcn1, a ribosome binding protein and the RWD-domain containing proteins Gcn2, Yih1, Gir2 and Mtc5. From laboratory experiments it is known that in order for Gcn2 activation, an eIF2α kinase, its RWD-domain has to bind to Gcn1 and the residue Arg-2259 is important for this interaction. As the 3-D structure for the Gcn1 region containing Arg-2259 is not currently available, its 3-D structure was inferred using fold recognition and comparative modelling techniques. Further, in order to understand the Gcn2 RWD domain-Gcn1 molecular interaction, a complex structure was inferred by using a restrained protein-protein docking procedure. As the proteins, Yih1 and Gir2 are known to bind to Gcn1 using their RWD-domains, first the structures of the RWD-domain containing proteins including Mtc5 were inferred using a Gcn2 RWD domain NMR structure. Additionally, the Gcn1-Gcn2 complex was used to build a set of complexes to explain the binding of other RWD domain containing proteins Yih1, Gir2 and Mtc5. The important molecular interactions were obtained on analysing the interacting residues in these complexes. Thus, the Gcn1-Gcn2 interaction at the molecular level has been proposed for the first time. Future experiments guided by the protein-protein complex models and the proposed set of mutations should provide an understanding about the critical molecular interactions involved in the general amino acid control mechanism. Chapter 3 describes an integrative approach that was used to decipher a pseudo-atomic model of the closed form of human SF3b complex. SF3b is a multi-protein complex containing seven components – p14, SF3b49, SF3b155, SF3b145, SF3b130, SF3b14b and SF3b10. It recognizes the branch point adenosine in the pre-mRNA as part of U2 snRNP or U11/U12 di-snRNP in the spliceosome. Although, the cryo-EM map for human SF3b complex has been available for more than a decade, the structure and relative spatial arrangement of all components in the complex are not yet known. The integrative modelling approach used here involved utilizing structural data in the form of available X-ray and NMR structures, fold recognition and comparative modelling as well as currently available experimental datasets, along with the available cryo-EM density map to provide a model with high structural coverage. Hence, the molecular architecture of closed form human SF3b complex was derived that can now provide insights into the functioning of SF3b in splicing. This might also help the future high resolution structure determination efforts of the entire human spliceosome machinery In chapter 4, the molecular architecture of the closed form of SF3b complex obtained from the use of integrative modelling approach (Chapter 3) is extensively discussed. The structure-function relationships for some of the SF3b components based on the pseudo-atomic model has also been provided. In addition, the extreme flexibility associated with some of the SF3b components based on dynamics analysis has also been examined. Further, using an existing U11/U12 di-snRNP cryo-EM map and the closed form SF3b complex pseudo-atomic model, an open form of the SF3b complex was modelled and the component structures were fit into it. Hence, it was found that the transition between closed and open forms is primarily caused by a flap containing the HEAT repeat protein, SF3b155. This Protein is also known to harbour cancer causing mutations and has the potential to affect the Closed to open transition as well as SF3b complex structure and stability. Thus, this provides a framework for the future understanding of the closed to open transition in SF3b functioning within the spliceosome. Chapter 5 builds upon the integrative modelling approach (Chapter 3) that proposed the molecular architecture of the closed form of human SF3b complex and an open form of SF3b that was derived due to a flap opening of the closed form and which might help in accommodating RNA and other trans-acting factors within the U11/U12 di-snRNP (Chapter 4). In the current chapter, the SF3b open form and its interaction with the RNA elements is studied. The 5' end of U12 snRNA and its interaction with pre-mRNA in branch point duplex was modelled guided by the open form of SF3b that provided the necessary structural constraints and the RNA model is topologically consistent with the existing biochemical data. Further, utilizing the SF3b opens form-RNA model and the existing experimental knowledge, an extensive discussion has been provided on how the architecture of SF3b acts as a scaffold for U12 snRNA: pre-mRNA branch point duplex formation as well as its potential implications for branch point adenosine recognition fidelity. Moreover, the reasons for SF3b to be defined as a “fuzzy” complex - a complex with highly flexible folded regions along with intrinsically disordered regions is also discussed. Hence, the current work adds to the excellent developments made previously and deepens the understanding of the structure-function relationship of the human SF3b complex in the context of the spliceosome machinery. In chapter 6, a methodology has been proposed for the use of evolutionary conservation of protein-protein interfacial residues in multiple protein cryo-EM density based fitting of the protein components in the low-resolution density maps of multi-protein assemblies. First, the methodology was tested on a dataset of simulated density maps generated at four different resolutions -10, 15, 20 and 25 Å. On utilizing the evolutionary conservation scores obtained from multiple sequence alignments to score the fitted complexes, it was found that there was a decrease in the conservation scores when compared to that of the crystal structures, which were used to generate the simulated density maps. Further, the assessment of the multiple protein density fitting technique to align the actual protein-protein interface residues correctly using a performance metric called F-measure showed there was a decrease in performance as the resolutions became poorer. Hence, based on evolutionary conservations scores as well as F-measure the decrease in conservation scores or performance was found to be mainly due to the errors associated with the fitting process. Subsequently, a refinement methodology was designed involving the use of conservation scores, which improved the accuracy of the fitted models and the same, was observed in an experimental cryo-EM density test case of RyR1-FKBP12 complex. Hence, the conservation information acts as an effective filter to distinguish the incorrectly fitted structures and improves the accuracy of the fitting of the protein structures in the density maps. Thus, one can incorporate the conserved surface residues information in the current density fitting tools to reduce ambiguity and improve the accuracy of the macromolecular assembly structures determined using cryo-EM. In the concluding chapter 7, the learnings on the structural and mechanistic features of protein assemblies obtained from the use of computational techniques and integration of experimental datasets is discussed. In chapter 2, the modelling of a binary macromolecular complex such as the Gcn1-Gcn2 complex was performed using computational structure prediction strategies to understand the molecular basis of its interaction. Due to the potential inaccuracies which can exist in computational modelling, the chapters 3 to 5 dealt with the use of integrative approaches, primarily guided by the cryo-EM map, in order to decipher the molecular architecture of the human SF3b complex in the closed and open forms as well as its contribution for branch point adenosine recognition. Based on the extensive experience gained in modelling of assemblies using cryo-EM data in the previous chapters, a new method has been proposed on the use of evolutionary conservation information to improve the accuracy of cryo-EM density based fitting. Hence, these studies have provided strategies for modelling macromolecular assemblies as well as a deeper understanding of its mechanistic features. Macromolecular Assemblies Splicosome Machinery Spliceosomes Protein Assemblies Cryo-Electron Microscopy Cryo-EM Density Modeling Human Splicing Factor SF3B Complex Protein-Protein Docking Cryo-EM Map Protein-Protein Interactions Homologous Protein Structures Molecular Biophyiscs
3	Structure, Stability and Evolution of Multi-Domain Proteins Bhaskara, Ramachandra M January 2013 (has links) (PDF) Analyses of protein sequences from diverse genomes have revealed the ubiquitous nature of multi-domain proteins. They form up to 70% of proteomes of most eukaryotic organisms. Yet, our understanding of protein structure, folding and evolution has been dominated by extensive studies on single-domain proteins. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Natural variations (nsSNPs) at these sites alter communication between domains and affect stability leading to disease manifestation. We emphasize this by using explicit all-atom molecular dynamics simulations to study the interface nsSNPs of human Glutathione S-transferase omega 1. We show that domain-domain interface interactions constrain inter-domain geometry (IDG) which is evolutionarily well conserved. The inter-domain linkers modulate the interactions by varying their lengths, conformations and local structure, thereby affecting the overall IDG. These findings led to the development of a method to predict interfacial residues in multi-domain proteins based on difference evolutionary information extracted from at least two diverse domain architectures (single and multi-domain). Our predictions are highly accurate (∼85%) and specific (∼95%). Using predicted residues to constrain domain–domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. Further, we developed and employed an alignment-free approach based on local amino-acid fragment matching to compare sequences of multi-domain proteins. This is especially effective in the absence of proper alignments, which is usually the case for multi-domain proteins. Using this, we were able to recreate the existing Hanks and Hunter classification scheme for protein kinases. We also showed functional relationships among Immunoglobulin sequences. The clusters obtained were functionally distinct and also showed unique domain-architectures. Our analysis provides guidelines toward rational protein and interaction design which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR. These studies enable a deeper understanding of rapport of protein domains in the multi-domain context. Protein Sequence Protein Structure Multi-Domain Proteins GSTO-1 Full-length Proteins Multi-Domain Proteins - Evolution Multi-Domain Proteins - Folding Human Glutathione S-Transferase Omega-1 Multi-Domain Proteins - 3D Modelling Protein Folding Proteins - Stability Amino Acid Sequence Single Nucleotide Polymophisms (SNPs) Homologous Protein Structures Biochemistry

1

Page generated in 0.053 seconds