Global ETD Search

331	Genome-Wide And Structural Analyses Of Protein Kinase Superfamily Anamika, * 01 1900 (has links) A signal transduction process refers to chain of highly regulated biochemical steps which results in the transfer of signal in response to a stimulus in the extracellular environment to the intracellular compartments such as nucleus. Variety of biomolecules such as proteins and lipids participate in such processes. One of the superfamilies of proteins which actively participate in signaling processes is protein kinase which transfers γ-phosphate from Adenosine Triphosphate (ATP) to the specific hydroxyl group(s) in the protein substrates. Phosphorylation and dephosphorylation events are critical in many signal transduction pathways affecting biological system as a whole. Protein phosphorylation carried out by protein kinases has emerged as pre-eminent mechanism for the regulation of variety of cellular processes such as cell growth, development, differentiation, homeostasis, apoptosis, metabolism, transcription and translation. The current thesis encompasses the investigations carried out by the author, using various bioinformatics tools and methods, to comprehend the structural and functional roles of diverse set of protein kinase subfamilies in various eukaryotic and prokaryotic organisms. The present thesis has been divided into various chapters. Chapter 1 of the thesis provides introduction to the superfamily of protein kinases and covers the relevant literature. The database of Kinases in Genomes (KinG) set-up in the author’s group a few years ago (Krupa et al, 2004a), comprises of a collection of Serine/Threonine/Tyrosine protein kinases recognized using bioinformatics approaches, from the genomic data of various eukaryotes, prokaryotes and viruses (Krupa et al, 2004a). KinG database also provides classification of protein kinases into various groups and subfamilies (Hanks et al, 1988). Information on non-kinase domains which are tethered to the catalytic kinase domains is also available for every kinase in the KinG database. KinG is periodically (annually) updated with rise in the number of genome sequence datasets of various organisms, increase in the number of known protein domain families and refinement or reannotation of genomic datasets (Anamika et al, 2008c). Author describes the work on annual update of KinG database in Chapter 2 of the thesis. Availability of an improved version of the human genomic data has provided an opportunity to re-investigate protein kinase complement of the human genome and enabled an analysis of the splice variants. This analysis is also described in Chapter 2. Chapter 3, Chapter 4 and Chapter 5 report recognition and analysis of repertoire of protein kinases in Chimpanzee, two Plasmodium species (Plasmodium falciparum and Plasmodium yoelii yoelli) and Entamoeba histolytica respectively. A detailed analysis of the non-kinase domains which are tethered to catalytic protein kinase domains in eukaryotic organisms is presented in Chapter 6. Chapter 7 discusses a systematic classification framework developed by the author to classify Serine/Threonine protein kinases in prokaryotic organisms. Investigation carried out on 3-D structural aspects of protein kinase-substrate interactions is described in Chapter 8. While identifying protein kinases from genomic data occurrence of protein kinase-like non-kinases (PKLNK), which lack aspartate in a specific position in the amino acid sequence (and hence are unlikely to function as a kinase), has also been observed. Chapter 9 presents an analysis of PKLNKs with an objective of obtaining clues to their functions. Chapter 10 summarizes the main conclusions of the thesis and provides an outlook of the current study. Chapter 1: Chapter 1 provides an introduction to cell signaling and the involvement of protein kinases in various signaling pathways compiled from author’s literature survey. This chapter provides a description of molecular events in cell signaling in prokaryotic and eukaryotic organisms. The diversity, specificity and cellular roles of protein kinases are discussed in detail. Chapter 2: Chapter 2 describes KinG (Kinases in Genomes) database which was first established by Krupa et al (2004a). The KinG database is an on-line compilation of the putative Serine/Threonine/Tyrosine protein kinases encoded in the completely sequenced genomes of archaea, eubacteria, viruses and eukaryotes. Surge in the datasets of genomes, improvements in the quality of the genomic data for various organisms and growing number of protein domain families necessitates periodic update of KinG database. The updated version of KinG holds information on protein kinases for 483 organisms (Anamika et al, 2008c). Availability of draft version of the human genome data in 2001 enabled recognition of repertoire of human protein kinases (Krupa and Srinivasan, 2002a; Manning et al, 2002; Kostich et al, 2002). Over the last 7 years human genomic data is being refined and at present the quality of the human genomic data available is much superior to the one available in 2001. By gleaning the latest version of human genome data, 46 new human protein kinase splice variants have been identified which were not recognized in the earlier studies on human kinome. Improper regulation or mutant forms of many of these newly identified protein kinase splice variants are directly involved in various diseases such as different kinds of cancer, Severe Combined Immunodeficiency Disease (SCID) and Huntington disease. In addition, abnormal forms of mouse orthologues of some of the newly identified human kinase splice variants are known to cause various diseases in mice. This raises the possibility of the human orthologues playing similar roles in the disease processes. Such observations and detailed analysis of these protein kinase splice variants would have a profound influence on drug design and development against various diseases. Chapter 3: Investigations on the identification and analysis of protein kinases encoded in the genome of chimpanzee (chimp) has been discussed in Chapter 3. Further, the kinome complement has been compared between chimp and its evolutionary close relative, human (Anamika et al, 2008b). The shared core biology between chimp and human is characterized by many orthologous protein kinases which are involved in conserved pathways. Domain architectures specific to chimp/human kinases have been observed. Chimp kinases with unique domain architectures are characterized by deletion of one or more non-kinase domains present in the human kinases. Interestingly, counterparts of some of the multi-domain human kinases in chimp are characterized by identical domain architectures but with kinase-like non-kinase domain (PKLNK). Remarkably, for 160 out of 587 chimpanzee kinases no human orthologue with sequence identity greater than 95% could be identified. Variations in chimpanzee kinases compared to human kinases are brought about also by differences in functions of domains tethered to the catalytic kinase domain. For example, the heterodimer forming PB1 domain related to the fold of ubiquitin / Ras-binding domain is seen uniquely tethered to PKC-like chimpanzee kinase. Though chimpanzee and human have close evolutionary relationship, there are chimpanzee kinases with no close counterpart in the human suggesting differences in their functions. This chapter provides a direction for experimental analysis of human and chimpanzee protein kinases in order to enhance our understanding on their specific biological roles. Chapter 4: Chapter 4 describes genome-wide comparative analysis for protein kinases encoded in the two apicomplexa namely Plasmodium falciparum (P. falciparum) (3D7 strain) and Plasmodium yoelii yoelii (P. yoelii yoelii) (17XNL strain) genomes which are causative agents of malaria in human and rodent respectively (Anamika and Srinivasan, 2007). Sensitive bioinformatics techniques enable identification of 82 and 60 putative protein kinases in P. falciparum and P. yoelii yoelii respectively. These protein kinases have been classified further into subfamilies based on the extent of sequence similarity of their catalytic domains (Hanks et al, 1988). The most populated kinase subfamilies in both the Plasmodium species correspond to CAMK and CMGC groups. Analysis of domain architectures enables detection of uncommon domain organisation in kinases of both the organisms such as kinase domain tethered to EF hands as well as pleckstrin homology domain. Components of MAPK signaling pathway are not well conserved in Plasmodium species. Such observations suggest that Plasmodium protein kinases are highly divergent from other eukaryotes. A trans-membrane kinase with 6 membrane spanning segments in P. falciparum seems to have no orthologue in P. yoelii yoelii. 19 P. falciparum kinases (Anamika et al, 2005; Anamika and Srinivasan, 2007) have been found to cluster separately from P. yoelii yoelii kinases and hence these kinases are unique to P. falciparum genome. Only 28 orthologous pairs of kinases could be identified between these two Plasmodium species. Comparative kinome analysis of the two Plasmodium species has thus provided clues to the function of many protein kinases based upon their classification and domain organisation and also implicate marked differences even between two Plasmodium species. Chapter 5: Identification and analysis of the repertoire of protein kinases in the intracellular parasite Entamoeba histolytica (E. histolytica) using sensitive sequence and profile search methods forms the basis of Chapter 5. A systematic analysis of a set of 307 protein kinases in E. histolytica genome has been carried out by classifying them into different subfamilies originally defined by Hanks and Hunter (Hanks et al, 1988) and by examining the functional domains which are tethered to the catalytic kinase domains (Anamika et al, 2008a). Compared to other eukaryotic organisms, protein kinases from E. histolytica vary in terms of their domain organisation and displays features that may have a bearing in the unusual biology of this organism. Some of the parasitic kinases show high sequence similarity in the catalytic domain region with calmodulin/calcium dependent protein kinase subfamily. However, they are unlikely to act like calcium/calmodulin dependent kinases as they lack non-catalytic domains characteristic of such kinases in other organisms. Such kinases form the largest subfamily of protein kinases in E. histolytica. Interestingly a Protein Kinase A/Protein Kinase G-like hybrid kinase subfamily member is tethered to pleckstrin homology domain. Although potential cyclins and cyclin-dependent kinases could be identified in the genome the likely absence of other cell cycle proteins suggests unusual nature of cell cycle in E. histolytica. Some of the unusual kinases recognized in the analysis include the absence of Mitogen activated protein kinase kinase (MEK) as a part of the Mitogen Activated Kinase signaling pathway and identification of trans-membraneous kinases with catalytic kinase region showing a good sequence similarity to Src kinases which are usually cytosolic. Sequences which could not be classified into known subfamilies of protein kinases have unusual domain architectures. Many such unclassified protein kinases are tethered to domains which are cysteine-rich and to domains known to be involved in protein-protein interactions. The current chapter on kinome analysis of E. histolytica suggests that the organism possesses a complex protein phosphorylation network that involves many unusual protein kinases. Chapter 6: Protein kinases phosphorylating Serine/Threonine/Tyrosine residues in several cellular proteins exert tight control over their biological functions. They constitute the largest protein family in most eukaryotic species. Classification based on sequence similarity of their catalytic domains, results in clustering of protein kinases sharing gross functional properties into various subfamilies. Many protein kinases are associated or tethered covalently to domains that serve as adapter or regulatory modules, aiding substrate recruitment, specificity, and also serve as scaffolds. Hence the modular organisation of the protein kinases serves as guidelines to their molecular interaction which has been discussed in Chapter 6. Recent studies on repertoires of protein kinases in eukaryotes have revealed wide spectrum of domain organisation in model organisms across various subfamilies. Occurrence of organism-specific novel domain combinations suggests functional diversity achieved by the protein kinase in order to regulate variety of biological processes. In addition, domain architectures of protein kinases revealed existence of hybrid protein kinase subfamilies and their emerging roles in the signaling of eukaryotic organisms. The repertoire of non-kinase domains tethered to multi-domain kinases in the higher eukaryotes is discussed in Chapter 6. Similarities and differences in the domain architectures of protein kinases in these organisms indicate conserved and unique features that are critical to functional specialization. Chapter 7: Chapter 7 describes systematic classification of Serine/Threonine protein kinases encoded in archaeal and eubacterial genomes. Majority of the Serine/Threonine protein kinases which have been identified in archaeal and eubacterial genomes could not be classified into any of the well known subfamilies (Hanks et al, 1988) of protein kinases suggesting their diversity from kinases in eukaryotes. The extensive prokaryotic Serine/Threonine protein kinase dataset obtained from KinG (Krupa et al, 2004a, Anamika et al, 2008c) has given an opportunity to classify these prokaryotic Serine/Threonine protein kinases mainly into three categories based upon sequence identity based clustering: 1) Species/Genus-specific clusters: Species/Genus-specific Serine/Threonine protein kinases contain members from a particular species or genus of the eubacteria or archaea suggesting requirement of these Serine/Threonine protein kinases for certain lineage specific function. 2) Organism-specific clusters: Organism specific clusters has members from certain specific types of organisms which suggests role of these Serine/Threonine protein kinases in some specific function being carried out by limited sets of prokaryotes. 3) Organism-diverse clusters: Organism diverse clusters suggest common function performed by such kinases in wide variety of organisms. Interestingly, occurrence of several species/genus or organism specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Function-based classification has also been proposed which shows that members of each cluster has specific function to perform. In this analysis, almost 50% of the “clusters” obtained have only one member suggesting their sequence and probably functional divergence. Many prokaryotic Serine/Threonine protein kinases exhibit a wide variety of modular organisation which indicates a degree of complexity and protein-protein interactions in the signaling pathways in these microbes. Chapter 8: A wide spectrum of protein kinases belonging to different Hanks and Hunter groups of kinases and subfamilies has been identified in various eukaryotes. However, specific biological targets (substrates) of many protein kinase subfamilies are still unknown and this is one of the active areas of research. In the current analysis reported in Chapter 8, an attempt has been made to understand protein kinase-substrate interaction and substrate consensus prediction by analyzing known 3-D structures of complexes of kinases and peptide substrates/pseudosubstrates. Considering protein kinase ternary complex structures in their active states, it has been observed that protein kinase residues which are interacting with the substrate residues having constraint are at topologically equivalent positions despite belonging to different Hanks and Hunter protein kinase subfamilies. In this analysis, it has also been observed that the residues in a given kinase subfamily interacting with consensus substrate residues are usually conserved across homologues. Interestingly, in Protein Kinase B and Phosphorylase Kinase subfamily homologues, residues interacting with substrate residue/s having no constraint are not well conserved even within the kinase subfamily suggesting different evolutionary rate of substrate interacting residues. This result is anticipated to be helpful in furthering our understanding of protein kinase-substrate relationship which is likely to be helpful in drug design. Chapter 9: Protein Kinase-Like Non-kinases (PKLNKs) are closely related to protein kinases but they lack the crucial catalytic aspartate in the catalytic loop and hence cannot function as a protein kinase. PKLNKs have been analyzed (Chapter 9) with an objective of obtaining clues about their functions. Using various sensitive sequence analysis methods, 82 PKLNKs from four higher eukaryotic organisms namely, Homo sapiens, Mus Musculus, Rattus norvegicus and Drosophila melanogaster have been recognized. On the basis of their domain combinations and functions of tethered domains, PKLNKs have been classified mainly into four categories: 1) Ligand binding PKLNKs 2) PKLNKs having extracellular protein-protein interaction domain 3) PKLNKs involved in dimerization 4) PKLNKs with cytoplasmic protein-protein interaction module. While members of the first two classes of PKLNKs have transmembrane domain tethered to the PKLNK domain, members of the other two classes of PKLNKs are entirely cytoplasmic in nature. The current classification scheme hopes to provide a convenient framework to classify the PKLNKs from other eukaryotes and it should be helpful in deciphering their roles in cellular processes. Chapter 10: This is a chapter on conclusions of the entire thesis work. Summary of the major outcomes of this thesis work is provided and implications of the work in the area of signal transduction are discussed. In addition to above mentioned work, studies on repertoire of protein kinases from two plant organisms have been carried out and the kinomes have been comparatively analyzed (Krupa et al, 2006) (Appendix 1). Comparison of plant protein kinases with other eukaryotes revealed remarkable differences. Trans-genomic comparison of the protein kinase repertoires of Arabidopsis thaliana and Oryza sativa has enabled identification of members that are uniquely conserved within the two species. Analysis on the domain organisation of plant protein kinases has also been carried out. Appendix 2 presents the work done on Entamoeba histolytica (E. histolytica) ornithine decarboxylase (ODC)-like protein which regulates the polyamine biosynthesis. DFMO (Difluoromethylornithine) is unable to inhibit the E. histolytica ODC-like protein while it inhibits the homologues of ODC in other organisms. Modelling study has suggested substitution of three amino acids in the E. histolytica ODC-like protein because of which DFMO is unable to inhibit the activity of ODC-like protein (Jhingran et al, 2008). All the computational modeling work reported in Appendix 2 was performed by the author while all the laboratory experiments were performed in the laboratory of the collaborator Prof. Madhubala of JNU, New Delhi. The supplementary data pertaining to this thesis is presented in an accompanying CD. The supplementary data in this CD is organized into different folders corresponding to various chapters. Genome Structure Protein Kinases Protein Kinases - Phosphorylation Human Protein Kinases Eukaryotic Protein Kinases Prokaryotic Protein kinases Protein Kinase-Substrate Interaction Chimpanzee Protein Kinome Plasmodium Kinome Entamoeba Histolytica Kinome King G (Kinases in Genomes) Protein Kinome Protein Kinase Biochemical Genetics
332	Computational Analyses Of Proteins Encoded In Genomes Of Pathogenic Organisms : Inferences On Structures, Functions And Interactions Tyagi, Nidhi 11 1900 (has links) (PDF) The availability of completely sequenced genomes for a number of organisms provides an opportunity to understand the molecular basis of physiology, metabolism, regulation and evolution of these organisms. Significant understanding of the complexity of organisms can be obtained from the functional characterization of repertoire of proteins encoded in their genomes. Computational approaches for recognition of function of proteins of unknown function encoded in genomes often rely on ability to detect well characterized homologues. Homology searches based on pair-wise sequence comparisons can reliably detect homologues with sequence identity more than 30%. However, detecting homologues characterized by sequence identity below 30% is difficult using these methods. Distant homology relationship can be established using profiles or position specific scoring matrices, which encapsulate information about structurally and functionally conserved residues. These conserved residues imply high constraints at a particular amino acid residue site due to their involvement in structural stability, enzymatic activity, ligand binding, protein folding or protein–protein interactions. In addition, information on three dimensional structures of proteins also aid in detection of remote homologues, as tertiary structures of proteins are conserved better than the primary structures of proteins. The gross objective of the work reported in this thesis is to employ various sensitive remote homology detection methods to recognize relevant functional information of proteins encoded mainly in pathogenic organisms. Since proteins do not work in isolation in a cell, it has become essential to understand the in vivo context of functions of proteins. For this purpose, it is essential to have an understanding of all molecules that interact with a particular protein. Thus, another major area of bioinformatics has been to integrate protein-protein interaction information to enable better understanding of context of functional events. Protein-protein interaction analysis for host-pathogen can lead to useful insight into mode of pathogenesis and subsequent consequences in host cell. Chapters 2-6 of the thesis discuss the sequence and structural characteristics along with remote evolutionary relationships and functional implications of uncharacterized proteins encoded in genomes of following pathogens: Helicobacter pylori, Plasmodium falciparum and Leishmania donovani. The Chapters 6-8 discuss mainly various sequence, structural and functional aspects of protein kinases encoded in genomes of various prokaryotes and viruses. Chapter 1 discusses background information and literature survey in the areas of homology detection and prediction of protein-protein interactions. The growth of genomic data and need for processing genomic data to infer context of various functional events have been highlighted. Different approaches to recognize functions of proteins (experimental as well as computational) have been discussed. Various experimental and computational approaches to detect/predict protein-protein interactions have been mentioned. Chapter 2 discusses recognition of non-trivial remote homology relationships involving proteins of Helicobacter pylori and their implications for function recognition. H. pylori is microaerophilic, Gram negative bacterial pathogen. It colonizes human gastric mucosa and is a causative agent of gastroduodenal disease. The pathogen infects about 50% of the human population. It can lead to development of Mucosa-associated lymphoid tissue lymphoma. About 10% of the infected population develop gastric or duodenal ulcer and approximately 1% develop gastric cancer. H. pylori has been classified as class I carcinogen by WHO. Pathogen is characterized by type IV secretion system. The complete genomic sequences of three widely studied strains including 26695, J99 and HPAG1 of Helicobacter pylori are available. According to the genome analysis, the number of predicted open reading frames in strain 26695, J99 and HPAG1 are 1590, 1495 and 1536 respectively. Out of predicted H. pylori proteins from 26695, J99 and HPAG1 strains, numbers of proteins with no functional domain assignments in Pfam database (Protein family database) are 453, 357 and 400 respectively. There are proteins in different strains of H. pylori genomes where one part of the protein is associated with at least one protein domain of known function and hence preliminary indication of their functions is available whereas rest of the region is not associated with any function. There are 772, 803 and 790 such segments in proteins from strains 26695, J99 and HPAG1 respectively with at least 45 residues with no functional assignment currently available. Sensitive remote homology detection methods have been employed to establish relationships for 294 amino acid sequences and results have been grouped into 4 categories. Results of homology detection have been further confirmed by studying conservation of amino acid residues which are important for functioning of the proteins concerned. (i) Remote relationship has been established involving protein domain families for which no bonafide member is currently known in H. pylori. For example: DNA binding protein domain (Kor_B) has been assigned to a H. pylori protein at sequence identity of 20%. Study involving secondary structure prediction and conservation of amino acid residues confirms the results of homology detection methods. (ii) Remote relationship has been established involving H. pylori hypothetical proteins and protein domain families, for which paralogous members are present in Helicobacter pylori. For example, Cytochrome_C, an electron transfer protein domain could be associated with a Helicobacter pylori protein sequence which shows a sequence identity of 14% with sequences of bonafide cytochrome C. (iii) “Missing” metabolic proteins of H. pylori have also been recognized. For example, Aspartoacylase (EC 3.5.1.15) catalyzes deacetylation of N-acetylaspartic acid to produce acetate and L-aspartate. This enzyme in aspartate metabolism pathway has not been reported so far from H. pylori. A remote evolutionary relationship between a H. pylori protein and Aspartoacylase domain has been established at sequence identity of 17% thus filling the gap in this metabolic pathway in the pathogen. (iv) New functional assignments for domains in H. pylori sequences with prior assignment of domains for the rest of the sequences have been made. For example, DNA methylase domain has been assigned to C-terminal region of H. pylori protein which already had Helicase domain assigned to the N-terminal region of the protein. All these information should open avenues for further probing by carrying out experiments which will impact the design of inhibitor against this pathogen and will result in better understanding of pathogenesis of this organism in human. Chapter 3 describes prediction of protein–protein interactions between Helicobacter pylori and the human host. A lack of information on protein-protein interactions at the host-pathogen interface is impeding the understanding of the pathogenesis process. A recently developed, homology search-based method to predict protein-protein interactions is applied to the gastric pathogen, Helicobacter pylori to predict the interactions between proteins of H. pylori and human proteins in vitro. Many of the predicted interactions could potentially occur between the pathogen and its human host during pathogenesis as we focused mainly on the H. pylori proteins that have a transmembrane region or are encoded in the pathogenic island and those which are known to be secreted into the human host. By applying the homology search approach to protein-protein interaction databases DIP and iPfam, in vitro interactions for a total of 623 H. pylori proteins with 6559 human proteins could be predicted. The predicted interactions include 549 hypothetical proteins of as yet unknown function encoded in the H. pylori genome and 13 experimentally verified secreted proteins. A total of 833 interactions involving the extracellular domains of transmembrane proteins of H. pylori could be predicted. Structural analysis of some of the examples reveals that the predicted interactions are consistent with the structural compatibility of binding partners. Various probable interactions with discernible biological relevance are discussed in this chapter. For example, interaction between CFTR protein (NP_000483) and multidrug resistance protein (HP1206) has been predicted. The structure of the CFTR intracellular domain is known in the homomeric form and consists of five AAA transport domains in tandem (PDB code 1XMI). Out of the five identical subunits, two subunits (the B chain and the E chain in the PDB structure) have been selected. The structure of multidrug resistance protein of the pathogen based on the B chain (sequence identity 32%) of the template has been modeled. This exercise suggests that interface residues in the model are congenial for interaction. This makes the structural complex feasible in in vitro conditions and suggests that the pathogen protein may compete for occupancy with the host protein. Chapter 4 describes recognition of Plasmodium-specific protein domain families and their roles in Plasmodium falciparum life cycle. Malaria in humans is caused by the parasites of intracellular, eukaryotic protozoan of apicomplexan nature belonging to the genus Plasmodium. Out of five species of Plasmodium, namely, P. falciparum, P. ovale, P. vivax, P. malariae and P. knowlesi which infects human, P. falciparum causes lethal infection. P. falciparum proteins have diverged extensively during the course of evolution. Pathogen genome is rich in A+T composition which larger than the homologous proteins from other organisms due to presence of low complexity regions. Organism specific families are important as they play roles in peculiar life style of an organism. If the organism is a pathogen, then these family members may play roles in pathogenesis. Inhibiting these specific proteins is unlikely to interfere with host system as no homolog may be present in host. In the present work we identify Plasmodium specific protein families and their role in different stages of life cycle of the pathogen. A total of 5086 amino acid sequences (full length sequences/fragments of proteins) show homology only with amino acid sequences from Plasmodium organisms and hence are Plasmodium-specific. These Plasmodium-specific amino acid sequences cluster into 106 Plasmodium-specific families (≥2 members per family). 14 Plasmodium-specific protein domain families with known physico-chemical properties are observed. These Plasmodium-specific protein domain families are involved in various important functions such as rosetting and sequestering of infected erythrocytes, binding to surface of host cell and invasion process in life cycle of pathogen. Also, 89 new Plasmodium-specific protein domain families have been recognized. Analysis of various aspects of members of Plasmodium-specific proteins domain families such as their potential to target apicoplast, protein-protein interaction, expression profile and domain organization has been performed to derive relevant information about function. New Plasmodium specific domain families for which no function can be associated could provide some insight into much diverged Plasmodium species. These proteins may play role in parasite-specific life style. Experimental work on these Plasmodium-specific proteins might fill the gaps of less understood physiology of this parasite. Chapter 5 presents genome-wide compilation of low complexity regions (LCR) in proteins. An indepth analysis of the nature, structure, and functional role of the proteins containing low complexity regions in Plasmodium falciparum, was undertaken given the high prevalence of LCRs in the proteome of this organism. Low complexity regions and repeat patterns have been recognized in proteins encoded in 986 genomes (68 archaea, 896 prokaryotes and 22 eukaryotes). Low complexity regions have been classified into following three categories: a) Composition of LCRs: (i) LCRs can be stretches of homo amino acid residues (ii) LCRs can be stretches of more than one amino acid residue type b) Periodicity of amino acids in LCRs: Certain amino acid residues can be observed at certain specific periodicity in proteins. c) Repeat patterns: Certain motif of amino acid residues are repeated in protein. 850 Plasmodium falciparum proteins are observed to have at least one repeat pattern where the repeating unit is at least 5 amino acid residues long. Statistical analysis on single amino acid residue repeats indicate that occurrence of stretches of homo amino acid residues is not a random event. Studies on recognition of functions, protein protein interactions and organization of tethered domain(s) in proteins containing LCR suggest that these proteins are part of variety of functional events such as signal transduction, enzymatic processes, cell differentiation, pyrimidine biosynthesis, fatty acid biosynthesis and chromosomal replication. Representations of low complexity regions of Plasmodium falciparum in protein data bank suggest that LCRs can take conformation of regular secondary structure (apart from disordered regions) in 3-D structures of proteins. Chapter 6 describes sequence analysis, structural modeling and evolutionary studies of Leishmania donovani hypusine pathway enzymes. Leishmania is an eukaryotic kinetoplastid protozoan parasite which causes leishmaniasis in humans. Hypusine is a non standard polyaminederived amino acid Nε-(4-amino-2-hydroxybutyl) lysine and is named after its two structural components, hydroxyputrescine and lysine. The eukaryotic translation initiation factor 5A (eIF5A) is the only cellular protein containing hypusine. Synthesis of hypusine is critical for the function of elF5A and is essential for eukaryotic cell proliferation and survival. Formation of hypusine is the result of a two step post-translational modification process involving enzymes (i) deoxyhypusine synthase (DHS) (ii) deoxyhypusine hydroxylase (DOHH). DHS, the first enzyme involved in hypusine pathway catalyzes the NAD-dependent transfer of the butylamino moiety of spermidine (substrate) to the ε-amino group of a specific lysine residue of eIF5A precursor and generates deoxyhypusine containing intermediate. DOHH, the second enzyme in same pathway catalyzes the hydroxylation of deoxyhypusine-containing intermediate, generating hypusine-containing mature eIF5A. Two putative deoxyhypusine synthase (DHS) sequences DHS34 and DHS20 have been identified in Leishmania donovani, by Professor Madhubala and coworkers (Jawaharlal Nehru University, New Delhi) with whom the work embodied in this chapter was done in collaboration. Detailed comparison of DHS34 sequence from Leishmania with human DHS protein indicated conservation of functionally important residues. 3D structural modeling studies of protein suggested that residues around the active site were absolutely conserved. NAD binding regions are located spatially closer, however, one NAD binding region was observed in a large (225 amino acid residues long) insertion. Based on these observations, DHS34 was predicted to have enzymatic activity. Experimental studies done by our collaborators confirmed preliminary results of computational analysis. Based on sequence and structural analysis of DHS20 and DOHH proteins, DHS20 and DOHH were proposed to be catalytically inactive and active respectively. Experimental studies on these proteins supported results of computational analysis. Deoxyhypusine synthase (DHS) and Deoxyhypusine hydroxylase (DOHH) are key proteins conserved in the hypusine synthesis pathways of eukaryotes. Because they are highly conserved, they could be coevolving. Comparison of the genetic distance matrices of DHS and DOHH proteins reveals that their evolutionary rates are better correlated when compared to the rate of an unrelated protein such as Cytochrome C. This indicates that they are coevolving, further serving as an indicator that, even non-interacting proteins that are functionally coupled, experience correlated evolution. However, this correlation does not extend to their tree topologies. Chapter 7 provides a classification scheme for protein kinases encoded in genomes of prokaryotic organisms. Overwhelming majority of the Ser/Thr protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Ser/Thr protein kinases prokaryotic Ser/Thr protein kinases. Traditional sequence alignment and phylogenetic approaches have been used to identify and classify prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence databases, it is recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly, subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, it was also possible to identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses. Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity in protein-protein interactions and the signaling pathways in these microbes. Chapter 8 focuses on recognition, classification of protein kinases encoded in genomes of viruses and their implications in various functions and diseases. Protein kinases encoded by viral genomes play a major role in infection, replication and survival of viruses. Using traditional sequence homology detection tools, sequence alignment methods and phylogenetic approaches, protein kinases were recognized. 646123 protein sequences from 35799 viral genomes (including strains) have been used in this analysis. Protein kinases are identified using a combination of profile-based search methods such as PSI-BLAST, RPS-BLAST and HMMER approaches. Based upon sequence similarity over the length of catalytic kinase domains, 479 protein kinase domains recognized in 244 viral genomes have been clustered into 46 subfamilies with minimum sequence identity of 35% within a subfamily. Viral protein kinases are encoded in genomes of retro-transcribing viruses or viruses which possess double stranded DNA as genetic material. Based on the available functional information present for one or more members of a subfamily, a putative function has been assigned to other members of the subfamily. Information regarding interaction of viral protein kinases with viral/host protein has also been considered for enhancing understanding of function of kinases in a subfamily. Out of 46 subfamilies, 14 subfamilies are characterized by various functions. Kinases belonging to UL97, US69, UL13 and BGLF subfamilies are virus specific. For 7 subfamilies, nearest neighbors are from well characterized eukaryotic protein kinase groups such as AGC, CAMK and CDK. Out of 25 new uncharacterized subfamilies observed in this analysis, 13 subfamilies are virus specific. Different subfamilies have been characterized by various functions which are crucial for viral infection such as synthesis of structural unit, replication of genetic material, modification of cellular components, alteration in host immune system, competing with cellular protein for efficient usage of host machinery. Also, many viral kinases share very high sequence identity (~97%) with their eukaryotic counterpart and represent disease state. For example, a protein kinase encoded in Avian erythroblastosis virus shares 97% sequence identity with catalytic domain of human epidermal growth factor receptor tyrosine kinase. Leucine at position 861 in human protein is substituted by Gln in cancer conditions; the viral protein kinase sequence possesses Gln at corresponding position and thus represents disease state. Chapter 9 provides study of dependency on the ability of 3-D structural features of comparative models and crystal structures of inactive forms of enzymes to predict enzymes by considering protein kinases as case study. With the advent of structural genomics initiatives, there is a surge in the number of proteins with 3-D structural information even before functional features are understood on many of these proteins. One of the useful annotations of a protein is the demarcation of a protein into an enzyme or non-enzyme solely from the knowledge of 3-D structure. This is facilitated by the identification of active sites and ligand binding sites in a protein. In this work, which was carried out in collaboration with Dr Jim Warwicker of Manchester University, UK, an approach developed by Warwicker and coworkers has been used. In the 3D structure of proteins, the largest clefts are generally considered to be ligand binding sites. This feature along with other sequence alignment independent properties such as residue preferences, fraction of surface residues and secondary structure elements have been considered to differentiate enzymes from non-enzymes. Electrostatic potential at the active site is one of the key properties utilized in this respect. Active sites in enzymes are generally associated with ionizable groups which can take part in catalysis. In addition to the feature of large clefts in enzymes, active site residues are in buried environments and show larger deviation in pKa values than surface residues. The method proposed by Warwicker and co-workers distinguish proteins in to enzymes and non-enzymes considering the electrostatic features at clefts along with the sequence profile of the protein concerned. Conformation of the inactive state of an enzyme is not congenial to the catalytic function. In an ideal situation, a method should be capable of predicting an enzyme irrespective of whether determined structure corresponds to active or inactive state. Peak potential values have been calculated by using Warwicker program for a set of 15 protein kinases for which 3-D structures are present in active as well in inactive conformations. Comparison of peak potential values calculated for active and inactive conformations suggests that algorithm can differentiate between active and inactive conformations as value for active conformations are generally higher than corresponding values for inactive conformations. However, the peak potential values are high enough for even the inactive conformations to be predicted as enzyme. Peak potential values calculated for generated homology models of protein kinases (for which crystal structures are already available) at different sequence identities with template sequences predict protein kinases as enzymes and their peak potential values are comparable to corresponding values for X-ray structures. This suggests that proteins for which there are no crystal or NMR structures yet available and no good template with high sequence identity are present, peak potential values for models generated at low sequence identity can still give insight into probable function of protein as an enzyme. The enzyme/non-enzyme prediction algorithm was also found to be useful in confirming enzyme functionality using 3-D models of putative viral kinases. Initially, putative function of kinase has been assigned to these viral proteins based solely upon their sequence characteristics such as presence of residues/motifs which are important for activity of the protein. The enzyme recognition method which is not directly sensitive to these motifs confirmed that all the analyzed putative viral kinases are enzymes. Chapter 10 presents conclusions of work embodied in the entire thesis. Very briefly, various computational approaches have been used to analyze and understand structural and functional properties of repertoire of proteins of pathogenic organisms. Analysis of uncharacterized protein domain families has helped to understand the functional implications of constituent proteins. Experimental validation of these results can further facilitate unraveling of functional aspects of proteins encoded in various pathogenic organisms. Apart from studies embodied in the thesis, author has been involved in two other studies, which are provided as appendices. Appendix 1 describes comparison of substitution pattern of amino acid residues of protein encoded in P. falciparum genome with substitution pattern of corresponding homologous proteins from non-Plasmodium organisms. Salient differences have been highlighted. Appendix 2 discusses study of bacterial tyrosine kinases with an objective of recognition of all putative protein tyrosine kinases in E. coli. Computational study suggests that protein SopA can be a potential tyrosine kinase and this conclusion is being tested experimentally in collaborator’s laboratory. Proteins Viral Genomes Microorganisms Microbiology Viral Protein Kinases Protein-protein Interactions Helicobacter Pylori Proteins Plasmodium-Specific Proteins Leishmania donovani Deoxyhypusine Synthase Protein Kinases Plasmodium falciparum Microbiology
333	L'exploration des génomes par l'outil ICEFinder révèle la forte prévalence et l'extrême diversité des ICE et des IME de streptocoques / Genomic exploration using the ICEFinder tool reveals the strong predominance and extreme diversity of streptococcal ICEs and IMEs Coluzzi, Charles 20 December 2017 (has links) Les éléments génétiques mobiles contribuent grandement à la diversité et à l’évolution des génomes bactériens par le biais du transfert horizontal. Parmi eux, les éléments intégratifs conjugatifs (ICE) codent leur propre excision, leur transfert par conjugaison et leur intégration. En revanche, les éléments intégratifs et mobilisables (IME) ne sont autonomes que pour leur excision et intégration et ne codent seulement que certaines des protéines/fonctions (oriT) dont ils ont besoin pour leur transfert conjugatif. Par conséquent, les IME ont besoin d’un élément conjugatif « helper » pour se transférer. Malgré leur impact sur le flux des gènes et l’évolution des génomes, la prévalence des ICE reste peu étudiée et seulement très peu d’IME avaient été identifiés au début de cette étude. De plus, bien que plusieurs méthodes de détection des ilots génomiques existent, aucune d’elles n’est dédiée aux ICE ou aux IME. Ce qui ne facilite pas l’analyse exhaustive de ces éléments. Le genre Streptococcus appartient au phylum des firmicutes. La quasi-totalité des streptocoques sont des bactéries commensales ou pathogènes de l’homme et d’autres animaux. Aussi, 2 espèces de streptocoques sont utilisées en tant que ferments lactiques lors la production de laits fermentés et divers fromages. Globalement, le genre streptocoques représente un groupe d’intérêt pour l’homme, l’étude du flux de gènes au sein de ces organismes et l’impact qu’il peut avoir sur leur mode vie est primordiale. Au cours de cette thèse, nous avons recherché les ICE et les IME dans 124 souches de streptocoques appartenant à 27 espèces en utilisant une base de données de référence comportant des protéines dites « signatures » d’IME et d’ICE (de leurs modules de conjugaison/mobilisation et d’integration/excision). Cette analyse exhaustive a permis l’identification et la délimitation de 131 ICE ou ICE légèrement dégénérés et 144 IME. Tous ces éléments ont été délimités, ce qui nous a permis de déterminer leur spécificité d’intégration dans les génomes. Au total, 17 spécificités d’intégration ont été identifiées pour les ICE dont 8 encore jamais décrites (ftsK, guaA, lysS, mutT, rpmG, rpsI, traG and ybaB/EbfC) et 18 spécificités pour les IME dont seulement 5 étaient connues chez les firmicutes. Les modules d’intégration des ICE codent soit une intégrase à tyrosine pouvant avoir une faible spécificité (1 famille d’intégrase) ou une forte spécificité (13 spécificités différentes), soit des intégrases à sérine seule ou en triplet (4 spécificités différentes), soit une transposase à DDE. Les IME codent soit des intégrases à tyrosine (10 spécificités différentes) soit des intégrases à serine seule (8 spécificités différentes). Les ICE ont été groupés en 7 familles distinctes selon les protéines codées par leur module de conjugaison. Les IME présentaient une très forte diversité au sein de leur module de mobilisation, empêchant ainsi leur regroupement en famille selon les gènes portés par ce module. Les analyses phylogénétiques des protéines signature codées par tous les ICE et les IME ont montré des échanges de modules d’intégration entre les ICE et les IME et de nombreux échanges entre les modules de mobilisation des IME. L’ensemble de ces résultats révèle la forte prévalence et l’extrême diversité des ICE et des IME au sein des génomes de streptocoques. Une meilleure connaissance et compréhension de ces éléments nous a incité à construire un outil informatique semi-automatisé de détection des ICE et des IME de Streptocoques ainsi que leurs sites d’insertion / Mobile genetic elements largely contribute to the evolution and diversity of bacterial genomes through horizontal gene transfer. Among them, the integrative and conjugative elements (ICEs) encode their own excision, conjugative transfer and integration. On the other hand, integrative mobilizable elements (IMEs) are autonomous for excision and integration but encode only some of the proteins needed for their conjugative transfer. IMEs therefore need a “helper” conjugative element to transfer. Despite their impact on gene flow and genome dynamics, the prevalence of ICEs remains largely underscored and very few IMEs were identified at the beginning of this study. Furthermore, although several in silico methods exist to detect genomic islands, none are dedicated to ICEs or IMEs, thus complicating exhaustive examination of these mobile elements. The Streptococcus genus belongs to the firmicutes’ phylum. Almost all streptococci are commensal bacteria or pathogenes to men and animals. Two species of Streptococcus are also used in the dairy industry as lactic ferments in order to produce fermented milk and different types of cheese. Studying the gene flux of the Steptococci genus and the impact it can have on the lifestyle of these organisms is essential, as it has a lot of interest for human health and activities. In this work, we searched for ICEs and IMEs in 124 strains of streptococci belonging to 27 species using a reference database of ICE and IME signature proteins (from their conjugation, mobilization and integration/excision modules). This exhaustive analysis led to the identification and delimitation of 131 ICEs or slightly decayed ICEs and 144 IMEs. All these elements were delimited, which allowed us to identify their integration specificities in the genomes. In total, 17 ICE integration specificities were identified. Among them, 8 had never been described before (ftsK, guaA, lysS, mutT, rpmG, rpsI, traG and ybaB/EbfC). 18 specificities were also identified for IMEs, among which only 5 were known for the firmicutes. ICEs encode high or low-specificity tyrosine integrases (13 different specificities), single serine intégrases (1 specificity), triplet of serine integrases (3 different specificities), or DDE transposases while IMEs encode either tyrosine integrases (10 different specificities) or single serine integrases (8 different specificities). ICE were grouped in 7 distinct families according to the proteins encoded by their conjugation module whereas the mobilization modules of IMEs were highly diverse, preventing them from grouping into families according to their mobilization modules. The phylogenetic analysis of the signature proteins encoded by all ICEs and IMEs showed integration module exchanges between ICEs and IMEs and several mobilization module exchanges between IMEs. The overall results reveal a strong prevalence and extreme diversity of these elements among Streptococci genomes. Better understanding and knowledge of ICEs and IMEs prompted us to build a semi-automated command-line tool to identify streptococcal ICEs and IMEs as well as to determine their insertion site Évolution des génomes bactériens Transfert horizontal Streptocoques ICEFinder Evolution of bacterial genomes Horizontal gene transfer Integrative mobilizable elements (IMEs) Streptococci ICEFinder 572.869
334	Computational Studies on Structures and Functions of Single and Multi-domain Proteins Mehrotra, Prachi January 2017 (has links) (PDF) Proteins are essential for the growth, survival and maintenance of the cell. Understanding the functional roles of proteins helps to decipher the working of macromolecular assemblies and cellular machinery of living organisms. A thorough investigation of the link between sequence, structure and function of proteins, helps in building a comprehensive understanding of the complex biological systems. Proteins have been observed to be composed of single and multiple domains. Analysis of proteins encoded in diverse genomes shows the ubiquitous nature of multi-domain proteins. Though the majority of eukaryotic proteins are multi-domain in nature, 3-D structures of only a small proportion of multi-domain proteins are known due to difficulties in crystallizing such proteins. While functions of individual domains are generally extensively studied, the complex interplay of functions of domains is not well understood for most multi-domain proteins. Paucity of structural and functional data, affects our understanding of the evolution of structure and function of multi-domain proteins. The broad objective of this thesis is to achieve an enhanced understanding of structure and function of protein domains by computational analysis of sequence and structural data. Special attention is paid in the first few chapters of this thesis on the multi-domain proteins. Classification of multi-domain proteins by implementation of an alignment-free sequence comparison method has been achieved in Chapters 2 and 3. Studies on organization, interactions and interdependence of domain-domain interactions in multi-domain proteins with respect to sequential separation between domains and N to C-terminal domain order have been described in Chapters 4 and 5. The functional and structural repertoire of organisms can be comprehensively studied and compared using functional and structural domain annotations. Chapter 6, 7 and 8 represent the proteome-wide structure and function comparisons of various pathogenic and non-pathogenic microorganisms. These comparisons help in identifying proteins implicated in virulence of the pathogen and thus predict putative targets for disease treatment and prevention. Chapter 1 forms an introduction to the main subject area of this thesis. Starting with describing protein structure and function, details of the four levels of hierarchical organization of protein structure have been provided, along with the databases that document protein sequences and structures. Classification of protein domains considered as the realm of function, structure and evolution has been described. The usefulness of classification of proteins at the domain level has been highlighted in terms of providing an enhanced understanding of protein structure and function and also their evolutionary relatedness. The details of structure, function and evolution of multi-domain proteins have also been outlined in chapter 1. ! Chapter 2 aims to achieve a biologically meaningful classification scheme for multi-domain protein sequences. The overall function of a multi-domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence-based methods utilize only the domain-level information to classify proteins. This does not take into account the contributions of accessory domains and linker regions towards the overall function of a multi-domain protein. An alignment-free protein sequence comparison tool, CLAP (CLAssification of Proteins) previously developed in this laboratory, was assessed and improved when the author joined the group. CLAP was developed especially to handle multi-domain protein sequences without a requirement of defining domain boundaries and sequential order of domains (domain architecture). ! The working principle of CLAP involves comparison of all against all windows of 5-residue sequence patterns between two protein sequences. The sequences compared could be full-length comprising of all the domains in the two proteins. This compilation of comparison is represented as the Local Matching Scores (LMS) between protein sequences (nslab.iisc.ernet.in/clap/). It has been previously shown that the execution time of CLAP is ~7 times faster than other protein sequence comparison methods that employ alignment of sequences. In Chapter 2, CLAP-based classification has been carried out on two test datasets of proteins containing (i) Tyrosine phosphatase domain family and (ii) SH3-domain family. The former dataset comprises both single and multi-domain proteins that sometimes consist of domain repeats of the tyrosine phosphatase domain. The latter dataset consists only of multi-domain proteins with one copy of the SH3-domain. At the domain-level CLAP-based classification scheme resulted in a clustering similar to that obtained from an alignment-based method, ClustalW. CLAP-based clusters obtained for full-length datasets were shown to comprise of proteins with similar functions and domain architectures. Hence, a protein classification scheme is shown to work efficiently that is independent of domain definitions and requires only the full-length amino acid sequences as input.! Chapter 3 explores the limitations of CLAP in large-scale protein sequence comparisons. The potential advantages of full-length protein sequence classification, combined with the availability of the alignment-free sequence comparison tool, CLAP, motivated the conceptualization of full-length sequence classification of the entire protein repertoire. Before undertaking this mammoth task, working of CLAP was tested for a large dataset of 239,461 protein sequences. Chapter 3 discusses the technical details of computation, storage and retrieval of CLAP scores for a large dataset in a feasible timeframe. CLAP scores were examined for protein pairs of same domain architecture and ~22% of these showed 0 CLAP similarity scores. This led to investigation of the sensitivity of CLAP with respect to sequence divergence. Several test datasets of proteins belonging to the same SCOP fold were constructed and CLAP-based classification of these proteins was examined at inter and intra-SCOP family level. CLAP was successful in efficiently clustering evolutionary related proteins (defined as proteins within the same SCOP superfamily) if their sequence identity >35%. At lower sequence identities, CLAP fails to recognize any evolutionary relatedness. Another test dataset consisting of two-domain proteins with domain order swapped was constructed. Domain order swap refers to domain architectures of type AB and BA, consisting of domains A and B. A condition that the sequence identities of homologous domains were greater than 35% was imposed. CLAP could effectively cluster together proteins of the same domain architectures in this case. Thus, the sequence identity threshold of 35% at the domain-level improves the accuracy of CLAP. The analysis also showed that for highly divergent sequences, the expectation of 5-residue pattern match was likely a stringent criterion. Thus, a modification in the 5-residue identical pattern match criterion, by considering even similar residue and gaps within matched patterns may be required to effectuate CLAP-based clustering of remotely related protein sequences. Thus, this study highlights the limitations of CLAP with respect to large-scale analysis and its sensitivity to sequence divergence. ! Chapters 4 and 5 discuss the computational analysis of inter-domain interactions with respect to sequential distance and domain order. Knowledge of domain composition and 3-D structures of individual domains in a multi-domain protein may not be sufficient to predict the tertiary structure of the multi-domain protein. Substantial information about the nature of domain-domain interfaces helps in prediction of the tertiary as well as the quaternary structure of a protein. Therefore, chapter 4 explores the possible relationship between the sequential distance separating two domains in a multi-domain protein and the extent of their interaction. With increasing sequential separation between any two domains, the extent of inter-domain interactions showed a gradual decrease. The trend was more apparent when sequential separation between domains is measured in terms of number of intervening domains. Irrespective of the linker length, extensive interactions were seen more often between contiguous domains than between non-contiguous domains. Contiguous domains show a broader interface area and lower proportion of non-interacting domains (interface area: 0 Å2 to - 4400 Å2, 2.3% non-interacting domains) than non-contiguous domains (interface area: 0 Å2 to - 2000 Å2, 34.7% non-interacting domains). Additionally, as inter-protein interactions are mediated through constituent domains, rules of protein-protein interactions were applied to domain-domain interactions. Tight binding between domains is denoted as putative permanent domain-domain interactions and domains that may dissociate and associate with relatively weak interactions to regulate functional activity are denoted as putative transient domain-domain interactions. An interface area threshold of 600 Å2 was utilized as a binary classifier to distinguish between putative permanent and putative transient domain-domain interactions. Therefore, the state of interaction of a domain pair is defined as either putative permanent or putative transient interaction. Contiguous domains showed a predominance of putative permanent nature of inter-domain interface, whereas non-contiguous domains showed a prevalence of putative transient interfaces. The state of interaction of various SCOP superfamily pairs was studied across different proteins in the dataset. SCOP superfamily pairs mostly showed a conserved state of interaction, i.e. either putative permanent or putative transient in all their occurrences across different proteins. Thus, it is noted that contiguous domains interact extensively more often than non-contiguous domains and specific superfamily pairs tend to interact in a conserved manner. In conclusion, a combination of interface area and other inter-domain properties along with experimental validation will help strengthen the binary classification scheme of putative permanent and transient domain-domain interactions.! Chapter 5 provides structural analysis of domain pairs occurring in different sequential domain orders in mutli-domain proteins. The function and regulation of a multi-domain protein is predominantly determined by the domain-domain interactions. These in turn are influenced by the sequential order of domains in a protein. With domains defined using evolutionary and structural relatedness (SCOP superfamily), their conservation of structure and function was studied across domain order reversal. A domain order reversal indicates different sequential orders of the concerned domains, which may be identified in proteins of same or different domain compositions. Domain order reversals of domains A and B can be indicated in protein pair consisting of the domain architectures xAxBx and xBxAx, where x indicates 0 or more domains. A total of 161 pairs of domain order reversals were identified in 77 pairs of PDB entries. For most of the comparisons between proteins with different domain composition and architecture, large differences in the relative spatial orientation of domains were observed. Although preservation of state of interaction was observed for ~75% of the comparisons, none of the inter-domain interfaces of domains in different order displayed high interface similarity. These domain order reversals in multi-domain proteins are contributed by a limited number of 15 SCOP superfamilies. Majority of the superfamilies undergoing order reversal either function as transporters or regulatory domains and very few are enzymes. A higher proportion of domain order reversals were observed in domains separated by 0 or 1 domains than those separated by more than 1 domain. A thorough analysis of various structural features of domains undergoing order reversal indicates that only one order of domains is strongly preferred over all possible orders. This may be due to either evolutionary selection of one of the orders and its conservation throughout generations, or the fact that domain order reversals rarely conserve the interface between the domains. Further studies (Chapters 6 to 8) utilize the available computational techniques for structural and functional annotation of proteins encoded in a few bacterial genomes. Based on these annotations, proteome-wide structure and function comparisons were performed between two sets of pathogenic and non-pathogenic bacteria. The first study compares the pathogenic Mycobacterium tuberculosis to the closely related organism Mycobacterium smegmatis which is non-pathogenic. The second study primarily identified biologically feasible host-pathogen interactions between the human host and the pathogen Leptospira interrogans and also compared leptospiral-host interactions of the pathogenic Leptospira interrogans and of the saprophytic Leptospira biflexa with the human host. Chapter 6 describes the function and structure annotation of proteins encoded in the genome of M. smegmatis MC2-155. M. smegmatis is a widely used model organism for understanding the pathophysiology of M. tuberculosis, the primary causative agent of tuberculosis in humans. M. smegmatis and M. tuberculosis species of the mycobacterial genus share several features like a similar cell-wall architecture, the ability to oxidise carbon monoxide aerobically and share a huge number of homologues. These features render M. smegmatis particularly useful in identifying critical cellular pathways of M. tuberculosis to inhibit its growth in the human host. In spite of the similarities between M. smegmatis and M. tuberculosis, there are stark differences between the two due to their diverse niche and lifestyle. While there are innumerable studies reporting the structure, function and interaction properties of M. tuberculosis proteins, there is a lack of high quality annotation of M. smegmatis proteins. This makes the understanding of the biology of M. smegmatis extremely important for investigating its competence as a good model organism for M. tuberculosis. With the implementation of available sequence and structural profile-based search procedures, functional and structural characterization could be achieved for ~92% of the M. smegmatis proteome. Structural and functional domain definitions were obtained for a total of 5695 of 6717 proteins in M. smegmatis. Residue coverage >70% was achieved for 4567 proteins, which constitute ~68% of the proteome. Domain unassigned regions more than 30 residues were assessed for their potential to be associated to a domain. For 1022 proteins with no recognizable domains, putative structural and functional information was inferred for 328 proteins by the use of distance relationship detection and fold recognition methods. Although 916 sequences of 1022 proteins with no recognizable domains were found to be specific to M. smegmatis species, 98 of these are specific to its MC2-155 strain. Of the 1828 M. smegmatis proteins classified as conserved hypothetical proteins, 1038 proteins were successfully characterized. A total of 33 Domains of Unknown Function (DUFs) occurring in M. smegmatis could be associated to structural domains. A high representation of the tetR and GntR family of transcription regulators was noted in the functional repertoire of M. smegmatis proteome. As M. smegmatis is a soil-dwelling bacterium, transcriptional regulators are crucial for helping it to adapt and survive the environmental stress. Similarly, the ABC transporter and MFS domain families are highly represented in the M. smegmatis proteome. These are important in enabling the bacteria to uptake carbohydrate from diverse environmental sources. A lower number of virulent proteins were identified in M. smegmatis, which justifies its non-pathogenicity. Thus, a detailed functional and structural annotation of the M. smegmatis proteome was achieved in Chapter 6. Chapter 7 delineates the similarities and difference in the structure and function of proteins encoded in the genomes of the pathogenic M. tuberculosis and the non-pathogenic M. smegmatis. The protocol employed in Chapter 6 to achieve the proteome-wide structure and function annotation of M. smegmatis was also applied to M. tuberculosis proteome in Chapter 7. The number of proteins encoded by the genome of M. smegmatis strain MC2-155 (6717 proteins) is comparatively higher than that in M. tuberculosis strain H37Rv (4018 proteins). A total of 2720 high confidence orthologues sharing ≥30% sequence identity were identified in M. tuberculosis with respect to M. smegmatis. Based on the orthologue information, specific functional clusters, essential proteins, metabolic pathways, transporters and toxin-antitoxin systems of M. tuberculosis were inspected for conservation in M. smegmatis. Among the several categories analysed, 53 metabolic pathways, 44 membrane transporter proteins belonging to secondary transporters and ATP-dependent transporter classes, 73 toxin-antitoxin systems, 23 M. tuberculosis-specific targets, 10 broad-spectrum targets and 34 targets implicated in persistence of M. tuberculosis could not detect any orthologues in M. smegmatis. Several of the MFS superfamily transporters act as drug efflux pumps and are hence associated with drug resistance in M. tuberculosis. The relative abundances of MFS and ABC superfamily transporters are higher in M. smegmatis than in M. tuberculosis. As these transporters are involved in carbohydrate uptake, their higher representation in M. smegmatis than in M. tuberculosis highlights the lack of proficiency of M. tuberculosis to assimilate diverse carbon sources. In the case of porins, MspA-like and OmpA-like porins are selectively present in either M. smegmatis or M. tuberculosis. These differences help to elucidate protein clusters for which M. smegmatis may not be the best model organism to study M. tuberculosis proteins.! At the domain-level, ATP-binding domain of ABC transporters, tetracycline transcriptional regulator (tetR) domain family, major facilitator superfamily (MFS) domain family, AMP-binding domain family and enoyl-CoA hydrolase domain family are highly represented in both M. smegmatis and M. tuberculosis proteomes. These domains play an essential role in the carbohydrate uptake systems and drug-efflux pumps among other diverse functions in mycobacteria. There are several differentially represented domain families in M. tuberculosis and M. smegmatis. For example, the pentapeptide-repeat domain, PE, PPE and PIN domains although abundantly present in M. tuberculosis, are very rare in M. smegmatis. Therefore, such uniquely or differentially represented functional and structural domains in M. tuberculosis as compared to M. smegmatis may be linked to pathogenicity or adaptation of M. tuberculosis in the host. Hence, major differences between M. tuberculosis and M. smegmatis were identified, not only in terms of domain populations but also in terms of domain combinations. Thus, Chapter 7 highlights the similarities and differences between M. smegmatis and M. tuberculosis proteomes in terms of structure and function. These differences provide an understanding of selective utilization of M. smegmatis as a model organism to study M. tuberculosis. ! In Chapter 8, computational tools have been employed to predict biologically feasible host-pathogen interactions between the human host and the pathogenic, Leptospira interrogans. Sensitive profile-based search procedures were used to specifically identify practical drug targets in the genome of Leptospira interrogans, the causative agent of the globally widespread zoonotic disease, Leptospirosis. Traditionally, the genus Leptospira is classified into two species complex- the pathogenic L. interrogans and the non-pathogenic saprophyte L. biflexa. The pathogen gains entry into the human host through direct or indirect contact with fluids of infected animals. Several ambiguities exist in the understanding of L. interrogans pathogenesis. An integration of multiple computational approaches guided by experimentally derived protein-protein interactions, was utilized for recognition of host-pathogen protein-protein interactions. The initial step involved the identification of similarities of host and L. interrogans proteins with crystal structures of experimentally known transient protein-protein complexes. Further, conservation of interfacial nature was used to obtain high confidence predictions for putative host-pathogen protein-protein interactions. These predictions were subjected to further selection based on subcellular localization of proteins of the human host and L. interrogans, and tissue-specific expression profiles of the host proteins. A total of 49 protein-protein interactions mediated by 24 L. interrogans proteins and 17 host proteins were identified and these may be subjected to further experimental investigations to assess their in vivo relevance. The functional relevance of similarities and differences between the pathogenic and non-pathogenic leptospires in terms of interactions with the host has also been explored. For this, protein-protein interactions across human host and the non-pathogenic saprophyte L. biflexa were also predicted. Nearly 39 leptospiral-host interactions were recognized to be similar across both the pathogen and saprophyte in the context of processes that influence the host. The overlapping leptospiral-host interactions of L. interrogans and L. biflexa proteins with the human host proteins are primarily associated with establishment of its entry into the human host. These include adhesion of the leptospiral proteins to host cells, survival in host environment such as iron acquisition and binding to components of extracellular matrix and plasma. The disjoint sets of leptospiral-host interactions are species-specific interactions, more importantly indicative of the establishment of infection by L. interrogans in the human host and immune clearance of L. biflexa by the human host. With respect to L. interrogans, these specific interactions include interference with blood coagulation cascade and dissemination to target organs by means of disruption of cell junction assembly. On the other hand, species-specific interactions of L. biflexa proteins include those with components of host immune system. ! In spite of the limited availability of experimental evidence, these help in identifying functionally relevant interactions between host and pathogen by integrating multiple lines of evidence. Thus, inferences from computational prediction of host-pathogen interactions act as guidelines for experimental studies investigating the in vivo relevance of these predicted protein-protein interactions. This will further help in developing effective measures for treatment and disease prevention. In summary, Chapters 2 and 3 describe the implementation, advantages and limitations of the alignment-free full-length sequence comparison method, CLAP. Chapter 4 and 5 are dedicated to understand the domain-domain interactions in multi-domain protein sequences and structures. In Chapters 6, 7 and 8 the computational analyses of the mycobacterial species and leptospiral species helped in an enhanced understanding of the functional repertoire of these bacteria. These studies were undertaken by utilizing the biological sequence data available in public databases and implementation of powerful homology-detection techniques. The supplemental data associated with the chapters is provided in a compact disc attached with this thesis.! Proteins - Building Blocks Protein Sequences Protein Domain Hidden Markov Models (HMM) Multi-domain Proteins Mycobacterium smegmatis MC2-155 Mycobacterium tuberculosis Proteomes Leptospira Interrogans Leptospira Biflexa Proteomes Leptospira Biflexa Genomes Mycobacterium tuberculosis H37Rv Mathematics
335	Utilisation d'outils bio-informatiques pour l'étude de pathogènes émergents / Use of bioinformatics tools for the study of emerging pathogens Benamar, Samia 06 July 2017 (has links) La recherche en bactériologie et virologie est à la fois de nature cognitive et appliquée. Elle consiste à fédérer et mettre en place une capacité de recherche multidisciplinaire et pouvoir l'intégrer sur un champ très vaste de microorganismes et de maladies. Les nouvelles avancées conceptuelles et technologiques dans le domaine de la génomique, notamment les avancées dans les techniques à haut débit (séquençage, PCR...) permettent actuellement d’avoir rapidement des génomes bactériens et viraux entiers, ou seulement sur quelques gènes d’une grande population. Les progrès dans ce domaine permettent l’accès à ces informations en évitant une combinaison de plusieurs méthodologies, et à moindre coûts. Dans notre travail de thèse, nous avons été porté à analyser et traiter les données de deux études genomiques et métagenomiques, mettant en évidence avantages, limites et attentes liés à ces techniques. La première étude porte sur l'analyse génomique de nouveaux virus géants et chlamydia infectant Vermamoeba vermiformis. La deuxième étude concerne le pyroséquençage 16S de microbiote intestinal de nouveau-nés atteint de l'entérocolite nécrosante. Pour le premier projet du travail de thèse, nous avons analysé les génomes de trois nouvelles espèces de Chlamydiae et onze virus giants (premiers membres de deux probables nouvelles familles) qui se multiplient naturellement dans Vermamoeba vermiformis. L'objectif étant de mettre en évidence les caractéristiques génétiques spécifiques à ces micro-organismes. La deuxième partie a été consacrée à l'analyse des données de pyroséquençage 16S des selles de nouveau-nés atteints de l'entérocolite nécrosante. / Research in bacteriology and virology is both cognitive and applied. It involves federating and developing a multidisciplinary research capacity and being able to integrate it into a very broad field of microorganisms and diseases. New genomic and conceptual advances in genomics, including advances in high-throughput techniques, now permit rapid bacterial and viral genomes, or only a few genes of a large population. Progress in this area allows access to this information by avoiding a combination of several methodologies and at lower costs. In our thesis work, we were led to analyze and process the data of two genomic and metagenomic studies, highlighting advantages, limitations and expectations related to these techniques. The first study focuses on the genomic analysis of new giant viruses and chlamydia infecting Vermamoeba vermiformis. The second study concerns the 16S pyrosequencing of intestinal microbiota of neonates with necrotizing enterocolitis. The first project of the thesis work analyzed the genomes of three new species of Chlamydiae and eleven giant viruses (first members of two probable new families) which naturally multiply in Vermamoeba vermiformis. The objective is to highlight the genetic characteristics specific to these microorganisms. The second part was devoted to the analysis of 16S pyrosequencing data from neonatal enterocolitis neonatal stools. The goal was to identify an agent responsible for this disease. Vermamoeba vermiformis Chlamydiae Spécificité de l'hôte Vacuoles d'inclusion Virus géants Pan-Genome Phylogenie Comparaison de génomes Entérocolite nécrosante Métagénomique Vermamoeba vermiformis Chlamydiae Host specificity Inclusion vacuoles Giant viruses Pan-Genome Phylogeny Comparison of genomes Necrotizing enterocolitis Metagenomics
336	Identificação de interações proteína-proteína envolvendo os produtos dos Loci hrp, vir e rpf do fitopatógeno Xanthomonas axonopodis pv. citri / Identification of protein-protein interactions involving the products of the loci hrp, vir and rpf the phytopathogen Xanthomonas axonopodis pv. citri Marcos Castanheira Alegria 24 September 2004 (has links) O Cancro Cítrico, um dos mais graves problemas fitossanitários da citricultura atual, é uma doença causada pelo fitopatógeno Xanthomonas axonopodis pv. citri (Xac). Um estudo funcional do genoma de Xac foi iniciado com o intuito de identificar interações proteína-proteína envolvidas em processos de patogenicidade de Xac. Através da utilização do sistema duplo-híbrido de levedura, baseado nos domínios de ligação ao DNA e ativação da transcrição do GAL4, nós analisamos os principais componentes dos mecanismos de patogenicidade de Xac, incluindo o Sistema de Secreção do Tipo III (TTSS), Sistema de Secreção do Tipo IV (TFSS) e Sistema de \"Quorum Sensing\" composto pelas proteínas Rpf. Componentes desses sistemas foram utilizados como iscas na triagem de uma biblioteca genômica de Xac. O TTSS é codificado pelos genes denominados hrp (\"hypersensitive response and pathogenicity\"), hrc (\"hrp conserved\") e hpa (\"hrp associated\") localizados no locus hrp do cromossomo de Xac. Esse sistema de secreção é capaz de translocar proteínas efetoras do citoplasma bacteriano para o interior da célula hospedeira. Nossos resultados mostraram novas interações proteínaproteína entre componentes do próprio TTSS além de associações específicas com uma proteína hipotética: 1) HrpG, um regulador de resposta de um sistema de dois componentes responsável pela expressão dos genes hrp, e XAC0095, uma proteína hipotética encontrada apenas em Xanthomonas spp; 2) HpaA, uma proteína secretada pelo TTSS, HpaB e o domínio C-terminal da HrcV; 3) HrpB1, HrpD6 e HrpW, 4) HrpB2 e HrcU e 5) interações homotrópicas envolvendo a ATPase HrcN. Em Xac, foram encontrados dois loci vir que codificam proteínas que possuem similaridade com componentes do TFSS envolvido em processos de conjugação/secreção bacteriana: TFSS-plasmídeo localizado no plasmídeo pXAC64 e TFSS-cromossomo localizado no cromossomo de Xac. O TFSS-plasmídeo, o qual possui maior similaridade com sistemas de conjugação, mostrou interações envolvendo proteínas cujos genes estão localizados na mesma região do plasmídeo pXAC64: 1) interação homotrópica da TrwA; 2) XACb0032 e XACb0033; 3) interações homotrópicas da proteína XACb0035; 4) VirB1 e VirB9; 5) XACb0042 e VirB6; 6) XACb0043 e XACb0021b. O TFSS-cromossomo apresentou interações envolvendo as proteínas: 1) VirD4 e um grupo de 12 proteínas que contém similaridade entre si, incluindo XAC2609 cujo gene encontra-se no locus vir, 2) XAC2609 e XAC2610; 3) Interações homotrópicas da VirB11; 4) XAC2622 e VirB9. A análise do sistema de \"Quorum-Sensing\" composto pelas proteínas Rpf mostrou interações envolvendo componentes do próprio sistema: 1) RpfC e RpfF; 2) RpfC e RpfG; 3) interações homotrópicas da RpfF; 4) RpfC e CmfA, uma proteína similar a Cmf de Dictyostelium discoideum que, neste organismo, é fundamental para processos de \"quorum-sensing\". As interações proteína-proteína encontradas permitiram-nos entender melhor a composição, organização e regulação dos fatores envolvidos na patogenicidade de Xac. / Citrus Canker, caused by the bacterial plant pathogen Xanthomonas axonopodis pv. citri (Xac) presents one of the most serious problems to Brazilian citriculture. We have initiated a project to identify protein-protein interactions involved in pathogenicity of Xac. Using a yeast two-hybrid system based on GAL4 DNA-binding and activation domains, we have focused on identifying interactions involving subunits, regulators and substrates of: Type Three Secretion System (TTSS), Type Four Secretion System (TFSS) and Quorum Sensing/Rpf System. Components of these systems were used as baits to screening a random Xac genomic library. The TTSS is coded by the hrp (hypersensitive response and pathogenicity), hrc (hrp conserved) and hpa (hrp associated) genes in the chromosomal hrp locus. This secretion system can translocate efector proteins from the bacterial cytoplasm into the host cells. We have identified several previously uncharacterized interactions involving: 1) HrpG, a two-component system response regulator responsible for the expression of Xac hrp operons, and XAC0095, a previously uncharacterized protein encountered only in Xanthomonas spp; 2) HpaA, a protein secreted by the TTSS, HpaB and the C-terminal domain HrcV; 3) HrpB1, HrpD6 and HrpW; 4) HrpB2 and HrcU; 5) Homotropic interactions were also identified for the ATPase HrcN. Xac contains two virB gene clusters, one on the chromosome and one on the pXAC64 plasmid, each of which codes for a unique and previously uncharacterized TFSS. Components of the TFSS of pXAC64, which is most similar to conjugation systems, showed interactions involving proteins coded by the same locus: 1) Homotropic interactions of TrwA; 2) XACb0032 and XACb0033; 3) XAC0035 homotropic interactions; 4) VirB1 and VirB9; 5) XACb0042 and VirB6; 6) XACb0043 and XACb0021 b. Components of the chromosomal TFSS exhibited interactions involving: 1) VirD4 and a group of 12 uncharacterized proteins with a common C-terminal domain motif, include XAC2609 whose gene resides within the vir locus; 2) XAC2609 and XAC261 O; 3) Homotropic interactions of VirB11; 4) XAC2622 and VirB9. Analysis of Quorum Sensing/Rpf System components revealed interactions between the principal Rpf proteins which control Xanthomonas quorum sensing: 1) RpfC and RpfF; 2) RpfC and RpfG; 3) RpfF homotropic interactions; 4) RpfC and CmfA, a protein that presents similarity with Cmf (conditioned medium factor) of Dictyostelium discoideum, which contrais quorum sensing in this organism. The protein-protein interactions that we have detected reveal insights into the composition, organization and regulation of these important mechanisms involved in Xanthomonas pathogenicity. Biologia molecular vegetal Fitopatógenos Genomas (Estudo) Genomica funcional Interações proteína-proteína Patogenicidade Proteínas recombinantes Quorum sensing Two-hybrid Xanthomonas (Estudo) Functional Genomics Genomes (Study) Pathogenicity Phytopathogen Plant molecular biology Protein-protein interactions Quorum sensing Recombinant proteins Two-hybrid Xanthomonas (Study)
337	Análise comparativa entre os genomas dos fitopatógenos Xylella fastidiosa e Xanthomonas axonopodis pv. citri / Comparative analysis between the genomes of the phytopathogens Xylella fastidiosa and Xanthomonas axonopodis pv. citri Leandro Marcio Moreira 04 November 2002 (has links) Xylella fastidiosa (Xf) e Xanthomonas axonopodis pv. citri (Xac) são gama proteobactérias gram negativas, responsáveis por grandes perdas econômicas no setor citrícola brasileiro. Com seus genomas seqüenciados e anotados, fizemos uma análise comparativa entre suas composições gênicas e seus ambientes de vida. Xac apresenta um genoma de 5.2Mb contra 2.7Mb de Xf Isto reflete no número de genes (4432 contra 2838) que acabam refletindo em uma maior complexidade metabólica de Xac, caracterizada por: uma extensa gama de genes de degradação de parede celular (44), biossíntese de proteases (92), genes de funções regulatórias (296), um completo metabolismo energético (209), quimiotático (inexistente em Xf) e secretório (presença dos tipos I, II, III e IV , sendo o II em duplicata), além de um grande número de genes envolvidos com captação de ferro (65), fazem de Xac um patógeno de alto poder invasivo e de rápida propagação e virulência Em contrapartida, Xf por não possuir a complexidade supracitada, parece ter seus recursos adaptados ao ambiente em que vive, como por exemplo um alto número de genes envolvidos com biossíntese de pili, que associado à biossíntese de goma, favorecem sua adesão nas glândulas salivares do vetor (cigarrinha) e a formação de aglomerados celulares responsáveis pelo entupimento dos vasos que levam às patologias decorrentes do evento. / Xylella fastidiosa (Xf) and Xanthomonas axonopodis pv. citri Xac are gram negative gamma proteobacteria, responsible for great economical losses in the Brazilian citrus sector. With their sequenced and annotated genomes, we have done a comparative analysis between their genetic composition and life habitat. Xac displays a genome of 5.2Mb against 2. 7Mb of Xf. This reflects the number of genes (4432 against 2838) which results in a greater metabolic complexity of Xac, characterized by: a wide range of genes of cell wall degradation (44), biosynthesis of proteases (92), many genes of regulatory functions (296), a complex energy metabolism (209), chemotatic (absent in Xf) and secretory systerns (presence of types I, II, III and IV, type II in duplicate), besides a great number of genes involved in iron acquisition (65), make of Xac a pathogen of high invasive power and of quick spreading and virulence. In the other hand Xf, due to the lack of the complexity just cited, seems to have its resources adapted to the habitat in which it lives, as for example a large number of genes involved in pili biosynthesis, that associated with gum biosynthesis, favor its adhesion to the salivary glands of the vector (sharpshooter) and the formation of cellular agglomerations responsible for the blockage of the vessels which leads to the pathologies resulted from this event. Análise comparativa Fitopatógenos Genomas (Análise) Metabolismo Xanthomonas Xanthomonas (Análise genética) Xylella Comparative analysis Genomes (Analysis) Metabolism Phytopathogens Xanthomonas Xanthomonas (Genetic analysis) Xylella
338	Role of Mammalian RAD51 Paralogs in Genome Maintenance and Tumor Suppression Somyajit, Kumar January 2014 (has links) (PDF) My research was focused on understanding the importance of mammalian RAD51 paralogs in genome maintenance and suppression of tumorigenesis. The investigation carried out during this study has been addressed toward gaining more insights into the involvement of RAD51 paralogs in DNA damage signalling, repair of various types of lesions including double stranded breaks (DSBs), daughter strand gaps (DSGs), interstrand crosslinks (ICLs), and in the protection of stalled replication forks. My study highlights the molecular functions of RAD51 paralogs in Fanconi anemia (FA) pathway of ICL repair, in the ATM and ATR mediated DNA damage responses, in homologous recombination (HR), and in the recovery from replication associated lesions. My research also focused on the development of a novel photoinducible ICL agent for targeted cancer therapy. The thesis has been divided into following sections as follows: Chapter I: General introduction that describes about DNA damage responses and the known functions of RAD51 paralogs across species in DNA repair and checkpoint The genome of every living organism is susceptible to various types of DNA damage and mammalian cells are evolved with various DNA damage surveillance mechanisms in response to DNA damages. In response to DNA damage, activated checkpoints arrest the cell cycle progression transiently and allow the repair of damaged DNA. Upon completion of DNA repair, checkpoints are deactivated to resume the normal cell cycle progression. Defective DNA damage responses may lead to chromosome instability and tumorigenesis. Indeed, genome instability is associated with several genetic disorders, premature ageing and various types of cancer in humans. The major cause of chromosome instability is the formation of DSBs and DSGs. Both DSBs and DSGs are the most dangerous type of DNA lesions that arise endogenously as well as through exogenous sources such as radiations and chemicals. Spontaneous DNA damage is due to generation of reactive oxygen species (ROS) through normal cellular metabolism. Replication across ROS induced modified bases and single strand breaks (SSBs) leads to DSGs and DSBs, respectively. Such DNA lesions need to be accurately repaired to maintain the integrity of the genome. To understand the various cellular responses that are triggered after different types of DNA damage and the possible roles of RAD51 paralogs in these processes, chapter I of the thesis has been distributed in to multiple sections as follows: Briefly, the initial portion of the chapter provides a glimpse of various types of DNA damage responses and repair pathways to deal with the lesions arising from both endogenous as well as exogenous sources. Owing to the vast range of cellular responses and pathways, the following section provides the detailed description and mechanisms of various pathways involved in taking care of wide range of DNA lesions from SSBs to DSBs. Subsequent section of chapter I provides a comprehensive description of maintenance of genome stability at the replication fork and telomeres. Germline mutations in the genes that regulate genome integrity cause various genetic disorders and cancer. Mutations in ATM, ATR, MRE11, NBS1, BLM and FANC (1-16), BRCA1 and BRCA2 that are known to regulate DNA damage signaling, DNA repair and genome integrity lead to chromosome instability disorders such as ataxia-telangiectasia, ATR-Seckel syndrome, AT-like disorder, Nijmegen breakage syndrome, Bloom syndrome, FA, and breast and ovarian cancers respectively. Interestingly, RAD51 paralog mutations are reported in patients with FA-like disorder and various types of cancers including breast and ovarian cancers. Mono-allelic germline mutations in all RAD51 paralogs are reported to cause cancer in addition to the reported cases of FA-like disorder with bi-allelic germline mutations in RAD51C and XRCC2. In accordance, the last section of the chapter has been dedicated to describe the genetics of breast and ovarian cancers and the known functions of tumor suppressors such as BRCA1, BRCA2 and RAD51 paralogs in the protection of genome. Despite the identification of five RAD51 paralogs nearly two decades ago, the molecular mechanism(s) by which RAD51 paralogs regulate HR and genome maintenance remain obscure. To gain insights into the molecular mechanisms of RAD51 paralogs in DNA damage responses and their link with genetic diseases and cancer, the following objectives were laid for my PhD thesis: 1) To understand the functional role of RAD51 paralog RAD51C in FA pathway of ICL repair and DNA damage signalling. 2) To dissect the ATM/ATR mediated targeting of RAD51 paralog XRCC3 in the repair of DSBs and intra S-phase checkpoint. 3) To uncover the replication restart pathway after transient replication pause and the involvement of distinct complexes of RAD51 paralogs in the protection of replication forks. 4) To design photoinducible ICL agent that can be activated by visible light for targeted cancer therapy. Chapter II: Distinct roles of FANCO/RAD51C protein in DNA damage signaling and repair: Implications for Fanconi anemia and breast cancer susceptibility RAD51C, a RAD51 paralog has been implicated in HR. However, the underlying mechanism by which RAD51C regulates HR mediated DNA repair is elusive. In 2010, a study identified biallelic mutation in RAD51C leading to FA-like disorder, whereas a second study reported monoallelic mutations in RAD51C associated with increased risk of breast and ovarian cancers. However, the role of RAD51C in the FA pathway of DNA cross-link repair and as a tumor suppressor remained obscure. To understand the role of RAD51C in FA pathway of ICL repair and DNA damage response, we employed genetic, biochemical and cell biological approaches to dissect out the functions of RAD51C in genome maintenance. In our study, we observed that RAD51C deficiency leads to ICL sensitivity, chromatid-type errors, and G2/M accumulation, which are hallmarks of the FA phenotype. We found that RAD51C is dispensable for ICL unhooking and FANCD2 monoubiquitination but is essential for HR, confirming the downstream role of RAD51C in ICL repair. Furthermore, we demonstrated that RAD51C plays a vital role in the HR-mediated repair of DSBs associated with replication. Finally, we showed that RAD51C participates in ICL and DSB induced DNA damage signaling and controls intra-S-phase checkpoint through CHK2 activation. Our analyses with pathological mutants of RAD51C displayed that RAD51C regulates HR and DNA damage signaling distinctly. Together, these results unravel the critical role of RAD51C in the FA pathway of ICL repair and as a tumor suppressor. Chapter III: ATM-and ATR-mediated phosphorylation of XRCC3 regulates DNA double-strand break-induced checkpoint activation and repair The RAD51 paralogs XRCC3 and RAD51C have been implicated in HR and DNA damage responses, but the molecular mechanism of their participation in these pathways remained obscured. In our study, we showed that an SQ motif serine 225 in XRCC3 is phosphorylated by ATR kinase in an ATM signaling pathway. We found that RAD51C in CX3 complex but not in BCDX2 complex is essential for XRCC3 phosphorylation, and this modification follows end resection and is specific to S and G2 phases. XRCC3 phosphorylation was found to be required for chromatin loading and stabilization of RAD51 and HR-mediated repair of DSBs. Notably, in response to DSBs, XRCC3 participates in the intra-S-phase checkpoint following its phosphorylation and in the G2/M checkpoint independently of its phosphorylation. Strikingly, we found that XRCC3 distinctly regulates recovery of stalled and collapsed replication forks such that phosphorylation was required for the HR-mediated recovery of collapsed replication forks but is dispensable for the recovery of stalled replication forks. Together, our findings suggest that XRCC3 is a new player in the ATM/ATR-induced DNA damage responses to control checkpoint and HR-mediated repair. Chapter IV: RAD51 paralogs protect stalled forks and mediate replication restart in an FA-BRCA independent manner Mammalian RAD51 paralogs RAD51 B, C, D, XRCC2 and XRCC3 are critical for genome maintenance. To understand the crucial roles of RAD51 paralogs during spontaneously arising DNA damage, we have studied the RAD51 paralogs assembly during replication and examined the replication fork stability and its restart. We found that RAD51 paralogs are enriched onto the S-phase chromatin spontaneously. Interestingly, the number of 53BP1 nuclear bodies in G1-phase and micro-nucleation which serve as markers for under replicated lesions increases after genetic ablation of RAD51C, XRCC2 and XRCC3. Furthermore, we showed that RAD51 paralogs are specifically enriched at two major fragile sites FRA3B and FRA16D after replication fork stalling. We found that all five RAD51 paralogs bind to nascent DNA strands after replication fork stalling and protect the fork. Nascent replication tracts created before fork stalling with hydroxyurea degrade in the absence of RAD51 paralogs but remain stable in wild-type cells. This function was dependent on ATP binding at the walker A motif of RAD51 paralogs. Our results also suggested that RAD51 paralogs assemble into BCDX2 complex to prevent generation of DSBs at stalled replication forks, thereby safeguarding the pre-assembled replisome from the action of nucleases. Strikingly, we showed that RAD51C and XRCC3 in complex with FANCM promote the restart of stalled replication forks in an ATP hydrolysis dependent manner. Moreover, RAD51C R258H mutation that was identified in FA-like disorder abrogates the interaction of RAD51C with FANCM and XRCC3, and prevents fork restart. Thus, assembly of RAD51 paralogs in different complexes prevents nucleolytic degradation of stalled replication forks and promotes restart to maintain genomic integrity. Chapter V: Trans-dichlorooxovandium(IV) complex as a potent photoinducible DNA interstrand crosslinker for targeted cancer therapy Although DNA ICL agents such as MMC, cisplatin and psoralen are known to serve as anticancer drugs, these agents affect normal cells as well. Moreover, tumor resistance to these agents has been reported. We have designed and synthesized a novel photoinducible DNA crosslinking agent (ICL-2) which is a derivative of oxovanadiumterpyridine complex with two chlorides in trans position. We found that ICL-2 can be activated by UV-A and visible light to enable DNA ICLs. ICL-2 efficiently activated FA pathway of ICL repair. Strikingly, photoinduction of ICL-2 induces prolonged activation of cell cycle checkpoint and high degree of cell death in FA pathway defective cells. Moreover, we showed that ICL-2 specifically targets cells that express pathological RAD51C mutants. Our findings suggest that ICL-2 can be potentially used for targeted cancer therapy in patients with gene mutations in FA and HR pathway. Tumour Suppression Targeted Cancer Therapy DNA Repair RAD51 Paralogs DNA Damage Signallling DNA Lesions Genome Stability Fanconi Anemia Tumorigenesis Breast Cancer Cancer Susceptibility Genes Genomes Carcinogenesis Trans-dichlorooxovandium (IV) Complex RAD51C Oxovanadium(IV) Complexes Biochemistry
339	Organisation et expression des gènes de résistance aux métaux lourds chez Cupriavidus metallidurans CH34 Monchy, Sébastien 04 June 2007 (has links) Cupriavidus metallidurans CH34 est une béta-protéobactérie, résistante aux métaux lourds, isolée des sédiments d'une usine de métallurgie non-ferreuse en Belgique. <p>Le génome de cette bactérie contient un chromosome (3.6 Mb), un mégaplasmide (2.6 Mb) et deux plasmides pMOL28 (171 kb) et pMOL30 (234 kb) déjà connus pour porter des gènes de résistance aux métaux lourds. <p>Nous avons d'abord fait le catalogue des gènes impliqués dans la résistance aux métaux lourds et, ensuite, cherché à mesurer leur expression par deux approches transcriptomiques :RT-PCR et puces à ADN.<p> L'analyse du génome montre au moins 170 gènes relatifs à la résistance aux ions métalliques localisés sur les 4 réplicons, principalement sur les deux plasmides. Ces gènes codent essentiellement pour des systèmes d'efflux tel que les HME-RND (transport chimioosmotique avec flux de protons à contresens), les ATPases de type P ou encore pour le système de résistance aux ions Cu(II). Dans le génome de C. metallidurans, nous avons identifié 13 opérons qui codent pour des systèmes HME-RND, seuls trois, localisés sur les plasmides, sont surexprimés en présence de métaux lourds. Huit gènes codent pour des ATPases de type P, dont deux appartiennent à une classe dont les substrats ne sont pas métalliques. Deux ATPases appartiennent à une famille spécialisée pour l'efflux du Cu(II) et les quatre autres à une autre grande famille impliquée dans l'efflux des ions Cd(II), Pb(II) et Zn(II). Les analyses transcriptomiques montrent la surexpression des deux premières classes d'ATPases P en présence des métaux lourds. La mutagenèse du gène zntA (mégaplasmide), codant pour l'une des ATPases, provoque une diminution de la viabilité en présence de Zn(II), Cd(II) et dans une moindre mesure de Pb(II), Tl(I) et Bi(III). <p>Sur pMOL30, la résistance au cuivre implique un groupe de 19 gènes cop codant pour la résistance au cuivre au niveau du périplasme et du cytoplasme, et vraisemblablement pour une forme de stockage du cuivre essentiel. Ces 19 gènes sont surexprimés en présence de cuivre, mais une quinzaine de gènes proches semblent aussi requis pour une expression optimale de la résistance au cuivre. <p>L'annotation des plasmides a mis en évidence la parenté du plasmide pMOL28 avec le plasmide pHG1 (hydrogénotrophie, fixation du CO2) de C. eutrophus H16 et le plasmide pSym (fixation de l'azote) de C. taiwanensis, et chez pMOL30, la présence de deux îlots génomiques concentrant la plupart des résistances aux métaux lourds. Les puces montrent la surexpression de 83 sur 164 gènes dans pMOL28, et de 143 sur 250 gènes dans pMOL30. Elles montrent aussi que les gènes présents sur les deux plasmides sont davantage surexprimés que ceux localisés sur les deux mégaréplicons. Parmi les gènes surexprimés les plus intéressants du plasmide pMOL30, il faut mentionner des transposases tronquées et des gènes impliqués dans la synthèse des membranes (glycosyltransférases). L'analyse de l'expression des gènes plasmidiens de résistance aux métaux lourds montre la surexpression en présence de plusieurs ions métalliques ajoutés indépendamment et pas seulement par les substrats métalliques de ces opérons, ce qui suggère l'intervention de deux types de régulation dont les gènes correspondants sont aussi localisés sur le chromosome et le mégaplasmide.<p>Ce travail met en évidence la spécialisation de la bactérie dans la réponse à un grand spectre de concentrations de métaux lourds, jusqu'à la limite majeure de la toxicité observée pour les bactéries mésophiles hétérotrophes. Cette spécialisation correspond bien aux biotopes industriels de divers continents dans lesquels on l'a trouvée. <p> / Doctorat en sciences, Spécialisation biologie moléculaire / info:eu-repo/semantics/nonPublished Sciences exactes et naturelles Biologie Heavy metals Ralstonia Microbial genomics Microbial genomes Genetic transcription Métaux lourds Ralstonia Génomique microbienne Génomes microbiens Transcription génétique ralstonia pMOL30 pMOL28 HME-RND metaux lourds ATPase cop RND CH34 metallidurans Cupriavidus
340	Structural Studies on Thiolases and Thiolase-like Proteins Janardan, Neelanjana January 2014 (has links) (PDF) The genus Mycobacterium comprises some of the most devastating pathogens that infect humans. Mycobacterium tuberculosis causes tuberculosis in humans leading to high morbidity and mortality. The disease is especially prevalent in the under-developed and developing countries of the tropics. Diseases like AIDS and cancer compromise the immune system of an individual leaving him/her susceptible to secondary infections, particularly of tuberculosis. Thus, tuberculosis is making reappearance even in the well-developed countries of the west. The emergence of multi drug resistant strains of tuberculosis makes this deadly disease difficult to cure. A vaccine against tuberculosis is therefore the need of the hour. Mycobacterium smegmatis is a non-pathogenic member of the same family. It has a relatively fast multiplication time when compared to M. tuberculosis and shares the same unique features of the family that make pathogenic members extremely resistant to chemicals and drugs. Proteins of M. smegmatis and M. tuberculosis share high sequence identities, making M. smegmatis the microorganism of choice to study its more deadly counterpart from the same family. A striking feature of all mycobacterial genomes is the abundance of genes coding for enzymes involved in fatty acid and lipid metabolism; more than 250 in Mycobacterium tuberculosis compared to only 50 in Escherichia coli. The mycobacterial genome codes for over a hundred enzymes involved in fatty acid degradation. Apart from providing energy, lipids and fatty acids also form an integral part of the cell wall and cell membrane of Mycobacteria. The abundance and importance of lipid metabolizing enzymes in Mycobacteria make them attractive targets for drug discovery. It is therefore of interest to biochemically and structurally characterize these enzymes. Thiolases are a group of enzymes that are involved in lipid metabolism. In the last step of the β-oxidation pathway, degradative thiolases catalyze the shortening of fatty acid chains by degrading 3-keto acyl CoA to acetyl CoA and a shortened acyl CoA molecule. Thiolases are a subfamily of the thiolase superfamily. This superfamily also includes the Ketoacyl-(Acyl-carrier-protein)-Synthase (KAS) enzymes, polyketide synthases and chalcone synthases. Most members of this superfamily are dimers and while only a few have been found to be tetramers. The tetramers are loosely held dimers of tight dimers. Examination of the Mycobacterium smegmatis genome revealed the presence of several putative thiolase genes. These genes have been annotated as thiolases on the basis of sequence analysis. However, none of them has been biochemically or structurally characterized. The sequence identity between some of these proteins and the other well-characterized thiolases is rather low. The work described in this thesis attempts to characterize two such enzymes from M. smegmatis structurally and functionally. Chapter 1 begins with a brief introduction to the genus Mycobacteria and the role of fatty acid metabolism in mycobacterial virulence. This is followed by a review of the current literature on the enzymes of the thiolase superfamily and their role in fatty acid metabolism. The chapter concludes with a brief summary on the aims and objectives of the work. Chapter 2 describes all the common experimental procedures and computational methods used during the course of these investigations, as most of them are applicable to all the structure determinations and analyses presented in later chapters. The experimental procedures described include overexpression, purification, site directed mutagenesis, isolation of plasmids, crystallization of proteins and X-ray diffraction data collection. Computational methods include structure determination protocols along with details of various programs used during data processing, structure determination, refinement, model building, structure validation and analysis. Chapter 3 describes the cloning, expression, purification, crystallization and structure determination of a thiolase-like protein (TLP1) from M. smegmatis. All enzymes of the thiolase superfamily that have been structurally characterized so far share four features: 1) conservation of the core α/β/α/β/α-layered structure of the thiolase domain, 2) conservation of the extensive dimerization interface, 3) the location of the active site pocket and conservation of key active site residues and 4) the use of a nucleophilic cysteine residue in catalysis. The crystal structure of MsTLP1 revealed some interesting differences when compared to classical thiolases. Of the four characteristic features of thiolases, MsTLP1 has the conserved thiolase fold. The location of its putative active site is similar to that in classical thiolases. However, the dimerization is not a conserved feature in MsTLP1, which appears to be a monomer in solution as well as in the crystal structure. The ligand binding groove of MsTLP1, identified by structural superposition with Z. ramigera thiolase, is larger than that of Z. ramigera. The absence of the catalytic cysteine suggested that though the protein has the strictly conserved thiolase fold, it might perform an entirely different function. A unique extra C-terminal domain of unknown function present only in MsTLP1 has been described towards the end of the chapter. A thorough sequence and structural analysis suggested that MsTLP1 might belong to a new subfamily in the thiolase superfamily. Chapter 4 describes the attempts made towards the biochemical characterization of MsTLP1. Thiolase assays carried out for the synthetic and degradative reactions revealed that the enzyme is inactive in both the directions. However, surface plasmon resonance binding studies revealed that the protein could bind to Coenzyme A, a feature it shares with other enzymes of the thiolase superfamily. Thorough bioinformatics analyses of the structure to determine the residues involved in CoA binding have also been described. The chapter ends with a discussion on the probable function of TLPs in Mycobacteria. Chapter 5 describes the cloning, expression, purification and X-ray structural studies on MsT1-L thiolase. This is the first structural report of a probable T1-thiolase. The protein crystallized in three different space groups, in all of which the enzyme was found to be in a tetrameric form. Analysis of the tetramer structures from the three different crystal forms revealed that MsT1-L exhibits some rotational flexibility about the central tetramerization loop. A qualitative and quantitative analysis of this movement has been described. Structural comparisons revealed that the overall structure of MsT1-L is very similar to that of the well-characterized biosynthetic thiolase form Z. ramigera. However, a detailed analysis of the ordered waters near the active site cavity revealed interesting differences between the two. The probable functional relevance of this observation has been discussed. The crystal structure of MsT1-L complexed with CoA has also been described in detail. Structural comparisons with classical thiolases also revealed significant differences in the organization of the loop domain that harbors most of the residues required for catalysis. These differences cause the active site cavity of MsT1-L to be larger than that of biosynthetic thiolase suggesting that MsT1-L thiolase could probably bind larger substrates. This cavity is large enough to accommodate a medium chain length fatty acyl CoA as substrate. Co-crystallization experiments with hexanoyl CoA revealed a novel binding site for the fatty acyl chain in MsT1-L and this has been described in detail. Contributions made towards the cloning and expression of other thiolases from S. typhimurium and P. falciparum have been described in Chapters 6 and 7. The thesis concludes with a brief discussion on the future prospects of the investigations presented here. Thiolases Thiolase-like Proteins Bacterial Thiolase Mycobacterial Thiolases Mycobacterium Smegmatis Thiolase Salmonella Typhimurium Thiolase Plasmodium Falciparum Thiolase T1 Thiolase Mycobacterial Genomes Mycobacterial Virulence SCP2-Thiolases Biosynthetic Thiolases Thiolase-like Protein Type 1 (TLP1) Biochemistry

Search results