Global ETD Search

1	Microorganismos do solo e de manguezais: fonte de produtos antimicrobianos. / Microorganisms from the soil and from the mangrove swamps: source of antimicrobian products. Firoozmand, Lília Macedo 24 July 2008 (has links) A biodiversidade de microrganismos encontrados nos ecossistemas constitui excelentes fontes para a descoberta de moléculas farmacologicamente ativas. Neste estudo, 32 isolados de actinobactérias coletadas do solo e 51 isolados de fungos de manguezais da costa brasileira foram avaliados quanto à ação antifúngica, antimicobacteriana, leishmanicida e tripanossomicida. Extratos orgânicos obtidos a partir do sobrenadante da cultura dos isolados foram testados e cinco apresentaram concentrações inibitórias mínimas iguais ou inferiores a 400 mg/mL sobre fungos patogênicos e dois demonstraram expressiva ação contra a forma tripomastigota de Trypanosoma cruzi. Para a forma promastigota de Leishmania amazonensis e Mycobacterium tuberculosis H37Rv, os extratos não foram efetivos. Os resultados indicam que fungos isolados de manguezais representam boas perspectivas na investigação de novos agentes antimicrobianos. / The biodiversity of microorganisms found in the ecosystems provide excellent perspectives for the discovery of pharmacologically active molecules. In this study, 32 actinobacteria from the soil and 51 fungi from the mangrove swamps of the Brazilian coast were analyzed with respect to their antifungal, antimycobacterial, leishmanicidal and trypanocidal actions. Organic extracts from the supernatant of the culture of the microorganisms were analyzed and five extracts presented MICs equal or less than 400 mg/mL over the pathogenic fungi and two presented significant action against the trypomastigote of Trypanosoma cruzi. The results indicate that fungi from mangrove swamps present promising perspectives for the research of new antimicrobial agents. Leishmania Leishmania Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis H37Rv Trypanosoma cruzi Trypanosoma cruzi Ação antimicrobiana Antimicrobial action Fungi Fungos Microorganismos Microorganisms
2	Microorganismos do solo e de manguezais: fonte de produtos antimicrobianos. / Microorganisms from the soil and from the mangrove swamps: source of antimicrobian products. Lília Macedo Firoozmand 24 July 2008 (has links) A biodiversidade de microrganismos encontrados nos ecossistemas constitui excelentes fontes para a descoberta de moléculas farmacologicamente ativas. Neste estudo, 32 isolados de actinobactérias coletadas do solo e 51 isolados de fungos de manguezais da costa brasileira foram avaliados quanto à ação antifúngica, antimicobacteriana, leishmanicida e tripanossomicida. Extratos orgânicos obtidos a partir do sobrenadante da cultura dos isolados foram testados e cinco apresentaram concentrações inibitórias mínimas iguais ou inferiores a 400 mg/mL sobre fungos patogênicos e dois demonstraram expressiva ação contra a forma tripomastigota de Trypanosoma cruzi. Para a forma promastigota de Leishmania amazonensis e Mycobacterium tuberculosis H37Rv, os extratos não foram efetivos. Os resultados indicam que fungos isolados de manguezais representam boas perspectivas na investigação de novos agentes antimicrobianos. / The biodiversity of microorganisms found in the ecosystems provide excellent perspectives for the discovery of pharmacologically active molecules. In this study, 32 actinobacteria from the soil and 51 fungi from the mangrove swamps of the Brazilian coast were analyzed with respect to their antifungal, antimycobacterial, leishmanicidal and trypanocidal actions. Organic extracts from the supernatant of the culture of the microorganisms were analyzed and five extracts presented MICs equal or less than 400 mg/mL over the pathogenic fungi and two presented significant action against the trypomastigote of Trypanosoma cruzi. The results indicate that fungi from mangrove swamps present promising perspectives for the research of new antimicrobial agents. Leishmania Mycobacterium tuberculosis H37Rv Trypanosoma cruzi Ação antimicrobiana Fungos Microorganismos Leishmania Mycobacterium tuberculosis H37Rv Trypanosoma cruzi Antimicrobial action Fungi Microorganisms
3	Recognition of Structures, Functions and Interactions of Proteins of Pathogens : Implications in Drug Discovery Ramkrishnan, Gayatri January 2016 (has links) (PDF) Significant advancements in genome sequencing techniques and other high-throughput initiatives have resulted in the availability of complete sequences of genomes of a large number of organisms, which provide an opportunity to study detailed biological information encoded therein. Identification of functional roles of proteins can aid in comprehension of various cellular activities in an organism, which is traditionally achieved using techniques pertaining to the field of molecular biology, protein chemistry and macromolecular crystallography. The established experimental methods for protein structure and function determination, although accurate and resourceful, are laborious and time consuming. Computational analyses of sequences of gene products and exploration of evolutionary relationships can give clues on protein structure and/or function with reasonable accuracy which can be used to direct experimental studies on proteins of interest, effectively. Moreover, with growing volumes of data, there has been a growing disparity in the number of well-characterized and uncharacterized proteins, further necessitating the use of computational methods for investigating evolutionary and structure-function relationships. The remarkable progress made in the development of computational techniques (Chapter 1) has immensely contributed to the state-of-the-art biological sequence analysis and recognition of protein structure and function in a reliable manner. These methods have largely influenced the exploration of protein sequence-structure-function space. One of the relevant applications of computational approaches is in the understanding of functional make-up of human pathogens, their complex interplay with the host and implications in pathogenesis. In this thesis, sensitive profile-based search procedures have been utilized to address various aspects in the context of three pathogens- Mycobacterium tuberculosis, Plasmodium falciparum and Trypanosoma brucei, which are causative agents of potentially life- threatening diseases. The existing drugs approved for the diseases, although of immense value in controlling the disease, have several shortcomings, the most important of them being the emergence of drug resistance that render the current treatment regimens futile. Thus, the identification of practicable targets and new drugs or new combination therapies become an important necessity. Analyses on structural and functional repertoire of proteins encoded in the pathogenic genomes can provide means for rational identification of therapeutic intervention strategies. This thesis begins with the computational analyses of proteins encoded in M. tuberculosis genome. M. tuberculosis is a primary aetiological agent of tuberculosis in humans, and is o responsible for an estimated 1.5 million deaths every year. The complete genome of the pathogen was sequenced and made available more than a decade ago, which has been valuable in determination of functional roles of its gene products. Yet, functions of many M. tuberculosis proteins remain unknown. Computational prediction of protein function is an on- going process based on ever growing information made available in public databases as well as the introduction of powerful homology recognition techniques. Hence, a continuous refinement is essential to make the most of the sequence data, ensuring its accuracy and relevance. With the use of multiple sequence and structural profile-based search procedures, an enhanced structural and functional characterization of M. tuberculosis proteins, totalling to 95% of the genome was achieved (Chapter 2). Following are the key findings. o Domain definitions were obtained for a total of 3566 of 4018 proteins. Amino acid residue coverage of >70% was achieved for 2295 proteins which constitute more than half of the proteome. o Domain assignments were newly identified for 244 proteins with domain-unassigned regions. Structure prediction for these proteins corroborated all the remote homologyrelationships recognized using profile-based methods, enhancing the reliability of the predictions. o Comparison on domain compositions of proteins between M. tuberculosis and human host, revealed presence of pathogen-specific domains that are not homologous to proteins in human. Such proteins in M. tuberculosis are mainly virulence factors involved in host-pathogen interactions such as immune-dominance and aiding entry and survival in human host macrophages, hence forming attractive targets for drug discovery. o Putative structural and functional information for proteins with no recognizable domains were inferred by means of fold recognition and an iterative profile-based search against sequence database. o Attributing putative structures and functions to 955 conserved hypothetical proteins in M. tuberculosis, 137 of which are reportedly essential to the pathogen, provide a basis to re-investigate their involvement in pathogenesis and survival in the host. Proteins with no detectable homologues were recognized as M. tuberculosis H37Rv-specific, which can serve as promising drug targets. An attempt was made to identify porin-like proteins in M. tuberculosis, considering MspA porin from M. smegmatis as a template. The difficulty in recognition of putative porins in M. tuberculosis is indicative of novel outer membrane channel proteins, not characterized yet, or high representation of ion-channels, symporters and transporters to compensate for the functional role of porins. In addition, MspA-like proteins were not readily recognized in other slow-growing mycobacterial pathogens that are known to infect human host, apart from M. tuberculosis. This indicates probable acquisition of physiological adaptations, i.e. absence of porins, to confer drug-resistance, in the course of their co-evolution with human hosts. Evolutionary relationships recognized between sequence (Pfam) and structural (SCOP) families aided in association of potential structures and/or functions for 55 uncharacterized Pfam domains recognized in M. tuberculosis. Such associations deliver useful insights into the structure and function of a protein housing the uncharacterized domain. The functional inferences drawn for M. tuberculosis proteins based on the predictions can provide valuable basis for experimental endeavours in understanding mechanisms of pathogenesis and can significantly impact anti-tubercular drug discovery programmes. An interesting outcome benefitted from the exercise of exploring relationships between Pfam and SCOP families, was the identification of evolutionary relationship between a Pfam domain of unknown function DUF2652 and class III nucleotidyl cyclases. A detailed investigation was undertaken to assess this relationship (Chapter 3). Nucleotidyl cyclases synthesize cyclic nucleotides which are critical second messengers in signalling pathways. The DUF2652 family predominantly comprises of bacterial proteins belonging to three lineages- Actinobacteria, Bacteroidetes and Proteobacteria. Thus, recognition of evolutionary relationship between these bacterial proteins and nucleotide cyclases is of particular interest due to the indispensability of cyclic nucleotides in regulation of varied biological activities in bacteria. Use of fold recognition program suggested presence of nucleotide cyclase-characteristic topological motif (βααββαβ) in all the members of the DUF2652 family. Detailed analyses on structural and functional features of the uncharacterized set of bacterial proteins corresponding to 50 bacterial genomes, using profile- based alignments, revealed presence of key features typical of nucleotidyl cyclases, including metal-binding aspartates, substrate-specifying residues and transition-state stabilizing residues. Depending on the features, 20 proteins of Actinobacteria lineage, predominantly mycobacteria, of unknown structure and function were identified as putative nucleotide cyclases, 23 proteins of Bacteroidetes lineage were associated with guanylyl cyclases, while 8 uncharacterized proteins of Proteobacteria were recognized as nucleotide cyclase-like proteins (7 adenylyl and one guanylyl cyclase). Sequence similarity-based clustering of the predicted nucleotide cyclase-like proteins with established nucleotide cyclases indicated the apparent evolutionarily distinctness of the subfamily of class III nucleotidyl cyclases predicted. Furthermore, analysis of evolutionarily conserved gene clusters of the predicted nucleotide cyclase-like proteins indicated functional associations that support the predictions on their participation in cellular signalling events. The inferences made can be experimentally investigated further to ascertain the involvement of the uncharacterized bacterial proteins in signalling pathways, which can help in understanding the pathobiology of pathogenic species of interest. The next objective was the recognition of biologically relevant protein-protein interactions across M. tuberculosis and human host (Chapter 4). M. tuberculosis is well known for its ability to successfully co-evolve with human host in terms of establishing infection, survival and persistence. The current knowledge on the mechanisms of host invasion, immune evasion and persistence in the host environment can be attributed, and is limited, to the experimental studies pursued by numerous groups. Chapter 4 presents an approach for computational identification of biologically feasible protein-protein interactions across M. tuberculosis and human host. The approach utilizes crystal structures of intra-organism protein-protein complexes which are transient in nature. Identification of homologues of host and pathogen proteins in the database of known protein-protein interactions, formed the initial step, followed by identification of conserved interfacial patch and integration of information on tissue-specific expression of human proteins and subcellular localization of human and M. tuberculosis proteins. In addition, appropriate filters were used to extract biologically feasible host-pathogen protein-protein interactions. This resulted in recognition of 386 interactions potentially mediated by 59 M. tuberculosis proteins and 90 human proteins. A predominance of host-pathogen interactions (193 protein-protein interactions) brought about by M. tuberculosis proteins participating in cell wall processes, was observed, which is in concurrence with the experimental studies on immuno-modulatory activities brought about by such proteins. These set of mycobacterial proteins were predicted to interact with diverse set of host proteins such as those involved in ubiquitin conjugation pathways, metabolic pathways, signalling pathways, regulation of cell proliferation, transport, apoptosis and autophagy. The predictions have the potential to complement experimental observations at the molecular level. Details on couple of interesting cases are presented in the chapter, one of which is the probable mechanism of immune evasion adopted by M. tuberculosis to inhibit lysozyme activity in macrophages, and second is the mechanism of nutrient uptake from host. The set of M. tuberculosis proteins predicted to mediate interactions with host proteins have the potential to warrant an experimental follow-up on probable mechanisms of pathogenesis and also serve as attractive targets for chemotherapeutic interventions. proteins known to participate in P. falciparum metabolism. Pathway holes, where evidence on metabolic step exists but the catalysing enzyme is not known, have also been addressed in the study, several of which have been suggested to play an important role in growth and development of the parasite during its intra-erythrocytic stages in human host. A subsequent objective was the recognition P. falciparum proteins potentially capable of remodelling erythrocytes to suit their niche (Chapter 7). Exploitative mechanisms are brought about by the parasite to remodel erythrocytes for growth and survival during intra-erythrocytic stages of its life-cycle, the understanding of which is limited to experimental studies. To achieve physicochemically viable protein-protein interactions potentially mediated by proteins of human erythrocytes and P. falciparum proteins, a structure-influenced protocol, similar to the one demonstrated in Chapter 4, was employed. Information on subcellular localization and protein expression is crucial especially for parasites like P. falciparum, which reside in One of the major shortcomings with current treatment regimen for tuberculosis is the emergence of multidrug (MDR) and extensively drug-resistant (XDR) strains that render first-line and second-line drug treatments futile. This entails a need to explore target space in M. tuberculosis as well as explore the potential of existing drugs for repurposing against tuberculosis. A drug repurposing strategy i.e. exploring within-target-family selectivity of small molecules, has been implemented (Chapter 5) to contribute towards time and cost-saving anti-tubercular drug development efforts. With the use of profile-based search procedures, evolutionary relationships between targets (other than proteins of M. tuberculosis) of FDA-approved drugs and M. tuberculosis proteins were investigated. A key filter to exclude drugs capable of acting on human proteins substantially reduced the chances of obtaining anti-targets. Thus, total of 130 FDA-approved drugs were recognized that can be repurposed against 78 M. tuberculosis proteins, belonging to the functional categories- intermediary metabolism and respiration, information pathways, cell wall and cell processes and lipid metabolism. The catalogue of structure and function of M. tuberculosis proteins and their involvement in host-pathogen protein-protein interactions compiled from chapters 2 and 4 served as a guiding tool to explore the functional importance of targets identified. Many of the potential targets identified have been experimentally shown to be essential for growth and survival of the pathogen earlier, thus gaining importance in terms of pharmaceutical relevance. Polypharmacological drugs or drugs capable of acting of multiple targets were also identified (92 drugs) in the study. These drugs have the potential to stand tolerance against development of drug resistance in the pathogen. Comparative sequence and structure-based analysis of M. tuberculosis proteins homologous to known targets yielded credible inferences on putative binding sites of FDA-approved drugs in potential targets. Instances where information on binding sites could not be readily inferred from known targets, potentially druggable sites have been predicted. Comparison with earlier experimental studies that report anti-tubercular potential of several approved drugs enhanced the credibility of 74 of 130 FDA-approved drugs that can be readily prioritized for clinical studies. An additional exercise was pursued to identify prospective anti-tubercular agents by means of structural comparison between ChEMBL compounds and 130 FDA-approved drugs. Only those compounds were retained that showed considerably high structural similarity with approved drugs. Such compounds with minor changes in terms of physicochemical properties provide a basis for exploration of compounds that may exhibit higher affinities to bind to M. tuberculosis targets. The set of approved drugs recognized as repurpose-able candidates against tuberculosis, in concert with the structurally similar compounds, can significantly impact anti-tubercular drug development and drug discovery. The next part of the thesis focuses on Plasmodium falciparum, an obligate intracellular protozoan parasite responsible for malaria. The parasite genome features unusual characteristics including abundance of low complexity regions and pronounced sequence divergence that render protein structure and function recognition difficult. The parasite also manifests remarkable plasticity in its metabolic organization throughout its developmental stages in two hosts-human and mosquito; thus obtaining an exhaustive list of metabolic proteins in the parasite gains importance. Considering the utility of multiple sensitive profile-based search approaches in enhanced annotation of M. tuberculosis genome, a similar exercise was employed to recognize potential metabolic proteins in P. falciparum (Chapter 6). A total of 172 metabolic proteins were identified as participants of 78 metabolic pathways, over and above 609heterogeneous environmental conditions at different stages in their lifecycle. Inclusion of such data aided in extraction of 208 biologically relevant protein-protein interactions potentially mediated by 59 P. falciparum proteins and 30 erythrocyte proteins. Host-parasite protein-protein interactions were predicted pertaining to several major strategies spanning intra-erythrocytic stages in P. falciparum pathogenesis including- gaining entry into the host erythrocytes (category: RBC invasion, protease), redirecting parasitic proteins to erythrocyte membrane (category: protein traffic), modulating erythrocyte machinery (category: rosette formation, putative adhesin, chaperone, kinase), evading immunity (category: immune evasion) and eventually egress (category: merozoite egress) to infect other uninfected erythrocytes. Elaborate means to analyse and evaluate the functional viability of a predicted interaction in terms of geometrical packing at the interfacial region, electrostatic complementarity of the interacting surfaces and interaction energies is also demonstrated. The protein-protein interactions, thus predicted between human erythrocytes and P. falciparum, have the potential to provide a useful basis in understanding probable mechanisms of pathogenesis, and indeed in pinning down attractive targets for antimalarial drug discovery. The emergence of drug resistance against all known antimalarial agents, currently in use, necessitates discovery and development of either new antimalarial agents or unexplored combination of drugs that may not only reduce mortality and morbidity of malaria, but also reduce the risk of resistance to antimalarial drugs. In an attempt to contribute towards the same, Chapter 8 explores the established concept of within-target-family selectivity of small molecules to recognize antimalarial potential of the approved drugs. Eighty six FDA-approved drugs, predominantly constituted by antibacterial agents, were identified as feasible candidates for repurposing against 90 P. falciparum proteins. Most of the potential parasite targets identified are known to participate in housekeeping machinery, protein biosynthesis, metabolic pathways and cell growth and differentiation, and thus are pharmaceutically relevant. During intra-erythrocytic growth of P. falciparum, the parasite resides within the erythrocyte, within a protective encasing, known as parasitophorous vacuole. Hence a drug, intended to target a parasite protein residing in an organelle, must be sufficiently hydrophilic or hydrophobic to be able to permeate cell membranes and reach its site of activity. On the basis of lipophilicity of the drugs, a physical property determined experimentally, 57 of 86 FDA-approved drugs were recognized as feasible candidates for use against P. falciparum during the course of blood-stages of infection, which can be prioritized for antimalarial drug development programmes. The final section of the thesis focuses on the protozoan parasite Trypanosoma brucei, a causative agent of African sleeping sickness (Chapter 9). This disease is endemic to sub-Saharan regions of Africa. Despite the availability of completely sequenced genome of T. brucei, structure and function for about 50% of the proteins encoded in the genome remain unknown. Absence of prophylactic chemotherapy and vaccine, compounded with emergence of drug-resistance renders anti-trypanosomal drug discovery challenging. Thus, considering the utility of frameworks established in earlier chapters for recognition of protein structure, function and drug-targets, similar steps were undertaken to understand functional repertoire of the parasite and use drug repurposing methods to accelerate anti-trypanosomal drug discovery efforts. Structures and functions were reliably recognized for 70% of the gene products (5894) encoded in T. brucei genome, with the use of multiple profile-based search procedures, coupled with information on presence of transmembrane domains and signal peptide cleavage sites. Consequently, a total of 282 uncharacterized T. brucei proteins could be newly coined as potential metabolic proteins. Integration of information on stage-specific expression profiles with Trypanosoma-specific and T-.brucei-specific proteins identified in the study, aided in pinning down potential attractive targets. Additionally, exploration of evolutionary relationships between targets of FDA-approved drugs and T. brucei proteins, 68 FDA-approved drugs were predicted as repurpose-able candidates against 42 potential T. brucei targets which primarily include proteins involved in regulatory processes and metabolism. Several targets predicted are reportedly essential in assisting the parasite to switch between differentiation forms (bloodstream and procyclic) in the course of its lifecycle. These targets are of high therapeutic relevance, hence the corresponding drug-target associations provide a useful resource for experimental endeavours. In summary, this thesis presents computational analyses on three pathogenic genomes in terms of enhancing the understanding of functional repertoire of the pathogens, addressing metabolic pathway holes, exploring probable mechanisms of pathogenesis brought about by potential host-pathogen protein-protein interactions, and identifying feasible FDA-approved drug candidates to repurpose against the pathogens. The studies are pursued primarily by taking advantage of powerful homology-detection techniques and the ever-growing biological information made available in public databases. Indeed, the inferences drawn for the three pathogenic genomes serve an excellent resource for an experimental follow-up. The set of protocols presented in the thesis are highly generic in nature, as demonstrated for three pathogens, and can be utilized for genome-wide analyses on many other pathogens of interest. The supplemental data associated with the chapters is provided in a compact disc attached with this thesis. Gene Regulation - Prokaryotes Prokaryotic Promotiers Gene Expression Protein Homology Detection Scoring Matrices Hidden Markov Model Mycobacterium tuberculosis H37Rv Plasmodium falciparum - Drug Targets Homology Detection Nucleotide Cyclase-like Proteins Antimalarial Drugs Mathematics
4	Computational Studies on Structures and Functions of Single and Multi-domain Proteins Mehrotra, Prachi January 2017 (has links) (PDF) Proteins are essential for the growth, survival and maintenance of the cell. Understanding the functional roles of proteins helps to decipher the working of macromolecular assemblies and cellular machinery of living organisms. A thorough investigation of the link between sequence, structure and function of proteins, helps in building a comprehensive understanding of the complex biological systems. Proteins have been observed to be composed of single and multiple domains. Analysis of proteins encoded in diverse genomes shows the ubiquitous nature of multi-domain proteins. Though the majority of eukaryotic proteins are multi-domain in nature, 3-D structures of only a small proportion of multi-domain proteins are known due to difficulties in crystallizing such proteins. While functions of individual domains are generally extensively studied, the complex interplay of functions of domains is not well understood for most multi-domain proteins. Paucity of structural and functional data, affects our understanding of the evolution of structure and function of multi-domain proteins. The broad objective of this thesis is to achieve an enhanced understanding of structure and function of protein domains by computational analysis of sequence and structural data. Special attention is paid in the first few chapters of this thesis on the multi-domain proteins. Classification of multi-domain proteins by implementation of an alignment-free sequence comparison method has been achieved in Chapters 2 and 3. Studies on organization, interactions and interdependence of domain-domain interactions in multi-domain proteins with respect to sequential separation between domains and N to C-terminal domain order have been described in Chapters 4 and 5. The functional and structural repertoire of organisms can be comprehensively studied and compared using functional and structural domain annotations. Chapter 6, 7 and 8 represent the proteome-wide structure and function comparisons of various pathogenic and non-pathogenic microorganisms. These comparisons help in identifying proteins implicated in virulence of the pathogen and thus predict putative targets for disease treatment and prevention. Chapter 1 forms an introduction to the main subject area of this thesis. Starting with describing protein structure and function, details of the four levels of hierarchical organization of protein structure have been provided, along with the databases that document protein sequences and structures. Classification of protein domains considered as the realm of function, structure and evolution has been described. The usefulness of classification of proteins at the domain level has been highlighted in terms of providing an enhanced understanding of protein structure and function and also their evolutionary relatedness. The details of structure, function and evolution of multi-domain proteins have also been outlined in chapter 1. ! Chapter 2 aims to achieve a biologically meaningful classification scheme for multi-domain protein sequences. The overall function of a multi-domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence-based methods utilize only the domain-level information to classify proteins. This does not take into account the contributions of accessory domains and linker regions towards the overall function of a multi-domain protein. An alignment-free protein sequence comparison tool, CLAP (CLAssification of Proteins) previously developed in this laboratory, was assessed and improved when the author joined the group. CLAP was developed especially to handle multi-domain protein sequences without a requirement of defining domain boundaries and sequential order of domains (domain architecture). ! The working principle of CLAP involves comparison of all against all windows of 5-residue sequence patterns between two protein sequences. The sequences compared could be full-length comprising of all the domains in the two proteins. This compilation of comparison is represented as the Local Matching Scores (LMS) between protein sequences (nslab.iisc.ernet.in/clap/). It has been previously shown that the execution time of CLAP is ~7 times faster than other protein sequence comparison methods that employ alignment of sequences. In Chapter 2, CLAP-based classification has been carried out on two test datasets of proteins containing (i) Tyrosine phosphatase domain family and (ii) SH3-domain family. The former dataset comprises both single and multi-domain proteins that sometimes consist of domain repeats of the tyrosine phosphatase domain. The latter dataset consists only of multi-domain proteins with one copy of the SH3-domain. At the domain-level CLAP-based classification scheme resulted in a clustering similar to that obtained from an alignment-based method, ClustalW. CLAP-based clusters obtained for full-length datasets were shown to comprise of proteins with similar functions and domain architectures. Hence, a protein classification scheme is shown to work efficiently that is independent of domain definitions and requires only the full-length amino acid sequences as input.! Chapter 3 explores the limitations of CLAP in large-scale protein sequence comparisons. The potential advantages of full-length protein sequence classification, combined with the availability of the alignment-free sequence comparison tool, CLAP, motivated the conceptualization of full-length sequence classification of the entire protein repertoire. Before undertaking this mammoth task, working of CLAP was tested for a large dataset of 239,461 protein sequences. Chapter 3 discusses the technical details of computation, storage and retrieval of CLAP scores for a large dataset in a feasible timeframe. CLAP scores were examined for protein pairs of same domain architecture and ~22% of these showed 0 CLAP similarity scores. This led to investigation of the sensitivity of CLAP with respect to sequence divergence. Several test datasets of proteins belonging to the same SCOP fold were constructed and CLAP-based classification of these proteins was examined at inter and intra-SCOP family level. CLAP was successful in efficiently clustering evolutionary related proteins (defined as proteins within the same SCOP superfamily) if their sequence identity >35%. At lower sequence identities, CLAP fails to recognize any evolutionary relatedness. Another test dataset consisting of two-domain proteins with domain order swapped was constructed. Domain order swap refers to domain architectures of type AB and BA, consisting of domains A and B. A condition that the sequence identities of homologous domains were greater than 35% was imposed. CLAP could effectively cluster together proteins of the same domain architectures in this case. Thus, the sequence identity threshold of 35% at the domain-level improves the accuracy of CLAP. The analysis also showed that for highly divergent sequences, the expectation of 5-residue pattern match was likely a stringent criterion. Thus, a modification in the 5-residue identical pattern match criterion, by considering even similar residue and gaps within matched patterns may be required to effectuate CLAP-based clustering of remotely related protein sequences. Thus, this study highlights the limitations of CLAP with respect to large-scale analysis and its sensitivity to sequence divergence. ! Chapters 4 and 5 discuss the computational analysis of inter-domain interactions with respect to sequential distance and domain order. Knowledge of domain composition and 3-D structures of individual domains in a multi-domain protein may not be sufficient to predict the tertiary structure of the multi-domain protein. Substantial information about the nature of domain-domain interfaces helps in prediction of the tertiary as well as the quaternary structure of a protein. Therefore, chapter 4 explores the possible relationship between the sequential distance separating two domains in a multi-domain protein and the extent of their interaction. With increasing sequential separation between any two domains, the extent of inter-domain interactions showed a gradual decrease. The trend was more apparent when sequential separation between domains is measured in terms of number of intervening domains. Irrespective of the linker length, extensive interactions were seen more often between contiguous domains than between non-contiguous domains. Contiguous domains show a broader interface area and lower proportion of non-interacting domains (interface area: 0 Å2 to - 4400 Å2, 2.3% non-interacting domains) than non-contiguous domains (interface area: 0 Å2 to - 2000 Å2, 34.7% non-interacting domains). Additionally, as inter-protein interactions are mediated through constituent domains, rules of protein-protein interactions were applied to domain-domain interactions. Tight binding between domains is denoted as putative permanent domain-domain interactions and domains that may dissociate and associate with relatively weak interactions to regulate functional activity are denoted as putative transient domain-domain interactions. An interface area threshold of 600 Å2 was utilized as a binary classifier to distinguish between putative permanent and putative transient domain-domain interactions. Therefore, the state of interaction of a domain pair is defined as either putative permanent or putative transient interaction. Contiguous domains showed a predominance of putative permanent nature of inter-domain interface, whereas non-contiguous domains showed a prevalence of putative transient interfaces. The state of interaction of various SCOP superfamily pairs was studied across different proteins in the dataset. SCOP superfamily pairs mostly showed a conserved state of interaction, i.e. either putative permanent or putative transient in all their occurrences across different proteins. Thus, it is noted that contiguous domains interact extensively more often than non-contiguous domains and specific superfamily pairs tend to interact in a conserved manner. In conclusion, a combination of interface area and other inter-domain properties along with experimental validation will help strengthen the binary classification scheme of putative permanent and transient domain-domain interactions.! Chapter 5 provides structural analysis of domain pairs occurring in different sequential domain orders in mutli-domain proteins. The function and regulation of a multi-domain protein is predominantly determined by the domain-domain interactions. These in turn are influenced by the sequential order of domains in a protein. With domains defined using evolutionary and structural relatedness (SCOP superfamily), their conservation of structure and function was studied across domain order reversal. A domain order reversal indicates different sequential orders of the concerned domains, which may be identified in proteins of same or different domain compositions. Domain order reversals of domains A and B can be indicated in protein pair consisting of the domain architectures xAxBx and xBxAx, where x indicates 0 or more domains. A total of 161 pairs of domain order reversals were identified in 77 pairs of PDB entries. For most of the comparisons between proteins with different domain composition and architecture, large differences in the relative spatial orientation of domains were observed. Although preservation of state of interaction was observed for ~75% of the comparisons, none of the inter-domain interfaces of domains in different order displayed high interface similarity. These domain order reversals in multi-domain proteins are contributed by a limited number of 15 SCOP superfamilies. Majority of the superfamilies undergoing order reversal either function as transporters or regulatory domains and very few are enzymes. A higher proportion of domain order reversals were observed in domains separated by 0 or 1 domains than those separated by more than 1 domain. A thorough analysis of various structural features of domains undergoing order reversal indicates that only one order of domains is strongly preferred over all possible orders. This may be due to either evolutionary selection of one of the orders and its conservation throughout generations, or the fact that domain order reversals rarely conserve the interface between the domains. Further studies (Chapters 6 to 8) utilize the available computational techniques for structural and functional annotation of proteins encoded in a few bacterial genomes. Based on these annotations, proteome-wide structure and function comparisons were performed between two sets of pathogenic and non-pathogenic bacteria. The first study compares the pathogenic Mycobacterium tuberculosis to the closely related organism Mycobacterium smegmatis which is non-pathogenic. The second study primarily identified biologically feasible host-pathogen interactions between the human host and the pathogen Leptospira interrogans and also compared leptospiral-host interactions of the pathogenic Leptospira interrogans and of the saprophytic Leptospira biflexa with the human host. Chapter 6 describes the function and structure annotation of proteins encoded in the genome of M. smegmatis MC2-155. M. smegmatis is a widely used model organism for understanding the pathophysiology of M. tuberculosis, the primary causative agent of tuberculosis in humans. M. smegmatis and M. tuberculosis species of the mycobacterial genus share several features like a similar cell-wall architecture, the ability to oxidise carbon monoxide aerobically and share a huge number of homologues. These features render M. smegmatis particularly useful in identifying critical cellular pathways of M. tuberculosis to inhibit its growth in the human host. In spite of the similarities between M. smegmatis and M. tuberculosis, there are stark differences between the two due to their diverse niche and lifestyle. While there are innumerable studies reporting the structure, function and interaction properties of M. tuberculosis proteins, there is a lack of high quality annotation of M. smegmatis proteins. This makes the understanding of the biology of M. smegmatis extremely important for investigating its competence as a good model organism for M. tuberculosis. With the implementation of available sequence and structural profile-based search procedures, functional and structural characterization could be achieved for ~92% of the M. smegmatis proteome. Structural and functional domain definitions were obtained for a total of 5695 of 6717 proteins in M. smegmatis. Residue coverage >70% was achieved for 4567 proteins, which constitute ~68% of the proteome. Domain unassigned regions more than 30 residues were assessed for their potential to be associated to a domain. For 1022 proteins with no recognizable domains, putative structural and functional information was inferred for 328 proteins by the use of distance relationship detection and fold recognition methods. Although 916 sequences of 1022 proteins with no recognizable domains were found to be specific to M. smegmatis species, 98 of these are specific to its MC2-155 strain. Of the 1828 M. smegmatis proteins classified as conserved hypothetical proteins, 1038 proteins were successfully characterized. A total of 33 Domains of Unknown Function (DUFs) occurring in M. smegmatis could be associated to structural domains. A high representation of the tetR and GntR family of transcription regulators was noted in the functional repertoire of M. smegmatis proteome. As M. smegmatis is a soil-dwelling bacterium, transcriptional regulators are crucial for helping it to adapt and survive the environmental stress. Similarly, the ABC transporter and MFS domain families are highly represented in the M. smegmatis proteome. These are important in enabling the bacteria to uptake carbohydrate from diverse environmental sources. A lower number of virulent proteins were identified in M. smegmatis, which justifies its non-pathogenicity. Thus, a detailed functional and structural annotation of the M. smegmatis proteome was achieved in Chapter 6. Chapter 7 delineates the similarities and difference in the structure and function of proteins encoded in the genomes of the pathogenic M. tuberculosis and the non-pathogenic M. smegmatis. The protocol employed in Chapter 6 to achieve the proteome-wide structure and function annotation of M. smegmatis was also applied to M. tuberculosis proteome in Chapter 7. The number of proteins encoded by the genome of M. smegmatis strain MC2-155 (6717 proteins) is comparatively higher than that in M. tuberculosis strain H37Rv (4018 proteins). A total of 2720 high confidence orthologues sharing ≥30% sequence identity were identified in M. tuberculosis with respect to M. smegmatis. Based on the orthologue information, specific functional clusters, essential proteins, metabolic pathways, transporters and toxin-antitoxin systems of M. tuberculosis were inspected for conservation in M. smegmatis. Among the several categories analysed, 53 metabolic pathways, 44 membrane transporter proteins belonging to secondary transporters and ATP-dependent transporter classes, 73 toxin-antitoxin systems, 23 M. tuberculosis-specific targets, 10 broad-spectrum targets and 34 targets implicated in persistence of M. tuberculosis could not detect any orthologues in M. smegmatis. Several of the MFS superfamily transporters act as drug efflux pumps and are hence associated with drug resistance in M. tuberculosis. The relative abundances of MFS and ABC superfamily transporters are higher in M. smegmatis than in M. tuberculosis. As these transporters are involved in carbohydrate uptake, their higher representation in M. smegmatis than in M. tuberculosis highlights the lack of proficiency of M. tuberculosis to assimilate diverse carbon sources. In the case of porins, MspA-like and OmpA-like porins are selectively present in either M. smegmatis or M. tuberculosis. These differences help to elucidate protein clusters for which M. smegmatis may not be the best model organism to study M. tuberculosis proteins.! At the domain-level, ATP-binding domain of ABC transporters, tetracycline transcriptional regulator (tetR) domain family, major facilitator superfamily (MFS) domain family, AMP-binding domain family and enoyl-CoA hydrolase domain family are highly represented in both M. smegmatis and M. tuberculosis proteomes. These domains play an essential role in the carbohydrate uptake systems and drug-efflux pumps among other diverse functions in mycobacteria. There are several differentially represented domain families in M. tuberculosis and M. smegmatis. For example, the pentapeptide-repeat domain, PE, PPE and PIN domains although abundantly present in M. tuberculosis, are very rare in M. smegmatis. Therefore, such uniquely or differentially represented functional and structural domains in M. tuberculosis as compared to M. smegmatis may be linked to pathogenicity or adaptation of M. tuberculosis in the host. Hence, major differences between M. tuberculosis and M. smegmatis were identified, not only in terms of domain populations but also in terms of domain combinations. Thus, Chapter 7 highlights the similarities and differences between M. smegmatis and M. tuberculosis proteomes in terms of structure and function. These differences provide an understanding of selective utilization of M. smegmatis as a model organism to study M. tuberculosis. ! In Chapter 8, computational tools have been employed to predict biologically feasible host-pathogen interactions between the human host and the pathogenic, Leptospira interrogans. Sensitive profile-based search procedures were used to specifically identify practical drug targets in the genome of Leptospira interrogans, the causative agent of the globally widespread zoonotic disease, Leptospirosis. Traditionally, the genus Leptospira is classified into two species complex- the pathogenic L. interrogans and the non-pathogenic saprophyte L. biflexa. The pathogen gains entry into the human host through direct or indirect contact with fluids of infected animals. Several ambiguities exist in the understanding of L. interrogans pathogenesis. An integration of multiple computational approaches guided by experimentally derived protein-protein interactions, was utilized for recognition of host-pathogen protein-protein interactions. The initial step involved the identification of similarities of host and L. interrogans proteins with crystal structures of experimentally known transient protein-protein complexes. Further, conservation of interfacial nature was used to obtain high confidence predictions for putative host-pathogen protein-protein interactions. These predictions were subjected to further selection based on subcellular localization of proteins of the human host and L. interrogans, and tissue-specific expression profiles of the host proteins. A total of 49 protein-protein interactions mediated by 24 L. interrogans proteins and 17 host proteins were identified and these may be subjected to further experimental investigations to assess their in vivo relevance. The functional relevance of similarities and differences between the pathogenic and non-pathogenic leptospires in terms of interactions with the host has also been explored. For this, protein-protein interactions across human host and the non-pathogenic saprophyte L. biflexa were also predicted. Nearly 39 leptospiral-host interactions were recognized to be similar across both the pathogen and saprophyte in the context of processes that influence the host. The overlapping leptospiral-host interactions of L. interrogans and L. biflexa proteins with the human host proteins are primarily associated with establishment of its entry into the human host. These include adhesion of the leptospiral proteins to host cells, survival in host environment such as iron acquisition and binding to components of extracellular matrix and plasma. The disjoint sets of leptospiral-host interactions are species-specific interactions, more importantly indicative of the establishment of infection by L. interrogans in the human host and immune clearance of L. biflexa by the human host. With respect to L. interrogans, these specific interactions include interference with blood coagulation cascade and dissemination to target organs by means of disruption of cell junction assembly. On the other hand, species-specific interactions of L. biflexa proteins include those with components of host immune system. ! In spite of the limited availability of experimental evidence, these help in identifying functionally relevant interactions between host and pathogen by integrating multiple lines of evidence. Thus, inferences from computational prediction of host-pathogen interactions act as guidelines for experimental studies investigating the in vivo relevance of these predicted protein-protein interactions. This will further help in developing effective measures for treatment and disease prevention. In summary, Chapters 2 and 3 describe the implementation, advantages and limitations of the alignment-free full-length sequence comparison method, CLAP. Chapter 4 and 5 are dedicated to understand the domain-domain interactions in multi-domain protein sequences and structures. In Chapters 6, 7 and 8 the computational analyses of the mycobacterial species and leptospiral species helped in an enhanced understanding of the functional repertoire of these bacteria. These studies were undertaken by utilizing the biological sequence data available in public databases and implementation of powerful homology-detection techniques. The supplemental data associated with the chapters is provided in a compact disc attached with this thesis.! Proteins - Building Blocks Protein Sequences Protein Domain Hidden Markov Models (HMM) Multi-domain Proteins Mycobacterium smegmatis MC2-155 Mycobacterium tuberculosis Proteomes Leptospira Interrogans Leptospira Biflexa Proteomes Leptospira Biflexa Genomes Mycobacterium tuberculosis H37Rv Mathematics

1

Page generated in 0.0658 seconds