Global ETD Search

1	An Integrated Systems Biology Approach to Study Drug Resistance in Mycobacteria Padiadpu, Jyothi January 2015 (has links) (PDF) Emergence of drug resistance is a major problem in the treatment of many diseases including tuberculosis. To tackle the problem, it is essential to obtain a global perspective of the molecular mechanisms by which bacteria acquire drug resistance. Systems biology approaches therefore become necessary. This work aims to understand pathways to drug resistance and strategies for inhibition of the resistant strains by using a combination of experimental genomics and computational molecular systems approaches. Laboratory evolution of Mycobacterium smegmatis MC2 155 by treatment with isoniazid (INH), a front-line anti-tubercular drug, resulted in a drug-resistant strain (4XR), capable of growth even at about 10-times the minimum inhibitory concentration of the drug. Whole genome sequence of the 4XR was determined, which indicated only 31 variations in the whole genome, including 3 point mutations, 17 indels and 11 frame-shifts. Two mutations were in proteins required for the pharmacological action of the drug, albeit in regions distant from the drug binding site. The variations however were insufficient to explain the observed resistance to isoniazid. For a better understanding of the global changes associated with drug resistance, whole genome-wide gene expression data was obtained for the resistant strain and compared with that of the WT strain. 716 genes were found to be differentially regulated in 4XR, spanning different biochemical, signaling and regulatory pathways. From this, some explanations for the emergence of drug resistance were obtained, such as the up-regulation of the enzymes in the mycolic acid biosynthesis pathway and also of the drug efflux pumps. In addition, enrichment analysis indicated that up-regulated genes belong to functional categories of response to stress, carbohydrate metabolism, oxidation-reduction process, ion transport, signaling as well as lipid metabolism. The differential gene regulations seemed to be partially responsible for conferring the phenotype to the organism. Alterations in the metabolic pathways in 4XR were characterized using the phenotypic microarray technology, which experimentally scanned the respiratory ability of the resistant bacteria under 280 different nutrient conditions and 96 different inhibitors. Phenotypic gain, where the resistant strain grows significantly better than the wild type and phenotypic loss, where the growth of the resistant strain is compromised as compared to the sensitive strains were derived from the comparison of the phenotypic responses. Differences in survival ability and growth rates in different nutrient sources in the resistant phenotype as compared to the wild type were observed, suggesting rewiring in the metabolic network of the drug-resistant strain. In particular, the pathways of central carbon metabolism and amino acid biosynthesis exhibit significant differences. The strain-specific metabolic pathway differences may guide in devising strategies to tackle the drug-resistant strains selectively and in a rational manner. Scanning electron microscopy indicated the morphology of the drug-resistant strains to be significantly altered, as compared to the control drug-sensitive strain. It is well-known that isoniazid acts by inhibiting mycolic acid biosynthesis. The pathway turns out to be a target for many other anti-tubercular drugs also, since mycolic acids are major components of the cell wall. It is therefore important to understand what changes occur in the mycolic acid and the associated pathways in the drug-resistant variety so that strategies to tackle the latter can be chosen more judiciously. The lipidome of the cell wall was therefore quantitatively characterized by mass spectrometric analyses, which indeed confirmed that the 4XR strain has a significantly different composition profile. Among the six categories of lipids, the members of the glycerophospolipids category were abundant while the fatty acyls, polyketides and saccharolipids were lower in the 4XR strain as compared to the WT. The lipidomic data derived from the cell wall of INH-resistant strain shows that it results in the mycolic acid pathway function restoration, which would otherwise be lost upon drug exposure in the sensitive strain. Understanding the precise changes that occur in the lipidome in the drug-resistant strains is expected to be useful in developing new ways to tackle resistance. Next, to understand the implications of altered gene expression profiles, protein-protein interaction networks are constructed at a genome-scale that captures various structural and functional associations mediated by proteins in the mycobacterial cell. Using transcriptome data of 4XR, a response network is computed. Using an algorithm previously developed in the laboratory, the networks have been mined to identify highest differential activity paths and possible mechanisms that are deployed by the cells leading to drug resistance. Known resistance mechanisms such as efflux, cytochromes, SOS, are all seen to constitute the highest activities for achieving drug resistance in 4XR. Interestingly, such paths are seen to form a well-connected subnet, indicating such differential activities to be orchestrated. This clearly shows that multiple mechanisms are simultaneously active in the 4XR and may together generate drug resistance. Mechanisms of detoxification and antioxidant responses are seen to predominate in the 4XR subnet. Overall the analysis provides a shortlist of strategies for targeting the drug resistant strain. Next, the phenotypic microarray platform was used for screening for growth in Msm in the presence of various drugs. Data analysis and clustering resulted in identification of conditions that lead to phenotypic gain or loss in the 4XR as well as those that lead to differential susceptibility to various drugs. Drugs such as cephalosporins, tobramycin, aminotriazole, phenylarsine oxide, vancomycin and oxycarboxin were also found to inhibit growth in the resistant strain selectively. In other words, the 4XR is found to be collaterally sensitive to these drugs. The top-net formed by the highest differential activity paths, identified from the network described earlier has already indicated the involvement of proteins that generate antioxidant responses. Insights from the two methods, first from the targeted approach and second, from the phenotypic discovery approach were combined together to select only those compounds to which the 4XR strain was collaterally sensitive and targeted proteins responsible for antioxidant responses. These compounds are vancomycin, phenylarsine oxide, ebselen and clofazimine. These were further tested against the virulent M. tuberculosis H37Rv strain in a collaborator‘s laboratory. 3 of these compounds such as vancomycin, ebselen and phenylarsine oxide were found to be highly active in combinations with isoniazid against all tested Mtb strains, showed high levels of inhibition against H37Rv and 3 different single drug resistant, MDR and XDR strains. Moreover, they were observed to be highly potent when given in combinations. Clofazimine on the other hand, in combination with isoniazid showed activity but no significant synergy in the virulent drug-resistant strains of M. tuberculosis though synergistic to the sensitive strain. Thus, experiments with M. tuberculosis provide empirical proof that four different compounds, all capable of blocking antioxidant responses, are capable of inhibiting growth of single-, multiple- and extremely-drug-resistant clinical isolates of M. tuberculosis. Using transcriptome data from literature for M. tuberculosis exposed to six different drugs, similar drug specific response networks were constructed. These networks indicate differences in the cellular response to different drugs. Interestingly, the analysis suggests that different drug targets and hence different drugs could trigger drug resistance to various extents, leading to the possibility of prioritizing drug targets based on their resistance evolvability. An earlier study from the laboratory suggested the concept of target-co-target pairs, where-in the co-target could be a key protein in mediating drug resistance for that particular drug and hence for its target protein. Top ranked hubs in multiple drug specific networks such as PolA, FadD1, CydA, a monoxygenase and GltS, can possibly serve as co-targets. Simultaneous inhibition of the co-target along with the primary target could lower the chances of emergence of drug resistance. Such analyses of drug specific networks provide insights about possible routes of communication in the cell leading to drug resistance and strategies to inhibit such communication to retard emergence of drug resistance. Since mutations in the target proteins are known to form an important mechanism by which resistant strains emerge, an understanding of the nature of mutations in different drug targets and how they achieve resistance is crucial. Sequence as well as structural bases for the resistance from known drug-resistant mutants in different drug targets is deciphered and then positions amenable to such mutations are predicted in each target. Mutational indices of individual residues in each target structure are computed based on sequence conservation. Saturated mutagenesis is performed in silico and structural stability analysis of the target proteins has been carried out. Critical insights were obtained in terms of which amino acid positions are prone to acquiring mutations. This in turn suggests interactions that are not desirable, thus can be translated into guidelines for modifying the existing drugs as well as for designing new drugs. Finally, the work presented here describes application of the systems biology approaches to understand the underlying mechanisms of drug resistance, which has provided insights for drug discovery on multiple fronts though target identification, target prioritization and identification of co-targets. In particular, the work has led to a rational exploration of collateral drug sensitivity and cross-resistance of the drug-resistant strain to other compounds. Combinations of such compounds with isoniazid were first identified in the M. smegmatis model system and later tested to hold good for the virulent M. tuberculosis strain, in a collaborative study. The combinations were found to be active against three different clinical drug-resistant isolates of M. tuberculosis. Therefore, this study not only reveals the global view of resistance mechanisms but also identifies synergistic combinations of promising drug candidates based on the learnt mechanisms, demonstrating a possible route to exploring drug repurposing. The combinations are seen to work at a much reduced dosage as compared to the conventional tuberculosis drug regimens, indicating that the toxicity and any associated adverse effects may be greatly reduced, suggesting that the combinations may have a high chance to succeed in the next steps of the drug discovery pipeline. Drug Resistance Tuberculosis Drug Resistance Mycobacteria M. smegmatis MC2 155 Mycobacterium smegmatis M. tuberculosis Anti-tubercular Drugs Mycobacterium tuberculosis Biochemistry
2	Structural, Functional And Transcriptional Analysis Of Nucleoside Diphosphate Kinase From Mycobacterium Smegmatis mc2 155 Arumugam, Muthu 10 1900 (has links) (PDF) Maintenance of the levels of nucleoside triphosphates (NTPs) as well as their corresponding deoxy derivatives (dNTPs) is crucial to all growth and developmental processes. The enzyme nucleoside diphosphate kinase (NDK) utilises an autophosporylated enzyme intermediate to catalyse the transfer of 5’ terminal phosphate from NTPs (mostly ATP) to nucleoside diphosphates (NDPs) via a reversible mechanism as given below. N1TP + NDK ↔N1DP+ −NDK-His* (1) N2DP + NDK-His* P ↔N2TP + NDK−His. (2) In the γ-phosphoryl group transfer, the highly conserved His 117 active site residue becomes autocatalytically phosphorylated, in the enzyme intermediate (NDK-H*). This phosphoryl group is transferred to ribo-or deoxyribonucleotides (N2DP) in a substrate non-specific manner. In addition to its fundamental role in nucleotide metabolism, NDP kinase is also involved in a number of cellular regulatory functions such as growth and developmental control, tumor metastasis suppression, signal transduction and so on. From mycobacterial genera, NDK of Mycobacterium tuberculosis (MtNDK) has been crystallised, structure was solved and biochemical functions were elucidated. However, there has not been any such study on the NDK of Mycobacterium smegmatis, except on the possible interaction with other proteins which modulates the NTP synthesising activity of MsNDK, towards specific NTPs. M. smegmatis, being a saprophytic, fast growing and non-pathogenic mycobacterium that is widely used as an experimental model mycobacterial system to study various biological processes in mycobacteria, it was thought appropriate to study NDK from this organism. The outcome of current study is presented in five chapters. The First Chapter gives a detailed introduction on the structural and functional aspects of NDK from diverse organisms, from bacteria to humans. Chapter 2. Molecular Cloning, Expression and Characterisation of Biochemical Activities of Nucleoside Diphosphate Kinase from Mycobacterium smegmatis mc 155 The research work starts with the molecular cloning, overexpression, purification, and characterisation of biochemical activities of recombinant MsNDK protein. In brief, ndk gene from M. smegmatis (Msndk) has been cloned, efficiently overexpressed as a soluble 6xHis-tagged recombinant protein, purified through affinity chromatography, and its biochemical characterisation for ATPase, GTPase and NTP synthesising activities have been demonstrated. Catalytic mutant of MsNDK, MsNDK-H117Q, was generated using site-directed mutagenesis approach and H117 was shown to be essential for the catalytic activity. Further experiments revealed that it is the same H117 that is required for mediating autophosphorylation as well, which is an intermediate in the transphosphorylation reaction of NDK. Chapter 3. Characterisation of Oligomerisation Property of M. smegmatis Nucleoside Diphosphate Kinase: the Possible Role of Hydrogen Bond and Hydrophobic Interactions The present study revealed that presence of homodimer of MsNDK could be observed in the presence of heat and SDS. Chemical cross-linking experiments revealed that MsNDK forms dimer, tetramer and hexamer. Homology modeling of MsNDK on the MtNDK crystal structure supported the existence of hexamer as three homodimers. Gln 17, Ser 24 and Glu 27 were found to be positioned at the dimer interface. Mutations on these residues did not abolish the stability of the respective mutant dimers in the presence of SDS and heat. Modeled structure of MsNDK revealed the existence of hydrophobic interactions at the dimer interface. In silico approach helped in mapping the existence of hydrophobic interactions at the dimer interface as two consecutive β-strands. Exposure of hydrophobic residues, using organic solvent methanol, abolished the dimer completely, indicating the vital role of hydrophobic interactions in the dimer stability. In solution, the native MsNDK was found to be a hexamer. Chapter 4. Mycobacterial Nucleoside Diphosphate Kinase Functions as GTPase Activating Protein for Mycobacterial Cytokinetic Protein FtsZ In Vitro Mammalian, plant, and bacterial NDKs can function as GTPase activating protein (GAP) for small G proteins namely, p21 Ras, Rad, and Rho-GTPases in animals and Pra1, Pra2, and GPA1 in Arabidopsis thaliana in vitro. We examined whether NDK of M. tuberculosis (MtNDK) can function as GAP in vitro for the cytokinetic protein FtsZ of Mycobacterium tuberculosis (MtFtsZ), which is a protein with a classical G-protein fold, possessing GTP-binding and GTPase activities (like G proteins). Both MtNDK and MsNDK could function as GAP for MtFtsZ and FtsZ of M. smegmatis (MsFtsZ) respectively in vitro. Similarly, MtNDK could function as GAP for MsFtsZ and reciprocally MsNDK could function as GAP from MtFtsZ. Interaction of NDK with respective FtsZ could be observed. Physiological implications of GAP activity of NDK on FtsZ are discussed. Chapter 5. Transcriptional Analyses of Nucleoside Diphosphate Kinase Gene of Mycobacterium smegmatis mc 155 Although there are studies on the structural and functional aspects of NDK, there are not many studies available on the transcriptional analysis of nucleoside diphosphate kinase (NDK) gene expression in general and nothing in particular in mycobacterial systems. Therefore we studied the transcriptional analysis of expression of Msndk gene, in order to map the Transcriptional Start Site (TSS), identification of promoter elements, and elucidated of transcriptional activity of the promoters. Expression of Msndk gene was analysed in exponential growth phase and under two different stress conditions wherein DNA replication gets arrested. Hydroxy Urea (HU), which reduce dNTP pools by inhibiting ribonucleotide reductase and Phenethyl Alcohol (PEA), which affects membrane structure resulting in DNA replication arrest, were used. Two transcripts and their promoter elements were mapped and their promoter activities were demonstrated. The profile of transcripts was found to be identical under the three different conditions examined. Nucleoside Sequence Micobacterium Smegmatis Structural Analysis Transcriptional Analysis Nucleoside Diphosphate Kinase mc2 155 FtsZ M. smegmatis Biochemical Genetics
3	Computational Studies on Structures and Functions of Single and Multi-domain Proteins Mehrotra, Prachi January 2017 (has links) (PDF) Proteins are essential for the growth, survival and maintenance of the cell. Understanding the functional roles of proteins helps to decipher the working of macromolecular assemblies and cellular machinery of living organisms. A thorough investigation of the link between sequence, structure and function of proteins, helps in building a comprehensive understanding of the complex biological systems. Proteins have been observed to be composed of single and multiple domains. Analysis of proteins encoded in diverse genomes shows the ubiquitous nature of multi-domain proteins. Though the majority of eukaryotic proteins are multi-domain in nature, 3-D structures of only a small proportion of multi-domain proteins are known due to difficulties in crystallizing such proteins. While functions of individual domains are generally extensively studied, the complex interplay of functions of domains is not well understood for most multi-domain proteins. Paucity of structural and functional data, affects our understanding of the evolution of structure and function of multi-domain proteins. The broad objective of this thesis is to achieve an enhanced understanding of structure and function of protein domains by computational analysis of sequence and structural data. Special attention is paid in the first few chapters of this thesis on the multi-domain proteins. Classification of multi-domain proteins by implementation of an alignment-free sequence comparison method has been achieved in Chapters 2 and 3. Studies on organization, interactions and interdependence of domain-domain interactions in multi-domain proteins with respect to sequential separation between domains and N to C-terminal domain order have been described in Chapters 4 and 5. The functional and structural repertoire of organisms can be comprehensively studied and compared using functional and structural domain annotations. Chapter 6, 7 and 8 represent the proteome-wide structure and function comparisons of various pathogenic and non-pathogenic microorganisms. These comparisons help in identifying proteins implicated in virulence of the pathogen and thus predict putative targets for disease treatment and prevention. Chapter 1 forms an introduction to the main subject area of this thesis. Starting with describing protein structure and function, details of the four levels of hierarchical organization of protein structure have been provided, along with the databases that document protein sequences and structures. Classification of protein domains considered as the realm of function, structure and evolution has been described. The usefulness of classification of proteins at the domain level has been highlighted in terms of providing an enhanced understanding of protein structure and function and also their evolutionary relatedness. The details of structure, function and evolution of multi-domain proteins have also been outlined in chapter 1. ! Chapter 2 aims to achieve a biologically meaningful classification scheme for multi-domain protein sequences. The overall function of a multi-domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence-based methods utilize only the domain-level information to classify proteins. This does not take into account the contributions of accessory domains and linker regions towards the overall function of a multi-domain protein. An alignment-free protein sequence comparison tool, CLAP (CLAssification of Proteins) previously developed in this laboratory, was assessed and improved when the author joined the group. CLAP was developed especially to handle multi-domain protein sequences without a requirement of defining domain boundaries and sequential order of domains (domain architecture). ! The working principle of CLAP involves comparison of all against all windows of 5-residue sequence patterns between two protein sequences. The sequences compared could be full-length comprising of all the domains in the two proteins. This compilation of comparison is represented as the Local Matching Scores (LMS) between protein sequences (nslab.iisc.ernet.in/clap/). It has been previously shown that the execution time of CLAP is ~7 times faster than other protein sequence comparison methods that employ alignment of sequences. In Chapter 2, CLAP-based classification has been carried out on two test datasets of proteins containing (i) Tyrosine phosphatase domain family and (ii) SH3-domain family. The former dataset comprises both single and multi-domain proteins that sometimes consist of domain repeats of the tyrosine phosphatase domain. The latter dataset consists only of multi-domain proteins with one copy of the SH3-domain. At the domain-level CLAP-based classification scheme resulted in a clustering similar to that obtained from an alignment-based method, ClustalW. CLAP-based clusters obtained for full-length datasets were shown to comprise of proteins with similar functions and domain architectures. Hence, a protein classification scheme is shown to work efficiently that is independent of domain definitions and requires only the full-length amino acid sequences as input.! Chapter 3 explores the limitations of CLAP in large-scale protein sequence comparisons. The potential advantages of full-length protein sequence classification, combined with the availability of the alignment-free sequence comparison tool, CLAP, motivated the conceptualization of full-length sequence classification of the entire protein repertoire. Before undertaking this mammoth task, working of CLAP was tested for a large dataset of 239,461 protein sequences. Chapter 3 discusses the technical details of computation, storage and retrieval of CLAP scores for a large dataset in a feasible timeframe. CLAP scores were examined for protein pairs of same domain architecture and ~22% of these showed 0 CLAP similarity scores. This led to investigation of the sensitivity of CLAP with respect to sequence divergence. Several test datasets of proteins belonging to the same SCOP fold were constructed and CLAP-based classification of these proteins was examined at inter and intra-SCOP family level. CLAP was successful in efficiently clustering evolutionary related proteins (defined as proteins within the same SCOP superfamily) if their sequence identity >35%. At lower sequence identities, CLAP fails to recognize any evolutionary relatedness. Another test dataset consisting of two-domain proteins with domain order swapped was constructed. Domain order swap refers to domain architectures of type AB and BA, consisting of domains A and B. A condition that the sequence identities of homologous domains were greater than 35% was imposed. CLAP could effectively cluster together proteins of the same domain architectures in this case. Thus, the sequence identity threshold of 35% at the domain-level improves the accuracy of CLAP. The analysis also showed that for highly divergent sequences, the expectation of 5-residue pattern match was likely a stringent criterion. Thus, a modification in the 5-residue identical pattern match criterion, by considering even similar residue and gaps within matched patterns may be required to effectuate CLAP-based clustering of remotely related protein sequences. Thus, this study highlights the limitations of CLAP with respect to large-scale analysis and its sensitivity to sequence divergence. ! Chapters 4 and 5 discuss the computational analysis of inter-domain interactions with respect to sequential distance and domain order. Knowledge of domain composition and 3-D structures of individual domains in a multi-domain protein may not be sufficient to predict the tertiary structure of the multi-domain protein. Substantial information about the nature of domain-domain interfaces helps in prediction of the tertiary as well as the quaternary structure of a protein. Therefore, chapter 4 explores the possible relationship between the sequential distance separating two domains in a multi-domain protein and the extent of their interaction. With increasing sequential separation between any two domains, the extent of inter-domain interactions showed a gradual decrease. The trend was more apparent when sequential separation between domains is measured in terms of number of intervening domains. Irrespective of the linker length, extensive interactions were seen more often between contiguous domains than between non-contiguous domains. Contiguous domains show a broader interface area and lower proportion of non-interacting domains (interface area: 0 Å2 to - 4400 Å2, 2.3% non-interacting domains) than non-contiguous domains (interface area: 0 Å2 to - 2000 Å2, 34.7% non-interacting domains). Additionally, as inter-protein interactions are mediated through constituent domains, rules of protein-protein interactions were applied to domain-domain interactions. Tight binding between domains is denoted as putative permanent domain-domain interactions and domains that may dissociate and associate with relatively weak interactions to regulate functional activity are denoted as putative transient domain-domain interactions. An interface area threshold of 600 Å2 was utilized as a binary classifier to distinguish between putative permanent and putative transient domain-domain interactions. Therefore, the state of interaction of a domain pair is defined as either putative permanent or putative transient interaction. Contiguous domains showed a predominance of putative permanent nature of inter-domain interface, whereas non-contiguous domains showed a prevalence of putative transient interfaces. The state of interaction of various SCOP superfamily pairs was studied across different proteins in the dataset. SCOP superfamily pairs mostly showed a conserved state of interaction, i.e. either putative permanent or putative transient in all their occurrences across different proteins. Thus, it is noted that contiguous domains interact extensively more often than non-contiguous domains and specific superfamily pairs tend to interact in a conserved manner. In conclusion, a combination of interface area and other inter-domain properties along with experimental validation will help strengthen the binary classification scheme of putative permanent and transient domain-domain interactions.! Chapter 5 provides structural analysis of domain pairs occurring in different sequential domain orders in mutli-domain proteins. The function and regulation of a multi-domain protein is predominantly determined by the domain-domain interactions. These in turn are influenced by the sequential order of domains in a protein. With domains defined using evolutionary and structural relatedness (SCOP superfamily), their conservation of structure and function was studied across domain order reversal. A domain order reversal indicates different sequential orders of the concerned domains, which may be identified in proteins of same or different domain compositions. Domain order reversals of domains A and B can be indicated in protein pair consisting of the domain architectures xAxBx and xBxAx, where x indicates 0 or more domains. A total of 161 pairs of domain order reversals were identified in 77 pairs of PDB entries. For most of the comparisons between proteins with different domain composition and architecture, large differences in the relative spatial orientation of domains were observed. Although preservation of state of interaction was observed for ~75% of the comparisons, none of the inter-domain interfaces of domains in different order displayed high interface similarity. These domain order reversals in multi-domain proteins are contributed by a limited number of 15 SCOP superfamilies. Majority of the superfamilies undergoing order reversal either function as transporters or regulatory domains and very few are enzymes. A higher proportion of domain order reversals were observed in domains separated by 0 or 1 domains than those separated by more than 1 domain. A thorough analysis of various structural features of domains undergoing order reversal indicates that only one order of domains is strongly preferred over all possible orders. This may be due to either evolutionary selection of one of the orders and its conservation throughout generations, or the fact that domain order reversals rarely conserve the interface between the domains. Further studies (Chapters 6 to 8) utilize the available computational techniques for structural and functional annotation of proteins encoded in a few bacterial genomes. Based on these annotations, proteome-wide structure and function comparisons were performed between two sets of pathogenic and non-pathogenic bacteria. The first study compares the pathogenic Mycobacterium tuberculosis to the closely related organism Mycobacterium smegmatis which is non-pathogenic. The second study primarily identified biologically feasible host-pathogen interactions between the human host and the pathogen Leptospira interrogans and also compared leptospiral-host interactions of the pathogenic Leptospira interrogans and of the saprophytic Leptospira biflexa with the human host. Chapter 6 describes the function and structure annotation of proteins encoded in the genome of M. smegmatis MC2-155. M. smegmatis is a widely used model organism for understanding the pathophysiology of M. tuberculosis, the primary causative agent of tuberculosis in humans. M. smegmatis and M. tuberculosis species of the mycobacterial genus share several features like a similar cell-wall architecture, the ability to oxidise carbon monoxide aerobically and share a huge number of homologues. These features render M. smegmatis particularly useful in identifying critical cellular pathways of M. tuberculosis to inhibit its growth in the human host. In spite of the similarities between M. smegmatis and M. tuberculosis, there are stark differences between the two due to their diverse niche and lifestyle. While there are innumerable studies reporting the structure, function and interaction properties of M. tuberculosis proteins, there is a lack of high quality annotation of M. smegmatis proteins. This makes the understanding of the biology of M. smegmatis extremely important for investigating its competence as a good model organism for M. tuberculosis. With the implementation of available sequence and structural profile-based search procedures, functional and structural characterization could be achieved for ~92% of the M. smegmatis proteome. Structural and functional domain definitions were obtained for a total of 5695 of 6717 proteins in M. smegmatis. Residue coverage >70% was achieved for 4567 proteins, which constitute ~68% of the proteome. Domain unassigned regions more than 30 residues were assessed for their potential to be associated to a domain. For 1022 proteins with no recognizable domains, putative structural and functional information was inferred for 328 proteins by the use of distance relationship detection and fold recognition methods. Although 916 sequences of 1022 proteins with no recognizable domains were found to be specific to M. smegmatis species, 98 of these are specific to its MC2-155 strain. Of the 1828 M. smegmatis proteins classified as conserved hypothetical proteins, 1038 proteins were successfully characterized. A total of 33 Domains of Unknown Function (DUFs) occurring in M. smegmatis could be associated to structural domains. A high representation of the tetR and GntR family of transcription regulators was noted in the functional repertoire of M. smegmatis proteome. As M. smegmatis is a soil-dwelling bacterium, transcriptional regulators are crucial for helping it to adapt and survive the environmental stress. Similarly, the ABC transporter and MFS domain families are highly represented in the M. smegmatis proteome. These are important in enabling the bacteria to uptake carbohydrate from diverse environmental sources. A lower number of virulent proteins were identified in M. smegmatis, which justifies its non-pathogenicity. Thus, a detailed functional and structural annotation of the M. smegmatis proteome was achieved in Chapter 6. Chapter 7 delineates the similarities and difference in the structure and function of proteins encoded in the genomes of the pathogenic M. tuberculosis and the non-pathogenic M. smegmatis. The protocol employed in Chapter 6 to achieve the proteome-wide structure and function annotation of M. smegmatis was also applied to M. tuberculosis proteome in Chapter 7. The number of proteins encoded by the genome of M. smegmatis strain MC2-155 (6717 proteins) is comparatively higher than that in M. tuberculosis strain H37Rv (4018 proteins). A total of 2720 high confidence orthologues sharing ≥30% sequence identity were identified in M. tuberculosis with respect to M. smegmatis. Based on the orthologue information, specific functional clusters, essential proteins, metabolic pathways, transporters and toxin-antitoxin systems of M. tuberculosis were inspected for conservation in M. smegmatis. Among the several categories analysed, 53 metabolic pathways, 44 membrane transporter proteins belonging to secondary transporters and ATP-dependent transporter classes, 73 toxin-antitoxin systems, 23 M. tuberculosis-specific targets, 10 broad-spectrum targets and 34 targets implicated in persistence of M. tuberculosis could not detect any orthologues in M. smegmatis. Several of the MFS superfamily transporters act as drug efflux pumps and are hence associated with drug resistance in M. tuberculosis. The relative abundances of MFS and ABC superfamily transporters are higher in M. smegmatis than in M. tuberculosis. As these transporters are involved in carbohydrate uptake, their higher representation in M. smegmatis than in M. tuberculosis highlights the lack of proficiency of M. tuberculosis to assimilate diverse carbon sources. In the case of porins, MspA-like and OmpA-like porins are selectively present in either M. smegmatis or M. tuberculosis. These differences help to elucidate protein clusters for which M. smegmatis may not be the best model organism to study M. tuberculosis proteins.! At the domain-level, ATP-binding domain of ABC transporters, tetracycline transcriptional regulator (tetR) domain family, major facilitator superfamily (MFS) domain family, AMP-binding domain family and enoyl-CoA hydrolase domain family are highly represented in both M. smegmatis and M. tuberculosis proteomes. These domains play an essential role in the carbohydrate uptake systems and drug-efflux pumps among other diverse functions in mycobacteria. There are several differentially represented domain families in M. tuberculosis and M. smegmatis. For example, the pentapeptide-repeat domain, PE, PPE and PIN domains although abundantly present in M. tuberculosis, are very rare in M. smegmatis. Therefore, such uniquely or differentially represented functional and structural domains in M. tuberculosis as compared to M. smegmatis may be linked to pathogenicity or adaptation of M. tuberculosis in the host. Hence, major differences between M. tuberculosis and M. smegmatis were identified, not only in terms of domain populations but also in terms of domain combinations. Thus, Chapter 7 highlights the similarities and differences between M. smegmatis and M. tuberculosis proteomes in terms of structure and function. These differences provide an understanding of selective utilization of M. smegmatis as a model organism to study M. tuberculosis. ! In Chapter 8, computational tools have been employed to predict biologically feasible host-pathogen interactions between the human host and the pathogenic, Leptospira interrogans. Sensitive profile-based search procedures were used to specifically identify practical drug targets in the genome of Leptospira interrogans, the causative agent of the globally widespread zoonotic disease, Leptospirosis. Traditionally, the genus Leptospira is classified into two species complex- the pathogenic L. interrogans and the non-pathogenic saprophyte L. biflexa. The pathogen gains entry into the human host through direct or indirect contact with fluids of infected animals. Several ambiguities exist in the understanding of L. interrogans pathogenesis. An integration of multiple computational approaches guided by experimentally derived protein-protein interactions, was utilized for recognition of host-pathogen protein-protein interactions. The initial step involved the identification of similarities of host and L. interrogans proteins with crystal structures of experimentally known transient protein-protein complexes. Further, conservation of interfacial nature was used to obtain high confidence predictions for putative host-pathogen protein-protein interactions. These predictions were subjected to further selection based on subcellular localization of proteins of the human host and L. interrogans, and tissue-specific expression profiles of the host proteins. A total of 49 protein-protein interactions mediated by 24 L. interrogans proteins and 17 host proteins were identified and these may be subjected to further experimental investigations to assess their in vivo relevance. The functional relevance of similarities and differences between the pathogenic and non-pathogenic leptospires in terms of interactions with the host has also been explored. For this, protein-protein interactions across human host and the non-pathogenic saprophyte L. biflexa were also predicted. Nearly 39 leptospiral-host interactions were recognized to be similar across both the pathogen and saprophyte in the context of processes that influence the host. The overlapping leptospiral-host interactions of L. interrogans and L. biflexa proteins with the human host proteins are primarily associated with establishment of its entry into the human host. These include adhesion of the leptospiral proteins to host cells, survival in host environment such as iron acquisition and binding to components of extracellular matrix and plasma. The disjoint sets of leptospiral-host interactions are species-specific interactions, more importantly indicative of the establishment of infection by L. interrogans in the human host and immune clearance of L. biflexa by the human host. With respect to L. interrogans, these specific interactions include interference with blood coagulation cascade and dissemination to target organs by means of disruption of cell junction assembly. On the other hand, species-specific interactions of L. biflexa proteins include those with components of host immune system. ! In spite of the limited availability of experimental evidence, these help in identifying functionally relevant interactions between host and pathogen by integrating multiple lines of evidence. Thus, inferences from computational prediction of host-pathogen interactions act as guidelines for experimental studies investigating the in vivo relevance of these predicted protein-protein interactions. This will further help in developing effective measures for treatment and disease prevention. In summary, Chapters 2 and 3 describe the implementation, advantages and limitations of the alignment-free full-length sequence comparison method, CLAP. Chapter 4 and 5 are dedicated to understand the domain-domain interactions in multi-domain protein sequences and structures. In Chapters 6, 7 and 8 the computational analyses of the mycobacterial species and leptospiral species helped in an enhanced understanding of the functional repertoire of these bacteria. These studies were undertaken by utilizing the biological sequence data available in public databases and implementation of powerful homology-detection techniques. The supplemental data associated with the chapters is provided in a compact disc attached with this thesis.! Proteins - Building Blocks Protein Sequences Protein Domain Hidden Markov Models (HMM) Multi-domain Proteins Mycobacterium smegmatis MC2-155 Mycobacterium tuberculosis Proteomes Leptospira Interrogans Leptospira Biflexa Proteomes Leptospira Biflexa Genomes Mycobacterium tuberculosis H37Rv Mathematics

1

Page generated in 0.2671 seconds