Global ETD Search

1	Microorganismos do solo e de manguezais: fonte de produtos antimicrobianos. / Microorganisms from the soil and from the mangrove swamps: source of antimicrobian products. Firoozmand, Lília Macedo 24 July 2008 (has links) A biodiversidade de microrganismos encontrados nos ecossistemas constitui excelentes fontes para a descoberta de moléculas farmacologicamente ativas. Neste estudo, 32 isolados de actinobactérias coletadas do solo e 51 isolados de fungos de manguezais da costa brasileira foram avaliados quanto à ação antifúngica, antimicobacteriana, leishmanicida e tripanossomicida. Extratos orgânicos obtidos a partir do sobrenadante da cultura dos isolados foram testados e cinco apresentaram concentrações inibitórias mínimas iguais ou inferiores a 400 mg/mL sobre fungos patogênicos e dois demonstraram expressiva ação contra a forma tripomastigota de Trypanosoma cruzi. Para a forma promastigota de Leishmania amazonensis e Mycobacterium tuberculosis H37Rv, os extratos não foram efetivos. Os resultados indicam que fungos isolados de manguezais representam boas perspectivas na investigação de novos agentes antimicrobianos. / The biodiversity of microorganisms found in the ecosystems provide excellent perspectives for the discovery of pharmacologically active molecules. In this study, 32 actinobacteria from the soil and 51 fungi from the mangrove swamps of the Brazilian coast were analyzed with respect to their antifungal, antimycobacterial, leishmanicidal and trypanocidal actions. Organic extracts from the supernatant of the culture of the microorganisms were analyzed and five extracts presented MICs equal or less than 400 mg/mL over the pathogenic fungi and two presented significant action against the trypomastigote of Trypanosoma cruzi. The results indicate that fungi from mangrove swamps present promising perspectives for the research of new antimicrobial agents. Leishmania Leishmania Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis H37Rv Trypanosoma cruzi Trypanosoma cruzi Ação antimicrobiana Antimicrobial action Fungi Fungos Microorganismos Microorganisms
2	Microorganismos do solo e de manguezais: fonte de produtos antimicrobianos. / Microorganisms from the soil and from the mangrove swamps: source of antimicrobian products. Lília Macedo Firoozmand 24 July 2008 (has links) A biodiversidade de microrganismos encontrados nos ecossistemas constitui excelentes fontes para a descoberta de moléculas farmacologicamente ativas. Neste estudo, 32 isolados de actinobactérias coletadas do solo e 51 isolados de fungos de manguezais da costa brasileira foram avaliados quanto à ação antifúngica, antimicobacteriana, leishmanicida e tripanossomicida. Extratos orgânicos obtidos a partir do sobrenadante da cultura dos isolados foram testados e cinco apresentaram concentrações inibitórias mínimas iguais ou inferiores a 400 mg/mL sobre fungos patogênicos e dois demonstraram expressiva ação contra a forma tripomastigota de Trypanosoma cruzi. Para a forma promastigota de Leishmania amazonensis e Mycobacterium tuberculosis H37Rv, os extratos não foram efetivos. Os resultados indicam que fungos isolados de manguezais representam boas perspectivas na investigação de novos agentes antimicrobianos. / The biodiversity of microorganisms found in the ecosystems provide excellent perspectives for the discovery of pharmacologically active molecules. In this study, 32 actinobacteria from the soil and 51 fungi from the mangrove swamps of the Brazilian coast were analyzed with respect to their antifungal, antimycobacterial, leishmanicidal and trypanocidal actions. Organic extracts from the supernatant of the culture of the microorganisms were analyzed and five extracts presented MICs equal or less than 400 mg/mL over the pathogenic fungi and two presented significant action against the trypomastigote of Trypanosoma cruzi. The results indicate that fungi from mangrove swamps present promising perspectives for the research of new antimicrobial agents. Leishmania Mycobacterium tuberculosis H37Rv Trypanosoma cruzi Ação antimicrobiana Fungos Microorganismos Leishmania Mycobacterium tuberculosis H37Rv Trypanosoma cruzi Antimicrobial action Fungi Microorganisms
3	Bioprospecção para compostos antimicobacterianos Cardoso, Franciano Dias Pereira 18 September 2017 (has links) A tuberculose representa um sério problema de saúde pública, com alta taxa de incidência, surgimento de formas multirresistentes e difícil tratamento. Torna-se necessário o desenvolvimento de novos compostos e as plantas são fontes destes recursos. O cerrado é possuidor de uma grande biodiversidade e detém potencial em conter espécimes com alguma atividade biológica. Neste estudo objetivou-se a execução de ensaios laboratoriais, a fim de determinar a atividade antimicobacteriana envolvendo dez extratos brutos de plantas presentes neste bioma: Plathymenia reticulata, Ouratea spectabilis, Galactia glaucescens, Apuleia molaris, Dipteryx alata, Brosimum gaudichaudii, Tabebuia caraíba, Pterodon emarginatus, Terminalia fagifolia e Stachytarpheta sp. Os resultados da pesquisa demonstraram que os extratos hidroalcoólicos de O. spectabilis e A. molaris tiveram expressivas concentrações inibitórias para Mycobaterium tuberculosis e demonstraram baixa toxicidade frente as células LLC-MK2 e Vero. O fracionamento dessas duas amostras, relevou uma fração hexânica da A. molaris com significativa ação farmacológica frente a cepas H37Rv, sendo caracterizada como possuidora de compostos promissores para futuros ensaios envolvendo atividade antituberculose. / Tuberculosis represents a serious public health problem, with a high incidence rate, the emergence of multiresistant forms and difficult treatment. It is necessary to develop new compounds and plants are sources of these resources. The Brazilian savanna vegetation has a great biodiversity and holds the potential to contain specimens with some biological activity. The objective of this study was determined the antimycobacterial activity involving ten crude extracts of plants present in this biome: Plathymenia reticulata, Ouratea spectabilis, Galactia glaucescens, Apuleia molaris, Dipteryx alata, Brosimum gaudichaudii, Tabebuia caraíba, Pterodon Emarginatus, Terminalia fagifolia and Stachytarpheta sp. The results of the research demonstrated that the hydroalcoholic extracts of O. spectabilis and A. molaris had significant inhibitory concentrations for Mycobaterium tuberculosis and showed low toxicity against LLC-MK2 and Vero cells. The fractionation of these two samples revealed a hexanic fraction of A. molaris with significant pharmacological action against H37Rv, being characterized as having promising compounds for future trials involving antituberculosis activity. CNPQ::OUTROS Apuleia molaris Cerrado H37Rv Ouratea spectabilis Tuberculose Brazilian savanna Tuberculosis
4	Inferences on Structure and Function of Proteins from Sequence Data : Development of Methods and Applications Mudgal, Richa January 2015 (has links) (PDF) Structural and functional annotation of sequences of putative proteins encoded in the newly sequenced genomes pose an important challenge. While much progress has been made towards high throughput experimental techniques for structure determination and functional assignment to proteins, most of the current genome-wide annotation systems rely on computational methods to derive cues on structure and function based on relationship with related proteins of known structure and/or function. Evolutionary pressure on proteins, forces the retention of sequence features that are important for structure and function. Thus, if it can be established that two proteins have descended from a common ancestor, then it can be inferred that the structural fold and biological function of the two proteins would be similar. Homology based information transfer from one protein to another has played a central role in the understanding of evolution of protein structures, functions and interactions. Many algorithmic improvements have been developed over the past two decades to recognize homologues of a protein from sequence-based searches alone, but there are still a large number of proteins without any functional annotation. The sensitivity of the available methods can be further enhanced by indirect comparisons with the help of intermediately-related sequences which link related families. However, sequence-based homology searches in the current protein sequence space are often restricted to the family members, due to the paucity of natural intermediate sequences that can act as linkers in detecting remote homologues. Thus a major goal of this thesis is to develop computational methods to fill up the sparse regions in the protein sequence space with computationally designed protein-like sequences and thereby create a continuum of protein sequences, which could aid in detecting remote homologues. Such designed sequences are further assessed for their effectiveness in detection of distant evolutionary relationships and functional annotation of proteins with unknown structure and function. Another important aspect in structural bioinformatics is to gain a good understanding of protein sequence - structure - function paradigm. Functional annotations by comparisons of protein sequences can be further strengthened with the addition of structural information; however, instances of functional divergence and convergence may lead to functional mis-annotations. Therefore, a systematic analysis is performed on the fold–function associations using binding site information and their inter-relationships using binding site similarity networks. Chapter 1 provides a background on proteins, their evolution, classification and structural and functional features. This chapter also describes various methods for detection of remote similarities and the role of protein sequence design methods in detection of distant relatives for protein annotation. Pitfalls in prediction of protein function from sequence and structure are also discussed followed by an outline of the thesis. Chapter 2 addresses the problem of paucity of available protein sequences that can act as linkers between distantly related proteins/families and help in detection of distant evolutionary relationships. Previous efforts in protein sequence design for remote homology detection and design of sequences corresponding to specific protein families are discussed. This chapter describes a novel methodology to computationally design intermediately-related protein sequences between two related families and thus fill-in the gaps in the sequence space between the related families. Protein families as defined in SCOP database are represented as position specific scoring matrices (PSSMs) and these profiles of related protein families within a fold are aligned using AlignHUSH -a profile-profile alignment method. Guided by this alignment, the frequency distribution of the amino acids in the two families are combined and for each aligned position a residue is selected based on the combined probability to occur in the alignment positions of two families. Each computationally designed sequence is then subjected to RPS-BLAST searches against an all profile pool representing all protein families. Artificial sequences that detect both the parent profiles with no hits corresponding to other folds qualify as ‘designed intermediate sequences’. Various scoring schemes and divergence levels for the design of protein-like sequences are investigated such that these designed sequences intersperse between two related families, thereby creating a continuum in sequence space. The method is then applied on a large scale for all folds with two or more families and resulted in the design of 3,611,010 intermediately-related sequences for 27,882 profile-profile alignments corresponding to 374 folds. Such designed sequences are generic in nature and can be augmented in any sequence database of natural protein sequences. Such enriched databases can then be queried using any sequence-based remote homology detection method to detect distant relatives. The next chapter (Chapter 3) explores the ability of these designed intermediate sequences to act as linkers of two related families and aid in detection of remote homologues. To assess the applicability of these designed sequences two types of databases have been generated, namely a CONTROL database containing protein sequences from natural sequence databases and an AUGMENTED database in which designed sequences are included in the database of natural sequences. Detailed assessments of the utility of such designed sequences using traditional sequence-based searches in the AUGMENTED database showed an enhanced detection of remote homologues for almost 74% of the folds. For over 3,000 queries, it is demonstrated that designed sequences are positioned as suitable linkers, which mediate connections between distantly related proteins. Using examples from known distant evolutionary relationships, we demonstrate that homology searches in augmented databases show an increase of up to 22% in the number of /correct evolutionary relationships "discovered". Such connections are reported with high sensitivities and very low false positive rates. Interestingly, they fill-in void and sparse regions in sequence space and relate distant proteins not only through multiple routes but also through SCOP-NrichD database, SUPFAM+ database, SUPERFAMILY database, protein domain library queried by pDomTHREADER and HHsearch against HMM library of SCOP families. This approach detected evolutionary relationships for almost 20% of all the families with no known structure or function. Detailed report of predictions for 614 DUFs, their fold and species distribution are provided in this chapter. These predictions are then enriched with GO terms and enzyme information wherever available. A detailed discussion is provided for few of the interesting assignments: DUF1636, DUF1572 and DUF2092 which are functionally annotated as thioredoxin-like 2Fe-2S ferredoxin, putative metalloenzyme and lipoprotein localization factors respectively. These 614 novel structure-function relationships of which 193 are supported by consensus between at least two of the five methods, can be accessed from http://proline.biochem.iisc.ernet.in/RHD_DUFS/. Protein functions can be appreciated better in the light of evolutionary information from their structures. Chapter 6 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important to obtain a better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Many structural genomics initiative projects have made considerable efforts in solving structures and bridging the growing gap between protein sequences and their structures. The results of such experiments suggest that often the newly solved structure using X-ray crystallography or NMR methods has structural similarity to a protein with already known structure. These relationships often remain undetected due to unavailability of structural information. Therefore, SUPFAM+ database aims to detect such distant relationships between Pfam families by mapping the Pfam families and SCOP domain families. The work presented in this chapter describes the generation of SUPFAM+ database using a sensitive AlignHUSH method to uncover hidden relationships. Firstly, Pfam families are queried against a profile database of SCOP families to derived Pfam-SCOP associations, and then Pfam families are queried against Pfam database to derive Pfam-Pfam relationships. Pfam families that remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families that are already mapped to a SCOP family. The criteria are kept stringent for these mappings to minimize the rate of false positives. In case of a Pfam family mapping to two or more SCOP superfamilies, a decision tree is implemented to assign the Pfam family to a single SCOP superfamily. Using these direct and indirect evolutionary relationships present in the SCOP database, associations between Pfam families are derived. Therefore, relationship between two Pfam families that do not have significant sequence similarity can be identified if both are related to same SCOP superfamily. Almost 36% of the Pfam families could be mapped to SCOP families through direct or indirect association. These Pfam-SCOP associations are grouped into 1,646 different superfamilies and cataloguing changes that occur in the binding sites between two functions, which are analysed in this study to trace possible routes between different functions in evolutionarily related enzymes. The main conclusions of the entire thesis are summarized in Chapter 8, contributing in the area of remote homology detection from sequence information alone and understanding the ‘sequence-structure-function’ paradigm from a binding site perspective. The chapter illustrates the importance of the work presented here in the post-genomic era. The development of the algorithm for the design of ‘intermediately-related sequences’ that could serve as effective linkers in remote homology detection, its subsequent large scale assessment and amenability to be augmented into any protein sequence database and exploration by any sequence-based search method is highlighted. Databases in the NrichD resource are made available in the public domain along with a portal to design artificial sequence for or between protein families. This thesis also provides useful and meaningful predictions for protein families with yet unknown structure and function using NrichD database as well as four other state-of-the-art sequence-based remote homology detection methods. A different aspect addressed in this thesis provides a fundamental understanding of the relationships between protein structure and functions. Evolutionary relationships between functional families are identified using the inherent structural information for these families and fold-function relationships are studied from a perspective of similarities in their binding sites. Such studies help in the area of functional annotation, polypharmacology and protein engineering. Chapter 2 addresses the problem of paucity of available protein sequences that can act as linkers between distantly related proteins/families and help in detection of distant evolutionary relationships. Previous efforts in protein sequence design for remote homology detection and design of sequences corresponding to specific protein families are discussed. This chapter describes a novel methodology to computationally design intermediately-related protein sequences between two related families and thus fill-in the gaps in the sequence space between the related families. Protein families as defined in SCOP database are represented as position specific scoring matrices (PSSMs) and these profiles of related protein families within a fold are aligned using AlignHUSH -a profile-profile alignment method. Guided by this alignment, the frequency distribution of the amino acids in the two families are combined and for each aligned position a residue is selected based on the combined probability to occur in the alignment positions of two families. Each computationally designed sequence is then subjected to RPS-BLAST searches against an all profile pool representing all protein families. Artificial sequences that detect both the parent profiles with no hits corresponding to other folds qualify as ‘designed intermediate sequences’. Various scoring schemes and divergence levels for the design of protein-like sequences are investigated such that these designed sequences intersperse between two related families, thereby creating a continuum in sequence space. The method is then applied on a large scale for all folds with two or more families and resulted in the design of 3,611,010 intermediately-related sequences for 27,882 profile-profile alignments corresponding to 374 folds. Such designed sequences are generic in nature and can be augmented in any sequence database of natural protein sequences. Such enriched databases can then be queried using any sequence-based remote homology detection method to detect distant relatives. The next chapter (Chapter 3) explores the ability of these designed intermediate sequences to act as linkers of two related families and aid in detection of remote homologues. To assess the applicability of these designed sequences two types of databases have been generated, namely a CONTROL database containing protein sequences from natural sequence databases and an AUGMENTED database in which designed sequences are included in the database of natural sequences. Detailed assessments of the utility of such designed sequences using traditional sequence-based searches in the AUGMENTED database showed an enhanced detection of remote homologues for almost 74% of the folds. For over 3,000 queries, it is demonstrated that designed sequences are positioned as suitable linkers, which mediate connections between distantly related proteins. Using examples from known distant evolutionary relationships, we demonstrate that homology searches in augmented databases show an increase of up to 22% in the number of /correct evolutionary relationships "discovered". Such connections are reported with high sensitivities and very low false positive rates. Interestingly, they fill-in void and sparse regions in sequence space and relate distant proteins not only through multiple routes but also through SCOP-NrichD database, SUPFAM+ database, SUPERFAMILY database, protein domain library queried by pDomTHREADER and HHsearch against HMM library of SCOP families. This approach detected evolutionary relationships for almost 20% of all the families with no known structure or function. Detailed report of predictions for 614 DUFs, their fold and species distribution are provided in this chapter. These predictions are then enriched with GO terms and enzyme information wherever available. A detailed discussion is provided for few of the interesting assignments: DUF1636, DUF1572 and DUF2092 which are functionally annotated as thioredoxin-like 2Fe-2S ferredoxin, putative metalloenzyme and lipoprotein localization factors respectively. These 614 novel structure-function relationships of which 193 are supported by consensus between at least two of the five methods, can be accessed from http://proline.biochem.iisc.ernet.in/RHD_DUFS/. Protein functions can be appreciated better in the light of evolutionary information from their structures. Chapter 6 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important to obtain a better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Many structural genomics initiative projects have made considerable efforts in solving structures and bridging the growing gap between protein sequences and their structures. The results of such experiments suggest that often the newly solved structure using X-ray crystallography or NMR methods has structural similarity to a protein with already known structure. These relationships often remain undetected due to unavailability of structural information. Therefore, SUPFAM+ database aims to detect such distant relationships between Pfam families by mapping the Pfam families and SCOP domain families. The work presented in this chapter describes the generation of SUPFAM+ database using a sensitive AlignHUSH method to uncover hidden relationships. Firstly, Pfam families are queried against a profile database of SCOP families to derived Pfam-SCOP associations, and then Pfam families are queried against Pfam database to derive Pfam-Pfam relationships. Pfam families that remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families that are already mapped to a SCOP family. The criteria are kept stringent for these mappings to minimize the rate of false positives. In case of a Pfam family mapping to two or more SCOP superfamilies, a decision tree is implemented to assign the Pfam family to a single SCOP superfamily. Using these direct and indirect evolutionary relationships present in the SCOP database, associations between Pfam families are derived. Therefore, relationship between two Pfam families that do not have significant sequence similarity can be identified if both are related to same SCOP superfamily. Almost 36% of the Pfam families could be mapped to SCOP families through direct or indirect association. These Pfam-SCOP associations are grouped into 1,646 different superfamilies and cataloguing changes that occur in the binding sites between two functions, which are analysed in this study to trace possible routes between different functions in evolutionarily related enzymes. The main conclusions of the entire thesis are summarized in Chapter 8, contributing in the area of remote homology detection from sequence information alone and understanding the ‘sequence-structure-function’ paradigm from a binding site perspective. The chapter illustrates the importance of the work presented here in the post-genomic era. The development of the algorithm for the design of ‘intermediately-related sequences’ that could serve as effective linkers in remote homology detection, its subsequent large scale assessment and amenability to be augmented into any protein sequence database and exploration by any sequence-based search method is highlighted. Databases in the NrichD resource are made available in the public domain along with a portal to design artificial sequence for or between protein families. This thesis also provides useful and meaningful predictions for protein families with yet unknown structure and function using NrichD database as well as four other state-of-the-art sequence-based remote homology detection methods. A different aspect addressed in this thesis provides a fundamental understanding of the relationships between protein structure and functions. Evolutionary relationships between functional families are identified using the inherent structural information for these families and fold-function relationships are studied from a perspective of similarities in their binding sites. Such studies help in the area of functional annotation, polypharmacology and protein engineering. Protien Structure Analysis Proteins Sequences Protein Structures Computational Protein Design Protein Sequence Space NrichD Database H37Rv Proteome Protein Sequence Design Mathamatics
5	Nucleic Acid-binding Adenylyl Cyclases in Mycobacteria : Studies on Evolutionary & Biochemical Aspects Zaveri, Anisha January 2016 (has links) (PDF) Mycobacterium tuberculosis is one of the most successful human pathogens, estimated to have infected close to one-third of the global human population. In order to survive within its host, M. tuberculosis utilises multiple signalling strategies, one of them being synthesis and secretion of universal second messenger cAMP. This process is enabled by the presence of sixteen predicted adenylyl cyclases in the genome of M. tuberculosis H37Rv, ten of which have been characterised in vitro. The synthesized cAMP is recognised by ten putative cAMP-binding proteins in which the cyclic AMP-binding domain is associated with a variety of enzymatic domains. The cAMP signal can be extinguished by degradation by phosphodiesterase’s, secretion into the extracellular milieu or via sequestration of the nucleotide by upregulation of a high-affinity cAMP-binding protein. Of the sixteen adenylyl cyclases (ACs) encoded by M. tuberculosis H37Rv, a subset of multidomain adenylyl cyclases remain poorly characterised, primarily due to challenges associated with studying these in vitro. The adenylyl cyclase domain in these proteins is associated with an NB-ARC domain (nucleotide binding domain common to APAF-1, plant R proteins and CED-4), a TPR domain (tetratricopeptide repeat) and an LuxR-type HTH motif (helix-turn-helix). This architecture places these multidomain mycobacterial ACs within a larger group of STAND (Signal transduction ATPase’s with numerous domains) proteins, and hence they will be referred to as STAND ACs. The STAND proteins are a recently recognised class of multidomain ATPases which integrate a variety of signals prior to activation. Activation is accompanied by formation of large oligomeric signalling hubs which facilitate downstream signalling events. While most STAND proteins have a single effector domain followed by an NB-ARC domain and a scaffolding domain, the STAND ACs distinguish themselves by retaining two effector domains, the AC domain and the HTH domain, at the N- and C- termini respectively. The cyclase, NB-ARC, TPR and HTH domains have widely divergent taxonomic distributions making the presence of these four domains in a single polypeptide rare. In fact, proteins with cyclase-NB-ARC-TPR-HTH (C-A-T-H) domain organisation were found to be encoded almost exclusively by slow growing mycobacterial species, a clade that harbours most mycobacterial pathogens, such as M. tuberculosis and M. leprae. Notably, one of the STAND ACs, Rv0386, is the only mycobacterial AC shown till date to be required for virulence of M. tuberculosis in mice. Using phylogenetic, the evolutionary underpinnings of this domain architecture were examined. The STAND ACs appear to have most likely evolved via a domain gain event from a cyclase-ATPase-TPR progenitor encoded by a strain ancestral to M. marina. Subsequently, the genes duplicated and diverged, sometimes leading to frameshift mutations splitting the cyclase domain from the C-terminal domains. Consequently, M. tuberculosis encodes for three ‘full-length’ STAND ACs, namely, Rv0386, Rv1358 and Rv2488c and one split STAND AC. The split STAND AC is made up of Rv0891c, containing the AC domain, and Rv0890c, containing the NB-ARC, TPR and HTH domains. rv0891c and rv0890c were found to be expressed as an operatic transcript, though they were translationally uncoupled. Pertinently, M. Canetti, an early-branching species of the M. tuberculosis complex, contains an orthologue of Rv0891c and Rv0890c where all four domains are present in a single polypeptide. Sequence analysis of the four STAND ACs in M. tuberculosis allowed predictions of significant divergence in function. These proteins showed high sequence conservation in their HTH domains, with substantial sequence divergence in their TPR, NB-ARC and AC domains. Biochemical analysis on the AC domains revealed that Rv0891c and Rv2488c possessed poor or no AC activity, respectively. On the other hand, the cyclase domain of Rv0386 could catalyse cAMP synthesis. Moreover, for both Rv0891c and Rv0386, presence of the C-terminal domains potentiated adenylyl cyclase activity, suggestive of allosteric regulation within the STAND AC module. Studies on Rv0891c also revealed that the protein could inhibit the adenylyl cyclase activity of Rv0386 in trans. This result thus provided a novel mechanism by which proteins harbouring poorly active/inactive adenylyl cyclase domains could contribute to cAMP levels, by acting as inhibitors of other adenylyl cyclases. The STAND ACs were found to be inactive ATPases. Additionally, incubation with nucleotides did not stimulate oligomerisation of these proteins, unlike what has been shown for several other STAND proteins. However, mutations in the NB-ARC domain perturbed the basal oligomeric state of these proteins, indicating that the NB-ARC domain can influence self- association. A subset of NB-ARC domain mutants also showed increased adenylyl cyclase activity, reiterating the inter-domain cross-talk in the STAND ACs. Since the AC activity of these proteins was meagre, the properties of the HTH domain were examined, as an alternative effector domain. Genomic SELEX was performed using the TPR-HTH domains of Rv0890c, and revealed a set of sequences that bound to this protein, though they lacked common sequence features. Further analysis revealed that Rv0890c bound to DNA in a sequence-independent manner, through the HTH domain. This binding was cooperative with multiple protein units engaging in DNA-binding. Due to the cooperative nature of binding and the lack of sequence preference, Rv0890c appeared coat the DNA molecule. This was further proved by the ability of Rv0890c to protect DNA from DNaseI-mediated degradation, and the requirement for long DNA sequences to form stable DNA-protein complexes. Studies also revealed that Rv0890c interacted with RNA and ssDNA. In fact, the protein as purified from heterologously expressing E. coli cells was bound to RNA. RNA-binding by a LuxR-type HTH has not been reported previously, providing a new function for this class of HTHs. Interestingly, nucleic acid-binding by a fusion Rv0891c-Rv0890c protein, similar to the one encoded in M. canetti, was shown to stimulate adenylyl cyclase activity. This was likely due to a relief of inhibitory interactions between the TPR-HTH and the AC domains, on DNA-binding. Given the high sequence similarity between the HTH domains of the STAND ACs, they were expected to bind to DNA in an identical manner. Indeed, the HTH domains of Rv0386 and Rv1358 engaged with DNA with an identical affinity as Rv0890c. Sequence comparisons in the HTH domain enabled identification of conserved basic residues, of which one, R850 was essential for nucleic acid-binding. Surprisingly however, Rv0386 and Rv1358 did not exhibit RNA-binding, pointing towards functional divergence of Rv0890c from its paralogues. Since the HTH domains of the STAND ACs were highly conserved, it was possible that the ability to bind to RNA was instead dictated by the adjacent TPR modules. To examine this possibility, TPR domains were swapped between Rv0890c and Rv0386. Interestingly, both the chimeric proteins showed a reduced ability to bind to DNA, while showing a complete absence of RNA- binding. These results suggested that the TPR domains were critical in modulating nucleic acid-binding. Moreover, the effect of the TPR domain was context-dependent, since the presence of non-cognate TPR domains hampered nucleic acid-binding. However, the ability to bind to RNA was not solely governed by the TPR domain since the Rv0890cTPR-Rv0386HTH chimeric protein did not show RNA-binding, in spite of containing a permissive TPR domain. To further dissect the molecular requirements for RNA-binding, the conservation of basic residues between the HTH domains of Rv0890c versus Rv1358 and Rv0386 was examined. Interestingly the HTH domain Rv0890c contained two additional positively charged residues over Rv1358 and Rv0386. Mutations of these abolished RNA-binding by Rv0890c. Thus the evolution of two basic residues permit Rv0890c to diverge in its nucleic acid-binding properties, a possible example of defunctionalisation following gene duplication. In summary, this thesis attempts to understand the evolution and functions of the STAND ACs, a group of pathogenically relevant and uniquely mycobacterial multidomain proteins. Phylogenetic analysis revealed an expansion of this gene family in slow growing mycobacteria. Biochemical characterisation showed that following gene duplication, the resulting proteins diverge both in their ability to synthesize cAMP and in their association with nucleic acids. Studies on these proteins also revealed novel mechanisms of regulation of mycobacterial cAMP levels. Additionally, these proteins exhibited indiscriminate binding to DNA/nucleic acids indicating that they may be responsible for global functions in the cell which extend beyond cAMP synthesis. Adenylyl Cyclases Mycobacteria Cyclic AMP in Mycobacteria STAND Proteins TPR Domain HTH Domain Nucleoid Associated Proteins Mycobacterium tuberculosis M. tuberculosis H37Rv Biochemistry
6	Recognition of Structures, Functions and Interactions of Proteins of Pathogens : Implications in Drug Discovery Ramkrishnan, Gayatri January 2016 (has links) (PDF) Significant advancements in genome sequencing techniques and other high-throughput initiatives have resulted in the availability of complete sequences of genomes of a large number of organisms, which provide an opportunity to study detailed biological information encoded therein. Identification of functional roles of proteins can aid in comprehension of various cellular activities in an organism, which is traditionally achieved using techniques pertaining to the field of molecular biology, protein chemistry and macromolecular crystallography. The established experimental methods for protein structure and function determination, although accurate and resourceful, are laborious and time consuming. Computational analyses of sequences of gene products and exploration of evolutionary relationships can give clues on protein structure and/or function with reasonable accuracy which can be used to direct experimental studies on proteins of interest, effectively. Moreover, with growing volumes of data, there has been a growing disparity in the number of well-characterized and uncharacterized proteins, further necessitating the use of computational methods for investigating evolutionary and structure-function relationships. The remarkable progress made in the development of computational techniques (Chapter 1) has immensely contributed to the state-of-the-art biological sequence analysis and recognition of protein structure and function in a reliable manner. These methods have largely influenced the exploration of protein sequence-structure-function space. One of the relevant applications of computational approaches is in the understanding of functional make-up of human pathogens, their complex interplay with the host and implications in pathogenesis. In this thesis, sensitive profile-based search procedures have been utilized to address various aspects in the context of three pathogens- Mycobacterium tuberculosis, Plasmodium falciparum and Trypanosoma brucei, which are causative agents of potentially life- threatening diseases. The existing drugs approved for the diseases, although of immense value in controlling the disease, have several shortcomings, the most important of them being the emergence of drug resistance that render the current treatment regimens futile. Thus, the identification of practicable targets and new drugs or new combination therapies become an important necessity. Analyses on structural and functional repertoire of proteins encoded in the pathogenic genomes can provide means for rational identification of therapeutic intervention strategies. This thesis begins with the computational analyses of proteins encoded in M. tuberculosis genome. M. tuberculosis is a primary aetiological agent of tuberculosis in humans, and is o responsible for an estimated 1.5 million deaths every year. The complete genome of the pathogen was sequenced and made available more than a decade ago, which has been valuable in determination of functional roles of its gene products. Yet, functions of many M. tuberculosis proteins remain unknown. Computational prediction of protein function is an on- going process based on ever growing information made available in public databases as well as the introduction of powerful homology recognition techniques. Hence, a continuous refinement is essential to make the most of the sequence data, ensuring its accuracy and relevance. With the use of multiple sequence and structural profile-based search procedures, an enhanced structural and functional characterization of M. tuberculosis proteins, totalling to 95% of the genome was achieved (Chapter 2). Following are the key findings. o Domain definitions were obtained for a total of 3566 of 4018 proteins. Amino acid residue coverage of >70% was achieved for 2295 proteins which constitute more than half of the proteome. o Domain assignments were newly identified for 244 proteins with domain-unassigned regions. Structure prediction for these proteins corroborated all the remote homologyrelationships recognized using profile-based methods, enhancing the reliability of the predictions. o Comparison on domain compositions of proteins between M. tuberculosis and human host, revealed presence of pathogen-specific domains that are not homologous to proteins in human. Such proteins in M. tuberculosis are mainly virulence factors involved in host-pathogen interactions such as immune-dominance and aiding entry and survival in human host macrophages, hence forming attractive targets for drug discovery. o Putative structural and functional information for proteins with no recognizable domains were inferred by means of fold recognition and an iterative profile-based search against sequence database. o Attributing putative structures and functions to 955 conserved hypothetical proteins in M. tuberculosis, 137 of which are reportedly essential to the pathogen, provide a basis to re-investigate their involvement in pathogenesis and survival in the host. Proteins with no detectable homologues were recognized as M. tuberculosis H37Rv-specific, which can serve as promising drug targets. An attempt was made to identify porin-like proteins in M. tuberculosis, considering MspA porin from M. smegmatis as a template. The difficulty in recognition of putative porins in M. tuberculosis is indicative of novel outer membrane channel proteins, not characterized yet, or high representation of ion-channels, symporters and transporters to compensate for the functional role of porins. In addition, MspA-like proteins were not readily recognized in other slow-growing mycobacterial pathogens that are known to infect human host, apart from M. tuberculosis. This indicates probable acquisition of physiological adaptations, i.e. absence of porins, to confer drug-resistance, in the course of their co-evolution with human hosts. Evolutionary relationships recognized between sequence (Pfam) and structural (SCOP) families aided in association of potential structures and/or functions for 55 uncharacterized Pfam domains recognized in M. tuberculosis. Such associations deliver useful insights into the structure and function of a protein housing the uncharacterized domain. The functional inferences drawn for M. tuberculosis proteins based on the predictions can provide valuable basis for experimental endeavours in understanding mechanisms of pathogenesis and can significantly impact anti-tubercular drug discovery programmes. An interesting outcome benefitted from the exercise of exploring relationships between Pfam and SCOP families, was the identification of evolutionary relationship between a Pfam domain of unknown function DUF2652 and class III nucleotidyl cyclases. A detailed investigation was undertaken to assess this relationship (Chapter 3). Nucleotidyl cyclases synthesize cyclic nucleotides which are critical second messengers in signalling pathways. The DUF2652 family predominantly comprises of bacterial proteins belonging to three lineages- Actinobacteria, Bacteroidetes and Proteobacteria. Thus, recognition of evolutionary relationship between these bacterial proteins and nucleotide cyclases is of particular interest due to the indispensability of cyclic nucleotides in regulation of varied biological activities in bacteria. Use of fold recognition program suggested presence of nucleotide cyclase-characteristic topological motif (βααββαβ) in all the members of the DUF2652 family. Detailed analyses on structural and functional features of the uncharacterized set of bacterial proteins corresponding to 50 bacterial genomes, using profile- based alignments, revealed presence of key features typical of nucleotidyl cyclases, including metal-binding aspartates, substrate-specifying residues and transition-state stabilizing residues. Depending on the features, 20 proteins of Actinobacteria lineage, predominantly mycobacteria, of unknown structure and function were identified as putative nucleotide cyclases, 23 proteins of Bacteroidetes lineage were associated with guanylyl cyclases, while 8 uncharacterized proteins of Proteobacteria were recognized as nucleotide cyclase-like proteins (7 adenylyl and one guanylyl cyclase). Sequence similarity-based clustering of the predicted nucleotide cyclase-like proteins with established nucleotide cyclases indicated the apparent evolutionarily distinctness of the subfamily of class III nucleotidyl cyclases predicted. Furthermore, analysis of evolutionarily conserved gene clusters of the predicted nucleotide cyclase-like proteins indicated functional associations that support the predictions on their participation in cellular signalling events. The inferences made can be experimentally investigated further to ascertain the involvement of the uncharacterized bacterial proteins in signalling pathways, which can help in understanding the pathobiology of pathogenic species of interest. The next objective was the recognition of biologically relevant protein-protein interactions across M. tuberculosis and human host (Chapter 4). M. tuberculosis is well known for its ability to successfully co-evolve with human host in terms of establishing infection, survival and persistence. The current knowledge on the mechanisms of host invasion, immune evasion and persistence in the host environment can be attributed, and is limited, to the experimental studies pursued by numerous groups. Chapter 4 presents an approach for computational identification of biologically feasible protein-protein interactions across M. tuberculosis and human host. The approach utilizes crystal structures of intra-organism protein-protein complexes which are transient in nature. Identification of homologues of host and pathogen proteins in the database of known protein-protein interactions, formed the initial step, followed by identification of conserved interfacial patch and integration of information on tissue-specific expression of human proteins and subcellular localization of human and M. tuberculosis proteins. In addition, appropriate filters were used to extract biologically feasible host-pathogen protein-protein interactions. This resulted in recognition of 386 interactions potentially mediated by 59 M. tuberculosis proteins and 90 human proteins. A predominance of host-pathogen interactions (193 protein-protein interactions) brought about by M. tuberculosis proteins participating in cell wall processes, was observed, which is in concurrence with the experimental studies on immuno-modulatory activities brought about by such proteins. These set of mycobacterial proteins were predicted to interact with diverse set of host proteins such as those involved in ubiquitin conjugation pathways, metabolic pathways, signalling pathways, regulation of cell proliferation, transport, apoptosis and autophagy. The predictions have the potential to complement experimental observations at the molecular level. Details on couple of interesting cases are presented in the chapter, one of which is the probable mechanism of immune evasion adopted by M. tuberculosis to inhibit lysozyme activity in macrophages, and second is the mechanism of nutrient uptake from host. The set of M. tuberculosis proteins predicted to mediate interactions with host proteins have the potential to warrant an experimental follow-up on probable mechanisms of pathogenesis and also serve as attractive targets for chemotherapeutic interventions. proteins known to participate in P. falciparum metabolism. Pathway holes, where evidence on metabolic step exists but the catalysing enzyme is not known, have also been addressed in the study, several of which have been suggested to play an important role in growth and development of the parasite during its intra-erythrocytic stages in human host. A subsequent objective was the recognition P. falciparum proteins potentially capable of remodelling erythrocytes to suit their niche (Chapter 7). Exploitative mechanisms are brought about by the parasite to remodel erythrocytes for growth and survival during intra-erythrocytic stages of its life-cycle, the understanding of which is limited to experimental studies. To achieve physicochemically viable protein-protein interactions potentially mediated by proteins of human erythrocytes and P. falciparum proteins, a structure-influenced protocol, similar to the one demonstrated in Chapter 4, was employed. Information on subcellular localization and protein expression is crucial especially for parasites like P. falciparum, which reside in One of the major shortcomings with current treatment regimen for tuberculosis is the emergence of multidrug (MDR) and extensively drug-resistant (XDR) strains that render first-line and second-line drug treatments futile. This entails a need to explore target space in M. tuberculosis as well as explore the potential of existing drugs for repurposing against tuberculosis. A drug repurposing strategy i.e. exploring within-target-family selectivity of small molecules, has been implemented (Chapter 5) to contribute towards time and cost-saving anti-tubercular drug development efforts. With the use of profile-based search procedures, evolutionary relationships between targets (other than proteins of M. tuberculosis) of FDA-approved drugs and M. tuberculosis proteins were investigated. A key filter to exclude drugs capable of acting on human proteins substantially reduced the chances of obtaining anti-targets. Thus, total of 130 FDA-approved drugs were recognized that can be repurposed against 78 M. tuberculosis proteins, belonging to the functional categories- intermediary metabolism and respiration, information pathways, cell wall and cell processes and lipid metabolism. The catalogue of structure and function of M. tuberculosis proteins and their involvement in host-pathogen protein-protein interactions compiled from chapters 2 and 4 served as a guiding tool to explore the functional importance of targets identified. Many of the potential targets identified have been experimentally shown to be essential for growth and survival of the pathogen earlier, thus gaining importance in terms of pharmaceutical relevance. Polypharmacological drugs or drugs capable of acting of multiple targets were also identified (92 drugs) in the study. These drugs have the potential to stand tolerance against development of drug resistance in the pathogen. Comparative sequence and structure-based analysis of M. tuberculosis proteins homologous to known targets yielded credible inferences on putative binding sites of FDA-approved drugs in potential targets. Instances where information on binding sites could not be readily inferred from known targets, potentially druggable sites have been predicted. Comparison with earlier experimental studies that report anti-tubercular potential of several approved drugs enhanced the credibility of 74 of 130 FDA-approved drugs that can be readily prioritized for clinical studies. An additional exercise was pursued to identify prospective anti-tubercular agents by means of structural comparison between ChEMBL compounds and 130 FDA-approved drugs. Only those compounds were retained that showed considerably high structural similarity with approved drugs. Such compounds with minor changes in terms of physicochemical properties provide a basis for exploration of compounds that may exhibit higher affinities to bind to M. tuberculosis targets. The set of approved drugs recognized as repurpose-able candidates against tuberculosis, in concert with the structurally similar compounds, can significantly impact anti-tubercular drug development and drug discovery. The next part of the thesis focuses on Plasmodium falciparum, an obligate intracellular protozoan parasite responsible for malaria. The parasite genome features unusual characteristics including abundance of low complexity regions and pronounced sequence divergence that render protein structure and function recognition difficult. The parasite also manifests remarkable plasticity in its metabolic organization throughout its developmental stages in two hosts-human and mosquito; thus obtaining an exhaustive list of metabolic proteins in the parasite gains importance. Considering the utility of multiple sensitive profile-based search approaches in enhanced annotation of M. tuberculosis genome, a similar exercise was employed to recognize potential metabolic proteins in P. falciparum (Chapter 6). A total of 172 metabolic proteins were identified as participants of 78 metabolic pathways, over and above 609heterogeneous environmental conditions at different stages in their lifecycle. Inclusion of such data aided in extraction of 208 biologically relevant protein-protein interactions potentially mediated by 59 P. falciparum proteins and 30 erythrocyte proteins. Host-parasite protein-protein interactions were predicted pertaining to several major strategies spanning intra-erythrocytic stages in P. falciparum pathogenesis including- gaining entry into the host erythrocytes (category: RBC invasion, protease), redirecting parasitic proteins to erythrocyte membrane (category: protein traffic), modulating erythrocyte machinery (category: rosette formation, putative adhesin, chaperone, kinase), evading immunity (category: immune evasion) and eventually egress (category: merozoite egress) to infect other uninfected erythrocytes. Elaborate means to analyse and evaluate the functional viability of a predicted interaction in terms of geometrical packing at the interfacial region, electrostatic complementarity of the interacting surfaces and interaction energies is also demonstrated. The protein-protein interactions, thus predicted between human erythrocytes and P. falciparum, have the potential to provide a useful basis in understanding probable mechanisms of pathogenesis, and indeed in pinning down attractive targets for antimalarial drug discovery. The emergence of drug resistance against all known antimalarial agents, currently in use, necessitates discovery and development of either new antimalarial agents or unexplored combination of drugs that may not only reduce mortality and morbidity of malaria, but also reduce the risk of resistance to antimalarial drugs. In an attempt to contribute towards the same, Chapter 8 explores the established concept of within-target-family selectivity of small molecules to recognize antimalarial potential of the approved drugs. Eighty six FDA-approved drugs, predominantly constituted by antibacterial agents, were identified as feasible candidates for repurposing against 90 P. falciparum proteins. Most of the potential parasite targets identified are known to participate in housekeeping machinery, protein biosynthesis, metabolic pathways and cell growth and differentiation, and thus are pharmaceutically relevant. During intra-erythrocytic growth of P. falciparum, the parasite resides within the erythrocyte, within a protective encasing, known as parasitophorous vacuole. Hence a drug, intended to target a parasite protein residing in an organelle, must be sufficiently hydrophilic or hydrophobic to be able to permeate cell membranes and reach its site of activity. On the basis of lipophilicity of the drugs, a physical property determined experimentally, 57 of 86 FDA-approved drugs were recognized as feasible candidates for use against P. falciparum during the course of blood-stages of infection, which can be prioritized for antimalarial drug development programmes. The final section of the thesis focuses on the protozoan parasite Trypanosoma brucei, a causative agent of African sleeping sickness (Chapter 9). This disease is endemic to sub-Saharan regions of Africa. Despite the availability of completely sequenced genome of T. brucei, structure and function for about 50% of the proteins encoded in the genome remain unknown. Absence of prophylactic chemotherapy and vaccine, compounded with emergence of drug-resistance renders anti-trypanosomal drug discovery challenging. Thus, considering the utility of frameworks established in earlier chapters for recognition of protein structure, function and drug-targets, similar steps were undertaken to understand functional repertoire of the parasite and use drug repurposing methods to accelerate anti-trypanosomal drug discovery efforts. Structures and functions were reliably recognized for 70% of the gene products (5894) encoded in T. brucei genome, with the use of multiple profile-based search procedures, coupled with information on presence of transmembrane domains and signal peptide cleavage sites. Consequently, a total of 282 uncharacterized T. brucei proteins could be newly coined as potential metabolic proteins. Integration of information on stage-specific expression profiles with Trypanosoma-specific and T-.brucei-specific proteins identified in the study, aided in pinning down potential attractive targets. Additionally, exploration of evolutionary relationships between targets of FDA-approved drugs and T. brucei proteins, 68 FDA-approved drugs were predicted as repurpose-able candidates against 42 potential T. brucei targets which primarily include proteins involved in regulatory processes and metabolism. Several targets predicted are reportedly essential in assisting the parasite to switch between differentiation forms (bloodstream and procyclic) in the course of its lifecycle. These targets are of high therapeutic relevance, hence the corresponding drug-target associations provide a useful resource for experimental endeavours. In summary, this thesis presents computational analyses on three pathogenic genomes in terms of enhancing the understanding of functional repertoire of the pathogens, addressing metabolic pathway holes, exploring probable mechanisms of pathogenesis brought about by potential host-pathogen protein-protein interactions, and identifying feasible FDA-approved drug candidates to repurpose against the pathogens. The studies are pursued primarily by taking advantage of powerful homology-detection techniques and the ever-growing biological information made available in public databases. Indeed, the inferences drawn for the three pathogenic genomes serve an excellent resource for an experimental follow-up. The set of protocols presented in the thesis are highly generic in nature, as demonstrated for three pathogens, and can be utilized for genome-wide analyses on many other pathogens of interest. The supplemental data associated with the chapters is provided in a compact disc attached with this thesis. Gene Regulation - Prokaryotes Prokaryotic Promotiers Gene Expression Protein Homology Detection Scoring Matrices Hidden Markov Model Mycobacterium tuberculosis H37Rv Plasmodium falciparum - Drug Targets Homology Detection Nucleotide Cyclase-like Proteins Antimalarial Drugs Mathematics
7	Computational Studies on Structures and Functions of Single and Multi-domain Proteins Mehrotra, Prachi January 2017 (has links) (PDF) Proteins are essential for the growth, survival and maintenance of the cell. Understanding the functional roles of proteins helps to decipher the working of macromolecular assemblies and cellular machinery of living organisms. A thorough investigation of the link between sequence, structure and function of proteins, helps in building a comprehensive understanding of the complex biological systems. Proteins have been observed to be composed of single and multiple domains. Analysis of proteins encoded in diverse genomes shows the ubiquitous nature of multi-domain proteins. Though the majority of eukaryotic proteins are multi-domain in nature, 3-D structures of only a small proportion of multi-domain proteins are known due to difficulties in crystallizing such proteins. While functions of individual domains are generally extensively studied, the complex interplay of functions of domains is not well understood for most multi-domain proteins. Paucity of structural and functional data, affects our understanding of the evolution of structure and function of multi-domain proteins. The broad objective of this thesis is to achieve an enhanced understanding of structure and function of protein domains by computational analysis of sequence and structural data. Special attention is paid in the first few chapters of this thesis on the multi-domain proteins. Classification of multi-domain proteins by implementation of an alignment-free sequence comparison method has been achieved in Chapters 2 and 3. Studies on organization, interactions and interdependence of domain-domain interactions in multi-domain proteins with respect to sequential separation between domains and N to C-terminal domain order have been described in Chapters 4 and 5. The functional and structural repertoire of organisms can be comprehensively studied and compared using functional and structural domain annotations. Chapter 6, 7 and 8 represent the proteome-wide structure and function comparisons of various pathogenic and non-pathogenic microorganisms. These comparisons help in identifying proteins implicated in virulence of the pathogen and thus predict putative targets for disease treatment and prevention. Chapter 1 forms an introduction to the main subject area of this thesis. Starting with describing protein structure and function, details of the four levels of hierarchical organization of protein structure have been provided, along with the databases that document protein sequences and structures. Classification of protein domains considered as the realm of function, structure and evolution has been described. The usefulness of classification of proteins at the domain level has been highlighted in terms of providing an enhanced understanding of protein structure and function and also their evolutionary relatedness. The details of structure, function and evolution of multi-domain proteins have also been outlined in chapter 1. ! Chapter 2 aims to achieve a biologically meaningful classification scheme for multi-domain protein sequences. The overall function of a multi-domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence-based methods utilize only the domain-level information to classify proteins. This does not take into account the contributions of accessory domains and linker regions towards the overall function of a multi-domain protein. An alignment-free protein sequence comparison tool, CLAP (CLAssification of Proteins) previously developed in this laboratory, was assessed and improved when the author joined the group. CLAP was developed especially to handle multi-domain protein sequences without a requirement of defining domain boundaries and sequential order of domains (domain architecture). ! The working principle of CLAP involves comparison of all against all windows of 5-residue sequence patterns between two protein sequences. The sequences compared could be full-length comprising of all the domains in the two proteins. This compilation of comparison is represented as the Local Matching Scores (LMS) between protein sequences (nslab.iisc.ernet.in/clap/). It has been previously shown that the execution time of CLAP is ~7 times faster than other protein sequence comparison methods that employ alignment of sequences. In Chapter 2, CLAP-based classification has been carried out on two test datasets of proteins containing (i) Tyrosine phosphatase domain family and (ii) SH3-domain family. The former dataset comprises both single and multi-domain proteins that sometimes consist of domain repeats of the tyrosine phosphatase domain. The latter dataset consists only of multi-domain proteins with one copy of the SH3-domain. At the domain-level CLAP-based classification scheme resulted in a clustering similar to that obtained from an alignment-based method, ClustalW. CLAP-based clusters obtained for full-length datasets were shown to comprise of proteins with similar functions and domain architectures. Hence, a protein classification scheme is shown to work efficiently that is independent of domain definitions and requires only the full-length amino acid sequences as input.! Chapter 3 explores the limitations of CLAP in large-scale protein sequence comparisons. The potential advantages of full-length protein sequence classification, combined with the availability of the alignment-free sequence comparison tool, CLAP, motivated the conceptualization of full-length sequence classification of the entire protein repertoire. Before undertaking this mammoth task, working of CLAP was tested for a large dataset of 239,461 protein sequences. Chapter 3 discusses the technical details of computation, storage and retrieval of CLAP scores for a large dataset in a feasible timeframe. CLAP scores were examined for protein pairs of same domain architecture and ~22% of these showed 0 CLAP similarity scores. This led to investigation of the sensitivity of CLAP with respect to sequence divergence. Several test datasets of proteins belonging to the same SCOP fold were constructed and CLAP-based classification of these proteins was examined at inter and intra-SCOP family level. CLAP was successful in efficiently clustering evolutionary related proteins (defined as proteins within the same SCOP superfamily) if their sequence identity >35%. At lower sequence identities, CLAP fails to recognize any evolutionary relatedness. Another test dataset consisting of two-domain proteins with domain order swapped was constructed. Domain order swap refers to domain architectures of type AB and BA, consisting of domains A and B. A condition that the sequence identities of homologous domains were greater than 35% was imposed. CLAP could effectively cluster together proteins of the same domain architectures in this case. Thus, the sequence identity threshold of 35% at the domain-level improves the accuracy of CLAP. The analysis also showed that for highly divergent sequences, the expectation of 5-residue pattern match was likely a stringent criterion. Thus, a modification in the 5-residue identical pattern match criterion, by considering even similar residue and gaps within matched patterns may be required to effectuate CLAP-based clustering of remotely related protein sequences. Thus, this study highlights the limitations of CLAP with respect to large-scale analysis and its sensitivity to sequence divergence. ! Chapters 4 and 5 discuss the computational analysis of inter-domain interactions with respect to sequential distance and domain order. Knowledge of domain composition and 3-D structures of individual domains in a multi-domain protein may not be sufficient to predict the tertiary structure of the multi-domain protein. Substantial information about the nature of domain-domain interfaces helps in prediction of the tertiary as well as the quaternary structure of a protein. Therefore, chapter 4 explores the possible relationship between the sequential distance separating two domains in a multi-domain protein and the extent of their interaction. With increasing sequential separation between any two domains, the extent of inter-domain interactions showed a gradual decrease. The trend was more apparent when sequential separation between domains is measured in terms of number of intervening domains. Irrespective of the linker length, extensive interactions were seen more often between contiguous domains than between non-contiguous domains. Contiguous domains show a broader interface area and lower proportion of non-interacting domains (interface area: 0 Å2 to - 4400 Å2, 2.3% non-interacting domains) than non-contiguous domains (interface area: 0 Å2 to - 2000 Å2, 34.7% non-interacting domains). Additionally, as inter-protein interactions are mediated through constituent domains, rules of protein-protein interactions were applied to domain-domain interactions. Tight binding between domains is denoted as putative permanent domain-domain interactions and domains that may dissociate and associate with relatively weak interactions to regulate functional activity are denoted as putative transient domain-domain interactions. An interface area threshold of 600 Å2 was utilized as a binary classifier to distinguish between putative permanent and putative transient domain-domain interactions. Therefore, the state of interaction of a domain pair is defined as either putative permanent or putative transient interaction. Contiguous domains showed a predominance of putative permanent nature of inter-domain interface, whereas non-contiguous domains showed a prevalence of putative transient interfaces. The state of interaction of various SCOP superfamily pairs was studied across different proteins in the dataset. SCOP superfamily pairs mostly showed a conserved state of interaction, i.e. either putative permanent or putative transient in all their occurrences across different proteins. Thus, it is noted that contiguous domains interact extensively more often than non-contiguous domains and specific superfamily pairs tend to interact in a conserved manner. In conclusion, a combination of interface area and other inter-domain properties along with experimental validation will help strengthen the binary classification scheme of putative permanent and transient domain-domain interactions.! Chapter 5 provides structural analysis of domain pairs occurring in different sequential domain orders in mutli-domain proteins. The function and regulation of a multi-domain protein is predominantly determined by the domain-domain interactions. These in turn are influenced by the sequential order of domains in a protein. With domains defined using evolutionary and structural relatedness (SCOP superfamily), their conservation of structure and function was studied across domain order reversal. A domain order reversal indicates different sequential orders of the concerned domains, which may be identified in proteins of same or different domain compositions. Domain order reversals of domains A and B can be indicated in protein pair consisting of the domain architectures xAxBx and xBxAx, where x indicates 0 or more domains. A total of 161 pairs of domain order reversals were identified in 77 pairs of PDB entries. For most of the comparisons between proteins with different domain composition and architecture, large differences in the relative spatial orientation of domains were observed. Although preservation of state of interaction was observed for ~75% of the comparisons, none of the inter-domain interfaces of domains in different order displayed high interface similarity. These domain order reversals in multi-domain proteins are contributed by a limited number of 15 SCOP superfamilies. Majority of the superfamilies undergoing order reversal either function as transporters or regulatory domains and very few are enzymes. A higher proportion of domain order reversals were observed in domains separated by 0 or 1 domains than those separated by more than 1 domain. A thorough analysis of various structural features of domains undergoing order reversal indicates that only one order of domains is strongly preferred over all possible orders. This may be due to either evolutionary selection of one of the orders and its conservation throughout generations, or the fact that domain order reversals rarely conserve the interface between the domains. Further studies (Chapters 6 to 8) utilize the available computational techniques for structural and functional annotation of proteins encoded in a few bacterial genomes. Based on these annotations, proteome-wide structure and function comparisons were performed between two sets of pathogenic and non-pathogenic bacteria. The first study compares the pathogenic Mycobacterium tuberculosis to the closely related organism Mycobacterium smegmatis which is non-pathogenic. The second study primarily identified biologically feasible host-pathogen interactions between the human host and the pathogen Leptospira interrogans and also compared leptospiral-host interactions of the pathogenic Leptospira interrogans and of the saprophytic Leptospira biflexa with the human host. Chapter 6 describes the function and structure annotation of proteins encoded in the genome of M. smegmatis MC2-155. M. smegmatis is a widely used model organism for understanding the pathophysiology of M. tuberculosis, the primary causative agent of tuberculosis in humans. M. smegmatis and M. tuberculosis species of the mycobacterial genus share several features like a similar cell-wall architecture, the ability to oxidise carbon monoxide aerobically and share a huge number of homologues. These features render M. smegmatis particularly useful in identifying critical cellular pathways of M. tuberculosis to inhibit its growth in the human host. In spite of the similarities between M. smegmatis and M. tuberculosis, there are stark differences between the two due to their diverse niche and lifestyle. While there are innumerable studies reporting the structure, function and interaction properties of M. tuberculosis proteins, there is a lack of high quality annotation of M. smegmatis proteins. This makes the understanding of the biology of M. smegmatis extremely important for investigating its competence as a good model organism for M. tuberculosis. With the implementation of available sequence and structural profile-based search procedures, functional and structural characterization could be achieved for ~92% of the M. smegmatis proteome. Structural and functional domain definitions were obtained for a total of 5695 of 6717 proteins in M. smegmatis. Residue coverage >70% was achieved for 4567 proteins, which constitute ~68% of the proteome. Domain unassigned regions more than 30 residues were assessed for their potential to be associated to a domain. For 1022 proteins with no recognizable domains, putative structural and functional information was inferred for 328 proteins by the use of distance relationship detection and fold recognition methods. Although 916 sequences of 1022 proteins with no recognizable domains were found to be specific to M. smegmatis species, 98 of these are specific to its MC2-155 strain. Of the 1828 M. smegmatis proteins classified as conserved hypothetical proteins, 1038 proteins were successfully characterized. A total of 33 Domains of Unknown Function (DUFs) occurring in M. smegmatis could be associated to structural domains. A high representation of the tetR and GntR family of transcription regulators was noted in the functional repertoire of M. smegmatis proteome. As M. smegmatis is a soil-dwelling bacterium, transcriptional regulators are crucial for helping it to adapt and survive the environmental stress. Similarly, the ABC transporter and MFS domain families are highly represented in the M. smegmatis proteome. These are important in enabling the bacteria to uptake carbohydrate from diverse environmental sources. A lower number of virulent proteins were identified in M. smegmatis, which justifies its non-pathogenicity. Thus, a detailed functional and structural annotation of the M. smegmatis proteome was achieved in Chapter 6. Chapter 7 delineates the similarities and difference in the structure and function of proteins encoded in the genomes of the pathogenic M. tuberculosis and the non-pathogenic M. smegmatis. The protocol employed in Chapter 6 to achieve the proteome-wide structure and function annotation of M. smegmatis was also applied to M. tuberculosis proteome in Chapter 7. The number of proteins encoded by the genome of M. smegmatis strain MC2-155 (6717 proteins) is comparatively higher than that in M. tuberculosis strain H37Rv (4018 proteins). A total of 2720 high confidence orthologues sharing ≥30% sequence identity were identified in M. tuberculosis with respect to M. smegmatis. Based on the orthologue information, specific functional clusters, essential proteins, metabolic pathways, transporters and toxin-antitoxin systems of M. tuberculosis were inspected for conservation in M. smegmatis. Among the several categories analysed, 53 metabolic pathways, 44 membrane transporter proteins belonging to secondary transporters and ATP-dependent transporter classes, 73 toxin-antitoxin systems, 23 M. tuberculosis-specific targets, 10 broad-spectrum targets and 34 targets implicated in persistence of M. tuberculosis could not detect any orthologues in M. smegmatis. Several of the MFS superfamily transporters act as drug efflux pumps and are hence associated with drug resistance in M. tuberculosis. The relative abundances of MFS and ABC superfamily transporters are higher in M. smegmatis than in M. tuberculosis. As these transporters are involved in carbohydrate uptake, their higher representation in M. smegmatis than in M. tuberculosis highlights the lack of proficiency of M. tuberculosis to assimilate diverse carbon sources. In the case of porins, MspA-like and OmpA-like porins are selectively present in either M. smegmatis or M. tuberculosis. These differences help to elucidate protein clusters for which M. smegmatis may not be the best model organism to study M. tuberculosis proteins.! At the domain-level, ATP-binding domain of ABC transporters, tetracycline transcriptional regulator (tetR) domain family, major facilitator superfamily (MFS) domain family, AMP-binding domain family and enoyl-CoA hydrolase domain family are highly represented in both M. smegmatis and M. tuberculosis proteomes. These domains play an essential role in the carbohydrate uptake systems and drug-efflux pumps among other diverse functions in mycobacteria. There are several differentially represented domain families in M. tuberculosis and M. smegmatis. For example, the pentapeptide-repeat domain, PE, PPE and PIN domains although abundantly present in M. tuberculosis, are very rare in M. smegmatis. Therefore, such uniquely or differentially represented functional and structural domains in M. tuberculosis as compared to M. smegmatis may be linked to pathogenicity or adaptation of M. tuberculosis in the host. Hence, major differences between M. tuberculosis and M. smegmatis were identified, not only in terms of domain populations but also in terms of domain combinations. Thus, Chapter 7 highlights the similarities and differences between M. smegmatis and M. tuberculosis proteomes in terms of structure and function. These differences provide an understanding of selective utilization of M. smegmatis as a model organism to study M. tuberculosis. ! In Chapter 8, computational tools have been employed to predict biologically feasible host-pathogen interactions between the human host and the pathogenic, Leptospira interrogans. Sensitive profile-based search procedures were used to specifically identify practical drug targets in the genome of Leptospira interrogans, the causative agent of the globally widespread zoonotic disease, Leptospirosis. Traditionally, the genus Leptospira is classified into two species complex- the pathogenic L. interrogans and the non-pathogenic saprophyte L. biflexa. The pathogen gains entry into the human host through direct or indirect contact with fluids of infected animals. Several ambiguities exist in the understanding of L. interrogans pathogenesis. An integration of multiple computational approaches guided by experimentally derived protein-protein interactions, was utilized for recognition of host-pathogen protein-protein interactions. The initial step involved the identification of similarities of host and L. interrogans proteins with crystal structures of experimentally known transient protein-protein complexes. Further, conservation of interfacial nature was used to obtain high confidence predictions for putative host-pathogen protein-protein interactions. These predictions were subjected to further selection based on subcellular localization of proteins of the human host and L. interrogans, and tissue-specific expression profiles of the host proteins. A total of 49 protein-protein interactions mediated by 24 L. interrogans proteins and 17 host proteins were identified and these may be subjected to further experimental investigations to assess their in vivo relevance. The functional relevance of similarities and differences between the pathogenic and non-pathogenic leptospires in terms of interactions with the host has also been explored. For this, protein-protein interactions across human host and the non-pathogenic saprophyte L. biflexa were also predicted. Nearly 39 leptospiral-host interactions were recognized to be similar across both the pathogen and saprophyte in the context of processes that influence the host. The overlapping leptospiral-host interactions of L. interrogans and L. biflexa proteins with the human host proteins are primarily associated with establishment of its entry into the human host. These include adhesion of the leptospiral proteins to host cells, survival in host environment such as iron acquisition and binding to components of extracellular matrix and plasma. The disjoint sets of leptospiral-host interactions are species-specific interactions, more importantly indicative of the establishment of infection by L. interrogans in the human host and immune clearance of L. biflexa by the human host. With respect to L. interrogans, these specific interactions include interference with blood coagulation cascade and dissemination to target organs by means of disruption of cell junction assembly. On the other hand, species-specific interactions of L. biflexa proteins include those with components of host immune system. ! In spite of the limited availability of experimental evidence, these help in identifying functionally relevant interactions between host and pathogen by integrating multiple lines of evidence. Thus, inferences from computational prediction of host-pathogen interactions act as guidelines for experimental studies investigating the in vivo relevance of these predicted protein-protein interactions. This will further help in developing effective measures for treatment and disease prevention. In summary, Chapters 2 and 3 describe the implementation, advantages and limitations of the alignment-free full-length sequence comparison method, CLAP. Chapter 4 and 5 are dedicated to understand the domain-domain interactions in multi-domain protein sequences and structures. In Chapters 6, 7 and 8 the computational analyses of the mycobacterial species and leptospiral species helped in an enhanced understanding of the functional repertoire of these bacteria. These studies were undertaken by utilizing the biological sequence data available in public databases and implementation of powerful homology-detection techniques. The supplemental data associated with the chapters is provided in a compact disc attached with this thesis.! Proteins - Building Blocks Protein Sequences Protein Domain Hidden Markov Models (HMM) Multi-domain Proteins Mycobacterium smegmatis MC2-155 Mycobacterium tuberculosis Proteomes Leptospira Interrogans Leptospira Biflexa Proteomes Leptospira Biflexa Genomes Mycobacterium tuberculosis H37Rv Mathematics

1

Page generated in 0.0352 seconds