• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 225
  • 96
  • 55
  • 35
  • 17
  • 10
  • 8
  • 8
  • 8
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 533
  • 533
  • 302
  • 73
  • 55
  • 50
  • 49
  • 47
  • 44
  • 40
  • 34
  • 32
  • 29
  • 26
  • 24
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
521

Treatment efficacy of artesunate-amodiaquine and prevalence of Plasmodium falciparum drug resistance markers in Zanzibar, 2002-2017

SOE, AUNG PAING January 2019 (has links)
Introduction: Emergence of resistance to artemisinin-based combination therapy (ACT) is a major threat to combat Plasmodium falciparum malaria. Regular therapeutic studies to monitor treatment efficacy is essential, and genotyping of molecular makers is useful for mapping development and spread of resistance. Aims: The study aims are to assess efficacy of artesunate-amodiquine (ASAQ) and prevalence of molecular markers of drug resistance in Zanzibar in 2017. Methods: Treatment efficacy of the clinical trial conducted in 2017 was compared with efficacies in 2002 and 2005. A total of 142 samples were genotyped for single nucleotide polymorphisms (SNPs) in the P. falciparum chloroquine resistance transporter gene (pfcrt) gene, the P. falciparum multi drug resistance 1 (pfmdr1) gene, and in the P. falciparum Kelch 13 (PfK13) propeller region. Prevalence of SNPs were assessed during the period 2002-2017. Results: Cure rate was 100% in 2017, compared to 94% and 96%, in 2002-2003 and 2005, respectively. Day 3 fever clearance rate were also high 93% (2002-3), 99% (2005) and 98% (2017) in all studies. Prevalence of pfcrt 76T, pfmdr1 86Y, 184Y and 1246Y and pfmdr1 (86Y, 184Y and 1246Y) YYY haplotypes were significantly decreased between 2002-3 and 2017 (p &lt; 0.001). No SNP in the PfK13 gene related to artemisinin resistance was identified. Conclusion: Efficacy of ASAQ remains high after fourteen years as first-line treatment, despite the wide-scale use of ASAQ, and there is no evidence of selection of resistance markers in Zanzibar. Continuous monitoring of drug efficacy and resistance markers is recommended. / <p>This master thesis is a collaboration project between Institutionen för kvinnors och barns hälsa, Department of Women's and Children's Health, Uppsala Universtiy and Anders Björkman group, Department of Microbiology, Tumor and Cell Biology (MTC), C1, Karolinska Institutet. Laboratory examinations were mainly conducted at MTC house, Karolinska Institutet.</p>
522

Étude de modèles épidémiologiques : Stabilité, observation et estimation de paramètres

Bichara, Derdei 28 February 2013 (has links) (PDF)
L'objectif de cette thèse est d'une part l'étude de la stabilité des équilibres de certains modèles épidémiologiques et d'autre part la construction d'un observateur pour l'estimation des états non mesurés et d'un paramètre clé pour un modèle intra-hôte. Nous proposons des extensions des modèles du type SIR, SIRS et SIS et nous étudions la stabilité globales de leur équilibres. En présence de plusieurs souches de pathogène d'un modèle SIS, on montre que le principe de compétition exclusive est vérifié: la souche qui maximise un seuil remporte la compétition en éliminant les autres souches. Il se trouve aussi que la souche gagnante est celle qui donne à l'équilibre le minimum de population hôte susceptible. Ceci peut-être interprété comme étant un principe de pessimisation. En considérant ce modèle avec cette fois une loi de contact de type fréquence-dépendante, on montre que la dynamique change et qu'un équilibre de coexistence existe et qui est globalement asymptotiquement stable sous certaines conditions. Le comportement asymptotique des deux équilibres frontières est aussi prouvée. L'étude de la stabilité des états d'équilibres est essentiellement faite par la construction des fonctions de Lyapunov combiné avec le principe d'invariance de LaSalle. On considère un modèle intra-hôte structuré en classe d'âge du parasite Plasmodium falciparum avec une force d'infection général. Nous développons une méthode d'estimation de la charge parasitaire totale dont on ne sait mesurée par les méthodes actuellement connues. Pour cela nous utilisons les outils de la théorie du contrôle, plus particulièrement les observateurs à entrées inconnues, pour estimer les états non mesurés à partir des états mesurés (données). De cela nous déduisons une méthode d'estimation d'un paramètre inconnu qui représente le taux d'infection des globules rouges saines par les parasites.
523

Analysis Of Protein Evolution And Its Implications In Remote Homology Detection And Function Recognition

Gowri, V S 10 1900 (has links)
One of the major outcomes of a genome sequencing project is the availability of amino acid sequences of all the proteins encoded in the genome of the organism concerned. However, most commonly, for a substantial proportion of the proteins encoded in the genome no information in function is available either from experimental studies or by inference on the basis of homology with a protein of known function. Even if the general function of a protein is known, the region of the protein corresponding to the function might be a domain and there may be additional regions of considerable length in the protein with no known function. In such cases the information on function is incomplete. Lack of understanding of the repertoire of functions of proteins encoded in the genome limits the utility of the genomic data. While there are many experimental approaches available for deciphering functions of proteins at the genomic scale, bioinformatics approaches form a good early step in obtaining clues about functions of proteins at the genomic scale (Koonin et al, 1998). One of the common bioinformatics approaches is recognition of function by homology (Bork et al, 1994). If the evolutionary relationship between two proteins, one with known function and the other with unknown function, could be established it raises the possibility of common function and 3-D structure for these proteins(Bork and Gibson, 1996). While this approach is effective its utility is limited by the ability of the bioinformatics approach to identify related proteins when their evolutionary divergence is high leading to low amino acid sequence similarity which is typical of two unrelated proteins (Bork and Koonin, 1998). Use of 3-D structural information, obtained by predictive methods such as fold recognition, has offered approaches towards increasing the sensitivity of remote homology detection 9e.g., Kelley et al, 2000; Shi et al, 2001; Gough et al, 2001). The work embodied in this thesis has the general objective of analysis of evolution of structural features and functions of families of proteins and design of new bioinformatics approaches for recognizing distantly related proteins and their applications. After an introductory chapter, a few chapters report analysis of functional and structural features of homologous protein domains. Further chapters report development and assessment of new remote homology detection approaches and applications to the proteins encoded in two protozoan organisms. A further chapter is presented on the analysis of proteins involved in methylglyoxal detoxification pathways in kinetoplastid organisms. Chapter I of the thesis presents a brief introduction, based on the information available in the literature, to protein structures, classification, methods for structure comparison, popular methods for remote homology detection and homology-based methods for function annotation. Chapter 2 describes the steps involved in the update and improvements made in this database. In addition to the update, the domain structural families are integrated with the homologous sequences from the sequence databases. Thus, every family in PALI is enriched with a substantial volume of sequence information from proteins with no known structural information. Chapter 3 reports investigations on the inter-relationships between sequence, structure and functions of closely-related homologous enzyme domain families. Chapter 4 describes the investigations on the unusual differences in the lengths of closely-related homologous protein domains, accommodation of additional lengths in protein 3-D structures and their functional implications. Chapter 5 reports the development and assessment of a new approach for remote homology detection using dynamic multiple profiles of homologous protein domain families. Chapter 6 describes development of another remote homology detection approach which are multiple, static profiles generated using the bonafide members of the family. A rigorous assessment of the approach and strategies for improving the detection of distant homologues using the multiple profile approach are discussed in this chapter. Chapter 7 describes results of searches made in the database of multiple family profiles (MulPSSM database) in order to recognize the functions of hypothetical proteins encoded in two parasitic protozoa. Chapter 8 describes the sequence and structural analyses of two glyoxalase pathway proteins from the kinetoplastid organism Leishmania donovani which causes Leishmaniases. An alternate enzyme, which would probably substitute the glyoxalase pathway enzymes in certain kinetoplastid organisms which lack the glyoxalase enzymes are also discussed. Chapter 9 summarises the important findings from the various analyses discussed in this thesis. Appendix describes an analysis on the correlation between a measure of hydrophobicity of amino acid residues aligned in a multiple sequence alignment and residue depth in 3-D structures of proteins.
524

Computational And Biochemical Studies On The Enzymes Of Type II Fatty Acid Biosynthesis Pathway : Towards Antimalarial And Antibacterial Drug Discovery

Kumar, Gyanendra 02 1900 (has links)
Malaria, caused by the parasite Plasmodium, continues to exact high global morbidity and mortality rate next only to tuberculosis. It causes 300-500 million clinical infections out of which more than a million people succumb to death annually. Worst affected are the children below 5 years of age in sub-Saharan Africa. Plasmodium is a protozoan parasite classified under the phylum Apicomplexa that also includes parasites such as Toxoplasma, Lankestrella, Eimeria and Cryptosporidium. Of the four species of Plasmodium affecting man viz., P. falciparum, P. vivax, P. ovale and P. malariae, Plasmodium falciparum is the deadliest as it causes cerebral malaria. The situation has worsened recently with the emergence of drug resistance in the parasite. Therefore, deciphering new pathways in the parasite for developing lead antimalarial compounds is the need of the hour. The discovery of the type II fatty acid biosynthesis pathway in Plasmodium falciparum has opened up new avenues for the design of new antimalarials as this pathway is different from the one in human hosts. Although many biochemical pathways such as the purine, pyrimidine and carbohydrate metabolic pathways, and the phospholipid, folate and heme biosynthetic pathways operate in the malaria parasite and are being investigated for their amenability as antimalarial therapeutic targets, no antimalarial of commercial use based on the direct intervention of these biochemical pathways has emerged so far. This is due to the fact that the structure and function of the targets of these drugs overlaps with that of the human host. A description of the parasite, its metabolic pathways, efforts to use these pathways for antimalarial drug discovery, inhibitors targeting these pathways, introduction to fatty acid biosynthesis pathway, discovery of type II fatty acid biosynthesis pathway in Plasmodium falciparum and prospects of developing lead compounds towards antimalarial drug discovery is given in Chapter 1 of the thesis. In the exploration of newly discovered type II fatty acid biosynthesis pathway of P. falciparum as a drug target for antimalarial drug discovery, one of the enzymes; β-hydroxyacyl- acyl carrier protein dehydratase (PfFabZ) was cloned and being characterized in the lab. The atomic structure of PfFabZ was not known till that point of time. Chapter 2 describes the homology modeled structure of PfFabZ and docking of the discovered inhibitors with this structure to provide a rationale for their inhibitory activity. Despite low sequence identity of ~ 21% with the closest available atomic structure then, E. coli FabA, a good model of PfFabZ could be built. A comparison of the modeled structure with recently determined crystal structure of PfFabZ is provided and design of new potential inhibitors is described. This study provides insights to further improve the inhibition of this enzyme. Enoyl acyl carrier protein reductase (ENR) is the most important enzyme in the type II fatty acid biosynthesis pathway. It has been proved as an important target for antibacterial as well as antimalarial drug discovery. The most effective drug against tuberculosis – Isoniazid targets this enzyme in M. tuberculosis. The well known antibacterial compound – Triclosan, a diphenyl ether, also targets this enzyme in P. falciparum. I designed a number of novel diphenyl ether compounds. Some of these compounds could be synthesized in the laboratory. Chapter 3 describes the design, docking studies and inhibitory activity of these novel diphenyl ether compounds against PfENR and E. coli ENR. Some of these compounds inhibit PfENR in nanomolar concentrations and EcENR in low micromolar concentrations, and many of them inhibit the growth of parasites in culture also. The structure activity relationship of these compounds is discussed that provides important insights into the activity of this class of compounds which is a step towards developing this class of compounds into an antimalarial and antibacterial candidate drugs. Components of the green tea extract and polyphenols are well known for their medicinal properties since ages. Recently they have been shown to inhibit components of the bacterial fatty acid biosynthesis pathway. Some selected tea catechins and polyphenols were tested in the laboratory for their inhibitory activity against PfENR. I conducted docking studies to find their probable binding sites in PfENR. On kinetic analysis of their inhibition, these compounds were found to be competitive with respect to the cofactor NADH. This has an implication that they could potentiate inhibition of PfENR by Triclosan in a fashion similar to that of NADH. As a model case, one of the tea catechins; EGCG ((-) Epigalocatechin gallate) was tested for this property. Indeed, in the presence of EGCG, the inhibition of PfENR improved from nanomolar to picomolar concentration of Triclosan.conducted molecular modeling studies and propose a model for the formation of a ternary complex consisting of EGCG, Triclosan and PfENR. Docking studies of these inhibitors and a model for the ternary complex is described in Chapter 4. Docking simulations show that these compounds indeed occupy NADH binding site. This study provides insights for further improvements in the usage of diphenyl ethers in conjugation or combination with tea catechins as possible antimalarial therapeutics. In search for new lead compounds against deadly diseases, in silico virtual screening and high throughput screening strategies are being adopted worldwide. While virtual screening needs a large amount of computation time and hardware, high throughput screening proves to be quite expensive. I adopted an intermediate approach, a combination of both these strategies and discovered compounds with a 2-thioxothiazolidin-4-one core moiety, commonly known as rhodanines as a novel class of inhibitors of PfENR with antimalarial properties. Chapter 5 describes the discovery of this class of compounds as inhibitors of PfENR. A small but diverse set of 382 compounds from a library of ~2,00,000 compounds was chosen for high throughput screening. The best compound gave an IC50 of 6.0 µM with many more in the higher micromolar range. The compound library was searched again for the compounds similar in structure with this best compound, virtual screening was conducted and 32 new compounds with better binding energies compared to the first lead and reasonable binding modes were tested. As a result, a new compound with an IC50 of 240 nM was discovered. Many more compounds gave IC50 values in 3-15 µM range. The best inhibitor was tested in red blood cell cultures of Plasmodium, it was found to inhibit the growth of the malaria parasite at an IC50 value of 0.75 µM. This study provides a new scaffold and lead compounds for further exploration towards antimalarial drug discovery. The summary of the results and conclusions of studies described in various chapters is given in Chapter 6. This chapter concludes the work described in the thesis. Cloning, over-expression and purification of PanD from M. tuberculosis, FabA and FabZ from E. coli are described in the Appendix.
525

ETUDE DE SUBSTANCES BIOACTIVES ISSUES DE LA FLORE AMAZONIENNE Analyse de préparations phytothérapeutiques à base de Quassia amara L. (Simaroubaceae) et de Psidium acutangulum DC. (Myrtaceae) utilisées en Guyane française pour une indication antipaludique. Identification et analyse métabolomique d'huiles essentielles à activité antifongique.

Houël, Emeline 01 July 2011 (has links) (PDF)
L'objectif du travail effectué était la recherche de nouvelles substances actives d'origine végétale, présentant soit une activité antiplasmodiale soit une activité antifongique. Cette étude a été menée suivant deux stratégies différentes: l'étude de remèdes traditionnels antipaludiques identifiés suite à des enquêtes ethnopharmacologiques, et la mise en évidence des propriétés antifongiques d'huiles essentielles grâce à une stratégie bioinspirée. La première partie du travail a permis de mettre en évidence le rôle d'un quassinoïde connu, la simalikalactone D, dans l'activité antipaludique d'une tisane de jeunes feuilles fraîches de Quassia amara L. (Simaroubaceae). Dans le cas de la décoction de rameaux de Psidium acutangulum DC. (Myrtaceae), c'est cette fois un mélange de flavonoïdes glycosylés qui est responsable de l'activité du remède. Dans le cadre de la recherche de nouvelles substances antifongiques, le criblage effectué a permis d'identifier de nombreuses huiles essentielles présentant des activités intéressantes, validant ainsi la démarche bioinspirée retenue dans ce cas. L'huile essentielle d'Otacanthus azureus (Linden) Ronse a en particulier démontré une activité remarquable, à la fois seule et en combinaison avec des antifongiques azolés. Enfin, l'étude métabolomique de la composition des huiles essentielles a permis de mettre au point un outil pouvant orienter la sélection des huiles en fonction des données obtenues en GC/MS dans l'optique de la recherche de nouvelles substances antifongiques. Ce travail démontre donc la validité des stratégies retenues - ethnopharmacologie et bioinspiration - dans la recherche de nouvelles substances bioactives.
526

Computational Analyses Of Proteins Encoded In Genomes Of Pathogenic Organisms : Inferences On Structures, Functions And Interactions

Tyagi, Nidhi 11 1900 (has links) (PDF)
The availability of completely sequenced genomes for a number of organisms provides an opportunity to understand the molecular basis of physiology, metabolism, regulation and evolution of these organisms. Significant understanding of the complexity of organisms can be obtained from the functional characterization of repertoire of proteins encoded in their genomes. Computational approaches for recognition of function of proteins of unknown function encoded in genomes often rely on ability to detect well characterized homologues. Homology searches based on pair-wise sequence comparisons can reliably detect homologues with sequence identity more than 30%. However, detecting homologues characterized by sequence identity below 30% is difficult using these methods. Distant homology relationship can be established using profiles or position specific scoring matrices, which encapsulate information about structurally and functionally conserved residues. These conserved residues imply high constraints at a particular amino acid residue site due to their involvement in structural stability, enzymatic activity, ligand binding, protein folding or protein–protein interactions. In addition, information on three dimensional structures of proteins also aid in detection of remote homologues, as tertiary structures of proteins are conserved better than the primary structures of proteins. The gross objective of the work reported in this thesis is to employ various sensitive remote homology detection methods to recognize relevant functional information of proteins encoded mainly in pathogenic organisms. Since proteins do not work in isolation in a cell, it has become essential to understand the in vivo context of functions of proteins. For this purpose, it is essential to have an understanding of all molecules that interact with a particular protein. Thus, another major area of bioinformatics has been to integrate protein-protein interaction information to enable better understanding of context of functional events. Protein-protein interaction analysis for host-pathogen can lead to useful insight into mode of pathogenesis and subsequent consequences in host cell. Chapters 2-6 of the thesis discuss the sequence and structural characteristics along with remote evolutionary relationships and functional implications of uncharacterized proteins encoded in genomes of following pathogens: Helicobacter pylori, Plasmodium falciparum and Leishmania donovani. The Chapters 6-8 discuss mainly various sequence, structural and functional aspects of protein kinases encoded in genomes of various prokaryotes and viruses. Chapter 1 discusses background information and literature survey in the areas of homology detection and prediction of protein-protein interactions. The growth of genomic data and need for processing genomic data to infer context of various functional events have been highlighted. Different approaches to recognize functions of proteins (experimental as well as computational) have been discussed. Various experimental and computational approaches to detect/predict protein-protein interactions have been mentioned. Chapter 2 discusses recognition of non-trivial remote homology relationships involving proteins of Helicobacter pylori and their implications for function recognition. H. pylori is microaerophilic, Gram negative bacterial pathogen. It colonizes human gastric mucosa and is a causative agent of gastroduodenal disease. The pathogen infects about 50% of the human population. It can lead to development of Mucosa-associated lymphoid tissue lymphoma. About 10% of the infected population develop gastric or duodenal ulcer and approximately 1% develop gastric cancer. H. pylori has been classified as class I carcinogen by WHO. Pathogen is characterized by type IV secretion system. The complete genomic sequences of three widely studied strains including 26695, J99 and HPAG1 of Helicobacter pylori are available. According to the genome analysis, the number of predicted open reading frames in strain 26695, J99 and HPAG1 are 1590, 1495 and 1536 respectively. Out of predicted H. pylori proteins from 26695, J99 and HPAG1 strains, numbers of proteins with no functional domain assignments in Pfam database (Protein family database) are 453, 357 and 400 respectively. There are proteins in different strains of H. pylori genomes where one part of the protein is associated with at least one protein domain of known function and hence preliminary indication of their functions is available whereas rest of the region is not associated with any function. There are 772, 803 and 790 such segments in proteins from strains 26695, J99 and HPAG1 respectively with at least 45 residues with no functional assignment currently available. Sensitive remote homology detection methods have been employed to establish relationships for 294 amino acid sequences and results have been grouped into 4 categories. Results of homology detection have been further confirmed by studying conservation of amino acid residues which are important for functioning of the proteins concerned. (i) Remote relationship has been established involving protein domain families for which no bonafide member is currently known in H. pylori. For example: DNA binding protein domain (Kor_B) has been assigned to a H. pylori protein at sequence identity of 20%. Study involving secondary structure prediction and conservation of amino acid residues confirms the results of homology detection methods. (ii) Remote relationship has been established involving H. pylori hypothetical proteins and protein domain families, for which paralogous members are present in Helicobacter pylori. For example, Cytochrome_C, an electron transfer protein domain could be associated with a Helicobacter pylori protein sequence which shows a sequence identity of 14% with sequences of bonafide cytochrome C. (iii) “Missing” metabolic proteins of H. pylori have also been recognized. For example, Aspartoacylase (EC 3.5.1.15) catalyzes deacetylation of N-acetylaspartic acid to produce acetate and L-aspartate. This enzyme in aspartate metabolism pathway has not been reported so far from H. pylori. A remote evolutionary relationship between a H. pylori protein and Aspartoacylase domain has been established at sequence identity of 17% thus filling the gap in this metabolic pathway in the pathogen. (iv) New functional assignments for domains in H. pylori sequences with prior assignment of domains for the rest of the sequences have been made. For example, DNA methylase domain has been assigned to C-terminal region of H. pylori protein which already had Helicase domain assigned to the N-terminal region of the protein. All these information should open avenues for further probing by carrying out experiments which will impact the design of inhibitor against this pathogen and will result in better understanding of pathogenesis of this organism in human. Chapter 3 describes prediction of protein–protein interactions between Helicobacter pylori and the human host. A lack of information on protein-protein interactions at the host-pathogen interface is impeding the understanding of the pathogenesis process. A recently developed, homology search-based method to predict protein-protein interactions is applied to the gastric pathogen, Helicobacter pylori to predict the interactions between proteins of H. pylori and human proteins in vitro. Many of the predicted interactions could potentially occur between the pathogen and its human host during pathogenesis as we focused mainly on the H. pylori proteins that have a transmembrane region or are encoded in the pathogenic island and those which are known to be secreted into the human host. By applying the homology search approach to protein-protein interaction databases DIP and iPfam, in vitro interactions for a total of 623 H. pylori proteins with 6559 human proteins could be predicted. The predicted interactions include 549 hypothetical proteins of as yet unknown function encoded in the H. pylori genome and 13 experimentally verified secreted proteins. A total of 833 interactions involving the extracellular domains of transmembrane proteins of H. pylori could be predicted. Structural analysis of some of the examples reveals that the predicted interactions are consistent with the structural compatibility of binding partners. Various probable interactions with discernible biological relevance are discussed in this chapter. For example, interaction between CFTR protein (NP_000483) and multidrug resistance protein (HP1206) has been predicted. The structure of the CFTR intracellular domain is known in the homomeric form and consists of five AAA transport domains in tandem (PDB code 1XMI). Out of the five identical subunits, two subunits (the B chain and the E chain in the PDB structure) have been selected. The structure of multidrug resistance protein of the pathogen based on the B chain (sequence identity 32%) of the template has been modeled. This exercise suggests that interface residues in the model are congenial for interaction. This makes the structural complex feasible in in vitro conditions and suggests that the pathogen protein may compete for occupancy with the host protein. Chapter 4 describes recognition of Plasmodium-specific protein domain families and their roles in Plasmodium falciparum life cycle. Malaria in humans is caused by the parasites of intracellular, eukaryotic protozoan of apicomplexan nature belonging to the genus Plasmodium. Out of five species of Plasmodium, namely, P. falciparum, P. ovale, P. vivax, P. malariae and P. knowlesi which infects human, P. falciparum causes lethal infection. P. falciparum proteins have diverged extensively during the course of evolution. Pathogen genome is rich in A+T composition which larger than the homologous proteins from other organisms due to presence of low complexity regions. Organism specific families are important as they play roles in peculiar life style of an organism. If the organism is a pathogen, then these family members may play roles in pathogenesis. Inhibiting these specific proteins is unlikely to interfere with host system as no homolog may be present in host. In the present work we identify Plasmodium specific protein families and their role in different stages of life cycle of the pathogen. A total of 5086 amino acid sequences (full length sequences/fragments of proteins) show homology only with amino acid sequences from Plasmodium organisms and hence are Plasmodium-specific. These Plasmodium-specific amino acid sequences cluster into 106 Plasmodium-specific families (≥2 members per family). 14 Plasmodium-specific protein domain families with known physico-chemical properties are observed. These Plasmodium-specific protein domain families are involved in various important functions such as rosetting and sequestering of infected erythrocytes, binding to surface of host cell and invasion process in life cycle of pathogen. Also, 89 new Plasmodium-specific protein domain families have been recognized. Analysis of various aspects of members of Plasmodium-specific proteins domain families such as their potential to target apicoplast, protein-protein interaction, expression profile and domain organization has been performed to derive relevant information about function. New Plasmodium specific domain families for which no function can be associated could provide some insight into much diverged Plasmodium species. These proteins may play role in parasite-specific life style. Experimental work on these Plasmodium-specific proteins might fill the gaps of less understood physiology of this parasite. Chapter 5 presents genome-wide compilation of low complexity regions (LCR) in proteins. An indepth analysis of the nature, structure, and functional role of the proteins containing low complexity regions in Plasmodium falciparum, was undertaken given the high prevalence of LCRs in the proteome of this organism. Low complexity regions and repeat patterns have been recognized in proteins encoded in 986 genomes (68 archaea, 896 prokaryotes and 22 eukaryotes). Low complexity regions have been classified into following three categories: a) Composition of LCRs: (i) LCRs can be stretches of homo amino acid residues (ii) LCRs can be stretches of more than one amino acid residue type b) Periodicity of amino acids in LCRs: Certain amino acid residues can be observed at certain specific periodicity in proteins. c) Repeat patterns: Certain motif of amino acid residues are repeated in protein. 850 Plasmodium falciparum proteins are observed to have at least one repeat pattern where the repeating unit is at least 5 amino acid residues long. Statistical analysis on single amino acid residue repeats indicate that occurrence of stretches of homo amino acid residues is not a random event. Studies on recognition of functions, protein protein interactions and organization of tethered domain(s) in proteins containing LCR suggest that these proteins are part of variety of functional events such as signal transduction, enzymatic processes, cell differentiation, pyrimidine biosynthesis, fatty acid biosynthesis and chromosomal replication. Representations of low complexity regions of Plasmodium falciparum in protein data bank suggest that LCRs can take conformation of regular secondary structure (apart from disordered regions) in 3-D structures of proteins. Chapter 6 describes sequence analysis, structural modeling and evolutionary studies of Leishmania donovani hypusine pathway enzymes. Leishmania is an eukaryotic kinetoplastid protozoan parasite which causes leishmaniasis in humans. Hypusine is a non standard polyaminederived amino acid Nε-(4-amino-2-hydroxybutyl) lysine and is named after its two structural components, hydroxyputrescine and lysine. The eukaryotic translation initiation factor 5A (eIF5A) is the only cellular protein containing hypusine. Synthesis of hypusine is critical for the function of elF5A and is essential for eukaryotic cell proliferation and survival. Formation of hypusine is the result of a two step post-translational modification process involving enzymes (i) deoxyhypusine synthase (DHS) (ii) deoxyhypusine hydroxylase (DOHH). DHS, the first enzyme involved in hypusine pathway catalyzes the NAD-dependent transfer of the butylamino moiety of spermidine (substrate) to the ε-amino group of a specific lysine residue of eIF5A precursor and generates deoxyhypusine containing intermediate. DOHH, the second enzyme in same pathway catalyzes the hydroxylation of deoxyhypusine-containing intermediate, generating hypusine-containing mature eIF5A. Two putative deoxyhypusine synthase (DHS) sequences DHS34 and DHS20 have been identified in Leishmania donovani, by Professor Madhubala and coworkers (Jawaharlal Nehru University, New Delhi) with whom the work embodied in this chapter was done in collaboration. Detailed comparison of DHS34 sequence from Leishmania with human DHS protein indicated conservation of functionally important residues. 3D structural modeling studies of protein suggested that residues around the active site were absolutely conserved. NAD binding regions are located spatially closer, however, one NAD binding region was observed in a large (225 amino acid residues long) insertion. Based on these observations, DHS34 was predicted to have enzymatic activity. Experimental studies done by our collaborators confirmed preliminary results of computational analysis. Based on sequence and structural analysis of DHS20 and DOHH proteins, DHS20 and DOHH were proposed to be catalytically inactive and active respectively. Experimental studies on these proteins supported results of computational analysis. Deoxyhypusine synthase (DHS) and Deoxyhypusine hydroxylase (DOHH) are key proteins conserved in the hypusine synthesis pathways of eukaryotes. Because they are highly conserved, they could be coevolving. Comparison of the genetic distance matrices of DHS and DOHH proteins reveals that their evolutionary rates are better correlated when compared to the rate of an unrelated protein such as Cytochrome C. This indicates that they are coevolving, further serving as an indicator that, even non-interacting proteins that are functionally coupled, experience correlated evolution. However, this correlation does not extend to their tree topologies. Chapter 7 provides a classification scheme for protein kinases encoded in genomes of prokaryotic organisms. Overwhelming majority of the Ser/Thr protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Ser/Thr protein kinases prokaryotic Ser/Thr protein kinases. Traditional sequence alignment and phylogenetic approaches have been used to identify and classify prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence databases, it is recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly, subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, it was also possible to identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses. Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity in protein-protein interactions and the signaling pathways in these microbes. Chapter 8 focuses on recognition, classification of protein kinases encoded in genomes of viruses and their implications in various functions and diseases. Protein kinases encoded by viral genomes play a major role in infection, replication and survival of viruses. Using traditional sequence homology detection tools, sequence alignment methods and phylogenetic approaches, protein kinases were recognized. 646123 protein sequences from 35799 viral genomes (including strains) have been used in this analysis. Protein kinases are identified using a combination of profile-based search methods such as PSI-BLAST, RPS-BLAST and HMMER approaches. Based upon sequence similarity over the length of catalytic kinase domains, 479 protein kinase domains recognized in 244 viral genomes have been clustered into 46 subfamilies with minimum sequence identity of 35% within a subfamily. Viral protein kinases are encoded in genomes of retro-transcribing viruses or viruses which possess double stranded DNA as genetic material. Based on the available functional information present for one or more members of a subfamily, a putative function has been assigned to other members of the subfamily. Information regarding interaction of viral protein kinases with viral/host protein has also been considered for enhancing understanding of function of kinases in a subfamily. Out of 46 subfamilies, 14 subfamilies are characterized by various functions. Kinases belonging to UL97, US69, UL13 and BGLF subfamilies are virus specific. For 7 subfamilies, nearest neighbors are from well characterized eukaryotic protein kinase groups such as AGC, CAMK and CDK. Out of 25 new uncharacterized subfamilies observed in this analysis, 13 subfamilies are virus specific. Different subfamilies have been characterized by various functions which are crucial for viral infection such as synthesis of structural unit, replication of genetic material, modification of cellular components, alteration in host immune system, competing with cellular protein for efficient usage of host machinery. Also, many viral kinases share very high sequence identity (~97%) with their eukaryotic counterpart and represent disease state. For example, a protein kinase encoded in Avian erythroblastosis virus shares 97% sequence identity with catalytic domain of human epidermal growth factor receptor tyrosine kinase. Leucine at position 861 in human protein is substituted by Gln in cancer conditions; the viral protein kinase sequence possesses Gln at corresponding position and thus represents disease state. Chapter 9 provides study of dependency on the ability of 3-D structural features of comparative models and crystal structures of inactive forms of enzymes to predict enzymes by considering protein kinases as case study. With the advent of structural genomics initiatives, there is a surge in the number of proteins with 3-D structural information even before functional features are understood on many of these proteins. One of the useful annotations of a protein is the demarcation of a protein into an enzyme or non-enzyme solely from the knowledge of 3-D structure. This is facilitated by the identification of active sites and ligand binding sites in a protein. In this work, which was carried out in collaboration with Dr Jim Warwicker of Manchester University, UK, an approach developed by Warwicker and coworkers has been used. In the 3D structure of proteins, the largest clefts are generally considered to be ligand binding sites. This feature along with other sequence alignment independent properties such as residue preferences, fraction of surface residues and secondary structure elements have been considered to differentiate enzymes from non-enzymes. Electrostatic potential at the active site is one of the key properties utilized in this respect. Active sites in enzymes are generally associated with ionizable groups which can take part in catalysis. In addition to the feature of large clefts in enzymes, active site residues are in buried environments and show larger deviation in pKa values than surface residues. The method proposed by Warwicker and co-workers distinguish proteins in to enzymes and non-enzymes considering the electrostatic features at clefts along with the sequence profile of the protein concerned. Conformation of the inactive state of an enzyme is not congenial to the catalytic function. In an ideal situation, a method should be capable of predicting an enzyme irrespective of whether determined structure corresponds to active or inactive state. Peak potential values have been calculated by using Warwicker program for a set of 15 protein kinases for which 3-D structures are present in active as well in inactive conformations. Comparison of peak potential values calculated for active and inactive conformations suggests that algorithm can differentiate between active and inactive conformations as value for active conformations are generally higher than corresponding values for inactive conformations. However, the peak potential values are high enough for even the inactive conformations to be predicted as enzyme. Peak potential values calculated for generated homology models of protein kinases (for which crystal structures are already available) at different sequence identities with template sequences predict protein kinases as enzymes and their peak potential values are comparable to corresponding values for X-ray structures. This suggests that proteins for which there are no crystal or NMR structures yet available and no good template with high sequence identity are present, peak potential values for models generated at low sequence identity can still give insight into probable function of protein as an enzyme. The enzyme/non-enzyme prediction algorithm was also found to be useful in confirming enzyme functionality using 3-D models of putative viral kinases. Initially, putative function of kinase has been assigned to these viral proteins based solely upon their sequence characteristics such as presence of residues/motifs which are important for activity of the protein. The enzyme recognition method which is not directly sensitive to these motifs confirmed that all the analyzed putative viral kinases are enzymes. Chapter 10 presents conclusions of work embodied in the entire thesis. Very briefly, various computational approaches have been used to analyze and understand structural and functional properties of repertoire of proteins of pathogenic organisms. Analysis of uncharacterized protein domain families has helped to understand the functional implications of constituent proteins. Experimental validation of these results can further facilitate unraveling of functional aspects of proteins encoded in various pathogenic organisms. Apart from studies embodied in the thesis, author has been involved in two other studies, which are provided as appendices. Appendix 1 describes comparison of substitution pattern of amino acid residues of protein encoded in P. falciparum genome with substitution pattern of corresponding homologous proteins from non-Plasmodium organisms. Salient differences have been highlighted. Appendix 2 discusses study of bacterial tyrosine kinases with an objective of recognition of all putative protein tyrosine kinases in E. coli. Computational study suggests that protein SopA can be a potential tyrosine kinase and this conclusion is being tested experimentally in collaborator’s laboratory.
527

Structural Studies on Thiolases and Thiolase-like Proteins

Janardan, Neelanjana January 2014 (has links) (PDF)
The genus Mycobacterium comprises some of the most devastating pathogens that infect humans. Mycobacterium tuberculosis causes tuberculosis in humans leading to high morbidity and mortality. The disease is especially prevalent in the under-developed and developing countries of the tropics. Diseases like AIDS and cancer compromise the immune system of an individual leaving him/her susceptible to secondary infections, particularly of tuberculosis. Thus, tuberculosis is making reappearance even in the well-developed countries of the west. The emergence of multi drug resistant strains of tuberculosis makes this deadly disease difficult to cure. A vaccine against tuberculosis is therefore the need of the hour. Mycobacterium smegmatis is a non-pathogenic member of the same family. It has a relatively fast multiplication time when compared to M. tuberculosis and shares the same unique features of the family that make pathogenic members extremely resistant to chemicals and drugs. Proteins of M. smegmatis and M. tuberculosis share high sequence identities, making M. smegmatis the microorganism of choice to study its more deadly counterpart from the same family. A striking feature of all mycobacterial genomes is the abundance of genes coding for enzymes involved in fatty acid and lipid metabolism; more than 250 in Mycobacterium tuberculosis compared to only 50 in Escherichia coli. The mycobacterial genome codes for over a hundred enzymes involved in fatty acid degradation. Apart from providing energy, lipids and fatty acids also form an integral part of the cell wall and cell membrane of Mycobacteria. The abundance and importance of lipid metabolizing enzymes in Mycobacteria make them attractive targets for drug discovery. It is therefore of interest to biochemically and structurally characterize these enzymes. Thiolases are a group of enzymes that are involved in lipid metabolism. In the last step of the β-oxidation pathway, degradative thiolases catalyze the shortening of fatty acid chains by degrading 3-keto acyl CoA to acetyl CoA and a shortened acyl CoA molecule. Thiolases are a subfamily of the thiolase superfamily. This superfamily also includes the Ketoacyl-(Acyl-carrier-protein)-Synthase (KAS) enzymes, polyketide synthases and chalcone synthases. Most members of this superfamily are dimers and while only a few have been found to be tetramers. The tetramers are loosely held dimers of tight dimers. Examination of the Mycobacterium smegmatis genome revealed the presence of several putative thiolase genes. These genes have been annotated as thiolases on the basis of sequence analysis. However, none of them has been biochemically or structurally characterized. The sequence identity between some of these proteins and the other well-characterized thiolases is rather low. The work described in this thesis attempts to characterize two such enzymes from M. smegmatis structurally and functionally. Chapter 1 begins with a brief introduction to the genus Mycobacteria and the role of fatty acid metabolism in mycobacterial virulence. This is followed by a review of the current literature on the enzymes of the thiolase superfamily and their role in fatty acid metabolism. The chapter concludes with a brief summary on the aims and objectives of the work. Chapter 2 describes all the common experimental procedures and computational methods used during the course of these investigations, as most of them are applicable to all the structure determinations and analyses presented in later chapters. The experimental procedures described include overexpression, purification, site directed mutagenesis, isolation of plasmids, crystallization of proteins and X-ray diffraction data collection. Computational methods include structure determination protocols along with details of various programs used during data processing, structure determination, refinement, model building, structure validation and analysis. Chapter 3 describes the cloning, expression, purification, crystallization and structure determination of a thiolase-like protein (TLP1) from M. smegmatis. All enzymes of the thiolase superfamily that have been structurally characterized so far share four features: 1) conservation of the core α/β/α/β/α-layered structure of the thiolase domain, 2) conservation of the extensive dimerization interface, 3) the location of the active site pocket and conservation of key active site residues and 4) the use of a nucleophilic cysteine residue in catalysis. The crystal structure of MsTLP1 revealed some interesting differences when compared to classical thiolases. Of the four characteristic features of thiolases, MsTLP1 has the conserved thiolase fold. The location of its putative active site is similar to that in classical thiolases. However, the dimerization is not a conserved feature in MsTLP1, which appears to be a monomer in solution as well as in the crystal structure. The ligand binding groove of MsTLP1, identified by structural superposition with Z. ramigera thiolase, is larger than that of Z. ramigera. The absence of the catalytic cysteine suggested that though the protein has the strictly conserved thiolase fold, it might perform an entirely different function. A unique extra C-terminal domain of unknown function present only in MsTLP1 has been described towards the end of the chapter. A thorough sequence and structural analysis suggested that MsTLP1 might belong to a new subfamily in the thiolase superfamily. Chapter 4 describes the attempts made towards the biochemical characterization of MsTLP1. Thiolase assays carried out for the synthetic and degradative reactions revealed that the enzyme is inactive in both the directions. However, surface plasmon resonance binding studies revealed that the protein could bind to Coenzyme A, a feature it shares with other enzymes of the thiolase superfamily. Thorough bioinformatics analyses of the structure to determine the residues involved in CoA binding have also been described. The chapter ends with a discussion on the probable function of TLPs in Mycobacteria. Chapter 5 describes the cloning, expression, purification and X-ray structural studies on MsT1-L thiolase. This is the first structural report of a probable T1-thiolase. The protein crystallized in three different space groups, in all of which the enzyme was found to be in a tetrameric form. Analysis of the tetramer structures from the three different crystal forms revealed that MsT1-L exhibits some rotational flexibility about the central tetramerization loop. A qualitative and quantitative analysis of this movement has been described. Structural comparisons revealed that the overall structure of MsT1-L is very similar to that of the well-characterized biosynthetic thiolase form Z. ramigera. However, a detailed analysis of the ordered waters near the active site cavity revealed interesting differences between the two. The probable functional relevance of this observation has been discussed. The crystal structure of MsT1-L complexed with CoA has also been described in detail. Structural comparisons with classical thiolases also revealed significant differences in the organization of the loop domain that harbors most of the residues required for catalysis. These differences cause the active site cavity of MsT1-L to be larger than that of biosynthetic thiolase suggesting that MsT1-L thiolase could probably bind larger substrates. This cavity is large enough to accommodate a medium chain length fatty acyl CoA as substrate. Co-crystallization experiments with hexanoyl CoA revealed a novel binding site for the fatty acyl chain in MsT1-L and this has been described in detail. Contributions made towards the cloning and expression of other thiolases from S. typhimurium and P. falciparum have been described in Chapters 6 and 7. The thesis concludes with a brief discussion on the future prospects of the investigations presented here.
528

Optical Tweezers and Its use in Studying Red Blood Cells - Healthy and Infected

Paul, Apurba January 2016 (has links) (PDF)
The experiment discussed in the next chapter was to confirm the aforementioned bystander effect. In the first experiment we separated hosting and non-hosting mRBCs by the percol purification method and then measured the corner frequencies of them. The mean fc of the distribution is almost the same, and this confirms the effect of the parasite on the non-hosting mRBC. In the next experiment, we have incubated nRBCs in the spent media and measured the corner frequency at six-hours intervals to see how the fc changed with the incubation time. The results showed that within 24 hours, the fc of the incubated nRBCs increases to the level of the iRBCs. The fact that nRBCs are getting affected by the spent media indicates that some substances must be released in the spent media which alter the physical properties of the nRBCs. This kind of effect on non-host mRBCs was previously observed by some earlier works [Dondorp97, Sabolovic91a, Bambardekar08]. It has also been recently shown that the rosetting of the host mRBCs to the non-host mRBCs is also activated by the substances released in the medium [Handunnetti89, Wahlgren89], which are also somewhat similar to the bystander effect observed by us. In addition to this, there are reports which suggest that sickle cell disease also shows binding properties [Roseff08, Zhang12] which may be due to the substances released in the medium. So it was already observed that the released substances induced changes in the properties of RBCs, but our study gives a direct confirmation of the same. The next study was to find out the released substances which were responsible for the observed changes above. We incubated infected and uninfected RBCs in different drugs. Then, we measured them to see what kind of changes occur in the corner frequency of the incubated RBCs. The corner frequency of normal RBCs incubated in db-cAMP shows the maximum change. So the released substance that is responsible for the bystander effect may be due to the db-cAMP. All the experiments above were done using samples cultured only in the lab. Since the environment of the blood taken directly from the patient may differ from the one that is cultured in the lab, it is natural to find out if similar kinds of changes can be observed in the clinical sample or not. The study in chapter 6 was targeted to find out the same. We took clinical samples from BMRI for patients having a confirmed malaria infection by both P. falciparum and P. vivax. This also provided us the opportunity to work with the P. vivax infected sample as it is very difficult to culture them in the lab. The results shown in this chapter clearly indicate that similar kinds of changes occur in the clinical sample also. It is worth noting that even though P. vivax infects only immature RBCs (reticulocytes), changes were also observed in P. vivax samples. This gives us another strong confirmation about the previously observed bystander effect. This also indicates that this technique can be used as a tool to diagnose malaria. Although we cannot differentiate between P. falciparum and P. vivax, this technique combined with other well established techniques can give us more confirmation. So, in all the experiment above we have shown an easy and novel technique which can be used to differentiate between normal and malaria-infected RBCs. We have also observed the bystander effect and tried to find out the released substances which are responsible for this effect. We have shown that this technique can use the bystander effect of malaria to identify malaria. It has also been shown that the RBCs taken from the patient sample also show the same changes as the cultured samples, which gives us the possibility that this technique can be used as a diagnostic tool combined with other technique. This technique can also be used in experiments like the effects of drugs and to find out drugs for diseases like malaria. Future outlook 1. We have observed the changes only for malaria. There may be other diseases like sickle cell anemia which can also alter the corner frequency of the distribution of RBCs. We have to find out the specificity of the observed changes. 1 We can directly measure the elasticity of RBCs using dual traps in optical tweezers to find out the effect of different infections and drugs on the rigidity of RBCs and compare the with the data above. 2 We can also study other cells using the same method to see if we can find out any difference between healthy and unhealthy cells.
529

Functionally Interacting Proteins : Analyses And Prediction

Mohanty, Smita 11 1900 (has links) (PDF)
Functional interaction of proteins is a broad term encompassing many different types of associations that are observed amongst proteins. It includes direct non-covalent interactions where the interacting proteins physically associate using an interface. There are also many protein-protein interactions where the proteins concerned are not involved in direct physical interactions but affect each other’s functions. Central focus of this thesis is to understand the various aspects of functionally interacting proteins. Chapter 1 of this thesis provides an introduction to functional interactions between proteins and discusses the key developments available in the literature. This chapter discusses the different types of functional associations observed commonly between proteins. Various approaches developed over time to elucidate such interactions have also been discussed. This chapter highlights how functional interactions between proteins have been helpful in understanding different cellular processes such as organization of metabolic pathways. The chapter emphasizes the importance of functional interactions between proteins, providing a motivation for development of methods with enhanced accuracy and sensitivity for the prediction of functional interactions. In this thesis, domain families which are found to co-exist in multidomain proteins have been used to understand and subsequently predict functional associations amongst proteins. Domains in proteins typically serve as modules associated with specific functions. There exist proteins with a single domain which describes the entire function of a protein, while there also exist proteins containing multiple domains, where various domains in unison describe the complete function of the multidomain protein. Therefore, by virtue of “guilt by association” domain families found together in multidomain proteins are functionally linked. This forms the basic premise for understanding functional association amongst proteins and is explained in great detail in the Introduction chapter. Using domain families which co-occur in multidomain proteins as the basis for functional association has many merits. First, as stated before, constituent domain families act as effective descriptors of function(s) of proteins. For example, members of SH3 domain family mediate protein-protein interactions by binding to regions with polyproline conformation irrespective of the multidomain protein in which it occurs. Thus, studies of domain families co-existing in multidomain proteins act as an accurate resource of functional associations between proteins. Also, assignment of domains to a protein relies on homology detection which has achieved a high level of reliability, thus, resulting in reasonably accurate prediction of functions. Such approaches enable exhaustive coverage of many diverse proteins including many multidomain proteins leading to detection of large numbers of functional associations between domains of multidomain proteins. Given the advantages attributed to functionally linked domain families in further understanding of functional associations, it is imperative to exhaustively enumerate all possible pairs of functionally linked domain families in multidomain proteins and study their various properties. This aspect is covered in the second chapter of the thesis. In the second chapter, analysis of domain families which co-occur in multidomain proteins, termed as 'tethered domain families', has been reported. For this analysis, a large dataset of multidomain proteins was considered from a diverse set of fully sequenced genomes from many eukaryotic and prokaryotic organisms. In every multidomain protein, all possible pairs of unique domain family pairs have been considered and they are assumed to be under the same functional/evolutionary constraint. Thus, from the entire dataset of multidomain proteins, all possible pairs of tethered domain families are obtained. For a given domain family, the number of other uniquely tethered families is referred to as the tethering number of a domain family. Therefore, tethering number of a domain family is an indicator of the diverse functional contexts in which a particular domain family is involved. Further analysis was carried out to understand various other attributes of domain families and its relation to tethering number. The results are summarized in the following points: 1) Distribution of tethering numbers of domain families in the entire dataset is found to be highly heterogeneous. Nearly 88% of domain families (10783 out of 12249 domain families) have tethering number of 10 or less and only 78 domain families show more than 100 unique associations. Further analysis reveals bias in functions of families showing high and low tethering numbers. The domain families with high tethering numbers are involved in processes such as signaling and protein-protein interactions. The domain families with low tethering numbers are often found to be involved in metabolic processes. 2) Differences are also observed in the type of organisms containing the domain families and their tethering numbers. Typically, domain families with high tethering numbers are ubiquitously found across almost all the kingdoms of life. In contrast, most of the domain families exclusively found in a kingdom have low tethering numbers. Furthermore, for the ubiquitously occurring domain families with high tethering numbers, the number of associations made and the type of associations are not strictly conserved across the kingdoms. Thus, the tethering preferences of such domain families vary across the kingdoms depending on their function. For instance, the protein kinase domain family which is a key regulator of signaling processes in eukaryotes, has a high tethering number in eukaryotes (270), and low tethering number in prokaryotes (96). 3) Tethering number of domain families is found to be correlated with the number of members (population) comprising a family. A Pearson correlation coefficient of 0.78 at a p-value ≤0.001 is obtained for the correlation between tethering number of domain families and their population. 4) Tethering numbers of domain families are also found to be well correlated with sequence and functional diversity within families. Thus, domain families with high tethering numbers comprise of members showing diversity in both sequence and functions. Thus, the work presented in second chapter provides a framework for understanding the tethering preferences of domain families. The use of tethered domain families to identify functional association amongst proteins is the central theme of third and fourth chapters of this thesis. The use of tethered domain families for the prediction of functionally interacting proteins originates from the initial idea of “Rosetta stone” approach, which was proposed by Ouzounis and coworkers and Eisenberg and coworkers in 1999. Rosetta stone approach demonstrated the use of fused genes in predicting functional interaction. It stems from the observation that in many organisms, genes corresponding to proteins acting in a metabolic pathway are found fused in another organism. Thus, enumeration of 'fused genes' in a template database could provide a good basis for prediction of functionally interacting proteins in target organisms in which the homologous genes are not found to be fused. The method has been shown, by others, to work quite effectively in prokaryotes, especially in the identification of interactions between metabolic proteins. Chapter 3 of this thesis explores the idea of “Rosetta stones” at the level of domain families, by considering tethered domain families as analogs to the fused genes. In this analysis, tethered domain families derived from multidomain proteins comprises the template dataset. If members of two domain families occurring in a multidomain protein are found to occur independently in two different proteins in the target organism then an interaction is predicted between these two proteins (collection of such predicted interactions is henceforth referred as TEDIP database, Tethered Domain-based Interaction Prediction). During this analysis, care is taken such that none of the proteins in the template dataset belongs to the target organisms. The entire analysis has been conducted on 6 model organisms which act as the target dataset where functional interactions between proteins are predicted. The effectiveness of tethered domain families in functional interaction prediction is compared with two other datasets 1) all experimentally known interactions and 2) interactions predicted on the basis of their homology with interacting domain families with known structure. Subsequently, an attempt has been made to answer these questions: 1) how effective is the information on tethered domain families in predicting functional linkages amongst proteins operating in pathways in eukaryotic organisms? 2) what is the false positive rate of the predictions? The above mentioned datasets show very little overlap in the coverage of functional interactions. This is largely attributed to insufficient sampling and inherent bias existing in each of the methods. The TEDIP datasets in the six organisms led to an average three-fold more functional interaction predictions in cellular pathways than the other two datasets. Nearly 90% of the predicted interactions derived from tethered domain families are amongst proteins across different pathways. In yeast, more than 60% of such interactions were found to be overlapping with a recent large scale genetic interaction screen based on synthetic lethality especially performed for metabolic proteins, thus establishing the effectiveness of this approach in understanding pathway crosstalk. Along with efficacy in identifying functional interactions, an assessment based on co-localization, co-expression and overall functional similarity based on Gene Ontology (GO) terms was carried out. It was found that the TEDIP predictions and experimentally found interactions show poor correspondence with co-expression and co-localization data (10% and 20% respectively for the two methods). Additionally, it was found that functional similarity between predicted interacting proteins in TEDIP dataset is low (5%) and is comparable to experimentally known interactions that shows 10% similarity in functions based on a scoring function for GO term similarity. From Chapter 3, it was concluded that the use of tethered domain families is effective in exhaustive enumeration of functionally associated proteins. However, the low co-expression and functional similarity measures are a cause for concern. On the one hand, co-expression and GO functional similarity have been found to be weak predictors of functional interactions, explaining the low values obtained for both predictions in the TEDIP datasets and experimentally known interactions. On the other hand, the poorer values shown for predictions in the TEDIP datasets suggest that further improvement in prediction accuracy is possible. Chapter 4 explores the use of machine learning in improving the accuracy of functional interaction prediction based on TEDIP dataset. In Chapter 4, two distinct machine learning approaches have been employed on a training dataset derived exclusively from yeast. Since the objective of the work is to improve the accuracy of prediction of functional interactions, the GO based functional similarity measures have been used to define positive and negative datasets. Thus, in the training dataset, positive interactions comprises of protein pairs which show high GO similarity in functions as defined in chapter 3 and 10% of this data overlaps with experimentally known interactions, while the negative dataset consists of protein pairs with no or insignificant similarity in their functions and additionally do not show similarity to any experimentally known interactions. Two machine learning approaches, namely Support vector machine (SVM) and Random forest, have been used on this training dataset. Use of two distinct approaches helps in addressing the weakness, if any, of these methods. Fourteen carefully chosen features have been utilized during the training process to aid in the process of distinguishing potentially correctly predicted interactions from incorrect predictions. Out of 14 features, some of the features chosen for the analysis are involved in quantifying the extent of similarity between the template proteins containing the fused domain families and the target protein pairs predicted to interact. The analysis also incorporates graph theory based parameters which are derived from a domain family based graph. In such a graph, each of the domain families which are involved in forming multidomain proteins represents the nodes and an edge is constructed between domain families which are found to co-exist in at least one multidomain protein. Graph theory based parameters such as clustering coefficient, degree and topological overlap have been employed. These are useful in down weighting appropriately the domain family pairs showing large number of associations which are expected to be promiscuous in their functions. These features also enable in identifying domain family pairs which are functionally related. Apart from the above mentioned features, coevolution and phylogenetic profiling of tethered domain families is also utilized to identify functionally related domain family pairs. Utilizing all these features in training, the machine learning approach yielded an accuracy of 94% using SVM and 92% using Random forest against the training data. Furthermore, the importance of using all these features has been addressed by performing principle component analysis, training both SVM and Random forest by removing one feature at a time and by quantifying the sensitivity by using only one feature. All of these suggest that the features used provide non-redundant information and contributed significantly to the classification. The models so generated were finally used on all the predicted functional interactions after the removal of the training dataset in yeast. The true positives observed were 56% using SVM and 63% using Random forest with around 80% of the interactions common between the two methods. Further analysis has been carried out on these interactions by first imparting a confidence score to these interactions using support vector regression that provides a probabilistic measure for SVM classification. Based on a cutoff of 0.5, 62455 interactions in total were termed as high confidence interactions. Further analysis was carried out for the high confidence interactions. Out of these, in 2855 interactions, both the proteins predicted to interact could be associated with a pathway in KEGG database. In-depth case studies have been performed on this dataset of 2855 interactions. Literature mining suggested that many known cross-pathway interactions such as between TCA and glycolysis are captured as high confidence interactions using TEDIP dataset. A few other case studies of high confidence interactions with supporting literature evidence are also presented in the chapter. These predictions could further aid in experimental characterization of pathway cross-talk between important metabolic and signaling pathways. So far, the thesis discussed analyses involving functional interactions and their prediction. In the subsequent chapters, analyses pertaining to two different types of functional interactions are discussed. Chapters 5 and 6 involve analyses incorporating metabolic proteins in diverse pathways in the pathogenic organism Plasmodium falciparum. Chapter 5 attempts to improve the coverage of the repertoire of metabolic proteins in P.falciparum while in Chapter 6 interactions and pathways prevalent in different stages in the life cycle of the parasite are deciphered and discussed. Apart from functionally interacting proteins in metabolic pathways, physically and transiently interacting proteins have been analyzed and discussed in Chapters 7 and 8. In Chapter 5, metabolic proteins participating in pathways in Plasmodium falciparum have been analyzed. P.falciparum is the causative agent of malaria, a disease which affects large populations in the subtropical regions. P.falciparum genome is atypical and is rich in Adenine/Thymine pairs, and there is presence of large stretches of amino acid repeats encoded in protein coding regions. Various sequence-related features of P.falciparum proteins when compared with those of other organisms show extensive divergence. All of these have made reliable function prediction, by homology to other proteins with known functions, daunting. Like other proteins in P.falciparum, metabolic proteins have also diverged significantly from their functional counterparts in model eukaryotes such as yeast. Metabolic pathways play an important role in the survival of the organism and hence are amenable towards the identification of proteins susceptible to drugs, thereby combating pathogenesis. Chapter 5 of the thesis aims at furthering knowledge pertaining to metabolic proteins by first quantifying the extent of divergence observed in the already characterized metabolic proteins. This knowledge is further used in identification of potential metabolic proteins which are not identified as proteins involved in metabolic pathways by other annotation efforts undertaken for P.falciparum. In the first part of the chapter, the extent of divergence in the sequences of metabolic proteins in P.falciparum has been determined by comparing the P.falciparum proteins with their functional counterparts from 34 completely sequenced unicellular eukaryotic organisms. Comparison of domain architectures between the P.falciparum proteins with their functional counterparts reveals that in nearly 54% of metabolic pathways, proteins show nearly the same domain architecture as the other functional counterparts. Inversion, deletion and duplication of domains are observed in rest of the proteins. Further analysis reveals that P.falciparum proteins are longer than their functional counterparts. It was also observed in nearly 15% of the cases, the domains are characterized by the presence of large non-conserved or plasmodium genus specific inserts within the domain assigned regions. There is also prevalence of unassigned regions in the N- and C- terminal regions in P.falciparum proteins when compared with their functional counterparts. Finally, it was also observed that metabolic proteins of P.falciparum show significantly low sequence similarity when compared with other functional counterparts. From this analysis, it can be clearly seen that metabolic proteins of P.falciparum have significantly diverged from such proteins in other organisms, thus making function prediction by homology very difficult. There are several steps in metabolic pathways in P.falciparum which are expected to be active based on experimental analysis. However, some of these proteins with expected functions have not been identified so far. One of the reasons for this apparent incompleteness is the high divergence observed in the metabolic proteins of P. falciparum. To overcome this limitation, in the second part of the chapter, a sensitive approach based on domain family assignment (MulPSSM), developed in-house, has been used to identify proteins which are potentially involved in metabolic pathways. The approach is based on reverse PSI–BLAST, where multiple sequence profiles for each family are used to search against sequence databases. This approach has been shown to be better or at-par with other remote homology detection procedures. Using this approach, 15 P. falciparum proteins have been identified which can potentially function as metabolic proteins and were not characterized in P.falciparum so far. All the proteins identified by the approach show low sequence similarity to other well characterized proteins and contain significant fractions of unassigned regions thus, making function recognition non-trivial. Supporting literature and other data is provided to demonstrate the robustness of the homology-based annotation of the identified pathway proteins. Chapter 6 is an analysis of the dynamic changes occurring in the metabolic network of P.falciparum during its life cycle. In this chapter, two aspects of P. falciparum metabolic proteins have been integrated and analyzed. First, the dataset of protein-protein interactions derived from experimental studies and second, the datasets of microarray analysis providing information on stage specific expression of P. falciparum genes corresponding to the metabolic proteins. As a first step, protein-protein interaction information for the metabolic proteins was gathered. A total of 810 interactions have been obtained, where one or both proteins are involved in a pathway. Subsequently, these interactions were compared with 14070 interactions involving metabolic proteins from free-living and non-pathogenic unicellular eukaryote yeast. Comparison across the two organisms shows wide discrepancy in the number of proteins involved in interactions and also the pathways in which they participate. Out of the 810 interactions in P.falciparum, 173 are found uniquely in plasmodium where both or one of the protein have no identifiable homolog in yeast. Insufficient sampling of interactions made by proteins in P.falciparum in comparison to yeast, is one of the reasons for the observed discrepancy. However, the differences due to the parasitic lifestyle of P.falciparum could also be a potential reason. Further analysis of the protein-protein interactions by the metabolic proteins revealed that a large fraction of interactions are made between a metabolic protein and a non-metabolic protein. For instance, interaction observed between glycolytic protein phospoglycerate kinase with MAP kinase. This trend is observed in both plasmodium and yeast where 65% and 77% of the interactions, respectively, involve proteins not directly participating in metabolic pathways. Further, interactions between proteins belonging to different pathways and lastly, interactions between proteins in the same pathway are uncovered. All of these interactions depict the different modes by which metabolic pathways are regulated through protein-protein interactions. Another aspect explored in this analysis is the stage specific expression of genes encoding these metabolic proteins. The analysis is especially relevant in the parasite because its entire life cycle is divided into seven distinct stages. Upon integrating the protein-protein interactions with the gene expression data, it became apparent that the trophozoite, schizont and gametocyte stages show large fractions of co-expressed genes encoding proteins involved in protein-protein interactions within metabolic pathways. The high preponderance of co-expressed genes encoding for interacting protein pairs in these stages is also consistent with metabolic requirement of plasmodium in the various stages. Glycolytic pathway is central to energy production in the parasite and is discussed at length in this chapter. Members of this pathway are involved in interactions with other glycolytic proteins (9 such interactions), they also interact with proteins involved in other pathways (30 interactions) and with proteins not involved directly in any metabolic pathway (75 interactions). Nearly 70% of the interactions made by the glycolytic proteins are encoded by genes found to be co-expressed across the various stages. Integration of gene expression data along with protein-protein interaction information for metabolic pathways such as the glycolytic pathway thus, highlights the complex mode of regulation underlying this pathway. The analysis carried out in this chapter emphasizes on the intricacies involved in the regulation of metabolic proteins in P.falciparum. Chapter 7 describes an in-depth analysis carried out to understand the basis for interaction specificity between small monomeric GTPases and their regulators, the Guanine nucleotide Exchange Factors (GEFs). Monomeric GTPases are involved in binding to guanine nucleotide. These proteins can bind to both GTP and GDP. However, transition from GDP bound to GTP bound form occurs with large conformational changes and requires binding of the GEFs. The conformational changes that arise due to the nucleotide exchange are required for the GTPases to bind to its various effectors. For the analysis carried out in Chapter 7, GTPases belonging to the Ras superfamily have been considered. The superfamily is further subdivided into 5 distinct families based on their functions. The 5 families are Ras, Ran, Rab, Arf and Rho. Members belonging to each of these families are involved in a wide array of cellular processes such as signaling and cytoskeletal remodeling. Members of each of these GTPase families bind to structurally distinct GEFs, and in some cases, multiple GEFs are involved in nucleotide exchange within a family. It is intriguing therefore, to understand how GTPases belonging to the same structural family maintain specificity across the highly dissimilar GEFs and this forms the main objective of this analysis. So far, 13 distinct complexes between GTPases and their cognate GEFs have been solved using X-ray crystallography. This set of structural complexes forms the starting point of the analysis. As a first step, pairwise structural comparison of the interfaces has made between various pairs of complex structures. Based on these comparisons, it is apparent that most of the interfaces in the GTPase and GEF complexes comprise of residue positions which are topologically not equivalent suggesting different modes of binding across these complexes. Further analysis was carried out to probe the extent of specificity underlying these complexes. This is achieved by determining interface residues which are found to be conserved in a family specific manner. Such residue positions have been obtained by using a statistically robust algorithm Contrast Hierarchical Alignment and Interaction Network (CHAIN) that extracts sequence patterns most distinguishing two sets of homologous sequences. The analysis indicated the presence of family specific residues at the GTPase and GEF interface. Such residues could be implicated in maintaining the specific interactions between the GTPases and the GEFs. The robustness in the specificity of the interactions was further interrogated by providing an energetic basis to the specificity in the interactions mediated by the cognate GTPases and the GEFs and also understanding how crosstalk is prevented across the non-cognate complexes. For each of the 13 cognate complexes, empirical interaction energies have been estimated using FoldX. The interaction energy is compared to non-cognate complexes which are obtained by swapping the interface residues of the cognate GTPase with the non-cognate GTPase residues. For most of the complexes, it was observed that the interaction energies for the cognate complexes are much lower than the non-cognate complexes. Energy values across the non-cognate complexes are usually indicative of reduced stability, thereby precluding such interactions from occurring. Such large energy differences between cognate and non-cognate interactions arise due to drastic substitutions at the interface patch due to difference in the charge or other stereochemical aspects of the amino acids. Both evolutionary and energy based analysis indicates the presence and importance of few family specific residues in the cognate complexes and also the presence of unfavorable residues in the non-cognate complexes thus preventing crosstalk. However, apart from changes at the interfaces, many positions outside the interface also undergo changes across the various homologs within the same family/subfamily of GTPase. Coevolutionary analysis of GTPase and GEFs from multiple eukaryotic organisms has been carried out in these complexes and it was observed that most of the coevolving positions are not found at the interface. Many of these residue positions are near the active site or near the interface. Identification of such coevolving positions, where residue variations in the GTPase are strongly coupled to the GEF, may provide initial clues to the possible allosteric path adopted in connecting the binding of GEF to the vast structural changes observed during GTP exchange in GTPases. Thus, the analysis provides a comprehensive framework to understand how interaction specificity has evolved between the GTPase and GEF complexes. Chapter 8 discusses another example of transient protein-protein interaction observed between proteins implicated in signaling process in Dictyostelium discoideum. The work reported in this chapter was carried out in collaboration with Prof. Nanjundaiah and coworkers from Molecular Reproduction and Developmental Genetics department, Indian Institute of Science. All the experimental analyses mentioned in this chapter were carried out by Prof. Nanjundaiah and coworkers and the author carried out all the computational analysis. Experimental analysis indicated the presence of a ribosomal protein S4 in D. discoideum which mediates interactions with CDC24 and CDC42. The protein is speculated to be a functional analog of yeast scaffolding protein Bem1. However, the exact structural and sequence features of the protein which can accommodate its non-ribosomal function as a scaffold by mediating protein-protein interactions are not clearly understood. With the aid of structural modeling, a 3-D structure was generated for the C-terminal regions of D. discoideum protein S4. The modeled structure, as in the template used for modelling, resembled the fold of SH3 domain which has been shown to be involved in protein-protein interactions. Structural and sequence analyses were carried out to evaluate the potential mode by which interactions could be mediated by this protein. The hypothesis generated was further corroborated by experimental analysis. Thus, both experimental and computational analysis provide evidence for the functional role of the ribosomal protein S4 from Dictyostelium discoideum as a scaffold. Chapter 9 summarizes the conclusions reached in various chapters of the thesis. The thesis embodies analyses probing various aspects of functional interactions between proteins. A frame work has been provided to elucidate functional interactions using tethered domain families in multidomain proteins. Further, the role of these functional interactions have been explored in different scenarios by exhaustively analyzing metabolic proteins and their regulation in pathogenic organism Plasmodium falciparum and by also analyzing two distinct types of transient protein-protein interactions.
530

Plasmodium falciparum Histidine-rich Protein 2 Gene Variation and Malaria Detection in Madagascar and Papua New Guinea

Willie, Nigani 04 June 2018 (has links)
No description available.

Page generated in 0.1082 seconds