Global ETD Search

11	Accessing complex genomic variation in Plasmodium falciparum natural infections Wendler, Jason Patrick January 2015 (has links) Genetic polymorphism in Plasmodium falciparum is a considerable obstacle to malaria intervention. Parasites have repeatedly evolved to overcome every front-line antimalarial deployed throughout history, and artemisinin resistant populations are expanding in Southeast Asia. Promising vaccine candidates routinely fail when challenged by the genetic diversity of natural parasite populations, and a recent trial using a blood-stage antigen showed immunity was allele specific. Modern sequencing technologies have revolutionized our understanding of parasite genomics and population genetics by providing access to single nucleotide variation, but characterizing more complex polymorphism remains a key challenge. Solving this problem is important because the selective pressures from drugs and host immunity often create complex polymorphism in the most clinically relevant genes that is missed using standard genotyping methods. In three sections, this thesis is a narrative about 1) encountering complex variation, 2) overcoming it with novel tools, and then 3) innovatively applying those tools to old and new questions. I first show examples of complex variation in a vaccine candidate (EBA-175) and a drug resistance gene (pfcrt) while reporting SNP based analyses of Kenyan and Tanzanian field isolates. While introducing this complex variation I also describe biological insights discovered in these populations. In Kenya I show evidence that chloroquine resistance selects for parasites that are primaquine sensitive, use a GWAS approach to discover new drug resistance loci, and catalogue variation in known resistance genes. In Tanzania I describe the population structure and allele frequencies of parasites from two geographic regions. In the second section of the thesis I develop methods for accessing complex variation and demonstrate their utility by producing de novo assemblies of eba-175, pfcrt, ama1, and msp3.4 from thousands of sequenced samples. Finally, in the third section I apply these tools in depth to eba-175. I comprehensively characterize the SNP and structural variation in eba-175 using an alignment of 1419 de novo assemblies. I use this resource to illustrate the profiles of positive selection across the gene, and corroborate these signals of balancing selection by showing the geographic distribution of the F/C indels and a lesser known 6bp indel positioned between the DBL domains. I then use the alignments to design Sequenom genotyping assays that facilitate a genome wide association study, testing for human associations with the eba-175 indels in the infecting parasite. I close by reporting a potential association on human chromosome 14 with the 6bp indel in eba-175. 616.9
12	The development of rapid genotyping methods for methicillin-resistant Staphylococcus aureus Stephens, Alex J. January 2008 (has links) Methicillin-resistant Staphylococcus aureus (MRSA) is an important human pathogen that is endemic in hospitals all over the world. It has more recently emerged as a serious threat to the general public in the form of community-acquired MRSA. MRSA has been implicated in a wide variety of diseases, ranging from skin infections and food poisoning to more severe and potentially fatal conditions, including; endocarditis, septicaemia and necrotising pneumonia. Treatment of MRSA disease is complicated and can be unsuccessful due to the bacterium's remarkable ability to develop antibiotic resistance. The considerable economic and public health burden imposed by MRSA has fuelled attempts by researchers to understand the evolution of virulent and antibiotic resistant strains and thereby improve epidemiological management strategies. Central to MRSA transmission management strategies is the implementation of active surveillance programs, via which unique genetic fingerprints, or genotypes, of each strain can be identified. Despite numerous advances in MRSA genotyping methodology, there remains a need for a rapid, reproducible, cost-effective method that is capable of producing a high level of genotype discrimination, whilst being suitable for high throughput use. Consequently, the fundamental aim of this thesis was to develop a novel MRSA genotyping strategy incorporating these benefits. This thesis explored the possibility that the development of more efficient genotyping strategies could be achieved through careful identification, and then simple interrogation, of multiple, unlinked DNA loci that exhibit progressively increasing mutation rates. The baseline component of the MRSA genotyping strategy described in this thesis is the allele-specific real-time PCR interrogation of slowly evolving core single nucleotide polymorphisms (SNPs). The genotyping SNP set was identified previously from the Multi-locus sequence typing (MLST) sequence database using an in-house software package named Minimum SNPs. As discussed in Chapter Three, the genotyping utility of the SNP set was validated on 107 diverse Australian MRSA isolates, which were largely clustered into groups of related strains as defined by MLST. To increase the resolution of the SNP genotyping method, a selection of binary virulence genes and antimicrobial resistance plasmids were tested that were successful at sub typing the SNP groups. A comprehensive MRSA genotyping strategy requires characterisation of the clonal background as well as interrogation of the hypervariable Staphylococcal Cassette Chromosome mec (SCCmec) that carries the β-lactam resistance gene, mecA. SCCmec genotyping defines the MRSA lineages; however, current SCCmec genotyping methods have struggled to handle the increasing number of SCCmec elements resulting from a recent explosion of comparative genomic analyses. Chapter Four of this thesis collates the known SCCmec binary marker diversity and demonstrates the ability of Minimum SNPs to identify systematically a minimal set of binary markers capable of generating maximum genotyping resolution. A number of binary targets were identified that indeed permit high resolution genotyping of the SCCmec element. Furthermore, the SCCmec genotyping targets are amenable for combinatorial use with the MLST genotyping SNPs and therefore are suitable as the second component of the MRSA genotyping strategy. To increase genotyping resolution of the slowly evolving MLST SNPs and the SCCmec binary markers, the analysis of a hypervariable repeat region was required. Sequence analysis of the Staphylococcal protein A (spa) repeat region has been conducted frequently with great success. Chapter Five describes the characterisation of the tandem repeats in the spa gene using real-time PCR and high resolution melting (HRM) analysis. Since the melting rate and precise point of dissociation of double stranded DNA is dependent on the size and sequence of the PCR amplicon, the HRM method was used successfully to identify 20 of 22 spa sequence types, without the need for DNA sequencing. The accumulation of comparative genomic information has allowed the systematic identification of key MRSA genomic polymorphisms to genotype MRSA efficiently. If implemented in its entirety, the strategy described in this thesis would produce efficient and deep-rooted genotypes. For example, an unknown MRSA isolate would be positioned within the MLST defined population structure, categorised based on its SCCmec lineage, then subtyped based on the polymorphic spa repeat region. Overall, by combining the genotyping methods described here, an integrated and novel MRSA genotyping strategy results that is efficacious for both long and short term investigations. Furthermore, an additional benefit is that each component can be performed easily and cost-effectively on a standard real-time PCR platform.
13	Proteogenomics for personalised molecular profiling Schlaffner, Christoph Norbert January 2018 (has links) Technological advancements in mass spectrometry allowing quantification of almost complete proteomes make proteomics a key platform for generating unique functional molecular data. Furthermore, the integrative analysis of genomic and proteomic data, termed proteogenomics, has emerged as a new field revealing insights into gene expression regulation, cell signalling, and disease processes. However, the lack of software tools for high-throughput integration and unbiased modification and variant detection hinder efforts for large-scale proteogenomics studies. The main objectives of this work are to address these issues by developing and applying new software tools and data analysis methods. Firstly, I address mapping of peptide sequences to reference genomes. I introduce a novel tool for high-throughput mapping and highlight its unique features facilitating quantitative and post-translational modification mapping alongside accounting for amino acid substitutions. The performance is benchmarked. Furthermore, I offer an additional tool that permits generation of web accessible hubs of genome wide mappings. To enable unbiased identification of post-translational modifications and amino acid substitutions for high resolution mass spectrometry data, I present algorithmic updates the mass tolerant blind spectrum comparison tool ’MS SMiV’. I demonstrate the applicability of the changes by benchmarking against a published mass tolerant database search of a high resolution tandem mass spectrometry dataset. I then present the application of ‘MS SMiV’ on a panel of 50 colorectal cancer cell lines. I show that the adaption of ‘MS SMiV’ outperforms traditional sequence database based identification of single amino acid variants. Furthermore, I highlight the utility of mass tolerant spectrum matching in combination with isobaric labelled quantitative proteomics in distinguishing between post-translational modifications and amino acid variants of similar mass. In the last part of this work I integrate both tools with a high-throughput proteogenomic identification pipeline and apply it to a pilot study of chondrocytes derived from 12 osteoarthritic individuals. I show the value of this approach in identifying variation between individuals and molecular levels and highlight them with individual examples. I show that multi-plexed proteogenomics can be used to infer genotypes of individuals.
14	Database Support for 3D-Protein Data Set Analysis Lehner, Wolfgang, Hinneburg, Alexander 25 May 2022 (has links) The progress in genome research demands for an adequate infrastructure to analyze the data sets. Database systems reflect a key technology to organize data and speed up the analysis process. This paper discusses the role of a relational database system based on the problem of finding frequent substructures in multi-dimensional protein databases. The specific problem consists of producing a set of association rules regarding frequent substructures with different lengths and gaps between the amino acid residues of a protein. From a database point of view, the process of finding association rules building the base for a more in-depth analysis of the data material is split into two parts. The first part performs a discretization of the conformational angle space of a single amino acid residue by computing the nearest neighbor of a given set of representatives. The second part consists in adapting a well-known association rule algorithm to determine the frequent substructures. Both steps within this comprehensive analysis task requires substantial support of the underlying database in order to reduce the programming overhead at the application level. info:eu-repo/classification/ddc/005 ddc:005
15	Discovery and evolutionary dynamics of RBPs and circular RNAs in mammalian transcriptomes Badve, Abhijit 30 March 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / RNA-binding proteins (RBPs) are vital post-transcriptional regulatory molecules in transcriptome of mammalian species. It necessitates studying their expression dynamics to extract how post-transcriptional networks work in various mammalian tissues. RNA binding proteins (RBPs) play important roles in controlling the post-transcriptional fate of RNA molecules, yet their evolutionary dynamics remains largely unknown. As expression profiles of genes encoding for RBPs can yield insights about their evolutionary trajectories on the post-transcriptional regulatory networks across species, we performed a comparative analyses of RBP expression profiles across 8 tissues (brain, cerebellum, heart, lung, liver, lung, skeletal muscle, testis) in 11 mammals (human, chimpanzee, gorilla, orangutan, macaque, rat, mouse, platypus, opossum, cow) and chicken & frog (evolutionary outgroups). Noticeably, orthologous gene expression profiles suggest a significantly higher expression level for RBPs than their non-RBP gene counterparts, which include other protein-coding and non-coding genes, across all the mammalian tissues studied here. This trend is significant irrespective of the tissue and species being compared, though RBP gene expression distribution patterns were found to be generally diverse in nature. Our analysis also shows that RBPs are expressed at a significantly lower level in human and mouse tissues compared to their expression levels in equivalent tissues in other mammals: chimpanzee, orangutan, rat, etc., which are all likely exposed to diverse natural habitats and ecological settings compared to more stable ecological environment humans and mice might have been exposed, thus reducing the need for complex and extensive post-transcriptional control. Further analysis of the similarity of orthologous RBP expression profiles between all pairs of tissue-mammal combinations clearly showed the grouping of RBP expression profiles across tissues in a given mammal, in contrast to the clustering of expression profiles for non-RBPs, which frequently grouped equivalent tissues across diverse mammalian species together, suggesting a significant evolution of RBPs expression after speciation events. Calculation of species specificity indices (SSIs) for RBPs across various tissues, to identify those that exhibited restricted expression to few mammals, revealed that about 30% of the RBPs are species-specific in at least one tissue studied here, with lung, liver, kidney & testis exhibiting a significantly higher proportion of species specifically expressed RBPs. We conducted a differential expression analysis of RBPs in human, mouse and chicken tissues to study the evolution of expression levels in recently evolved species (i.e., humans and mice) than evolutionarily-distant species (i.e., chickens). We identified more than 50% of the orthologous RBPs to be differentially expressed in at least one tissue, compared between human and mouse, but not so between human and an outgroup chicken, in which RBP expression levels are relatively conserved. Among the studied tissues (brain, liver and kidney) showed a higher fraction of differentially expressed RBPs, which may suggest hyper- regulatory activities by RBPs in these tissues with species evolution. Overall, this study forms a foundation for understanding the evolution of expression levels of RBPs in mammals, facilitating a snapshot of the wiring patterns of post-transcriptional regulatory networks in mammalian genomes. In our second study, we focused on elucidating novel features of post-transcriptional regulatory molecules called as circRNA from LongPolyA RNA-sequence data. The debate over presence of nonlinear exon splicing such as exon-shuffling or formation of circularized forms has finally come to an end as numerous repertoires have shown of their occurrence and presence through transcriptomic analyses. It is evident from previous studies that along with consensus-site splicing non-consensus site splicing is robustly occurring in the cell. Also, in spite of applying different high-throughput approaches (both computational and experimental) to determine their abundance, the signal is consistent and strongly conforming the plausible circularization mechanisms. Earlier studies hypothesized and hence focused on the ribo-minus non-polyA RNA-sequence data to identify circular RNA structures in cell and compared their abundance levels with their linear counterparts. Thus far, the studies show their conserved nature across tissues and species also that they are not translated and preferentially are without poly (A) tail, with one to five exons long. Much of this initial work has been performed using non-polyA sequencing thus probably underestimates the abundance of circular RNAs originating from long poly (A) RNA isoforms. Our hypothesis is if the circular RNA events are not the artifact of random events, but has a structured and defined mechanism for their formation, then there would not be biases on preferential selection / leaving of polyA tails, while forming the circularized isoforms. We have applied an existing computational pipeline from earlier studies by Memczack et. al., on ENCODE cell-lines long poly (A) RNA-sequence data. With the same pipeline, we achieve a significant number of circular RNA isoforms in the data, some of which are overlapping with known circular RNA isoforms from the literature. We identified an approach and worked upon to identify the precise structure of circular RNA, which is not plausible from the existing computational approaches. We aim to study their expression profiles in normal and cancer cell-lines, and see if there exists any pattern and functional significance based on their abundance levels in the cell. RNA-protein interactions -- Research Evolutionary genetics -- Research Genomics -- Research Gene expression -- Research Genetic regulation -- Research Computational biology -- Research Genetic translation -- Research Bioinformatics -- Research Genetic transcription -- Regulation RNA splicing Nucleotide sequence -- Research
16	Data-aware SOA for Gene Expression Analysis Processes Lehner, Wolfgang, Habich, Dirk, Richly, Sebastian, Assmann, Uwe, Grasselt, Mike, Maier, Albert, Pilarsky, Christian 11 May 2022 (has links) In the context of genome research, the method of gene expression analysis has been used for several years. Related microarray experiments are conducted all over the world, and consequently, a vast amount of microarray data sets are produced. Having access to this variety of repositories, researchers would like to incorporate this data in their analyses processes to increase the statistical significance of their results. Such analyses processes are typical examples of data-intensive processes. In general, data-intensive processes are characterized by (i) a sequence of functional operations processing large amount of data and (ii) the transportation and transformation of huge data sets between the functional operations. To support data-intensive processes, an efficient and scalable environment is required, since the performance is a key factor today. The service-oriented architecture (SOA) is beneficial in this area according to process orchestration and execution. However, the current realization of SOA with Web services and BPEL includes some drawbacks with regard to the performance of the data propagation between Web services. Therefore, we present in this paper our data-aware service-oriented approach to efficiently support such data-intensive processes. info:eu-repo/classification/ddc/572.8 ddc:572.8

Page generated in 0.0968 seconds