Global ETD Search

1	Host-fungal pathogen interactions: A study of Candida albicans and mammalian macrophage and epithelial cells at the transcriptional level Delorey, Toni Marie 31 May 2019 (has links) It is estimated that fungal infections kill greater than 1.6 million people annually, a number that is comparable to the number of deaths associated with tuberculosis. Candida species are the fourth leading cause of hospital-acquired blood infections and Candida albicans is the most common cause of these fungal blood infections (known as candidemia). Immunocompromised individuals, such as those who have HIV/AIDS, those undergoing chemotherapy treatments, or those on broad-spectrum antibiotics, are most likely to develop candidemia. Candidemia is associated with a 20-40% mortality rate. However, when patient treatment for candidemia is delayed for over 48 hours, associated mortality rates increase to 78%. Blood infections can disseminate Candida albicans throughout the body, eventually leading to infection in vital organs like the liver, kidney and brain. Optimal patient outcomes are achieved if antifungal therapy is given within 12 hours after a blood sample is obtained for culture and testing. However, current blood tests cannot reliably detect Candida this early and thus antifungals are not routinely given to patients in this time frame. Counterintuitively, it is believed that some fungi, like many bacteria, are non-harmful residents in small intestines of most adults and this hypothesis is supported by the fact that the most common fungal species in the human gut is Candida albicans. However, intestinal overgrowth of C. albicans is linked to Crohn's disease, and disease-causing forms of C. albicans can arise from commensal strains that once resided in the patient’s gastrointestinal tract. The specific molecular mechanisms by which C. albicans interacts with host immune cells versus intestinal cells, and those that trigger Candida pathogenicity remain unknown. Many strains of Candida albicans have developed resistance to azoles, the major class of drugs used to treat both superficial and systemic infections. In order to develop new treatments, we must better understand host-fungal pathogen biology to determine novel antifungal targets or therapeutics to fight fungal infections at the early stages of infection. In this work, we developed a novel tool that allows us to measure which genes are important to both the host and Candida albicans simultaneously, in specific infection states. We have applied this tool to measure gene expression in Candida albicans interacting with mammalian macrophages and small intestine epithelial cells– at both the population and single cell levels. When examining populations of sorted infection samples, we found that host immune cells both exposed to and infected with fungal cells exhibit similar expression patterns. In contrast, phagocytosed C. albicans exhibit unique expression patterns compared to those merely exposed to macrophages. We found that immune response genes in single, Candida infected macrophages exhibited bimodal expression patterns for some immune response genes. We also observed examples of expression bimodality in live Candida inside of single macrophages. Both Candida albicans and host small intestine epithelial cells demonstrate distinct patterns of expression when exposed to each other at the population level, compared to unexposed controls. However, the magnitude of these differences is dependent on the multiplicity of infection. Some expression programs overlapped with those observed in populations of Candida cells interacting with macrophages, with key differences. We also observed expression bimodality among epithelial cells infected with C. albicans. We believe the information obtained using this technique could be used when considering new antifungal or therapeutics targets; if uniform and high expression of particular in genes in Candida populations phagocytosed by macrophages or invading epithelial cells leads to high and early protein production, these proteins may be effective antifungal targets. Similarly, if some host immune response genes are not expressed in a population of Candida infected macrophages as uniformly and highly as expected, these genes or proteins could be target of a therapeutic for patients with Candida infections that are resistant to azoles. Candida albicans RNA-sequencing
2	Computational Processing of Omics Data: Implications for Analysis Benjamin, Ashlee Marie January 2013 (has links) <p>In this work, I present four studies across the range of 'omics data types - a Genome- Wide Association Study for gene-by-sex interaction of obesity traits, computational models for transcription start site classification, an assessment of reference-based mapping methods for RNA-Seq data from non-model organisms, and a statistical model for open-platform proteomics data alignment.</p><p>Obesity is an increasingly prevalent and severe health concern with a substantial heritable component, and marked sex differences. We sought to determine if the effect of genetic variants also differed by sex by performing a genome-wide association study modeling the effect of genotype-by-sex interaction on obesity phenotypes. Genotype data from individuals in the Framingham Heart Study Offspring cohort were analyzed across five exams. Although no variants showed genome-wide significant gene-by-sex interaction in any individual exam, four polymorphisms displayed a consistent BMI association (P-values .00186 to .00010) across all five exams. These variants were clustered downstream of LYPLAL1, which encodes a lipase/esterase expressed in adipose tissue, a locus previously identified as having sex-specific effects on central obesity. Primary effects in males were in the opposite direction as females and were replicated in Framingham Generation 3. Our data support a sex-influenced association between genetic variation at the LYPLAL1 locus and obesity-related traits.</p><p>The application of deep sequencing to map 5' capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: focused promot- ers with transcription start sites (TSSs) that occur in a narrowly defined genomic span and dispersed promoters with TSSs that are spread over a larger window. Pre- vious studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, and our collaborators recently inves- tigated the relationship with chromatin features. It was found that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Here, we present computational models supporting the stronger contribution of chro- matin features to the definition of dispersed promoters compared to focused start sites. Specifically, dispersed promoters display enrichment for well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone vari- ants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes.</p><p>The application of next-generation sequencing technology to gene expression quantification analysis, namely, RNA-Sequencing, has transformed the way in which gene expression studies are conducted and analyzed. These advances are of partic- ular interest to researchers studying non-model organisms, as the need for knowl- edge of sequence information is overcome. De novo assembly methods have gained widespread acceptance in the RNA-Seq community for non-model organisms with no true reference genome or transcriptome. While such methods have tremendous utility, computational complexity is still a significant challenge for organisms with large and complex genomes. Here we present a comparison of four reference-based mapping methods for non-human primate data. We explore mapping efficacy, correlation between computed expression values, and utility for differential expression analyses. We show that reference-based mapping methods indeed have utility in RNA-Seq analysis of mammalian data with no true reference, and that the details of mapping methods should be carefully considered when doing so. We find that shorter seed sequences, allowance of mismatches, and allowance of gapped alignments, in addition to splice junction gaps result in more sensitive alignments of non-human primate RNA-Seq data.</p><p>Open-platform proteomics experiments seek to quantify and identify the proteins present in biological samples. Much like differential gene expression analyses, it is often of interest to determine how protein abundance differs in various physiological conditions. Label free LC-MS/MS enables the rapid measurement of thousands of proteins, providing a wealth of peptide intensity information for differential analysis. However, the processing of raw proteomics data poses significant challenges that must be overcome prior to analysis. We specifically address the matching of peptide measurements across samples - an essential pre-processing step in every proteomics experiment. Presented here is a novel method for open-platform proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion data. Our results suggest that the inclusion of additional data results in higher numbers of more confident matches, without increasing the number of mismatches. We also show that the incorporation of product ion data can improve results dramatically. Based on these results, we argue that the incorporation of ion mobility drift times and product ion information are worthy pursuits. In addition, alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods. The addition of drift times and/or high energy to alignment methods and accurate mass and time (AMT) tag databases can greatly improve experimenters ability to identify measured peptides, reducing analysis costs and potentially the need to run additional experiments.</p> / Dissertation Bioinformatics Alignment Proteomics RNA-Sequencing
3	Global RNA profiling of susceptible and tolerant genotypes of Brassica napus infected with Sclerotinia sclerotiorum and prediction and functional characterization of novel regulators of plant defense Girard, Ian January 2016 (has links) Brassica napus (L.) contributes over $19 billion dollars each year to the Canadian economy. However, yields are constantly threatened by Sclerotinia sclerotiorum (Lib) de Bary, the fungus responsible for Sclerotinia stem rot. To date, there are no global RNA profiling data or gene regulatory analyses of plant tissues directly at the main site of foliar infection in the B. napus-S. sclerotiorum pathosystem. Using RNA sequencing and a gene regulatory analysis, I discovered putative transcriptional regulators of biological processes associated with the tolerant phenotype of B. napus cv. Zhougyou821 including subcellular localization of proteins, pathogen detection, and redox homeostasis. Functional characterization of Arabidopsis mutants identified a number of genes that contribute directly to plant defense to S. sclerotiorum. Together this research amounts to the expansion of our understanding of the B. napus-S. sclerotiorum pathosystem and a valuable resource to help protect B. napus crops from virulent pathogens such as S. sclerotiorum. / October 2016 RNA sequencing Plant-pathogen interactions
4	Anatomic and transcriptomic characterization of the canola (Brassica napus) funiculus during seed development. Chan, Ainsley January 2013 (has links) Canola (Brassica napus) is a $19.3 billion industry for the Canadian economy annually, largely because of the demand for the oil derived from the seeds of this crop plant. Seed development and accumulation of important nutrients requires coordinated interactions between all seed regions, including the funiculus. The funiculus is a structure of the seed that serves as the only connection between the filial seed and the parent plant, yet its development and underlying transcriptional programs have not been explored. Using light and transmission electron microscopy, I completed an anatomical study of the funiculus over the course of development, from the mature ovule to the post-mature green seed stage. My results show that all three plant tissue systems of the funiculus undergo profound changes at the histological and ultrastructural level. To understand the programs that orchestrate these changes in the globular stage funiculus, I used laser microdissection coupled with RNA-sequencing. This produced a high-resolution dataset of the mRNAs present in each of the three tissue systems of the funiculus. Various clustering analyses and gene ontology term enrichment analysis identified several important biological processes associated with each tissue system. My data show that cell wall growth occurs in the epidermis, photosynthesis occurs in the cortex, and tissue proliferation and differentiation occurs in the vasculature. The importance of these processes in supporting overall seed growth and development is discussed, which can have profound implications in the genetic modification of the canola seed through manipulating the transcriptional activity of the funiculus. canola funiculus seed development transcriptome RNA-sequencing
5	A pipeline for differential expression analysis of RNA-seq data and the effect of filter cutoff on performance Robert, Bonnie-Jean 01 September 2017 (has links) RNA sequencing is a powerful new approach to analyzing differential expression of transcripts between treatments. Many statistical methods are now available to test for differential expression, each one reports results differently. This thesis presents a workflow of five popular methods and discusses the results. A pipeline was built in the R language to analyze four of these packages using a real RNA-seq dataset. At present, researchers must prepare RNA-seq data prior to analysis to achieve reliable results. Filtering is a necessary preparatory step in which transcripts exhibiting low levels of genetic expression are removed from further analysis. Yet, little research is available to guide researchers on how best to choose this threshold. This thesis introduces a study designed to determine if the choice of filter threshold has a significant effect on individual package performance. Increasing the filtering threshold was shown to decrease the sensitivity and increase the specificity of the four statistical methods studied. / Graduate Bioinformatics Statistics RNA-Sequencing Filter Differential Expression
6	RNA sequencing for the study of splicing Gonzàlez-Porta, Mar January 2014 (has links) Amongst the many processes that shape the final set of RNA molecules present in eukaryote cells, splicing emerges as the most prominent mechanism for message diversification. In recent years, applications of high throughput sequencing to RNA, known as RNA sequencing, have opened new avenues for the study of transcriptome composition, and have enabled further characterisation of such mechanism. In this thesis, I focus on the application of this technology to the study of human transcript diversity and its potential impact on the protein repertoire. In the first results chapter, I explore the extent of transcriptome diversity by asking whether there is a preference for the production of specific alternative transcripts within each given gene. I show that while many alternative transcripts can be detected, the expression of most protein coding genes tends to be dominated by one single transcript (major transcript). Such findings are validated in the second chapter, and are further used to explore changes in splicing patterns in a disease context. By analysing healthy and tumor samples from kidney cancer patients, I show that most of the detected splicing alterations do not lead to big changes in the relative abundance of major transcripts, at least in a recurrent manner. In addition, I introduce a framework to visualise the most extreme changes in splicing and to evaluate their potential functional impact. In the third chapter, I investigate the role of spliceosome assembly dynamics on the regulation of splice site choice. I show that depletion of PRPF8, a core spliceosomal component, leads to the preferential retention of a subset of introns with weaker splice sites, and also introduces alterations in the rate of co-transcriptional splicing. Finally, in the last chapter, I explore the validation of changes in alternative transcript abundance at the protein level, through the integration of results derived from RNA sequencing datasets with those obtained from proteomics experiments. Altogether, the findings described in this thesis provide a global picture on the extent of alternative splicing in the diversification of the transcriptome, expand current knowledge on the splicing reaction, and open new possibilities for the integration of transcriptomics and proteomics data. 572.8
7	Correction of batch effects in single cell RNA sequencing data using ComBat-Seq Dullea, Jonathan Tyler 20 February 2021 (has links) Single cell RNA sequencing allows expression profiles for individual cells to be obtained thus offering unprecedented insight into the behavior of individual cells. Insight gained from exploration of individual cells has implications in both cancer and developmental biology. Much of the power of these models is derived from the shear amount and granularity of the data that can be collected; however, with this power comes the deleterious introduction of batch effects. Samples sequenced on different days, by different technicians can show variance that cannot be attributed to biological condition, but rather is only due to the batch in which it was sequenced. These batch effects can cause alterations to the perceived relationships between the main effect and the outcome of interest, for instance cancer status, the main effect of cancer status may be hidden by the unwanted and unmodeled variance. Two known methods for the correction of batch effects in bulk RNA sequencing data are ComBat-Seq and Surrogate Variable Analysis; in this work, we demonstrate that when cell-type is known, inclusion of that covariate in the ComBat-Seq results in an appropriate correction of the batch effect. We also demonstrate that when cell-type is not known, SVA can be used to infer cell-type information form the latent structure of the count matrix with some loss of accuracy compared to the correction with cell type. This cell type information can be used in place of the actual cell-type covariate information to correct single cell RNA sequencing data with ComBat-Seq; inclusion of surrogate variables helps the accuracy of the correction in certain scenarios. Additionally, in the case where cell-type is not known, and the cell proportions are balanced between batches we demonstrate that ComBat-Seq can be used naive to cell-type information. The efficacy of this procedure is demonstrated with two simulated datasets and a dataset containing Jurkat and t293 cells. These results are then compared to Harmony, a recently reported batch correction algorithm. The procedure, herein reported, has benefits over harmony in certain situations such as when a counts matrix is needed for further analysis or when there is thought to be substantial intra-cell-type variability across different batches. Bioinformatics Batch correction Single cell RNA sequencing
8	Elucidating the Contribution of Stroke-Induced Changes to Neural Stem and Progenitor Cells Associated with a Neuronal Fate Chwastek, Damian 25 February 2021 (has links) Following stroke there is a robust increase in the proliferation of neural stem and progenitor cells (NSPCs) that ectopically migrate from the subventricular zone (SVZ) to surround the site of damage induced by stroke (infarct). Previous in vivo studies by our lab and others have shown that a majority of migrating NSPCs when labelled prior to stroke become astrocytes surrounding the infarct. In contrast, our lab has shown that the majority of NSPCs when labelled after stroke become neurons surrounding the infarct. This thesis aims to elucidate the contributions of intrinsic changes that can alter the temporal fate of the NSPCs. The NSPCs were fate mapped in this study using the nestin-CreERT2 mouse model and strokes were induced using the photothrombosis model within the cortex. In alignment with our previous findings, fate-mapping the NSPCs using a single injection of tamoxifen treatment revealed a temporal-specific switch in neuronal fate when NSPCs were labeled at timepoints greater than 7 days following stroke. Single cell RNA sequencing and histological analysis identified significant differences in the proportion of populations of NSPCs and their progeny labeled at the SVZ in the absence or presence of a stroke. NSPCs labelled after stroke were comprised of a reduced proportion of quiescent neural stem cells alongside an accompanied increase in doublecortin-expressing neuroblasts. The RNA transcriptional profile of the NSPCs labelled also revealed NSPCs and their progeny labeled after stroke had an overall enrichment for a neuronal transcription profile in all of the labeled cells with a reduction in astrocytic gene expression in quiescent and activated neural stem cells. Furthermore, we highlight the presence of perturbed transcriptional dynamics of neuronal genes, such as doublecortin following stroke. Altogether, our study reveals following a stroke there is a sustained intrinsic regulated neuronal-fated response in the NSPCs that reside in the SVZ that may not be exclusive from extrinsic regulation. This work raises the challenge to learn how to harness the potential of this response to improve recovery following stroke through examining their contributions to recovery. Stroke Neurogenesis Single Cell RNA Sequencing
9	How does light affect the heat stress response in Arabidopsis? Kim, Eunje 11 1900 (has links) Light and temperature are two of the most important environmental factors regulating plant development. Although heat stress has been well studied, little is known about the interaction between light and temperature. In this study, we performed phenotypic assays comparing seedling responses to heat under light and dark conditions. Seedlings exposed to heat in the dark show lower survival rates than seedlings stressed in the light. To identify transcriptional changes underlying light-dependent heat tolerance, we used RNA-sequencing. The light-dependent heat stress responses involved a plethora of genes which could be potential candidate genes for light-induced heat tolerance, including transcription factors (bHLH) and genes commonly associated with biotic stress. By using the latest high-throughput phenotyping facility, we found that the light-dependent heat tolerance is reflected more on the maintenance of photosynthetic capacity, rather than leaf temperature. These results provide insights into how light increases heat stress tolerance in Arabidopsis seedlings and suggest its underlying mechanisms. Heat stress Arabidopsis RNA-sequencing Photosynthesis
10	Computational approaches for whole-transcriptome cancer analysis based on RNA sequencing data Tan, Yuxiang 12 February 2016 (has links) RNA-Seq (Whole Transcriptome Shotgun Sequencing) provides an ideal platform to study the complete set of transcripts for a specific developmental stage or physiological condition. It reveals not only expression-level changes, but also structural changes in the coding sequences, including gene rearrangements. In this dissertation, I present my contributions to the development of computational tools for the robust and efficient analysis of RNA-seq data to support cancer research. To automate the laborious and computationally intensive procedure of RNA-seq data management, I worked on the development of Hydra, an RNA-seq pipeline for the parallel processing and quality control of large numbers of samples. With user-friendly reports on quality control and running checkpoints, Hydra makes the data processing procedure fast, efficient and reliable. Here, I report my application of the pipeline to the analysis of patient-derived lymphoma xenograft samples, to show Hydra’s ability to detect abnormalities (e.g., mouse tissue contamination) in the sequencing data. Because fusions play an important role in carcinogenesis, fusion detection has become an important area of methodological research. Several computational methods have been developed to identify fusion transcripts from RNA-seq data. However, all these methods require realignment to the transcriptome, a computationally expensive task, unnecessary in many cases. Here, I present QueryFuse, a novel gene-specific fusion-detection algorithm for aligned RNA-seq data. It is designed to help biologists find and/or computationally validate fusions of interest quickly, and to annotate the detected events with visualization and detailed properties of the supporting reads. By focusing the fusion detection on read pairs aligned to query genes, we can not only reduce realignment time, but also afford to use a more accurate but computationally expensive local aligner. In the extensive evaluation I performed, I obtained comparable or better results compared with two widely adopted tools (deFuse and TophatFusion) on two simulated datasets, as well as on cell line datasets with known fusions. Finally, I contributed to the identification of a novel fusion event in lymphoma, with potential therapeutic implications in clinical samples. I validated this fusion in silico by my putative reference method before experimental validation. Bioinformatics RNA-sequencing Cancer Fusion Pipeline

Search results