Global ETD Search

1	Enhancing preprocessing and clustering of single-cell RNA sequencing data Wang, Zhe 04 October 2021 (has links) Single-cell RNA sequencing (scRNA-seq) is the leading technique for characterizing cellular heterogeneity in biological samples. Various scRNA-seq protocols have been developed that can measure the transcriptome from thousands of cells in a single experiment. With these methods readily available, the ability to transform raw data into biological understanding of complex systems is now a rate-limiting step. In this dissertation, I introduce novel computational software and tools which enhance preprocessing and clustering of scRNA-seq data and evaluate their performance compared to existing methods. First, I present scruff, an R/Bioconductor package that preprocesses data generated from scRNA-seq protocols including CEL-Seq or CEL-Seq2 and reports comprehensive data quality metrics and visualizations. scruff rapidly demultiplexes, aligns, and counts the reads mapped to genomic features with deduplication of unique molecular identifier (UMI) tags and provides novel and extensive functions to visualize both pre- and post-alignment data quality metrics for cells from multiple experiments. Second, I present Celda, a novel Bayesian hierarchical model that can perform simultaneous co-clustering of genes into transcriptional modules and cells into subpopulations for scRNA-seq data. Celda identified novel cell subpopulations in a publicly available peripheral blood mononuclear cell (PBMC) dataset and outperformed a PCA-based approach for gene clustering on simulated data. Third, I extend the application of Celda by developing a multimodal clustering method that utilizes both mRNA and protein expression information generated from single-cell sequencing datasets with multiple modalities, and demonstrate that Celda multimodal clustering captured meaningful biological patterns which are missed by transcriptome- or protein-only clustering methods. Collectively, this work addresses limitations present in the computational analyses of scRNA-seq data by providing novel methods and solutions that enhance scRNA-seq data preprocessing and clustering. Bioinformatics Clustering scRNA-seq Single-cell sequencing
2	Inferring the Origin of Cells at the Maternal-Fetal Interface (FEMO) Varley, Thomas 23 May 2022 (has links) No description available. Bioinformatics Computer Science
3	Celltyper: A Single-Cell Sequencing Marker Gene Tool Suite Paisley, Brianna Meadow 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Single-cell RNA-sequencing (scRNA-seq) has enabled researchers to study interindividual cellular heterogeneity, to explore disease impact on cellular composition of tissue, and to identify novel cell subtypes. However, a major challenge in scRNA-seq analysis is to identify the cell type of individual cells. Accurate cell type identification is crucial for any scRNA-seq analysis to be valid as incorrect cell type assignment will reduce statistical robustness and may lead to incorrect biological conclusions. Therefore, accurate and comprehensive cell type assignment is necessary for reliable biological insights into scRNA-seq datasets. With over 200 distinct cell types in humans alone, the concept of cell identity is large. Even within the same cell type there exists heterogeneity due to cell cycle phase, cell state, cell subtypes, cell health and the tissue microenvironment. This makes cell type classification a complicated biological problem requiring bioinformatics. One approach to classify cell type identity is using marker genes. Marker genes are genes specific for one or a few cell types. When coupled with bioinformatic methods, marker genes show promise of improving cell type classification. However, current scRNA-seq classification methods and databases use marker genes that are non-specific across sources, samples, and/or species leading to bias and errors. Furthermore, many existing tools require manual intervention by the user to provide training datasets or the expected number and name of cell types, which can introduce selection bias. The selection bias negatively impacts the accuracy of cell type classification methods as the model cannot extrapolate outside of the user inputs even when it is biologically meaningful to do so. In this dissertation I developed CellTypeR, a suite of tools to explore the biology governing cell identity in a “normal” state for humans and mice. The work presented here accomplishes three aims: 1. Develop an ontology standardized database of published marker gene literature; 2. Develop and apply a marker gene classification algorithm; and 3. Create user interface and input data structure for scRNA-seq cell type prediction. Bioinformatics Database scRNA-seq Marker gene User interface
4	Building an analytical framework for quality control and meta-analysis of single-cell data to understand heterogeneity in lung cancer cells Hong, Rui 20 March 2024 (has links) Single-cell RNA sequencing (scRNA-seq) has been a powerful technique for characterizing transcriptional heterogeneity related to tumor development and disease pathogenesis. Despite the advances of technology, there is still a lack of software to systematically and easily assess the quality and different types of artifacts present in scRNA-seq data and a statistical framework for understanding heterogeneity in the gene programs of cancer cells. In this dissertation, I first introduced novel computational software to enhance and streamline the process of quality control for scRNA-seq data called SCTK-QC. SCTK-QC is a pipeline that performs comprehensive quality control (QC) of scRNA-seq data and runs a multitude of tools to assess various types of noise present in scRNA-seq data as well as quantification of general QC metrics. These metrics are displayed in a user-friendly HTML report and the pipeline has been implemented in two cloud-based platforms. Most scRNA-seq studies only profiled a small number of tumors and provided a narrow view of the transcriptome in tumor tissue. Next, I developed a novel framework to perform a large-scale meta-analysis of cancer cells from 12 studies with scRNA-seq data from patients with non-small-cell lung cancer (NSCLC). I discovered interpretable gene co-expression modules with celda and demonstrated that the activity of gene modules accounted for both inter- and intra-tumor heterogeneity of NSCLC samples. Furthermore, I used CaDRa to determine that the levels of some gene modules were significantly associated with combinations of underlying genetic alterations. I also showed that other gene modules are associated with immune cell signatures and may be important for communication with the cancer cells and the immune microenvironment. Finally, I presented a novel computational method to study the association between copy number variation (CNV) and gene expression at the single-cell level. The diversity of the CNV profile was identified in tumor subclones within each sample and I discovered cis and trans gene signatures which have expression values associated with specific somatic CNV status. This study helped us prioritize the potential cancer driver genes within each CNV region. Collectively, this work addressed the limitation in the quality control of scRNA-seq data and provided insights for understanding the heterogeneity of NSCLC samples. Bioinformatics Multi-omics NSCLC scRNA-seq Transcriptome analysis
5	Convolutional neural network-based program to predict lymph node metastasis of non-small cell lung cancer using ¹⁸F-FDG PET / ¹⁸F-FDG PETから非小細胞肺癌のリンパ節転移を予測する畳み込みニューラルネットワークの開発木寺, 英太郎 23 May 2024 (has links) 京都大学 / 新制・課程博士 / 博士(医学) / 甲第25495号 / 医博第5095号 / 新制\|\|医\|\|1073(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授溝脇尚志, 教授伊達洋至, 教授黒田知宏 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM iPSC T cell differentiation CD4 3D organoid scRNA seq 490
6	Understanding cell-type diversification during developmental pattern formation in sea urchin embryos using single cell and molecular approaches Hawkins, Dakota Young 26 September 2024 (has links) From the discovery of developmental gradients to pioneering some of the first gene regulatory models, the sea urchin model has played a foundational role in deciphering the complex molecular mechanisms behind the phenomena that underlie pattern formation during embryonic development. Of particular interest to our lab, primary mesenchyme cells (PMCs), a skeletogenic lineage, provide an excellent system for understanding the mechanisms behind skeletal pattern formation. Sea urchin skeletal patterning is driven by ectodermal cues that are differentially expressed in space and time; these cues instruct the PMCs. Originating as a homogeneous population, PMCs diversify in response to patterning cue reception, then produce distinct skeletal elements as a function of the cues that they have received from the ectoderm. However, the exact mechanisms underpinning PMC diversification and the role that individual ectodermal cues play to mediate this diversification process is poorly understood. To bridge that knowledge gap, this work leverages multiple data modalities, including single-cell RNA sequencing (scRNA-seq) and 3D visualization of gene expression in normal and perturbed embryos to not only present an exhaustive description of PMC diversification, but also offers novel computational approaches and the development of resources necessary for these studies. First, we present the novel algorithm ICAT. Created to correctly identify cell states from mixedcondition scRNA-seq experiments, ICAT plays a necessary role in identifying PMC subpopulations affected by ectodermal cue disruption. Using simulated and real datasets, we benchmark ICAT against several state-of-the-art workflows, and find ICAT provides more robust and sensitive performance compared to current practices. We further validate ICAT in vivo using single molecule fluorescent in situ hybridization (FISH) and show that, compared to leading algorithms, ICAT uniquely and correctly characterizes the effects of patterning cue disruption on PMC subpopulation composition. Finally, by combining temporal scRNA-seq data throughout skeletal patterning with a newly generated spatial gene expression reference map, we not only identify distinct PMC subpopulations, but also provide spatial and temporal coherence to each of their developmental trajectories during skeletal pattern formation. We compliment this work by inferring the gene regulatory networks underlying PMC diversification and thereby identifying the transcriptional regulators that function as network hubs. We empirically demonstrate that these hubs are required for skeletal patterning, and spatially map their expression within the PMCs. Sequencing single PMCs isolated from embryos in which ectodermal cue function was inhibited, we show that functional loss of each cue uniquely disrupts the PMC gene regulatory network and characterize the subsequent compositional effects of PMC subpopulations. Taken together, this work defines the spatiotemporal details of PMC diversification in normal embryos as well as in embryos with individual cue losses, as well as offering numerous novel computational methods and resources necessary for these advances. / 2026-09-26T00:00:00Z Bioinformatics Diversification scRNA-seq Sea urchin Single-cell Spatial Temporal
7	Structured Bayesian methods for splicing analysis in RNA-seq data Huang, Yuanhua January 2018 (has links) In most eukaryotes, alternative splicing is an important regulatory mechanism of gene expression that results in a single gene coding for multiple protein isoforms, thus largely increases the diversity of the proteome. RNA-seq is widely used for genome-wide splicing isoform quantification, and several effective and powerful methods have been developed for splicing analysis with RNA-seq data. However, it remains problematic for genes with low coverages or large number of isoforms. These difficulties may in principle be ameliorated by exploiting correlations encoded in the structured data sources. This thesis contributes to developments of Bayesian methods for splicing analysis by leveraging additional information in multiple datasets with structured prior distributions. First, we developed DICEseq, the first isoform quantification method tailored to time-series RNA-seq experiments. DICEseq explicitly models the correlations between experiments at different time points to aid the quantification of isoforms across experiments. Numerical experiments on both simulated and real datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Second, we developed BRIE (Bayesian Regression for Isoform Estimation), a Bayesian hierarchical model which resolves the difficulties in splicing analysis in single-cell RNA-seq (scRNA-seq) data by learning an informative prior distribution from sequence features. This method combines the quantification and imputation for splicing analysis via a Bayesian way, which is particularly useful in scRNA-seq data due to its extreme low coverages and high technical noises. We validated BRIE on several scRNA-seq data sets, showing that BRIE yields reproducible estimates of exon inclusion ratios in single cells. Third, we provided an effective tool by using Bayes factor to sensitively detect differential splicing between different single cells. When applying BRIE to a few real datasets, we found interesting heterogeneity patterns in splicing events across cell population, for example alternative exons in DNMT3B. In summary, this thesis proposes structured Bayesian methods to integrate multiple datasets to improve splicing analysis and study its biological functions.
8	Network analysis of human vitiligo scRNA-seq data reveals complex mechanisms of immune activation Gellatly, Kyle 22 November 2021 (has links) The advent of scRNA-seq has rapidly advanced our understanding of complex systems by enabling the researcher to look at the full transcriptional profile within each cell, with the potential to reveal intercellular communications within a tissue. To map these communications, I created SignallingSingleCell, an R package that provides an end-to-end approach for the analysis of scRNA-seq data, with a particular focus on building ligand and receptor signaling networks. Using these powerful techniques, we sought to dissect the heterogenous population of cells recently reported within the BMDC culture system. From this data we were able to determine the cell type composition, identify the different myeloid responses to similar stimuli, and unify recent conflicting studies about the populations within this system. We then applied these tools to study vitiligo, an autoimmune disease of the skin, to answer fundamental questions about the initiation and progression of disease. We found signatures of increased antigen presentation through MHC-I, loss of immunotolerance cytokines such as TGFB1 and IL-10, and changes in the complex chemokine circuits that influence T cell localization, including an essential role for CCR5 in Treg function. In order to identify and characterize the autoreactive T cells that are responsible for the targeted destruction of melanocytes, we then paired scRNA-seq with TCR-seq and MHC-II complexes loaded with melanocyte antigen. From this data we contrast the transcriptional state of melanocyte specific T cells to bystanders found within the skin and circulation. scRNA-seq Autoimmunity Vitiligo Bioinformatics Integrative Biology OS and Networks Translational Medical Research
9	Utilizing unlabeled data in cell type identification : A semi-supervised learning approach to classification Quast, Thijs January 2020 (has links) Recent research in bioinformatics has presented multiple cell type identification meth- dologies using single cell RNA sequence data (scRNA-seq). However, a consensus on which cell typing methodology consistently demonstrates superior performance remains absent. Additionally, very few studies approach cell type identification through a semi- supervised learning study, whereby the information in unlabeled data is leveraged to train an enhanced classifier. This paper presents cell annotation methodologies through self- learning and graph-based semi-supervised learning, in both raw count scRNA-seq data as well as in a latent embedding. I find that a self-learning framework enhances perfor- mance compared to a solely supervised learning classifier. Additionally, modelling on the latent data representations consistently outperforms modelling on the original data. The results show an overall accuracy of 96.12%, whereas additional models achieve an average precision rate of 95.12% and an average recall rate of 94.40%. The semi-supervised learn- ing approaches in this thesis compare favourable to scANVI in terms of accuracy, average precision rate, average recall rate and average f1-score. Moreover, results for alternative scenarios, in which cell types among training and test data do not perfectly overlap, are reported in this thesis. Semi-supervised cell type identification scRNA-seq Probability Theory and Statistics Sannolikhetsteori och statistik
10	Reconstruction of Cell and Tissue-specific Immune-protein Interactomes Using Single-cell RNA Sequencing Data Althobaiti, Atheer 04 1900 (has links) Protein molecules and their interactions via protein-protein interactions (PPIs) are at the core of cellular functions. While such global PPI networks have been useful for analyzing gene function and effects of genetic variants, they do not resolve tissue and cell-typespecific interactions. Here we leverage recent advances in single-cell RNA sequencing (scRNA-seq) to reconstruct cell-type-specific PPI networks across different tissues to enable a context-sensitive analysis of immune cells’ gene-protein pathways. Targeting B cells, T cells, and macrophage cells as a proof-of-principle, we used scRNA-seq data across different tissues from the Tabula Muris mouse consortium. We mapped the protein-coding DEGs to a protein-protein interaction network database (STRING v.11). Topological and global similarity analysis of the networks revealed distinct properties between tissues highlighting tissue-specific behaviors for each cell type. For example, we found that degree and clustering coefficients distributions were tissue-specific. Different cell types and tissues displayed specific characteristics, and in particular, the splenic PPI networks were different compared to other analyzed tissues for all the immune cell types examined. For example, the pairwise comparison of the Jaccard index for node similarity and the mantel test correlation analysis showed that the spleen’ node and PPI networks are more different than any other tissues for each cell type examined. The physiological and anatomical properties that distinguish the spleen from other examined tissues might explain why the splenic PPI networks tend to be less similar compared to other tissues. The cell-type-specific network analyses using the different distance measures between the adjacency matrices on the hub nodes such as Euclidean, Manhattan, Jaccard, and Hamming distances showed a macrophage-specific behavior not observed in B cells and T cells, confirming their lineage differences. Finally, we explored the rewiring of selected hub nodes and transcription factors in the PPI networks along with their biological enrichments to validate our observations. The suggested biological validity of our results confirms the relevance of data-driven reconstruction of these context-sensitive networks using more advanced network inference algorithms. In conclusion, scRNA-seq enables the reconstruction of global unspecific PPI networks into cell and tissue-specific networks, thereby providing an increased resolution of the biological context. Protein-protein interaction Network Analysis scRNA-seq Immune cells Cell-specific interaction Tissue-specific interaction

Search results