Indiana University-Purdue University Indianapolis (IUPUI) / Single-cell RNA-sequencing (scRNA-seq) has enabled researchers to study interindividual
cellular heterogeneity, to explore disease impact on cellular composition of
tissue, and to identify novel cell subtypes. However, a major challenge in scRNA-seq
analysis is to identify the cell type of individual cells. Accurate cell type identification is
crucial for any scRNA-seq analysis to be valid as incorrect cell type assignment will reduce
statistical robustness and may lead to incorrect biological conclusions. Therefore, accurate
and comprehensive cell type assignment is necessary for reliable biological insights into
scRNA-seq datasets.
With over 200 distinct cell types in humans alone, the concept of cell identity is
large. Even within the same cell type there exists heterogeneity due to cell cycle phase, cell
state, cell subtypes, cell health and the tissue microenvironment. This makes cell type
classification a complicated biological problem requiring bioinformatics.
One approach to classify cell type identity is using marker genes. Marker genes are
genes specific for one or a few cell types. When coupled with bioinformatic methods,
marker genes show promise of improving cell type classification. However, current
scRNA-seq classification methods and databases use marker genes that are non-specific
across sources, samples, and/or species leading to bias and errors. Furthermore, many
existing tools require manual intervention by the user to provide training datasets or the
expected number and name of cell types, which can introduce selection bias. The selection bias negatively impacts the accuracy of cell type classification methods as the model cannot
extrapolate outside of the user inputs even when it is biologically meaningful to do so.
In this dissertation I developed CellTypeR, a suite of tools to explore the biology
governing cell identity in a “normal” state for humans and mice. The work presented here
accomplishes three aims: 1. Develop an ontology standardized database of published
marker gene literature; 2. Develop and apply a marker gene classification algorithm; and
3. Create user interface and input data structure for scRNA-seq cell type prediction.
Identifer | oai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/33196 |
Date | 05 1900 |
Creators | Paisley, Brianna Meadow |
Contributors | Liu, Yunlong, Yan, Jingwen, Cao, Sha, Wang, Juexin, Carfagna, Mark |
Source Sets | Indiana University-Purdue University Indianapolis |
Language | en_US |
Detected Language | English |
Type | Dissertation |
Page generated in 0.0101 seconds