1 |
Utilizing unlabeled data in cell type identification : A semi-supervised learning approach to classificationQuast, Thijs January 2020 (has links)
Recent research in bioinformatics has presented multiple cell type identification meth- dologies using single cell RNA sequence data (scRNA-seq). However, a consensus on which cell typing methodology consistently demonstrates superior performance remains absent. Additionally, very few studies approach cell type identification through a semi- supervised learning study, whereby the information in unlabeled data is leveraged to train an enhanced classifier. This paper presents cell annotation methodologies through self- learning and graph-based semi-supervised learning, in both raw count scRNA-seq data as well as in a latent embedding. I find that a self-learning framework enhances perfor- mance compared to a solely supervised learning classifier. Additionally, modelling on the latent data representations consistently outperforms modelling on the original data. The results show an overall accuracy of 96.12%, whereas additional models achieve an average precision rate of 95.12% and an average recall rate of 94.40%. The semi-supervised learn- ing approaches in this thesis compare favourable to scANVI in terms of accuracy, average precision rate, average recall rate and average f1-score. Moreover, results for alternative scenarios, in which cell types among training and test data do not perfectly overlap, are reported in this thesis.
|
2 |
Identifying tumor cell types and structural organization based on highly multiplexed fluorescence imaging dataKang, Ziqi January 2022 (has links)
Advances in multiplex fluorescence imaging now allow the measurement of more than 50protein markers in whole tissue sections at single-cell resolution. This promises to reveal tumor biology at an unprecedented level of detail, both in undisturbed growth and in therapy. However, to quantitatively analyze these images, the images must be broken down into the basic units of tumor biology: single cells and their types. In this study, we applied a graph-based unsupervised clustering method, Leiden, to perform cell type identification in highly multiplexed fluorescence images, and based on the annotated images, we ran the tumor microenvironment niches analysis in order to resolve the recurring patterns of tumor microarchitecture. This thesis first introduces several potentially feasible clustering methods selected based on the structure of the datasets studied. The performance and stability of these clustering methods were compared. The project involved benchmarking different dimensionality reduction and clustering techniques on manually annotated reference datasets and healthy tissue with known cellular composition. It was ultimately determined that appropriate data transformations combined with Leiden clustering methods with proper parameters could automatically identify cells in a way coherent with established marker profiles. The results imply that Leiden clustering can also identify clusters of cells with novel marker combinations. Careful examination of the multiplex images shows that the markers are indeed found in the tumor, leading to new hypotheses regarding tumor biology. Tumor microenvironment niches analysis found several archetypal niches with specific cellular composition, indicating active accumulation of immune cells after radiotherapy, and the less vascularized feature of rebound glioblastomas after treatment. We hope to further validate our analysis to provide new insights into the pathological process of glioblastoma. In future research, the analysis pipeline is planned to be improved so that it can be robustly used to analyze the growing data of multiplexed tumor images, both in mouse cancer models or patient samples.
|
Page generated in 0.1508 seconds