Global ETD Search

Return to search

Utilizing unlabeled data in cell type identification : A semi-supervised learning approach to classification

Recent research in bioinformatics has presented multiple cell type identification meth- dologies using single cell RNA sequence data (scRNA-seq). However, a consensus on which cell typing methodology consistently demonstrates superior performance remains absent. Additionally, very few studies approach cell type identification through a semi- supervised learning study, whereby the information in unlabeled data is leveraged to train an enhanced classifier. This paper presents cell annotation methodologies through self- learning and graph-based semi-supervised learning, in both raw count scRNA-seq data as well as in a latent embedding. I find that a self-learning framework enhances perfor- mance compared to a solely supervised learning classifier. Additionally, modelling on the latent data representations consistently outperforms modelling on the original data. The results show an overall accuracy of 96.12%, whereas additional models achieve an average precision rate of 95.12% and an average recall rate of 94.40%. The semi-supervised learn- ing approaches in this thesis compare favourable to scANVI in terms of accuracy, average precision rate, average recall rate and average f1-score. Moreover, results for alternative scenarios, in which cell types among training and test data do not perfectly overlap, are reported in this thesis.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-165996

Semi-supervised

cell type identification

scRNA-seq

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-165996
Date	January 2020
Creators	Quast, Thijs
Publisher	Linköpings universitet, Statistik och maskininlärning
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds

Utilizing unlabeled data in cell type identification : A semi-supervised learning approach to classification

Description

Links & Downloads

Tags

Additional Fields