Global ETD Search

Return to search

Labeling Clinical Reports with Active Learning and Topic Modeling / Uppmärkning av kliniska rapporter med active learning och topic modeller

Supervised machine learning models require a labeled data set of high quality in order to perform well. Available text data often exists in abundance, but it is usually not labeled. Labeling text data is a time consuming process, especially in the case where multiple labels can be assigned to a single text document. The purpose of this thesis was to make the labeling process of clinical reports as effective and effortless as possible by evaluating different multi-label active learning strategies. The goal of the strategies was to reduce the number of labeled documents a model needs, and increase the quality of those documents. With the strategies, an accuracy of 89% was achieved with 2500 reports, compared to 85% with random sampling. In addition to this, 85% accuracy could be reached after labeling 975 reports, compared to 1700 reports with random sampling.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148463

active learning

topic modeling

topic models

Binary Version Space Minimization

Clinical Reports

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-148463
Date	January 2018
Creators	Lindblad, Simon
Publisher	Linköpings universitet, Interaktiva och kognitiva system
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds

Labeling Clinical Reports with Active Learning and Topic Modeling / Uppmärkning av kliniska rapporter med active learning och topic modeller

Description

Links & Downloads

Tags

Additional Fields