Return to search

Textová klasifikace s limitovanými trénovacími daty / Text classification with limited training data

The aim of this thesis is to minimize manual work needed to create training data for text classification tasks. Various research areas including weak supervision, interactive learning and transfer learning explore how to minimize training data creation effort. We combine ideas from available literature in order to design a comprehensive text classification framework that employs keyword-based labeling instead of traditional text annotation. Keyword-based labeling aims to label texts based on keywords contained in the texts that are highly correlated with individual classification labels. As noted repeatedly in previous work, coming up with many new keywords is challenging for humans. To accommodate for this issue, we propose an interactive keyword labeler featuring the use of word similarity for guiding a user in keyword labeling. To verify the effectiveness of our novel approach, we implement a minimum viable prototype of the designed framework and use it to perform a user study on a restaurant review multi-label classification problem.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:448225
Date January 2021
CreatorsLaitoch, Petr
ContributorsHana, Jiří, Vidová Hladká, Barbora
Source SetsCzech ETDs
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0022 seconds