Return to search

Experimental Study on ClassifierDesign and Text Feature Extraction for Short Text Classification

Text classification is a wide research field with existing ready-to-use solutions for supervised training of text classifiers. The task of classifying short texts puts dif-ferent demands on the invoked learning system that general text classification does not. This thesis explores this challenge by experimenting on how to design the clas-sification system and what text features granted the best results. In the experimental study, a hierarchical versus a flat design was compared, along with different aspects of text features. The method consisted of training and testing on a dataset of 3.2 million samples in total. The test results were evaluated with the quality measures: precision, recall, F1-score and ROC analysis with a modification to target multi-class classification. The result of the experimental study was: 2-level hierarchical designed classifier gave better results than a flat designed classifier in 11 out of 13 occasions; integer represented terms outperformed TFIDF weighted terms of BOW features; lowercase conversion improved the classification results; bigram and tri-gram BOW features achieved better results than unigram BOW features. The results of the experimental study were used in a case study together with Thingmap, which maps natural language queries with users. The case study showed an improvement over earlier solutions of Thingmap’s system.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-323214
Date January 2017
CreatorsSernheim, Mikael
PublisherUppsala universitet, Institutionen för informationsteknologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationUPTEC IT, 1401-5749 ; 17005

Page generated in 0.0015 seconds