Global ETD Search

Return to search

Experimental Study on ClassifierDesign and Text Feature Extraction for Short Text Classification

Text classification is a wide research field with existing ready-to-use solutions for supervised training of text classifiers. The task of classifying short texts puts dif-ferent demands on the invoked learning system that general text classification does not. This thesis explores this challenge by experimenting on how to design the clas-sification system and what text features granted the best results. In the experimental study, a hierarchical versus a flat design was compared, along with different aspects of text features. The method consisted of training and testing on a dataset of 3.2 million samples in total. The test results were evaluated with the quality measures: precision, recall, F1-score and ROC analysis with a modification to target multi-class classification. The result of the experimental study was: 2-level hierarchical designed classifier gave better results than a flat designed classifier in 11 out of 13 occasions; integer represented terms outperformed TFIDF weighted terms of BOW features; lowercase conversion improved the classification results; bigram and tri-gram BOW features achieved better results than unigram BOW features. The results of the experimental study were used in a case study together with Thingmap, which maps natural language queries with users. The case study showed an improvement over earlier solutions of Thingmap’s system.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-323214

Natural Language Processing

Engineering and Technology

Teknik och teknologier

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-323214
Date	January 2017
Creators	Sernheim, Mikael
Publisher	Uppsala universitet, Institutionen för informationsteknologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC IT, 1401-5749 ; 17005

Page generated in 0.0019 seconds

Experimental Study on ClassifierDesign and Text Feature Extraction for Short Text Classification

Description

Links & Downloads

Tags

Additional Fields