Return to search

Predikce povahy spamových krátkých textů textovým klasifikátorem / Machine Learning Text Classifier for Short Texts Category Prediction

This thesis deals with categorization of short spam texts from SMS messages. First part summarizes current methods for text classification and~it's followed by description of several commonly used classifiers. In following chapters test data analysis, program implementation and results are described. The program is able to predict text categories based on predefined set of classes and also estimate classification accuracy on training data. For the two category types, that I designed, classifier reached accuracy of 82% and 92% . Both preprocessing and feature selection had a positive impact on resulting accuracy. It is possible to improve this accuracy further by removing portion of samples, which are difficult to classify. With 80\% recall it is possible to increase accuracy by 8-10%.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:386005
Date January 2018
CreatorsDrápela, Karel
ContributorsKřena, Bohuslav, Šimková, Hana
PublisherVysoké učení technické v Brně. Fakulta informačních technologií
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0021 seconds