This diploma thesis focuses its point on automatization of stopwords generation as one method of pre-processing a textual documents. It analyses an influence of stopwords removal to a result of data mining tasks (classification and clustering). First the text mining techniques and frequently used algorithms are described. Methods of creating domain specific lists of stopwords are described to detail. In the end the results of large collections of text files testing and implementation methods are presented and discussed.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:178597 |
Date | January 2014 |
Creators | Krupník, Jiří |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0019 seconds