The first part of thesis focuses on theoretical introduction to the methods of text mining -- Information retrieval, classification and clustering. LSA method is presented as an advanced model for representing textual data. Furthermore, the work describes source data and methods for their preprocessing and preparation used to enhance the effectiveness of text mining methods. For each chosen text mining method there are defined evaluation metrics and used already existing, or newly implemented, programs are presented. The results of experiments comparing the effects of different preprocessing type and use of different models of the source data are then demonstrated and discussed in the conclusion.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:191064 |
Date | January 2015 |
Creators | Řezníček, Pavel |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0013 seconds