The thesis deals with text mining. It describes the theory of text document clustering as well as algorithms used for clustering. This theory serves as a basis for developing an application for clustering text data. The application is developed in Java programming language and contains three methods used for clustering. The user can choose which method will be used for clustering the collection of documents. The implemented methods are K medoids, BiSec K medoids, and SOM (self-organization maps). The application also includes a validation set, which was specially created for the diploma thesis and it is used for testing the algorithms. Finally, the algorithms are compared according to obtained results.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:218899 |
Date | January 2011 |
Creators | Sedláček, Josef |
Contributors | Burget, Radim, Karásek, Jan |
Publisher | Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0023 seconds