The work deals with automatic summarization of documents in HTML format. As a language of web documents, Czech language has been chosen. The project is focused on algorithms of text summarization. The work also includes document preprocessing for summarization and conversion of text into representation suitable for summarization algorithms. General text mining is also briefly discussed but the project is mainly focused on the automatic document summarization. Two simple summarization algorithms are introduced. Then, the main attention is paid to an advanced algorithm that uses latent semantic analysis. Result of the work is a design and implementation of summarization module for Python language. Final part of the work contains evaluation of summaries generated by implemented summarization methods and their subjective comparison of the author.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:236386 |
Date | January 2013 |
Creators | Belica, Michal |
Contributors | Očenášek, Pavel, Bartík, Vladimír |
Publisher | Vysoké učení technické v Brně. Fakulta informačních technologií |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0023 seconds