This report deals with segmentation of web pages, which is important discipline of information extraction. In the first part, we describe several general ways to implement it. After that we introduce method Box Clustering Segmentation, which comes with a slightly different approach towards segmentation. In the second half, we describe implementation of this method as a part of framework FITLayout and final testing.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:363867 |
Date | January 2017 |
Creators | Lengál, Tomáš |
Contributors | Bartík, Vladimír, Burget, Radek |
Publisher | Vysoké učení technické v Brně. Fakulta informačních technologií |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0018 seconds