This work concentrates on the design and implementation of an automated support for data extraction from product descriptions. This system will be used for e-shop purposes. The work introduces present approaches to information extraction from HTML documents. It focuses chiefly at wrappers and methods for their induction. The visual approach to information extraction is also mentioned. System requirements and basic principles are described in the design part of the work. Next, a detailed description of a path tracing algorithm in document object model is explained. The last section of the work evaluates the results of experiments made with the implemented system.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:235921 |
Date | January 2008 |
Creators | Sláma, Vojtěch |
Contributors | Očenášek, Pavel, Burget, Radek |
Publisher | Vysoké učení technické v Brně. Fakulta informačních technologií |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0018 seconds