Return to search

Rámec pro extrakci informace z WWW / Framework for Information Exctration from WWW

Web environment has developed into the largest source of electronic documents, so it would be very useful, to process this information automatically. This is however not a trivial problem. Most documents are written in HTML (Hypertext Markup Language), which does not support semantic description of the content. The goal of this work is to create modular system for information extraction and further processing of this information from HTML documents. Further processing of information means to store this information in XML document or relational database. System modularity makes it possible to use various information extraction and storing methods, thus the system can be used for various tasks.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:236703
Date January 2009
CreatorsBrychta, Filip
ContributorsBartík, Vladimír, Burget, Radek
PublisherVysoké učení technické v Brně. Fakulta informačních technologií
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0019 seconds