Return to search

Extrakce dat z webu / Web Data Extraction

Creation of web wrappers (i.e programs that extract data from the web) is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper's language and safety. In addition, little attention has been paid to execution of a wrapper in restricted environment. In this thesis, we present a new wrapping language -- Serrano -- that has three goals in mind. (1) Ability to run in restricted environment, such as a browser extension, (2) extensibility, to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities, to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided encouraging results. Powered by TCPDF (www.tcpdf.org)

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:352595
Date January 2016
CreatorsNovella, Tomáš
ContributorsHolubová, Irena, Polák, Marek
Source SetsCzech ETDs
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0017 seconds