Creation of web wrappers (i.e programs that extract data from the web) is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper's language and safety. In addition, little attention has been paid to execution of a wrapper in restricted environment. In this thesis, we present a new wrapping language -- Serrano -- that has three goals in mind. (1) Ability to run in restricted environment, such as a browser extension, (2) extensibility, to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities, to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided encouraging results. Powered by TCPDF (www.tcpdf.org)
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:352595 |
Date | January 2016 |
Creators | Novella, Tomáš |
Contributors | Holubová, Irena, Polák, Marek |
Source Sets | Czech ETDs |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0013 seconds