This work deals with the automation of a web browser - the tools that allow programmatic control of the program for browsing the web pages. First, it discusses the existing solutions with focus on the tools from the Selenium Suite family and PhantomJS. Further, the internal representation of the web pages in the Gecko and WebKit browser engines is discussed. The work then focuses on the web browser application interface available for client-side scripting. The relevant standards are discussed as well. The core part of the thesis is dedicated to the design and implementation of a tool that allows to control a browser using the Selenium WebDriver tool and to extract data about the targert web page. The work presents an internal architecture, configuration files and the application interface of the designed tool. The topic of extracting detailed data about the page and its transformation to a unified structured description is covered as well. Finally, the performed unit tests and tests on real web pages are described.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:403125 |
Date | January 2019 |
Creators | Bastl, Vojtěch |
Contributors | Polčák, Libor, Burget, Radek |
Publisher | Vysoké učení technické v Brně. Fakulta informačních technologií |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.003 seconds