In order to make a step towards the idea of the Semantic Web it is necessary to research ways how to retrieve semantic information from documents published on the current Web 2.0. As an answer to growing amount of data published in a form of relational tables, the Odalic system, based on the extended TableMiner+ Semantic Table Interpretation algorithm was introduced to provide a convenient way to semantize tabular data using knowledge base disambiguation process. The goal of this thesis is to propose an extended algorithm for the Odalic system, which would allow the system to gather semantic information for tabular data describing products from e-shops, which have very limited presence in the knowl- edge bases. This should be achieved by using a machine learning technique called classification. This thesis consists of several parts - obtaining and preprocessing of the product data from e-shops, evaluation of several classification algorithms in order to select the best-performing one, description of design and implementation of the extended Odalic algorithm, description of its integration into the Odalic system, evaluation of the improved algorithm using the obtained product data and semantization of the product data using the new Odalic algorithm. In the end, the results are concluded and possible...
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:387272 |
Date | January 2018 |
Creators | Kadleček, Rastislav |
Contributors | Nečaský, Martin, Svoboda, Martin |
Source Sets | Czech ETDs |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0092 seconds