Return to search

Deduplikační metody v databázích / Deduplication methods in databases

In the present work we study the record deduplication problem as an issue of data quality. We define duplicates as records having different syntax and the same semantics and which are representing the same real-world entity. The main goal of this work is to provide the overview of existing deduplication methods according to their requirements, results and usability. We focus on the comparison of two groups of record deduplication methods - with and without the domain knowledge. Therefore, the second part of this work is dedicated to the implementation of our method which does not utilize any domain knowledge and compare its results with the results of commercial tool deeply utilizing the domain knowledge.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:298481
Date January 2010
CreatorsVávra, Petr
ContributorsKyjonka, Vladimír, Skopal, Tomáš
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0018 seconds