Return to search

Discovering Ontology Functional Dependencies

Functional Dependencies (FDs) are commonly used in data cleaning to identify dirty
and inconsistent data values. However, many errors require user input for specific do-
main knowledge. For example, let us consider the drugs, Advil and Crocin. FDs will
consider these two drugs different because they are not syntactically equal. However,
Advil and Crocin are synonyms as they are two different drugs with similar chemical
compounds but marketed under distinct names in different countries. While FDs
have traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., synonym, Is-A
(Inheritance)) defined by ontologies.
In this thesis, we take a first step to discover a new dependency called Ontology
Functional Dependencies (OFDs). OFDs model attribute relationships based on re-
lationships in a given ontology. We present two effective algorithms to discover OFDs
using synonyms and inheritance relationships. Our discovery algorithms search for
minimal OFDs and prune the redundant ones. Both algorithms traverse the search
lattice in a level-wise Breadth First Search (BFS) manner. In addition, we have devel-
oped a set of pruning rules so that we can avoid considering unnecessary candidates
in the search lattice. We present an experimental study describing the performance
ivand scalability of our techniques. Experimental results show that both algorithms
are effective in practice and discover OFDs efficiently for large datasets with millions
of tuples. We also present a qualitative study showing that the discovered OFDs are
meaningful with high precision and recall. / Thesis / Master of Science (MSc)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/20691
Date January 2016
CreatorsBaskaran, Sridevi
ContributorsChiang, Fei, Computing and Software
Source SetsMcMaster University
Languageen_US
Detected LanguageEnglish
TypeThesis

Page generated in 0.0022 seconds