This can be considered as a multidisciplinary research where ideas from Operations Research, Data Science and Logic came together to solve an inconsistency handling problem in a special type of ontology. / High data quality is a prerequisite for accurate data analysis. However, data inconsistencies
often arise in real data, leading to untrusted decision making downstream in the data
analysis pipeline. In this research, we study the problem of inconsistency detection and
repair of the Ontology Multi-dimensional Data Model (OMD). We propose a framework
of data quality assessment, and repair for the OMD. We formally define a weight-based
repair-by-deletion semantics, and present an automatic weight generation mechanism
that considers multiple input criteria. Our methods are rooted in multi-criteria decision
making that consider the correlation, contrast, and conflict that may exist among
multiple criteria, and is often needed in the data cleaning domain. After weight generation
we present a dynamic programming based Min-Sum algorithm to identify minimal
weight solution. We then apply evolutionary optimization techniques and demonstrate
improved performance using medical datasets, making it realizable in practice. / Thesis / Master of Computer Science (MCS) / Accurate data analysis requires high quality data as input. In this research, we study inconsistency in an ontology known as Ontology Multi-dimensional Data (OMD) Model and propose algorithms to repair them based on their automatically generated relative weights. We proposed two techniques to restore consistency, one provides optimal results but takes longer time compared to the other one, which produces sub-optimal results but fast enough for practical purposes, shown with experiments on datasets.
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/25386 |
Date | January 2020 |
Creators | Haque, Enamul |
Contributors | Chiang, Fei, Computing and Software |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0017 seconds