Global ETD Search

Return to search

Data Quality Through Active Constraint Discovery and Maintenance

Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning.

In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider.

http://hdl.handle.net/1807/33955

data management

data quality

0984

Identifer	oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/33955
Date	10 December 2012
Creators	Chiang, Fei Yen
Contributors	Miller, Renee J.
Source Sets	University of Toronto
Language	en_ca
Detected Language	English
Type	Thesis

Page generated in 0.0126 seconds

Data Quality Through Active Constraint Discovery and Maintenance

Description

Links & Downloads

Tags

Additional Fields