Global ETD Search

Return to search

Improving Data Quality: Development and Evaluation of Error Detection Methods

High quality of data are essential to decision support in organizations. However estimates have shown that 15-20% of data within an organization¡¦s databases can be erroneous. Some databases contain large number of errors, leading to a large potential problem if they are used for managerial decision-making. To improve data quality, data cleaning endeavors are needed and have been initiated by many organizations. Broadly, data quality problems can be classified into three categories, including incompleteness, inconsistency, and incorrectness. Among the three data quality problems, data incorrectness represents the major sources for low quality data. Thus, this research focuses on error detection for improving data quality. In this study, we developed a set of error detection methods based on the semantic constraint framework. Specifically, we proposed a set of error detection methods including uniqueness detection, domain detection, attribute value dependency detection, attribute domain inclusion detection, and entity participation detection. Empirical evaluation results showed that some of our proposed error detection techniques (i.e., uniqueness detection) achieved low miss rates and low false alarm rates. Overall, our error detection methods together could identify around 50% of the errors introduced by subjects during experiments.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0725102-233322

Decision Tree Induction

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0725102-233322
Date	25 July 2002
Creators	Lee, Nien-Chiu
Contributors	Chih-Ping Wei, Chao-Min Chiu, San-Yih Hwang
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0725102-233322
Rights	unrestricted, Copyright information available at source archive

Page generated in 0.0023 seconds

Improving Data Quality: Development and Evaluation of Error Detection Methods

Description

Links & Downloads

Tags

Additional Fields