Return to search

Data Editing and Logic: The covering set method from the perspective of logic

Errors in collections of data can cause significant problems when those data are used. Therefore the owners of data find themselves spending much time on data cleaning. This thesis is a theoretical work about one part of the broad subject of data cleaning - to be called the covering set method. More specifically, the covering set method deals with data records that have been assessed by the use of edits, which are rules that the data records are supposed to obey. The problem solved by the covering set method is the error localisation problem, which is the problem of determining the erroneous fields within data records that fail the edits. In this thesis I analyse the covering set method from the perspective of propositional logic. I demonstrate that the covering set method has strong parallels with well-known parts of propositional logic. The first aspect of the covering set method that I analyse is the edit generation function, which is the main function used in the covering set method. I demonstrate that the edit generation function can be formalised as a logical deduction function in propositional logic. I also demonstrate that the best-known edit generation function, written here as FH (standing for Fellegi-Holt), is essentially the same as propositional resolution deduction. Since there are many automated implementations of propositional resolution, the equivalence of FH with propositional resolution gives some hope that the covering set method might be implementable with automated logic tools. However, before any implementation, the other main aspect of the covering set method must also be formalised in terms of logic. This other aspect, to be called covering set correctibility, is the property that must be obeyed by the edit generation function if the covering set method is to successfully solve the error localisation problem. In this thesis I demonstrate that covering set correctibility is a strengthening of the well-known logical properties of soundness and refutation completeness. What is more, the proofs of the covering set correctibility of FH and of the soundness / completeness of resolution deduction have strong parallels: while the proof of soundness / completeness depends on the reduction property for counter-examples, the proof of covering set correctibility depends on the related lifting property. In this thesis I also use the lifting property to prove the covering set correctibility of the function defined by the Field Code Forest Algorithm. In so doing, I prove that the Field Code Forest Algorithm, whose correctness has been questioned, is indeed correct. The results about edit generation functions and covering set correctibility apply to both categorical edits (edits about discrete data) and arithmetic edits (edits expressible as linear inequalities). Thus this thesis gives the beginnings of a theoretical logical framework for error localisation, which might give new insights to the problem. In addition, the new insights will help develop new tools using automated logic tools. What is more, the strong parallels between the covering set method and aspects of logic are of aesthetic appeal.

Identiferoai:union.ndltd.org:ADTP/216870
Date January 2008
CreatorsBoskovitz, Agnes, abvi@webone.com.au
PublisherThe Australian National University. Research School of Information Sciences and Engineering
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://www.anu.edu.au/legal/copyrit.html), Copyright Agnes Boskovitz

Page generated in 0.0026 seconds