• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Definition and analysis of population-based data completeness measurement

Emran, Nurul Akmar Binti January 2011 (has links)
Poor quality data such as data with errors or missing values cause negative consequences in many application domains. An important aspect of data quality is completeness. One problem in data completeness is the problem of missing individuals in data sets. Within a data set, the individuals refer to the real world entities whose information is recorded. So far, in completeness studies however, there has been little discussion about how missing individuals are assessed. In this thesis, we propose the notion of population-based completeness (PBC) that deals with the missing individuals problem, with the aim of investigating what is required to measure PBC and to identify what is needed to support PBC measurements in practice. To achieve these aims, we analyse the elements of PBC and the requirements for PBC measurement, resulting in a definition of the PBC elements and PBC measurement formula. We propose an architecture for PBC measurement systems and determine the technical requirements of PBC systems in terms of software and hardware components. An analysis of the technical issues that arise in implementing PBC makes a contribution to an understanding of the feasibility of PBC measurements to provide accurate measurement results. Further exploration of a particular issue that was discovered in the analysis showed that when measuring PBC across multiple databases, data from those databases need to be integrated and materialised. Unfortunately, this requirement may lead to a large internal store for the PBC system that is impractical to maintain. We propose an approach to test the hypothesis that the available storage space can be optimised by materialising only partial information from the contributing databases, while retaining accuracy of the PBC measurements. Our approach involves substituting some of the attributes from the contributing databases with smaller alternatives, by exploiting the approximate functional dependencies (AFDs) that can be discovered within each local database. An analysis of the space-accuracy trade-offs of the approach leads to the development of an algorithm to assess candidate alternative attributes in terms of space-saving and accuracy (of PBC measurement). The result of several case studies conducted for proxy assessment contributes to an understanding of the space-accuracy trade-offs offered by the proxies. A better understanding of dealing with the completeness problem has been achieved through the proposal and the investigation of PBC, in terms of the requirements to measure and to support PBC in practice.

Page generated in 0.1559 seconds