Managing query quality in probabilistic databases

In many emerging applications, such as sensor networks, location-based services,

and data integration, the database is inherently uncertain. To handle a large

amount of uncertain data, probabilistic databases have been recently proposed,

where probabilistic queries are enabled to provide answers with statistical guarantees.

In this thesis, we study the important issues of managing the quality of

a probabilistic database. We first address the problem of measuring the ambiguity,

or quality, of a probabilistic query. This is accomplished by computing the

PWS-quality score, a recently proposed measure for quantifying the ambiguity of

query answers under the possible world semantics. We study the computation of

the PWS-quality for the top-k query. This problem is not trivial, since directly

computing the top-k query score is computationally expensive. To tackle this

challenge, we propose efficient approximate algorithms for deriving the quality

score of a top-k query. We have performed experiments on both synthetic and

real data to validate their performance and accuracy.

Our second contribution is to study how to use the PWS-quality score to

coordinate the process of cleaning uncertain data. Removing ambiguous data

from a probabilistic database can often give us a higher-quality query result.

However, this operation requires some external knowledge (e.g., an updated value

from a sensor source), and is thus not without cost. It is important to choose the

correct object to clean, in order to (1) achieve a high quality gain, and (2) incur

a low cleaning cost. In this thesis, we examine different cleaning methods for a

probabilistic top-k query. We also study an interesting problem where different

query users have their own budgets available for cleaning. We demonstrate how

an optimal solution, in terms of the lowest cleaning costs, can be achieved, for

probabilistic range and maximum queries. An extensive evaluation reveals that

these solutions are highly efficient and accurate. / published_or_final_version / Computer Science / Master / Master of Philosophy

  1. 10.5353/th_b4775313
  2. b4775313
Identiferoai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/174493
Date January 2011
CreatorsLi, Xiang, 李想
ContributorsCheng, CK, Cheung, DWL
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Source SetsHong Kong University Theses
LanguageEnglish
Detected LanguageEnglish
TypePG_Thesis
Sourcehttp://hub.hku.hk/bib/B47753134
RightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works., Creative Commons: Attribution 3.0 Hong Kong License
RelationHKU Theses Online (HKUTO)

Page generated in 0.0016 seconds