Return to search

Ranked Retrieval in Uncertain and Probabilistic Databases

Ranking queries are widely used in data exploration, data analysis and decision
making scenarios. While most of the currently proposed ranking techniques focus
on deterministic data, several emerging applications involve data that are imprecise
or uncertain. Ranking uncertain data raises new challenges in query semantics and
processing, making conventional methods inapplicable. Furthermore, the interplay
between ranking and uncertainty models introduces new dimensions for ordering query
results that do not exist in the traditional settings.

This dissertation introduces new formulations and processing techniques for ranking queries on uncertain data. The formulations are based on marriage of traditional ranking semantics with possible worlds semantics under widely-adopted uncertainty models. In particular, we focus on studying the impact of tuple-level and attribute-level uncertainty on the semantics and processing techniques of ranking queries.

Under the tuple-level uncertainty model, we introduce a processing framework leveraging the capabilities of relational database systems to recognize and handle data
uncertainty in score-based ranking. The framework encapsulates a state space model,
and efficient search algorithms that compute query answers by lazily materializing the
necessary parts of the space. Under the attribute-level uncertainty model, we give a new probabilistic ranking model, based on partial orders, to encapsulate the space of possible rankings originating from uncertainty in attribute values. We present a set of efficient query evaluation algorithms, including sampling-based techniques based on the theory of Markov chains and Monte-Carlo method, to compute query answers.

We build on our techniques for ranking under attribute-level uncertainty to support
rank join queries on uncertain data. We show how to extend current rank join methods
to handle uncertainty in scoring attributes. We provide a pipelined query operator
implementation of uncertainty-aware rank join algorithm integrated with sampling
techniques to compute query answers.

Identiferoai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/5724
Date January 2011
CreatorsSoliman, Mohamed
Source SetsUniversity of Waterloo Electronic Theses Repository
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation

Page generated in 0.002 seconds