The goal of top-k ranking is to rank individuals so that the best k of them can be determined. Depending on the application domain, an individual can be a person, a product, an event, or just a collection of data or information for which an ordering makes sense.
In the context of databases, top-k ranking has been studied in two distinct directions, depending on whether the stored
information is certain or uncertain. In the former, the past research has focused on efficient query processing. In the latter case, a number of semantics based on possible worlds have been proposed and computational mechanisms investigated for what are called uncertain databases or probabilistic databases, where a tuple is associated with a membership probability indicating the level of confidence on the stored information.
In this thesis, we study top-k ranking with uncertain data in two general areas. The first is on pruning for the computation of top-k tuples in a probabilistic database. We investigate the theoretical basis and practical means of pruning for the recently proposed, unifying framework based on parameterized ranking functions. As such, our results are applicable to a wide range of ranking functions. We show experimentally that pruning can generate orders of magnitude performance gains. In the second area of our investigation, we study the problem of top-k ranking for objects with multiple attributes whose values are modeled by probability distributions and constraints. We formulate a theory of top-k ranking for objects by a characterization of what constitutes the strength of an object, and show that a number of previous proposals for top-k ranking are special cases of our theory. We carry out a limited study on computation of top-k objects under our theory. We reveal the close connection between top-k ranking in this context
and high-dimensional space studied in mathematics, in particular, the problem of computing the volumes of high-dimensional polyhedra expressed by linear inequations is a special case of top-k ranking of objects, and as such, the algorithms formulated for the former can be employed for the latter under the same conditions.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:AEU.10048/1856 |
Date | 06 1900 |
Creators | Wang, Chonghai |
Contributors | You, Jia (Computing Science), Yuan, Li-Yan (Computing Science), Zaiane, Osmar (Computing Science), Sander,Joerg (Computing Science), Chen, Xi (Mathematical and Statistical Sciences), Lin, Xuemin (School of Computer Science and Engineering,University of New South Wales) |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_US |
Detected Language | English |
Type | Thesis |
Format | 891765 bytes, application/pdf |
Page generated in 0.0019 seconds