Return to search

Ranking and similarity queries on complex data types

Ranking queries and similarity queries are elementary operations with many important applications. There are lots of research works investigating efficient evaluation of various ranking and similarity queries in databases over the past few decades. In this thesis, ranking and similarity queries on three interesting complex data types are studied, namely, multidimensional cube, object summary and tree. Efficient and effective solutions are proposed to solve their related applications.

First, the evaluation of ranking queries on multidimensional cubes is studied. In exploratory data analysis, a relation can be considered as a multidimensional cube to investigate the relationship among its attributes. Given a relation with records that can be ranked, an interesting problem is to identify selection conditions for the relation, which result in sub-relations qualified by an input record and render the ranking of the input record as high as possible among the qualifying tuples. The ranking of the input record in a sub-relation measures the quality of the corresponding multidimensional cube of this sub-relation. In this thesis, a standing maximization problem, which aims to identify a multidimensional cube of high quality, is extensively studied. As an immediate consequence of its NP-hardness, three greedy methods are proposed to explore the search space only partially, while striving to identify sub-optimal solutions of high quality.

Next, the efficient evaluation of ranking queries on object summaries is investigated. An object summary is a tree structure of tuples that summarizes the context of a particular data subject tuple. The object summary has been used as a model of keyword search in relational databases; where given a set of keywords, the objective is to identify the data subject tuples relevant to the keywords and their corresponding object summaries. However, a keyword search result may return a large number of object summaries, which brings in the issue of effectively and efficiently ranking them in order to present only the most important ones to the user. In this thesis, a model that ranks object summaries according to their relevance to a set of input thematic keywords is introduced. Efficient algorithms are proposed to answer the proposed thematic ranking query.

Finally, the similarity join query on tree-structured data is studied. Treestructured data are ubiquitous nowadays and a number of applications require efficient management of such data. Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this thesis, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
Date January 2014
CreatorsCai, Yilun, 蔡奕倫
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Source SetsHong Kong University Theses
Detected LanguageEnglish
RightsCreative Commons: Attribution 3.0 Hong Kong License, The author retains all proprietary rights, (such as patent rights) and the right to use in future works.
RelationHKU Theses Online (HKUTO)

Page generated in 0.0062 seconds