Multidimensional data are ubiquitous and their efficient management and analysis is a core database research problem. There are lots of previous works focusing on indexing, analyzing and querying multidimensional data. In this dissertation, three challenging advanced analysis and join problems in multidimensional spaces are proposed and studied, providing efficient solutions to their related applications.
First, the problem of generalized budget constrained optimization query (Gen-BOQ) is studied. In real life, it is often difficult for manufacturers to create new products dominating their competitors, due to some constraints. These constraints can be modeled by constraint functions, and the problem is then to decide the best possible regions in multidimensional spaces where the features of new products could be placed. Using the number of dominating and dominated objects, the profitability of these regions can be evaluated and the best areas are then returned. Although GenBOQ computation is challenging due to its high complexity, an efficient divide-and-conquer based framework is offered for this problem. In addition, an approximation method is proposed, making tradeoffs between the result quality and the query cost.
Next, the efficient evaluation of all top-k queries (ATOPk) in multidimensional spaces is investigated, which compute the top ranked objects for a group of preference functions simultaneously. As an application of such a query, consider an online store, which needs to provide recommendations for a large number of users simultaneously. This problem is somewhat overlooked by past research; in this thesis, batch algorithms are proposed instead of naïvely evaluating top-k queries individually. Similar preferences are grouped together, and two algorithms are proposed, using block indexed nested loops and a view-based thresholding strategy. The optimized view-based threshold algorithm is demonstrated to be consistently the best. Moreover, an all top-k query helps to evaluate other queries relying on the results of multiple top-k queries, such as reverse top-k queries and top-m influential queries proposed in previous works. It is shown that applying the view-based approach to these queries can improve the performance of the current state-of-the-art by orders of magnitude.
Finally, the problem of spatio-textual similarity joins (ST-SJOIN) on multidimensional data is considered. Given both spatial and textual information, ST-SJOIN retrieves pairs of objects which are both spatially close and textually similar. One possible application of this query is friendship recommendation, by matching people who not only live nearby but also share common interests. By combining the state-of-the-art strategies of spatial distance joins and set similarity joins, efficient query processing algorithms are proposed, taking both spatial and textual constraints into account. A batch processing strategy is also introduced to boost the performance, which is also effective for the original textual-only joins. Using synthetic and real datasets, it is shown that the proposed techniques outperform the baseline solutions. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
Identifer | oai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/181500 |
Date | January 2012 |
Creators | Ge, Shen., 葛屾. |
Contributors | Mamoulis, N, Cheung, DWL |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Source Sets | Hong Kong University Theses |
Language | English |
Detected Language | English |
Type | PG_Thesis |
Source | http://hub.hku.hk/bib/B49799332 |
Rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works., Creative Commons: Attribution 3.0 Hong Kong License |
Relation | HKU Theses Online (HKUTO) |
Page generated in 0.0016 seconds