由于现今在各个领域涌现的图数据规模都愈加庞大,在这些大规模图数据上进行任何一种简单的查询都成为一件有富有挑战性的工作。在本文中,我们着重在大规模图上研究三个具有广泛应用的查询:最短路查询,权重限制查询和最近k关键字查询。具体来说, 最短路查询是一个计算两点间最短距离的基本查询。而权重限制查询判断两点间是否存在一条沿路边权都满足用指定条件的可行路径。对于一个查询节点,最近k关键字查询返回k个距离最近的带有指定关键字的节点。在面对一个拥有超过一亿节点的图时,我们需要为这些查询开发有效的索引和查询优化算法。 / 在本文中,对于最短路查询,我们提出了两个基于地标嵌入的算法,一个是有误差控制的地标嵌入算法,另一个则是本地化地标嵌入算法。前者通过对地标的筛选和组织,能对估计的最短距离给予一定的误差保证; 而后者提出的本地化机制能够在不增加预处理复杂度和在线查询复杂度的情况下大幅度提高估计的精准度。对于权重限制查询,我们先提出一个能够保证常数查询时间的内存算法。除此之外,为了提高算法对大规模数据的处理能力,我们使用编码技术设计了一个有效的外存算法。对于最近k关键字查询,我们先在一个特殊的图,即一颗树上,开发一个有效算法来在常数时间内回答最近k关键字查询, 并由此得出一个图上的近似算法;此外我们还通过一个全局存储的技术来进一步减少索引大小和缩短查询时间。我们在真实和模拟的数据上做了大量的实验,实验结果证明我们的算法在大图上对上述三个查询都具有高效性能。 / Due to the massive size of graphs from various domains nowadays, even simple graph queries become challenging tasks. In this thesis, three queries with a wide range of applications are investigated on large graphs. One is shortest distance query, a fundamental query which computes the shortest distance between two nodes. Another query, weight constraint reachability (WCR), checks if there is a feasible path between two nodes where edge weights along the path satisfy a side constraint. And the third one, a top-k nearest keywords (k-NK) query, reports, for a query node, the k nearest nodes bearing some user-specified keywords. When confronting with a large-scale graph with over tens of millions of nodes, we need to develop efficient indexing and query optimization techniques for these queries. / In this thesis, for a shortest distance query, we devise two landmark embedding schemes, an error bounded landmark scheme and a local landmark scheme, where the former can guarantee an error bound for estimated distance, and the latter can significantly improve the distance estimation accuracy without increasing the offline embedding or the online query complexity. For a WCR query, we propose a memorybased approach which promises a constant query time. Besides, in order to increase its scalability, we devise an I/O-efficient approach for answering a WCR query on massive graphs. For a k-NK query, we start with a special case when the graph is a tree, based on which we present our algorithm for approximate k-NK query on a graph. A global storage technique is devised to further reduce the index size and the query time. We did extensive experiments on the three queries respectively to show the effectiveness and efficiency of our methods. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Qiao, Miao. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 141-151). / Abstract also in Chinese. / Abstract --- p.i / Abstract in Chinese --- p.ii / Acknowledgements --- p.iii / Contents --- p.v / Chapter 1. --- Introduction --- p.1 / Chapter 1.1. --- Motivation --- p.1 / Chapter 1.1.1. --- Shortest Distance Query --- p.1 / Chapter 1.1.2. --- Weight Constraint Reachability Query --- p.4 / Chapter 1.1.3. --- Top-k Nearest Keyword Query --- p.7 / Chapter 1.2. --- Contributions --- p.9 / Chapter 1.3. --- Roadmap --- p.11 / Chapter 2. --- RelatedWork --- p.12 / Chapter 2.1. --- Shortest Distance Query --- p.12 / Chapter 2.2. --- Reachability Query --- p.14 / Chapter 2.3. --- Keyword Related Query --- p.15 / Chapter 3. --- Querying Shortest Distance --- p.17 / Chapter 3.1. --- Landmark Embedding --- p.17 / Chapter 3.2. --- Error Bounded Landmark Scheme --- p.18 / Chapter 3.2.1. --- Problem Statement --- p.18 / Chapter 3.2.2. --- Proposed Algorithm --- p.18 / Chapter 3.2.3. --- Graph Partitioning-based Heuristic --- p.22 / Chapter 3.2.4. --- Experiments --- p.27 / Chapter 3.3. --- Query-Dependent Local Landmark Scheme --- p.34 / Chapter 3.3.1. --- Problem Statement --- p.34 / Chapter 3.3.2. --- Shortest Path Tree Based Local Landmark --- p.37 / Chapter 3.3.3. --- Optimization Techniques --- p.41 / Chapter 3.3.4. --- Local Landmark Scheme on Relational Database --- p.48 / Chapter 3.3.5. --- Experiment --- p.56 / Chapter 3.4. --- Summary --- p.64 / Chapter 4. --- QueryingWeight Constraint Reachability --- p.65 / Chapter 4.1. --- Problem Definition --- p.65 / Chapter 4.1.1. --- Edge Weight Constraint --- p.65 / Chapter 4.1.2. --- Node Weight Constraint --- p.66 / Chapter 4.1.3. --- Two Basic Solutions --- p.67 / Chapter 4.2. --- An Efficient Memory Algorithm --- p.68 / Chapter 4.2.1. --- Properties of WCR --- p.68 / Chapter 4.2.2. --- Novel Edge Based Indexing --- p.70 / Chapter 4.2.3. --- Extension to Other Constraint Formats --- p.76 / Chapter 4.3. --- An I/O-Efficient Index --- p.77 / Chapter 4.3.1. --- Vertex Coding --- p.78 / Chapter 4.3.2. --- MST Re-balancing --- p.80 / Chapter 4.3.3. --- Disk-Based Index Construction --- p.84 / Chapter 4.3.4. --- Query Processing --- p.85 / Chapter 4.4. --- Experiments --- p.87 / Chapter 4.5. --- Summary --- p.101 / Chapter 5. --- Querying Top K-Nearest Keyword --- p.102 / Chapter 5.1. --- Problem Definition --- p.102 / Chapter 5.2. --- Existing Solutions --- p.103 / Chapter 5.2.1. --- Approximate k-NK on a Graph --- p.104 / Chapter 5.2.2. --- Exact 1-NK on a Tree --- p.106 / Chapter 5.3. --- Solution Overview --- p.108 / Chapter 5.4. --- K-NK on a Tree for a Small K --- p.110 / Chapter 5.4.1. --- Query Processing --- p.110 / Chapter 5.4.2. --- Construction of Entry Edge Partition --- p.115 / Chapter 5.4.3. --- Construction of Candidate List --- p.118 / Chapter 5.5. --- K-NK on a Tree for a Large K --- p.120 / Chapter 5.5.1. --- A Basic Pivot Approach --- p.121 / Chapter 5.5.2. --- Pivot Approach with Tree Balancing --- p.122 / Chapter 5.5.3. --- Index Construction --- p.125 / Chapter 5.6. --- Approximate K-NK on a Graph --- p.128 / Chapter 5.7. --- Experiments --- p.133 / Chapter 5.8. --- Summary --- p.138 / Chapter 6. --- Conclusions and Future Work --- p.139 / Bibliography --- p.140
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328761 |
Date | January 2013 |
Contributors | Qiao, Miao., Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, bibliography |
Format | electronic resource, electronic resource, remote, 1 online resource (vii, 151 leaves) : ill. (some col.) |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0078 seconds