Global ETD Search

51	Query processing in large-scale networks. January 2013 (has links) 由于现今在各个领域涌现的图数据规模都愈加庞大，在这些大规模图数据上进行任何一种简单的查询都成为一件有富有挑战性的工作。在本文中，我们着重在大规模图上研究三个具有广泛应用的查询：最短路查询，权重限制查询和最近k关键字查询。具体来说，最短路查询是一个计算两点间最短距离的基本查询。而权重限制查询判断两点间是否存在一条沿路边权都满足用指定条件的可行路径。对于一个查询节点，最近k关键字查询返回k个距离最近的带有指定关键字的节点。在面对一个拥有超过一亿节点的图时，我们需要为这些查询开发有效的索引和查询优化算法。 / 在本文中，对于最短路查询，我们提出了两个基于地标嵌入的算法，一个是有误差控制的地标嵌入算法，另一个则是本地化地标嵌入算法。前者通过对地标的筛选和组织，能对估计的最短距离给予一定的误差保证；而后者提出的本地化机制能够在不增加预处理复杂度和在线查询复杂度的情况下大幅度提高估计的精准度。对于权重限制查询，我们先提出一个能够保证常数查询时间的内存算法。除此之外，为了提高算法对大规模数据的处理能力，我们使用编码技术设计了一个有效的外存算法。对于最近k关键字查询，我们先在一个特殊的图，即一颗树上，开发一个有效算法来在常数时间内回答最近k关键字查询，并由此得出一个图上的近似算法；此外我们还通过一个全局存储的技术来进一步减少索引大小和缩短查询时间。我们在真实和模拟的数据上做了大量的实验，实验结果证明我们的算法在大图上对上述三个查询都具有高效性能。 / Due to the massive size of graphs from various domains nowadays, even simple graph queries become challenging tasks. In this thesis, three queries with a wide range of applications are investigated on large graphs. One is shortest distance query, a fundamental query which computes the shortest distance between two nodes. Another query, weight constraint reachability (WCR), checks if there is a feasible path between two nodes where edge weights along the path satisfy a side constraint. And the third one, a top-k nearest keywords (k-NK) query, reports, for a query node, the k nearest nodes bearing some user-specified keywords. When confronting with a large-scale graph with over tens of millions of nodes, we need to develop efficient indexing and query optimization techniques for these queries. / In this thesis, for a shortest distance query, we devise two landmark embedding schemes, an error bounded landmark scheme and a local landmark scheme, where the former can guarantee an error bound for estimated distance, and the latter can significantly improve the distance estimation accuracy without increasing the offline embedding or the online query complexity. For a WCR query, we propose a memorybased approach which promises a constant query time. Besides, in order to increase its scalability, we devise an I/O-efficient approach for answering a WCR query on massive graphs. For a k-NK query, we start with a special case when the graph is a tree, based on which we present our algorithm for approximate k-NK query on a graph. A global storage technique is devised to further reduce the index size and the query time. We did extensive experiments on the three queries respectively to show the effectiveness and efficiency of our methods. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Qiao, Miao. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 141-151). / Abstract also in Chinese. / Abstract --- p.i / Abstract in Chinese --- p.ii / Acknowledgements --- p.iii / Contents --- p.v / Chapter 1. --- Introduction --- p.1 / Chapter 1.1. --- Motivation --- p.1 / Chapter 1.1.1. --- Shortest Distance Query --- p.1 / Chapter 1.1.2. --- Weight Constraint Reachability Query --- p.4 / Chapter 1.1.3. --- Top-k Nearest Keyword Query --- p.7 / Chapter 1.2. --- Contributions --- p.9 / Chapter 1.3. --- Roadmap --- p.11 / Chapter 2. --- RelatedWork --- p.12 / Chapter 2.1. --- Shortest Distance Query --- p.12 / Chapter 2.2. --- Reachability Query --- p.14 / Chapter 2.3. --- Keyword Related Query --- p.15 / Chapter 3. --- Querying Shortest Distance --- p.17 / Chapter 3.1. --- Landmark Embedding --- p.17 / Chapter 3.2. --- Error Bounded Landmark Scheme --- p.18 / Chapter 3.2.1. --- Problem Statement --- p.18 / Chapter 3.2.2. --- Proposed Algorithm --- p.18 / Chapter 3.2.3. --- Graph Partitioning-based Heuristic --- p.22 / Chapter 3.2.4. --- Experiments --- p.27 / Chapter 3.3. --- Query-Dependent Local Landmark Scheme --- p.34 / Chapter 3.3.1. --- Problem Statement --- p.34 / Chapter 3.3.2. --- Shortest Path Tree Based Local Landmark --- p.37 / Chapter 3.3.3. --- Optimization Techniques --- p.41 / Chapter 3.3.4. --- Local Landmark Scheme on Relational Database --- p.48 / Chapter 3.3.5. --- Experiment --- p.56 / Chapter 3.4. --- Summary --- p.64 / Chapter 4. --- QueryingWeight Constraint Reachability --- p.65 / Chapter 4.1. --- Problem Definition --- p.65 / Chapter 4.1.1. --- Edge Weight Constraint --- p.65 / Chapter 4.1.2. --- Node Weight Constraint --- p.66 / Chapter 4.1.3. --- Two Basic Solutions --- p.67 / Chapter 4.2. --- An Efficient Memory Algorithm --- p.68 / Chapter 4.2.1. --- Properties of WCR --- p.68 / Chapter 4.2.2. --- Novel Edge Based Indexing --- p.70 / Chapter 4.2.3. --- Extension to Other Constraint Formats --- p.76 / Chapter 4.3. --- An I/O-Efficient Index --- p.77 / Chapter 4.3.1. --- Vertex Coding --- p.78 / Chapter 4.3.2. --- MST Re-balancing --- p.80 / Chapter 4.3.3. --- Disk-Based Index Construction --- p.84 / Chapter 4.3.4. --- Query Processing --- p.85 / Chapter 4.4. --- Experiments --- p.87 / Chapter 4.5. --- Summary --- p.101 / Chapter 5. --- Querying Top K-Nearest Keyword --- p.102 / Chapter 5.1. --- Problem Definition --- p.102 / Chapter 5.2. --- Existing Solutions --- p.103 / Chapter 5.2.1. --- Approximate k-NK on a Graph --- p.104 / Chapter 5.2.2. --- Exact 1-NK on a Tree --- p.106 / Chapter 5.3. --- Solution Overview --- p.108 / Chapter 5.4. --- K-NK on a Tree for a Small K --- p.110 / Chapter 5.4.1. --- Query Processing --- p.110 / Chapter 5.4.2. --- Construction of Entry Edge Partition --- p.115 / Chapter 5.4.3. --- Construction of Candidate List --- p.118 / Chapter 5.5. --- K-NK on a Tree for a Large K --- p.120 / Chapter 5.5.1. --- A Basic Pivot Approach --- p.121 / Chapter 5.5.2. --- Pivot Approach with Tree Balancing --- p.122 / Chapter 5.5.3. --- Index Construction --- p.125 / Chapter 5.6. --- Approximate K-NK on a Graph --- p.128 / Chapter 5.7. --- Experiments --- p.133 / Chapter 5.8. --- Summary --- p.138 / Chapter 6. --- Conclusions and Future Work --- p.139 / Bibliography --- p.140 Querying (Computer science) Data structures (Computer science) Graph theory--Data processing Database management
52	Query processing for graph-structured data. / 圖結構數據的查詢處理 / CUHK electronic theses & dissertations collection / Tu jie gou shu ju de cha xun chu li January 2007 (has links) Graph-structured data is enjoying an increasing popularity among Web technology and new data management and archiving techniques. Numerous applications work with graphs and need to query reachability among nodes in graphs. A 2-hop cover can compactly represent the whole edge transitive closure of a graph in O ( / Cheng, Jiefeng. / "August 2007." / Adviser: Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1097. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 134-141). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Data structures (Computer science) Database management Graph theory--Data processing Querying (Computer science)
53	Free tree-featured graph indexing and query processing. / CUHK electronic theses & dissertations collection January 2007 (has links) In this dissertation, we develop frequent free tree mining systems, F3TM and CFFTree, to systematically discover the complete set of (closed) frequent free trees from a graph database. Graph indexing, another problem addressed in this dissertation, is of special interest both in academia and in industrial applications. The solutions, (Tree+Delta ) and SimTree, propose a free tree featured index, which is built on frequent free tree patterns discovered through a structural mining process. This mining-based indexing methodology leads to the development of a compact but effective graph index structure that is orders of magnitude smaller in size but an order of magnitude faster in performance than traditional approaches. / Scalable structural pattern mining and graph database management tools become increasingly crucial to applications with complex data in domains ranging from software engineering to computational biology. Due to their high complexity, it is often difficult, if not impossible, for human beings to manually analyze any reasonably large collection of graphs. In this dissertation, we investigate two fundamental problems in large scale graph databases: Given a graph database what are the hidden free tree patterns and how can we find them? And how can we index graphs based on free tree patterns and perform similarity search in large graph databases? / The developed concepts, theories, and systems hence increase our understanding of data mining principles in structural pattern discovery, interpretation and search. The formulation of a general free tree featured graph information system through this study could provide fundamental supports to graph-intensive applications. / Zhao, Peixiang. / "August 2007." / Adviser: Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1127. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 136-145). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Data structures (Computer science) Database management Graph theory--Data processing Querying (Computer science)
54	A personalised query expansion approach using context Seher, Indra, University of Western Sydney, College of Health and Science, School of Computing and Mathematics January 2007 (has links) Users of the Web usually use search engines to find answers to a variety of questions. Although search engines can rapidly process a large number of Web documents, in many cases, the answers returned by search engines are not relevant to the user’s information need, although they do contain the same keywords as the query. This is because the Web contains information sources created by numerous authors independently, and the authors’ vocabularies vary greatly. Furthermore, most words in natural languages have inherent ambiguity. This vocabulary mismatch between user queries and Web sources is often addressed through query expansion. Moreover, user questions are often short. The results of a search can be improved when the length of the question is long. Various query expansion methods that add useful question-related terms before processing the question have been proposed and proven to increase the performance of the result. Some of these query expansion methods add contextual information related to the user and the question. On the other hand, human communications are quite successful and seem to be very easy. This is mainly due to the understanding of language and the world knowledge that humans have. Human communication is more successful when there is an implicit understanding of everyday situations of others who take part in the communication. Here the implicit situational information, or the “context” that humans share, enables them to have a more meaningful interaction amongst themselves. Similar to human–human communications, improving computers’ access to context can increase the richness of human–computer communications, giving more useful computational services to users. Based on the above factors, this research proposes a method to make use of context in order to understand and process user requests. Here, the term “context” means the meanings associated with key query terms and preferences that have to be decided in order to process the query. As in a natural environment, results produced to different users for the same question could vary in an automated system. If the automated system knows users’ preferences related to the question, then it could make use of these preferences to process user queries, producing more relevant and useful results to the user. Hence, a new approach for a personalised query expansion is proposed in this research, where user queries are expanded with user preferences and hence the expanded queries that will be used for processing vary for different users. An architecture that is required for such a Web application to carryout a personalised query expansion with contextual information is also proposed in the thesis. The preferences that could be used for the query expansion are therefore user-specific. Users have different set of preferences depending on the tasks they want to perform. Similar tasks that have same types of preferences can be grouped into task based domains. Hence, user preferences will be the same in a domain, and will vary across domains. Furthermore, there can be different types of subtasks that could be performed within a domain. The set of preferences that could be used for each sub task could vary, and it will be a sub set of the set of preferences of the domain. Hence, an approach for a personalised query expansion which adds user, domain and task-specific preferences to user queries is proposed in this research. The main stages of this expansion are identified and discussed in this thesis. Each of these stages requires different contextual information which is represented in the context model. Out of the main stages identified in the query expansion process, the first three stages, the domain identification, task identification, and missing parameter identification, are explored in the thesis. As the preferences used for the expansion depend on the query domain, it is necessary to identify the domain of the query at first instance. Hence, a domain identification algorithm which makes use of eight different features is proposed in the thesis to identify domains of given queries. This domain identification also reduces the ambiguity of query terms. When the query domain is identified, context/associating meanings of query terms are known. This limits the scope of the possible misinterpretations of query terms. A domain ontology, domain dictionary, and user profile are used by the domain identification algorithm. The domain ontology consists of objects and their categories, attributes of objects and their categories, relationships among objects, and instances and their categories in the domain. The domain dictionary consists of objects and attributes. This is created automatically from the domain ontology. The user profile has the long term preferences of the user that are domain-specific and general. When the domain of the query is known, in order to decide the preferences of the user, the task specified in the query has to be identified. This task identification process is found to be similar in domains with similar activities. Hence, domains are grouped at this stage. These domain groups and the rules that could be used to find out the tasks in the domain groups are identified and discussed in the thesis. For each sub tasks in the domain groups, the types of preferences that could be used to expand user queries are identified and are used to expand user queries. An experiment is designed to evaluate the performance of the proposed approach. The first three stages of the query expansion, the domain identification, task identification, and missing parameter identification, are implemented and evaluated. Samples of five domains are implemented, and queries are collected in these domains from various users. In order to create new domains, a wizard is provided by the system. This system also allows editing the existing domains, domain groups, and types of preferences in sub tasks of the domain groups. Instances of the attributes are manually identified and added to the system using the interface provided by the system. In each of the stages of the query expansion, the results of the queries are manually identified, and are compared with the results produced by the system. The results have confirmed that the proposed method has a positive impact in query expansion. The experiments, results and evaluation of the proposed query expansion approach are also presented in the thesis. The proposed approach for the query expansion could be used by search engines, organisations with a limited set of task domains, and any application that can be improved by making use of personalised query expansion. / Doctor of Philosophy (PhD) internet searching querying (computer science) information organization information retrieval
55	Feature extraction and similarity-based analysis for proteome and genome databases Öztürk, Özgür. January 2007 (has links) Thesis (Ph. D.)--Ohio State University, 2007. / Title from first page of PDF file. Includes bibliographical references (p. 108-119).
56	Socio-aware random walk search and replication in peer-to-peer networks Xie, Jing, January 2009 (has links) Thesis (M. Phil.)--University of Hong Kong, 2009. / Includes bibliographical references (leaves 52-55). Also available in print.
57	An I/O-efficient data structure for querying XML with inherited attributes / Lau, Ching Hin. January 2009 (has links) Includes bibliographical references (p. 39-41).
58	Web services query matchmaking with automated knowledge acquisition Gupta, Chaitali. January 2007 (has links) Thesis (M.S.)--State University of New York at Binghamton, Department of Computer Science, Thomas J. Watson School of Engineering and Applied Science, 2007. / Includes bibliographical references.
59	SQL front-end for the JRelix relational-programming system Khaya, Ibrahima. January 1900 (has links) Thesis (M.Sc.). / Written for the School of Computer Science. Title from title page of PDF (viewed 2009/06/25). Includes bibliographical references.
60	Adaptive query relaxation and processing over heterogeneous xml data sources Li, Jianxin. January 2009 (has links) Thesis (Ph.D) - Swinburne University of Technology, Faculty of Information & Communication Technologies, 2009. / A dissertation submitted to the Faculty of Information and Communication Technologies, Swinburne University of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, 2009. Typescript. "August 2009". Bibliography p. 161-171.

Search results