61 |
Metadata-Aware Query Processing over Data StreamsDing, Luping 22 April 2008 (has links)
Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination. The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies. The second part proposes herald-driven runtime query plan optimization techniques. We identify four query optimization techniques, design a lightweight algorithm to efficiently detect the optimization opportunities at runtime upon receiving heralds. We propose a novel execution paradigm to support multiple concurrent logical plans by maintaining one physical plan. Extensive experimental study confirms that our techniques significantly reduce query execution times. The third part deals with the shared execution of parameterized queries instantiated from a query template. We design a lightweight index mechanism to provide multiple access paths to data to facilitate a wide range of parameterized queries. To withstand workload fluctuations, we propose an index tuning framework to tune the index configurations in a timely manner. Extensive experimental evaluations demonstrate the effectiveness of the proposed strategies. The last part proposes event query optimization techniques by exploiting event constraints such as exclusiveness or ordering relationships among events extracted from workflows. Significant performance gains are shown to be achieved by our proposed constraint-aware event processing techniques.
|
62 |
Semantic Query Optimization for Processing XML Streams with Minimized Memory FootprintLi, Ming 25 August 2007 (has links)
"XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evaluation over XML streams require the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also causes un-necessary output latency. This thesis presents a semantic query optimization solution to minimize memory footprint during XQuery evaluation by exploiting XML schema knowledge. In many practical applications, XML streams are generated conforming to pre-defined schema constraints typically expressed via a DTD or an XML schema specification. Utilizing such constraints enables us to on-the-fly predict the non-occurrence of a given pattern within a bound context. This helps us to avoid data buffering and to release buffered data at an earlier moment, thus achieving a minimized memory footprint. In this work, we focus on one particular class of constraints, namely, the Pattern Non-Occurrence (PNO) constraint. We develop an automaton-based technique to detect PNO constraints at runtime. For a given query, optimization opportunities which can be triggered by runtime PNO detection are explored for memory footprint minimization. Optimization decisions are encoded using our proposed Condition-Action Graph (CAG). The optimization-embedded execution strategy is then proposed to execute an optimized plan by detecting PNO constraints at run-time and then triggering the corresponding encoded actions when certain predefined conditions are satisfied. To ensure the efficiency of such PNO-triggered optimization, we propose optimization strategy on shrinking the CAGs by utilizing constraint knowledge during the query plan compiling phase. We implement our optimization technique within the Raindrop XQuery engine. Our system implementation processes XQuery utilizing the Raindrop algebra. It is efficiently augmented by our optimization module, which uses Glushkov automaton technique to capture and monitor PNO constraints in parallel with the query-driven pattern retrieval. Finally, we conduct experimental studies using both real and synthetic data streams to illustrate that our techniques bring significant performance improvement in both memory and CPU usage as well as improved output latency over state-of-the-art solutions, with little overhead."
|
63 |
Sequence queries on temporal graphsZhu, Haohan 21 June 2016 (has links)
Graphs that evolve over time are called temporal graphs. They can be used to describe and represent real-world networks, including transportation networks, social networks, and communication networks, with higher fidelity and accuracy. However, research is still limited on how to manage large scale temporal graphs and execute queries over these graphs efficiently and effectively. This thesis investigates the problems of temporal graph data management related to node and edge sequence queries. In temporal graphs, nodes and edges can evolve over time. Therefore, sequence queries on nodes and edges can be key components in managing temporal graphs. In this thesis, the node sequence query decomposes into two parts: graph node similarity and subsequence matching. For node similarity, this thesis proposes a modified tree edit distance that is metric and polynomially computable and has a natural, intuitive interpretation. Note that the proposed node similarity works even for inter-graph nodes and therefore can be used for graph de-anonymization, network transfer learning, and cross-network mining, among other tasks. The subsequence matching query proposed in this thesis is a framework that can be adopted to index generic sequence and time-series data, including trajectory data and even DNA sequences for subsequence retrieval. For edge sequence queries, this thesis proposes an efficient storage and optimized indexing technique that allows for efficient retrieval of temporal subgraphs that satisfy certain temporal predicates. For this problem, this thesis develops a lightweight data management engine prototype that can support time-sensitive temporal graph analytics efficiently even on a single PC.
|
64 |
Query and mining in large graph databases.January 2013 (has links)
图结构能够描述数据对象之间的复杂关系,因而被广泛应用于多种领域。随着相关应用领域的发展,图数据库的规模变得庞大且仍在不断增长。这给研究者在图查询和图挖掘方面带来新的挑战。本文主要研究以下三个问题:如何确定两个图的顶点对应关系,使得其中一个图的子结构匹配到另一个图的相似子结构;如何从含有多个小图的数据库中,找到与查询图相似的图;如何在由不同类别的图组成的数据库中,选取特征子图并对图进行分类。 / 在本文中,对于第一个问题,我们提出了新的两段式图匹配算法。在第一阶段,我们采用了一个新的启发式策略,能够先选取锚顶点并向外扩展,进而快速得到初始匹配。在第二阶段,我们设计了新的算法对初始匹配加以改进,并且证明了新的匹配优于初始匹配。这个两段式图匹配算法能够快速有效地获得两个图的高质量匹配。为解决第二个问题,我们首先定义一个新的度量以衡量两图间的距离。它基于两图间的最大公共子图,能够很好地捕捉两个图的相同及不同之处。由于最大公共子图的计算是NP完全问题,为了快速回答top-k相似图查询,我们提出了一个高效算法,能够极大地减少最大公共子图的计算次数。这个算法根据距离度量的三种下界进行剪枝以筛选掉不合格的图。其中,前两种下界的计算基于两图的结构信息,第三种下界可由距离度量的三角不等式性质推出。我们还设计了三种不同的索引结构来支持剪枝,它们能够在剪枝效果和索引时间方面达到不同程度的平衡。关于第三个问题,我们发现了目前广泛使用的特征判别函数的两个主要缺陷,并据此提出了一个新的多样性特征判别函数。它不仅能衡量特征的判别性,而且能衡量特征的多样性。我们从多个方面分析了这个函数的性质,发现它能更好地区分不同类别的图。基于这个函数,我们设计了新的特征选取算法,获得很高的分类精度。 / Graph has powerful ability to model complex structural relationships among data objects and has been widely used in various applications. Along with the development of the application domains, graph databases become large and are growing rapidly in size. This brings researchers new challenges on graph query and mining, among which we mainly focus on investigating the following three problems: how to find the correspondence between the nodes of two large graphs so that some substructures in one graph are mapped to similar substructures in the other; another problem is how to retrieve similar graphs for a query graph from a graph database consisting of a large number of graphs; and the last problem is how to extract subgraph features to build an automated classification model for a graph database containing graphs which belong to different classes. / In this thesis, for the first problem, we propose a novel two-step approach which can efficiently match two large graphs over thousands of nodes with high matching quality. In the first stage, we design an anchor-selection/expansion scheme to construct a good initial matching heuristically. In the second stage, we propose a new approach to refine the initial matching and give the optimality of our refinement algorithm. Our approach can produce an approximate matching result with high quality and efficiency. To address the second problem, we introduce a new graph distance measure based on the maximum common subgraphs (MCS) of two graphs which can thoroughly capture the common as well as different structures of two graphs. Since computing the MCS of two graphs is NP-complete, to answer the top-k graph similarity query efficiently, we propose a fast algorithm which can significantly reduce the number of MCS computations. This algorithm prunes the unqualified graphs based on three lower bounds in which the first two are derived based on the structures of two graphs and the third is obtained based on the triangle property of the distance measure. Three index schemes are designed with different tradeoffs between pruning power and construction cost to assist the query processing. For the third problem, we identify two main issues of the current widely-used discriminative score for feature selection, and introduce a new diversified discriminative score to explore the additional value of the diversity together with the discriminativity. We analyze the properties of the newly-proposed diversified discriminative score from several perspectives and demonstrate that this score can make positive/negative graphs more separable. New algorithms are also proposed to select features based on the new score and they are shown to have high classification accuracy. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Zhu, Yuanyuan. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 137-146). / Abstract also in Chinese. / Abstract --- p.i / Abstract in Chinese --- p.iii / Acknowledgments --- p.iv / Contents --- p.vi / List of Tables --- p.x / List of Figures --- p.xi / Notations --- p.1 / Chapter 1. --- Introduction --- p.1 / Chapter 1.1. --- Motivation --- p.2 / Chapter 1.1.1. --- Large Graph Matching --- p.3 / Chapter 1.1.2. --- Top-k Graph Similarity Query --- p.4 / Chapter 1.1.3. --- Diversified Discriminative Feature Selection --- p.6 / Chapter 1.2. --- Contribution --- p.7 / Chapter 2. --- Preliminaries --- p.10 / Chapter 3. --- Related Work --- p.16 / Chapter 3.1. --- Graph Matching --- p.16 / Chapter 3.1.1. --- Exact Graph Matching --- p.16 / Chapter 3.1.2. --- Approximate Graph Matching --- p.17 / Chapter 3.2. --- Graph Similarity Query --- p.19 / Chapter 3.3. --- Graph Classification --- p.20 / Chapter 4. --- Large Graph Matching --- p.23 / Chapter 4.1. --- Problem Statement --- p.23 / Chapter 4.2. --- An Overview: Construction and Refinement --- p.24 / Chapter 4.3. --- Matching Construction --- p.26 / Chapter 4.3.1. --- Global and Local Node Similarity --- p.26 / Chapter 4.3.2. --- Anchor Selection and Expansion --- p.33 / Chapter 4.3.3. --- Discussion on τ for Anchor Selection --- p.36 / Chapter 4.4. --- Matching Refinement --- p.39 / Chapter 4.4.1. --- Vertex Cover Based Refinement --- p.39 / Chapter 4.4.2. --- Refinement and Its Optimality --- p.41 / Chapter 4.4.3. --- Randomly Refinement Excluding C - F₁ --- p.46 / Chapter 4.4.4. --- Randomly Refinement Including C - F₁ --- p.51 / Chapter 4.5. --- Labeled Graph Handling --- p.54 / Chapter 4.6. --- Experiments --- p.56 / Chapter 4.6.1. --- Comparison with the Approximate Algorithms --- p.59 / Chapter 4.6.2. --- Comparison with the Exact Algorithm --- p.63 / Chapter 4.6.3. --- Parameter and Scalability Testing --- p.65 / Chapter 4.6.4. --- Sensitivity of Randomness (PN) --- p.69 / Chapter 4.6.5. --- Effectiveness of Label Distribution --- p.70 / Chapter 4.7. --- Summary --- p.72 / Chapter 5. --- Top-k Graph Similarity Query --- p.73 / Chapter 5.1. --- Problem Statement --- p.73 / Chapter 5.2. --- The Framework --- p.78 / Chapter 5.3. --- Pruning without Indexing --- p.80 / Chapter 5.3.1. --- Edge Frequency Based Lower Bound --- p.80 / Chapter 5.3.2. --- Adjacency List Based Lower Bound --- p.82 / Chapter 5.3.3. --- Query Processing --- p.84 / Chapter 5.4. --- Pruning with Indexing --- p.85 / Chapter 5.4.1. --- The Triangle Property of Graph Distance --- p.86 / Chapter 5.4.2. --- Query Processing --- p.88 / Chapter 5.4.3. --- Indexing --- p.92 / Chapter 5.4.4. --- Discussion on the Generality of Our Framework --- p.94 / Chapter 5.5. --- Experiments --- p.94 / Chapter 5.5.1. --- Similarity Measures Evaluation --- p.96 / Chapter 5.5.2. --- Query Performance Evaluation --- p.98 / Chapter 5.5.3. --- Indexing Cost Evaluation --- p.102 / Chapter 5.6. --- Summary --- p.103 / Chapter 6. --- Diversified Discriminative Feature Selection --- p.105 / Chapter 6.1. --- Problem Statement --- p.105 / Chapter 6.2. --- Discriminative Score --- p.108 / Chapter 6.2.1. --- The Single Feature Discriminative Score --- p.109 / Chapter 6.2.2. --- A New Diversified Discriminative Score --- p.110 / Chapter 6.3. --- Property Statistics of Discriminative Score --- p.113 / Chapter 6.4. --- The Algorithms --- p.117 / Chapter 6.5. --- Ensemble D&D --- p.121 / Chapter 6.6. --- Experiments --- p.123 / Chapter 6.6.1. --- D&D Performance Analysis --- p.126 / Chapter 6.6.2. --- Comparison with Existing Algorithms --- p.127 / Chapter 6.6.3. --- Performance on Patterns Mined by GAIA --- p.129 / Chapter 6.7. --- Summary --- p.131 / Chapter 7. --- Conclusion and FutureWork --- p.132 / Chapter 7.1. --- Conclusion --- p.132 / Chapter 7.2. --- Future work --- p.134 / Bibliography --- p.136
|
65 |
Query processing in large-scale networks.January 2013 (has links)
由于现今在各个领域涌现的图数据规模都愈加庞大,在这些大规模图数据上进行任何一种简单的查询都成为一件有富有挑战性的工作。在本文中,我们着重在大规模图上研究三个具有广泛应用的查询:最短路查询,权重限制查询和最近k关键字查询。具体来说, 最短路查询是一个计算两点间最短距离的基本查询。而权重限制查询判断两点间是否存在一条沿路边权都满足用指定条件的可行路径。对于一个查询节点,最近k关键字查询返回k个距离最近的带有指定关键字的节点。在面对一个拥有超过一亿节点的图时,我们需要为这些查询开发有效的索引和查询优化算法。 / 在本文中,对于最短路查询,我们提出了两个基于地标嵌入的算法,一个是有误差控制的地标嵌入算法,另一个则是本地化地标嵌入算法。前者通过对地标的筛选和组织,能对估计的最短距离给予一定的误差保证; 而后者提出的本地化机制能够在不增加预处理复杂度和在线查询复杂度的情况下大幅度提高估计的精准度。对于权重限制查询,我们先提出一个能够保证常数查询时间的内存算法。除此之外,为了提高算法对大规模数据的处理能力,我们使用编码技术设计了一个有效的外存算法。对于最近k关键字查询,我们先在一个特殊的图,即一颗树上,开发一个有效算法来在常数时间内回答最近k关键字查询, 并由此得出一个图上的近似算法;此外我们还通过一个全局存储的技术来进一步减少索引大小和缩短查询时间。我们在真实和模拟的数据上做了大量的实验,实验结果证明我们的算法在大图上对上述三个查询都具有高效性能。 / Due to the massive size of graphs from various domains nowadays, even simple graph queries become challenging tasks. In this thesis, three queries with a wide range of applications are investigated on large graphs. One is shortest distance query, a fundamental query which computes the shortest distance between two nodes. Another query, weight constraint reachability (WCR), checks if there is a feasible path between two nodes where edge weights along the path satisfy a side constraint. And the third one, a top-k nearest keywords (k-NK) query, reports, for a query node, the k nearest nodes bearing some user-specified keywords. When confronting with a large-scale graph with over tens of millions of nodes, we need to develop efficient indexing and query optimization techniques for these queries. / In this thesis, for a shortest distance query, we devise two landmark embedding schemes, an error bounded landmark scheme and a local landmark scheme, where the former can guarantee an error bound for estimated distance, and the latter can significantly improve the distance estimation accuracy without increasing the offline embedding or the online query complexity. For a WCR query, we propose a memorybased approach which promises a constant query time. Besides, in order to increase its scalability, we devise an I/O-efficient approach for answering a WCR query on massive graphs. For a k-NK query, we start with a special case when the graph is a tree, based on which we present our algorithm for approximate k-NK query on a graph. A global storage technique is devised to further reduce the index size and the query time. We did extensive experiments on the three queries respectively to show the effectiveness and efficiency of our methods. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Qiao, Miao. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 141-151). / Abstract also in Chinese. / Abstract --- p.i / Abstract in Chinese --- p.ii / Acknowledgements --- p.iii / Contents --- p.v / Chapter 1. --- Introduction --- p.1 / Chapter 1.1. --- Motivation --- p.1 / Chapter 1.1.1. --- Shortest Distance Query --- p.1 / Chapter 1.1.2. --- Weight Constraint Reachability Query --- p.4 / Chapter 1.1.3. --- Top-k Nearest Keyword Query --- p.7 / Chapter 1.2. --- Contributions --- p.9 / Chapter 1.3. --- Roadmap --- p.11 / Chapter 2. --- RelatedWork --- p.12 / Chapter 2.1. --- Shortest Distance Query --- p.12 / Chapter 2.2. --- Reachability Query --- p.14 / Chapter 2.3. --- Keyword Related Query --- p.15 / Chapter 3. --- Querying Shortest Distance --- p.17 / Chapter 3.1. --- Landmark Embedding --- p.17 / Chapter 3.2. --- Error Bounded Landmark Scheme --- p.18 / Chapter 3.2.1. --- Problem Statement --- p.18 / Chapter 3.2.2. --- Proposed Algorithm --- p.18 / Chapter 3.2.3. --- Graph Partitioning-based Heuristic --- p.22 / Chapter 3.2.4. --- Experiments --- p.27 / Chapter 3.3. --- Query-Dependent Local Landmark Scheme --- p.34 / Chapter 3.3.1. --- Problem Statement --- p.34 / Chapter 3.3.2. --- Shortest Path Tree Based Local Landmark --- p.37 / Chapter 3.3.3. --- Optimization Techniques --- p.41 / Chapter 3.3.4. --- Local Landmark Scheme on Relational Database --- p.48 / Chapter 3.3.5. --- Experiment --- p.56 / Chapter 3.4. --- Summary --- p.64 / Chapter 4. --- QueryingWeight Constraint Reachability --- p.65 / Chapter 4.1. --- Problem Definition --- p.65 / Chapter 4.1.1. --- Edge Weight Constraint --- p.65 / Chapter 4.1.2. --- Node Weight Constraint --- p.66 / Chapter 4.1.3. --- Two Basic Solutions --- p.67 / Chapter 4.2. --- An Efficient Memory Algorithm --- p.68 / Chapter 4.2.1. --- Properties of WCR --- p.68 / Chapter 4.2.2. --- Novel Edge Based Indexing --- p.70 / Chapter 4.2.3. --- Extension to Other Constraint Formats --- p.76 / Chapter 4.3. --- An I/O-Efficient Index --- p.77 / Chapter 4.3.1. --- Vertex Coding --- p.78 / Chapter 4.3.2. --- MST Re-balancing --- p.80 / Chapter 4.3.3. --- Disk-Based Index Construction --- p.84 / Chapter 4.3.4. --- Query Processing --- p.85 / Chapter 4.4. --- Experiments --- p.87 / Chapter 4.5. --- Summary --- p.101 / Chapter 5. --- Querying Top K-Nearest Keyword --- p.102 / Chapter 5.1. --- Problem Definition --- p.102 / Chapter 5.2. --- Existing Solutions --- p.103 / Chapter 5.2.1. --- Approximate k-NK on a Graph --- p.104 / Chapter 5.2.2. --- Exact 1-NK on a Tree --- p.106 / Chapter 5.3. --- Solution Overview --- p.108 / Chapter 5.4. --- K-NK on a Tree for a Small K --- p.110 / Chapter 5.4.1. --- Query Processing --- p.110 / Chapter 5.4.2. --- Construction of Entry Edge Partition --- p.115 / Chapter 5.4.3. --- Construction of Candidate List --- p.118 / Chapter 5.5. --- K-NK on a Tree for a Large K --- p.120 / Chapter 5.5.1. --- A Basic Pivot Approach --- p.121 / Chapter 5.5.2. --- Pivot Approach with Tree Balancing --- p.122 / Chapter 5.5.3. --- Index Construction --- p.125 / Chapter 5.6. --- Approximate K-NK on a Graph --- p.128 / Chapter 5.7. --- Experiments --- p.133 / Chapter 5.8. --- Summary --- p.138 / Chapter 6. --- Conclusions and Future Work --- p.139 / Bibliography --- p.140
|
66 |
Query processing for graph-structured data. / 圖結構數據的查詢處理 / CUHK electronic theses & dissertations collection / Tu jie gou shu ju de cha xun chu liJanuary 2007 (has links)
Graph-structured data is enjoying an increasing popularity among Web technology and new data management and archiving techniques. Numerous applications work with graphs and need to query reachability among nodes in graphs. A 2-hop cover can compactly represent the whole edge transitive closure of a graph in O ( / Cheng, Jiefeng. / "August 2007." / Adviser: Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1097. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 134-141). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307.
|
67 |
Free tree-featured graph indexing and query processing. / CUHK electronic theses & dissertations collectionJanuary 2007 (has links)
In this dissertation, we develop frequent free tree mining systems, F3TM and CFFTree, to systematically discover the complete set of (closed) frequent free trees from a graph database. Graph indexing, another problem addressed in this dissertation, is of special interest both in academia and in industrial applications. The solutions, (Tree+Delta ) and SimTree, propose a free tree featured index, which is built on frequent free tree patterns discovered through a structural mining process. This mining-based indexing methodology leads to the development of a compact but effective graph index structure that is orders of magnitude smaller in size but an order of magnitude faster in performance than traditional approaches. / Scalable structural pattern mining and graph database management tools become increasingly crucial to applications with complex data in domains ranging from software engineering to computational biology. Due to their high complexity, it is often difficult, if not impossible, for human beings to manually analyze any reasonably large collection of graphs. In this dissertation, we investigate two fundamental problems in large scale graph databases: Given a graph database what are the hidden free tree patterns and how can we find them? And how can we index graphs based on free tree patterns and perform similarity search in large graph databases? / The developed concepts, theories, and systems hence increase our understanding of data mining principles in structural pattern discovery, interpretation and search. The formulation of a general free tree featured graph information system through this study could provide fundamental supports to graph-intensive applications. / Zhao, Peixiang. / "August 2007." / Adviser: Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1127. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 136-145). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307.
|
68 |
Area Access Control Systems: Zone Management And Personnel TrackingNatarajan, Bharath 12 July 2005 (has links)
Area access control is defined as the process of mediating requests to enter a physical area through one or more entry points. Area access control database systems are the collections of information required for an access control system to access, query, retrieve and match real time user inputs with persistent data to ensure the integrity of the resources it protects. This thesis presents an object oriented approach to the design and implementation of a centralized area access control database system and focuses on two features, zone management and personnel tracking. Zone management is defined as the process of hierarchically relating a zone to other immediately adjacent zone(s) that a user is required to have prior access to. This feature will automatically generate all zones that a user requires prior access to in order to approach a target zone. To implement zone management, the database system is required to support recursive relationships and recursive querying. The personnel-tracking feature allows the administrator to obtain information such as the movement of persons of interest and their interactions with others in the installation at any particular time. The results of this thesis contribute to the implementation of a sophisticated area access control database system capable of handling multiple installations, and generating the access rules and paths for each new user automatically. In addition, the object oriented area access control database system is able to support unconventional data types such as images and sound which are essential for emerging biometric security systems.
|
69 |
A personalised query expansion approach using contextSeher, Indra, University of Western Sydney, College of Health and Science, School of Computing and Mathematics January 2007 (has links)
Users of the Web usually use search engines to find answers to a variety of questions. Although search engines can rapidly process a large number of Web documents, in many cases, the answers returned by search engines are not relevant to the user’s information need, although they do contain the same keywords as the query. This is because the Web contains information sources created by numerous authors independently, and the authors’ vocabularies vary greatly. Furthermore, most words in natural languages have inherent ambiguity. This vocabulary mismatch between user queries and Web sources is often addressed through query expansion. Moreover, user questions are often short. The results of a search can be improved when the length of the question is long. Various query expansion methods that add useful question-related terms before processing the question have been proposed and proven to increase the performance of the result. Some of these query expansion methods add contextual information related to the user and the question. On the other hand, human communications are quite successful and seem to be very easy. This is mainly due to the understanding of language and the world knowledge that humans have. Human communication is more successful when there is an implicit understanding of everyday situations of others who take part in the communication. Here the implicit situational information, or the “context” that humans share, enables them to have a more meaningful interaction amongst themselves. Similar to human–human communications, improving computers’ access to context can increase the richness of human–computer communications, giving more useful computational services to users. Based on the above factors, this research proposes a method to make use of context in order to understand and process user requests. Here, the term “context” means the meanings associated with key query terms and preferences that have to be decided in order to process the query. As in a natural environment, results produced to different users for the same question could vary in an automated system. If the automated system knows users’ preferences related to the question, then it could make use of these preferences to process user queries, producing more relevant and useful results to the user. Hence, a new approach for a personalised query expansion is proposed in this research, where user queries are expanded with user preferences and hence the expanded queries that will be used for processing vary for different users. An architecture that is required for such a Web application to carryout a personalised query expansion with contextual information is also proposed in the thesis. The preferences that could be used for the query expansion are therefore user-specific. Users have different set of preferences depending on the tasks they want to perform. Similar tasks that have same types of preferences can be grouped into task based domains. Hence, user preferences will be the same in a domain, and will vary across domains. Furthermore, there can be different types of subtasks that could be performed within a domain. The set of preferences that could be used for each sub task could vary, and it will be a sub set of the set of preferences of the domain. Hence, an approach for a personalised query expansion which adds user, domain and task-specific preferences to user queries is proposed in this research. The main stages of this expansion are identified and discussed in this thesis. Each of these stages requires different contextual information which is represented in the context model. Out of the main stages identified in the query expansion process, the first three stages, the domain identification, task identification, and missing parameter identification, are explored in the thesis. As the preferences used for the expansion depend on the query domain, it is necessary to identify the domain of the query at first instance. Hence, a domain identification algorithm which makes use of eight different features is proposed in the thesis to identify domains of given queries. This domain identification also reduces the ambiguity of query terms. When the query domain is identified, context/associating meanings of query terms are known. This limits the scope of the possible misinterpretations of query terms. A domain ontology, domain dictionary, and user profile are used by the domain identification algorithm. The domain ontology consists of objects and their categories, attributes of objects and their categories, relationships among objects, and instances and their categories in the domain. The domain dictionary consists of objects and attributes. This is created automatically from the domain ontology. The user profile has the long term preferences of the user that are domain-specific and general. When the domain of the query is known, in order to decide the preferences of the user, the task specified in the query has to be identified. This task identification process is found to be similar in domains with similar activities. Hence, domains are grouped at this stage. These domain groups and the rules that could be used to find out the tasks in the domain groups are identified and discussed in the thesis. For each sub tasks in the domain groups, the types of preferences that could be used to expand user queries are identified and are used to expand user queries. An experiment is designed to evaluate the performance of the proposed approach. The first three stages of the query expansion, the domain identification, task identification, and missing parameter identification, are implemented and evaluated. Samples of five domains are implemented, and queries are collected in these domains from various users. In order to create new domains, a wizard is provided by the system. This system also allows editing the existing domains, domain groups, and types of preferences in sub tasks of the domain groups. Instances of the attributes are manually identified and added to the system using the interface provided by the system. In each of the stages of the query expansion, the results of the queries are manually identified, and are compared with the results produced by the system. The results have confirmed that the proposed method has a positive impact in query expansion. The experiments, results and evaluation of the proposed query expansion approach are also presented in the thesis. The proposed approach for the query expansion could be used by search engines, organisations with a limited set of task domains, and any application that can be improved by making use of personalised query expansion. / Doctor of Philosophy (PhD)
|
70 |
Feature extraction and similarity-based analysis for proteome and genome databasesÖztürk, Özgür. January 2007 (has links)
Thesis (Ph. D.)--Ohio State University, 2007. / Title from first page of PDF file. Includes bibliographical references (p. 108-119).
|
Page generated in 0.0739 seconds