Global ETD Search

1	Distributed XML Query Processing Kling, Patrick January 2012 (has links) While centralized query processing over collections of XML data stored at a single site is a well understood problem, centralized query evaluation techniques are inherently limited in their scalability when presented with large collections (or a single, large document) and heavy query workloads. In the context of relational query processing, similar scalability challenges have been overcome by partitioning data collections, distributing them across the sites of a distributed system, and then evaluating queries in a distributed fashion, usually in a way that ensures locality between (sub-)queries and their relevant data. This thesis presents a suite of query evaluation techniques for XML data that follow a similar approach to address the scalability problems encountered by XML query evaluation. Due to the significant differences in data and query models between relational and XML query processing, it is not possible to directly apply distributed query evaluation techniques designed for relational data to the XML scenario. Instead, new distributed query evaluation techniques need to be developed. Thus, in this thesis, an end-to-end solution to the scalability problems encountered by XML query processing is proposed. Based on a data partitioning model that supports both horizontal and vertical fragmentation steps (or any combination of the two), XML collections are fragmented and distributed across the sites of a distributed system. Then, a suite of distributed query evaluation strategies is proposed. These query evaluation techniques ensure locality between each fragment of the collection and the parts of the query corresponding to the data in this fragment. Special attention is paid to scalability and query performance, which is achieved by ensuring a high degree of parallelism during distributed query evaluation and by avoiding access to irrelevant portions of the data. For maximum flexibility, the suite of distributed query evaluation techniques proposed in this thesis provides several alternative approaches for evaluating a given query over a given distributed collection. Thus, to achieve the best performance, it is necessary to predict and compare the expected performance of each of these alternatives. In this work, this is accomplished through a query optimization technique based on a distribution-aware cost model. The same cost model is also used to fine-tune the way a collection is fragmented to the demands of the query workload evaluated over this collection. To evaluate the performance impact of the distributed query evaluation techniques proposed in this thesis, the techniques were implemented within a production-quality XML database system. Based on this implementation, a thorough experimental evaluation was performed. The results of this evaluation confirm that the distributed query evaluation techniques introduced here lead to significant improvements in query performance and scalability both when compared to centralized techniques and when compared to existing distributed query evaluation techniques. distributed query processing XML query processing Computer Science
2	Distributed XML Query Processing Kling, Patrick January 2012 (has links) While centralized query processing over collections of XML data stored at a single site is a well understood problem, centralized query evaluation techniques are inherently limited in their scalability when presented with large collections (or a single, large document) and heavy query workloads. In the context of relational query processing, similar scalability challenges have been overcome by partitioning data collections, distributing them across the sites of a distributed system, and then evaluating queries in a distributed fashion, usually in a way that ensures locality between (sub-)queries and their relevant data. This thesis presents a suite of query evaluation techniques for XML data that follow a similar approach to address the scalability problems encountered by XML query evaluation. Due to the significant differences in data and query models between relational and XML query processing, it is not possible to directly apply distributed query evaluation techniques designed for relational data to the XML scenario. Instead, new distributed query evaluation techniques need to be developed. Thus, in this thesis, an end-to-end solution to the scalability problems encountered by XML query processing is proposed. Based on a data partitioning model that supports both horizontal and vertical fragmentation steps (or any combination of the two), XML collections are fragmented and distributed across the sites of a distributed system. Then, a suite of distributed query evaluation strategies is proposed. These query evaluation techniques ensure locality between each fragment of the collection and the parts of the query corresponding to the data in this fragment. Special attention is paid to scalability and query performance, which is achieved by ensuring a high degree of parallelism during distributed query evaluation and by avoiding access to irrelevant portions of the data. For maximum flexibility, the suite of distributed query evaluation techniques proposed in this thesis provides several alternative approaches for evaluating a given query over a given distributed collection. Thus, to achieve the best performance, it is necessary to predict and compare the expected performance of each of these alternatives. In this work, this is accomplished through a query optimization technique based on a distribution-aware cost model. The same cost model is also used to fine-tune the way a collection is fragmented to the demands of the query workload evaluated over this collection. To evaluate the performance impact of the distributed query evaluation techniques proposed in this thesis, the techniques were implemented within a production-quality XML database system. Based on this implementation, a thorough experimental evaluation was performed. The results of this evaluation confirm that the distributed query evaluation techniques introduced here lead to significant improvements in query performance and scalability both when compared to centralized techniques and when compared to existing distributed query evaluation techniques. distributed query processing XML query processing Computer Science
3	Resilient sensor network query processing Stokes, Alan Barry January 2014 (has links) Sensor networks comprise of a collection of resource-constrained, low cost, sometimes fragile wireless motes which have the capability to gather information about their surroundings through the use of sensors, and can be conceived as a distributed computing platform for applications ranging from event detection to environmental monitoring. A Sensor Network Query Processor (SNQP) is a means of collecting data from sensor networks where the requirements are defined using a declarative query language with a set of Quality of Service (QoS) expectations. As sensor networks are often deployed in hostile environments, there is a high possibility that the motes could break or that the communication links between the motes become unreliable. SNQP Query Execution Plans (QEPs) are often optimised for a specific network deployment and are designed to be as energy efficient as possible whilst ensuring the QEPs meet the QoS expectations, yet little has been done for handling the situation where the deployment itself has changed since the optimisation in such a way as to make the original QEP no longer efficient, or unable to operate. In this respect, the previous work on SNQPs has not aimed at being resilient to failures in the assumptions used at compilation/optimisation time which result in a QEP terminating earlier than expected. This dissertation presents a collection of approaches that embed resilience into a SNQP generated QEPs in such a way that a QEP operates for longer whilst still meeting the QoS expectations demanded of it, thereby resulting in a more reliable platform that can be applicable to a broader range of applications. The research contributions reported here include (a) a strategy designed to adapt to predictable node failures due to energy depletion; (b) a collection of strategies designed to adapt to unpredictable node failures; (c) a strategy designed to handle unreliable communication channels; and (d) an empirical evaluation to show the benefits of a resilient SNQP in relation to a representative non-resilient SNQP. 621.382
4	A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Perry, Matthew Steven 02 September 2008 (has links) No description available. Computer Science Ontology Semantic Analytics RDF Temporal Query Processing
5	Quality of service aware optimization of sensor network queries Galpin, Ixent January 2010 (has links) Sensor networks comprise resource-constrained wireless nodes with the capability of gathering information about their surroundings and have recently risen to prominence with the promise of being an effective computing platform for diverse applications, ranging from event detection to environmental monitoring. The database community proposed the use of sensor network query processors (SNQPs) as means to meet data collection requirements using a declarative query language. Declarative queries posed against a sensor network constitute an effective means to repurpose sensor networks and reduce the high software development costs associated with them. The range of sensor network applications is very broad. Such applications have diverse, and often conflicting, QoS expectations in terms of the delivery time of results, the acquisition interval at which data is collected, the total energy consumption of the deployment, or the network lifetime. The conflicting nature of these desiderata is aggravated by the resource-constrained nature of sensor networks as a computing fabric, making it particularly challenging to reconcile the trade-offs that arise. Previously, SNQPs have been focussed on evaluating queries as energy-efficiently as possible. There has been comparatively less work on attempting to meet a broad range of optimization goals and constraints that captured these QoS expectations. In this respect, previous work in SNQP has not aimed at being general purpose across the breadth of applications to which sensor networks have been applied. This PhD dissertation presents an approach for enabling QoS-awareness in SNQPs so that query evaluation plans are generated that exhibit good performance for a broader range of sensor network applications in terms of their QoS expectations. The research contributions reported here include (a) a functional decomposition of the decision-making steps required to compile a declarative query into a query evaluation plan in a sensor network setting; (b) algorithms to implement these decision-making steps; and (c) an empirical evaluation to show the benefits of QoS-awareness compared to a representative fixed-goal SNQP. 621.382
6	Mining and Managing Neighbor-Based Patterns in Data Streams Yang, Di 09 January 2012 (has links) The current data-intensive world is continuously producing huge volumes of live streaming data through various kinds of electronic devices, such as sensor networks, smart phones, GPS and RFID systems. To understand these data sources and thus better leverage them to serve human society, the demands for mining complex patterns from these high speed data streams have significantly increased in a broad range of application domains, such as financial analysis, social network analysis, credit fraud detection, and moving object monitoring. In this dissertation, we present a framework to tackle the mining and management problem for the family of neighbor-based patterns in data streams, which covers a broad range of popular pattern types, including clusters, outliers, k-nearest neighbors and others. First, we study the problem of efficiently executing single neighbor-based pattern mining queries. We propose a general optimization principle for incremental pattern maintenance in data streams, called "Predicted Views". This general optimization principle exploits the "predictability" of sliding window semantics to eliminate both the computational and storage effort needed for handling the expiration of stream objects, which usually constitutes the most expensive operations for incremental pattern maintenance. Second, the problem of multiple query optimization for neighbor-based pattern mining queries is analyzed, which aims to efficiently execute a heavy workload of neighbor-based pattern mining queries using shared execution strategies. We present an integrated pattern maintenance strategy to represent and incrementally maintain the patterns identified by queries with different query parameters within a single compact structure. Our solution realizes fully shared execution of multiple queries with arbitrary parameter settings. Third, the problem of summarization and matching for neighbor-based patterns is examined. To solve this problem, we first propose a summarization format for each pattern type. Then, we present computation strategies, which efficiently summarize the neighbor-based patterns either during or after the online pattern extraction process. Lastly, to compare patterns extracted on different time horizon of the stream, we design an efficient matching mechanism to identify similar patterns in the stream history for any given pattern of interest to an analyst. Our comprehensive experimental studies, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrate superiority of our proposed strategies over alternate methods in both effectiveness and efficiency. Algorithm Streaming Data Query Processing Data Mining
7	Extending Event Sequence Processing:New Models and Optimization Techniques Liu, Mo 25 April 2012 (has links) Many modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. This dissertation focuses on innovating several techniques at the core of a scalable E-Analytic system to achieve efficient, scalable and robust methods for in-memory multi-dimensional nested pattern analysis over high-speed event streams. First, I address the problem of processing flat pattern queries on event streams with out-of-order data arrival. I design two alternate solutions: aggressive and conservative strategies respectively. The aggressive strategy produces maximal output under the optimistic assumption that out-of-order event arrival is rare. The conservative method works under the assumption that out-of-order data may be common, and thus produces output only when its correctness can be guaranteed. Second, I design the integration of CEP and OLAP techniques (ECube model) for efficient multi-dimensional event pattern analysis at different abstraction levels. Strategies of drill-down (refinement from abstract to specific patterns) and of roll-up (generalization from specific to abstract patterns) are developed for the efficient workload evaluation. I design a cost-driven adaptive optimizer called Chase that exploits reuse strategies for optimal E-Cube hierarchy execution. Then, I explore novel optimization techniques to support the high- performance processing of powerful nested CEP patterns. A CEP query language called NEEL, is designed to express nested CEP pattern queries composed of sequence, negation, AND and OR operators. To allow flexible execution ordering, I devise a normalization procedure that employs rewriting rules for flattening a nested complex event expression. To conserve CPU and memory consumption, I propose several strategies for efficient shared processing of groups of normalized NEEL subexpressions. Our comprehensive experimental studies, using both synthetic as well as real data streams demonstrate superiority of our proposed strategies over alternate methods in the literature in both effectiveness and efficiency. Complex Event Processing Optimization Streaming Query Processing
8	Querying Mediated Web Services Sabesan, Manivasakan January 2007 (has links) <p>Web services provide a framework for data interchange between applications by incorporating standards such as XMLSchema, WSDL, SOAP, HTTP etc. They define operations to be invoked over a network to perform the actions. These operations are described publicly in a WSDL document with the data types of their argument and result. Searching data accessible via web services is essential in many applications. However, web services don’t provide any general query language or view capabilities. Current web services applications to access the data must be developed using a regular programming language such Java, or C#. The thesis provides an approach to simplify querying web services data and proposes efficient processing of database queries to views of wrapped web services. To show the effectiveness of the approach, a prototype, <em>webService MEDiator system (WSMED</em>), is developed. WSMED provides general view and query capabilities over data accessible through web services by automatically extracting basic meta-data from WSDL descriptions. Based on imported meta-data, the user can then define views that extract data from the results of calls to web service operations. The views can be queried using SQL. A given view can access many different web service operations in different ways depending on what view attributes are known. The views can be specified in terms of several declarative queries to be applied by the query processor. In addition, the user can provide semantic enrichments of the meta-data with key constraints to enable efficient query execution over the views by automatic query transformations. We evaluated the effectiveness of our approach over multilevel views of existing web services and show that the key constraint enrichments substantially improve query performance.</p> / SIDA query processing mediated web services Databases Databaser
9	Semantic query processing in database systems Shenoy, Sreekumar Thrivikrama January 1990 (has links) No description available. Computer Science
10	Efficient Concurrent Operations in Spatial Databases Dai, Jing 16 November 2009 (has links) As demanded by applications such as GIS, CAD, ecology analysis, and space research, efficient spatial data access methods have attracted much research. Especially, moving object management and continuous spatial queries are becoming highlighted in the spatial database area. However, most of the existing spatial query processing approaches were designed for single-user environments, which may not ensure correctness and data consistency in multiple-user environments. This research focuses on designing efficient concurrent operations on spatial datasets. Current multidimensional data access methods can be categorized into two types: 1) pure multidimensional indexing structures such as the R-tree family and grid file; 2) linear spatial access methods, represented by the Space-Filling Curve (SFC) combined with B-trees. Concurrency control protocols have been designed for some pure multidimensional indexing structures, but none of them is suitable for variants of R-trees with object clipping, which are efficient in searching. On the other hand, there is no concurrency control protocol designed for linear spatial indexing structures, where the one-dimensional concurrency control protocols cannot be directly applied. Furthermore, the recently designed query processing approaches for moving objects have not been protected by any efficient concurrency control protocols. In this research, solutions for efficient concurrent access frameworks on both types of spatial indexing structures are provided, as well as for continuous query processing on moving objects, for multiple-user environments. These concurrent access frameworks can satisfy the concurrency control requirements, while providing outstanding performance for concurrent queries. Major contributions of this research include: (1) a new efficient spatial indexing approach with object clipping technique, ZR+-tree, that outperforms R-tree and R+-tree on searching; (2) a concurrency control protocol, GLIP, to provide high throughput and phantom update protection on spatial indexing with object clipping; (3) efficient concurrent operations for indices based on linear spatial access methods, which form up the CLAM protocol; (4) efficient concurrent continuous query processing on moving objects for both R-tree-based and linear spatial indexing frameworks; (5) a generic access framework, Disposable Index, for optimal location update and parallel search. / Ph. D. Indexing Query Processing Concurrency Control Spatial Database

Search results