Global ETD Search

1	Scaling Continuous Query Services for Future Computing Platforms and Applications Gedik, Bugra 13 June 2006 (has links) The ever increasing rate of digital information available from on-line sources drives the need for building information monitoring applications to assist users in tracking relevant changes in these sources and accessing information that is of interest to them in a timely manner. Continuous queries (CQs) are standing queries that are continuously evaluated over dynamic sources to track information changes that meet user specified thresholds and notify users of new results in real-time. CQ systems can be considered as powerful middleware for supporting information monitoring applications. A significant challenge in building CQ systems is scalability, caused by the large number of users and queries, and by the large and growing number of information sources with high update rates. In this thesis we use CQs to shepherd through and address the challenges involved in supporting information monitoring applications in future computing platforms. The focus is on P2P web monitoring in Internet systems, location monitoring in mobile systems, and environmental monitoring in sensor systems. Although different computing platforms require different software architectures for building scalable CQ services, there is a common design philosophy that this thesis advocates for making CQ services scalable and efficient. This can be summarized as "move computation close to the places where the data is produced." A common challenge in scaling CQ systems is the resource-intensive nature of query evaluation, which involves continuously checking updates in a large number of data sources and evaluating trigger conditions of a large number of queries over these updates, consuming both cpu and network bandwidth resources. If some part of the query evaluation can be pushed close to the sources where the data is produced, the resulting early filtering of updates will save both bandwidth and cpu resources. In summary, in this thesis we show that distributed CQ architectures that are designed to take advantage of the opportunities provided by ubiquitous computing platforms and pervasive networks, while at the same time recognizing and resolving the challenges posed by these platforms, lead to building scalable and effective CQ systems to better support the demanding information monitoring applications of the future. Continuous queries Information monitoring
2	Using Resampling to Optimizing Continuous Queries in Wireless Sensor Networks Liu, Pin-yu 17 July 2007 (has links) The advances of communication and computer techniques have enabled the development of low-cost, low-power, multifunctional sensor nodes that are small in size and capable of communicating in short distances. A sensor network is composed of a large number of sensor nodes that are densely deployed either inside the phenomenon to be observed or very close to it. Sensor networks open up new opportunities to observe and interact with the physical world around us. Despite the recent advances in sensor network applications and technology, sensor networks still suffer from the major problems of limited energy. It is because most sensor nodes use battery as their energy srouce and are inconvenient and sometimes difficult to be replaced when the battery run out. Understanding the events, measures, and tasks required by certain applications has the potential to provide efficient communication techniques for the sensor network. Our focus in this work is on the efficient processing of continuous queries, by which query results have to be generated according to the sampling rate specified by the user for an extended period of time. In this thesis, we will deal with two types of continuous queries. The first type of queries requires data from all sensor nodes; while the other is only interested in the data returned by some selected nodes. To answer these queries, data have to be sent to the base station at some designated rate, which may consume much energy. Previous works have developed two methods to reduce the energy consumption. They both base on the error range which the user can tolerate to determine whether current sensing data should be transmitted. While the first uses simple cache method, the second uses complex multi-dimensional model. However, the proposed methods required the user to specify the error range, which may not be easy to specify. In addition, the sensed data reported by the sensors were assumed to be accurate, which is by no means true in the real world. This thesis is based on Kalman filter to correct and predict sensing data. As a result, the sampling frequency of each sensor is dynamically adjusted, referred to as resampling which systematically determine the data sensing/transferring rate of sensors. We evaluate our proposed methods using empirical data collected from a real sensor network. Kalman filter Continuous Queries Sensor Networks
3	CQ-Buddy: Harnessing Peers For Distributed Continuous Query Processing Ng, Wee Siong, Shu, Yanfeng, Tok, Wee Hyong 01 1900 (has links) In this paper, we present the design and evaluation of CQ-Buddy, a peer-to-peer (p2p) continuous query (CQ) processing system that is distributed, and highly-scalable. CQ-Buddy exploits the differences in capabilities (processing and memory) of peers and load-balances the tasks across powerful and weak peers. Our main contributions are as follows: First, CQ-Buddy introduces the notion of pervasive continuous queries to tackle the frequent disconnected problems common in a peer-to-peer environment. Second, CQ-Buddy allows for inter-sharing and intra-sharing in the processing of continuous queries amongst peers. Third, CQ-Buddy peers perform query-centric load balancing for overloaded data source providers by acting as proxies. We have conducted extensive studies to evaluate CQ-Buddy’s performance. Our results show that CQ-Buddy is highly scalable, and is able to process continuous queries in an effective and efficient manner. / Singapore-MIT Alliance (SMA) CQ-Buddy continuous queries CQ P2P peer-to-peer distributed
4	Dynamic Optimization and Migration of Continuous Queries Over Data Streams Zhu, Yali 23 August 2006 (has links) "Continuous queries process real-time streaming data and output results in streams for a wide range of applications. Due to the fluctuating stream characteristics, a streaming database system needs to dynamically adapt query execution. This dissertation proposes novel solutions to continuous query adaptation in three core areas, namely dynamic query optimization, dynamic plan migration and partitioned query adaptation. Runtime query optimization needs to efficiently generate plans that satisfy both CPU and memory resource constraints. Existing work focus on minimizing intermediate query results, which decreases memory and CPU usages simultaneously. However, doing so cannot assure that both resource constraints are being satisfied, because memory and CPU can be either positively or negatively correlated. This part of the dissertation proposes efficient optimization strategies that utilize both types of correlations to search the entire query plan space in polynomial time when a typical exhaustive search would take at least exponential time. Extensive experimental evaluations have demonstrated the effectiveness of the proposed strategies. Dynamic plan migration is concerned with on-the-fly transition from one continuous plan to a semantically equivalent yet more efficient plan. It is a must to guarantee the continuation and repeatability of dynamic query optimization. However, this research area has been largely neglected in the current literature. The second part of this dissertation proposes migration strategies that dynamically migrate continuous queries while guaranteeing the integrity of the query results, meaning there are no missing, duplicate or incorrect results. The extensive experimental evaluations show that the proposed strategies vary significantly in terms of output rates and memory usages given distinct system configurations and stream workloads. Partitioned query processing is effective to process continuous queries with large stateful operators in a distributed system. Dynamic load redistribution is necessary to balance uneven workload across machines due to changing stream properties. However, existing solutions generally assume static query plans without runtime query optimization. This part of the dissertation evaluates the benefits of applying query optimization in partitioned query processing and shows dramatic performance improvement of more than 300%. Several load balancing strategies are then proposed to consider the heterogeneity of plan shapes across machines caused by dynamic query optimization. The effectiveness of the proposed strategies is analyzed through extensive experiments using a cluster." query optimization data streams runtime query adaptations continuous queries plan migration distributed query processing window constraints Querying (Computer science)
5	Multiple Continuous Query Processing with Relative Window Predicates "Juggler" Silva, Asima 27 May 2004 (has links) "Efficient querying over streaming data is a critical technology which requires the ability to handle numerous and possibly similar queries in real time dynamic environments such as the stock market and medical devices. Existing DBMS technology is not well suited for this domain since it was developed for static historical data. Queries over streams often contain relative window predicates such as in the query: ``Heart rate decreased to fifty-two beats per second within four seconds after the patient's temperature started rising." Relative window predicates are a specific type of join between streams that is based on the tuple's timestamp. In our operator, called Juggler, predicates are classified into three types: attribute, join, and window. Attribute predicates are stream values compared to a constant. Join predicates are stream values compared to another stream's values. Window predicates are join predicates where the streams' timestamp values are compared. Juggler's composite operator incorporates the processing of similar though not identical, query functionalities as one complex computation process. This execution strategy handles multi-way joins for multiple selection and join predicates. It adaptively orders the execution of predicates by their selectivity to efficiently process multiple continuous queries based on stream characteristics. In Juggler, all similar predicates are grouped into lists. These indices are represented by a collection of bits. Every tuple contains the bit structure representation of the predicate lists which encodes tuple predicate evaluation history. Every query also contains a similar bit structure to encode the predicate's relationship to the registered queries. The tuple's and query's bit structures are compared to assess if the tuple has satisfied a query. Juggler is designed and implemented in Java. Experiments were conducted to verify correctness and to assess the performance of Juggler's three features. Its adaptivity of reordering the evaluation of predicate types performed as well as the most selective predicate ordering. Its ability to exploit similar predicates in multiple queries showed reduction in number of comparisons. Its effectiveness when multiple queries are combined in a single Juggler operator indicated potential performance improvements after optimization of Juggler's data structures." reordering predicates multi-join operator sliding windows window predicates join algorithm continuous queries Query languages (Computer science)
6	SNIF TOOL - Sniffing for Patterns in Continuous Streams MUKHERJI, ABHISHEK 11 February 2008 (has links) Recent technological advances in sensor networks and mobile devices give rise to new challenges in processing of live streams. In particular, time-series sequence matching, namely, the similarity matching of live streams against a set of predefined pattern sequence queries, is an important technology for a broad range of domains that include monitoring the spread of hazardous waste and administering network traffic. In this thesis, I use the time critical application of monitoring of fire growth in an intelligent building as my motivating example. Various measures and algorithms have been established in the current literature for similarity of static time-series data. Matching continuous data poses the following new challenges: 1) fluctuations in stream characteristics, 2) real-time requirements of the application, 3) limited system resources, and, 4) noisy data. Thus the matching techniques proposed for static time-series are mostly not applicable for live stream matching. In this thesis, I propose a new generic framework, henceforth referred to as the n-Snippet Indices Framework (in short, SNIF), for discovering the similarity between a live stream and pattern sequences. The framework is composed of two key phases: (1.) Off-line preprocessing phase: where the pattern sequences are processed offline and stored into an approximate 2-level index structure; and (2.) On-line live stream matching phase: streaming time-series (or the live stream) is on-the-fly matched against the indexed pattern sequences. I introduce the concept of n-Snippets for numeric data as the unit for matching. The insight is to match small snippets of the live stream against prefixes of the patterns and maintain them in succession. Longer the pattern prefixes identified to be similar to the live stream, better the confirmation of the match. Thus, the live stream matching is performed in two levels of matching: bag matching for matching snippets and order checking for maintaining the lengths of the match. I propose four variations of matching algorithms that allow the user the capability to choose between the two conflicting characteristics of result accuracy versus response time. The effectiveness of SNIF to detect patterns has been thoroughly tested through extensive experimental evaluations using the continuous query engine CAPE as platform. The evaluations made use of real datasets from multiple domains, including fire monitoring, chlorine monitoring and sensor networks. Moreover, SNIF is demonstrated to be tolerant to noisy datasets. continuous queries streaming time-series similarity queries pattern matching Sequential pattern mining Fire growth Computer simulation
7	Continuous Query Processing on Spatio-Temporal Data Streams Nehme, Rimma V 23 August 2005 (has links) "This thesis addresses important challenges in the areas of streaming and spatio-temporal databases. It focuses on continuous querying of spatio-temporal environments characterized by (1) a large number of moving and stationary objects and queries; (2) need for near real-time results; (3) limited memory and cpu resources; and (4) different accuracy requirements. The first part of the thesis studies the problem of performance vs. accuracy tradeoff using different location modelling techniques when processing continuous spatio-temporal range queries on moving objects. Two models for modeling the movement, namely: continuous and discrete models are described. This thesis introduces an accuracy comparison model to estimate the quality of the answers returned by each of the models. Experimental evaluations show the effectiveness of each model given certain characteristics of spatio-temporal environment (e.g., varying speed, location update frequency). The second part of the thesis introduces SCUBA, a Scalable Cluster Based Algorithm for evaluating a large set of continuous queries over spatio-temporal data streams. Unlike the commonly used static grid indices, the key idea of SCUBA is to group moving objects and queries based on common dynamic properties (e.g., speed, destination, and road network location) at run-time into moving clusters. This results in improvement in performance which facilitate scalability. SCUBA exploits shared cluster-based execution consisting of two phases. In phase I, the evaluation of a set of spatio-temporal queries is abstracted as a spatial join between moving clusters for cluster-based filtering of true negatives. There after, in phase II, a fine-grained join process is executed for all pairs identified as potentially joinable by a positive cluster-join match in phase I. If the clusters donâ€™t satisfy the join predicate, the objects and queries that belong to those clusters can be savely discarded as being guaranteed to not join individually either. This provides processing cost savings. Another advantage of SCUBA is that moving cluster-driven load shedding is facilitated. A moving cluster (or its subset, called nucleus)approximates the locations of its members. As a consequence relatively accurate answers can be produced using solely the abstracted cluster location information in place of precise object-by-object matches, resulting in savings in memory and improvement in processing time. A theoretical analysis of SCUBA is presented with respect to the memory requirements, number of join comparisons and I/O costs. Experimental evaluations on real datasets demonstrate that SCUBA achieves a substantial improvement when executing continuous queries on highly dense moving objects. The experiments are conducted in a real data streaming system (CAPE) developed at WPI on real datasets generated by the Network-Based Moving Objects Generator." continuous queries moving objects Database management Query languages (Computer science) Global system for mobile communications
8	QTor : Une approche communautaire pour l'évaluation de requêtes / QTor : Using communities to evaluate queries Dufromentel-Fougerit, Sébastien 09 December 2016 (has links) Cette thèse porte sur la mise en place d'un système de requêtage sur des flux sous contraintes de capacités. Ce système est porté par ses utilisateurs-trices et basé sur les similitudes entre requêtes. Les relations d'équivalences entre les différentes requêtes permettent de réunir les participants au sein de communautés d'intérêt. Celles-ci forment alors une abstraction permettant de séparer le problème d'organisation du système en plusieurs sous-problèmes plus simples et de taille réduite. Afin de garantir une généricité vis-à-vis du langage, l'organisation repose sur une API simple et modulable. Nous avons ainsi recours au mécanisme de réécritures de requêtes utilisant des vues matérialisées, connu en bases de données, pour déterminer les relations possibles entre les communautés. Le choix entre ces différentes possibilités est ensuite effectué à l'aide d'un modèle de coût paramétrable. Les relations entre communautés sont concrétisées par un échange de ressources entre elles, un participant de l'une venant contribuer à l'autre. Cela permet de s'affranchir des limitations de capacités au niveau abstrait, tout en en tenant hautement compte pour la mise en relation effective des participants. Au sein des communautés, un arbre de diffusion permet à l'ensemble des participants de récupérer les résultats requis. L'approche, mise en œuvre de manière incrémentale, permet une réduction efficace des coûts de calcul et de diffusion (l'optimalité est atteinte, notamment, dans le cas de l'inclusion de requête) pour un coût d'organisation limité et une latence raisonnable. Les expérimentations réalisées ont montré une grande adaptabilité aux variations concernant les requêtes exprimées et les capacités des participants. Le démonstrateur mis en place peut être utilisé à la fois pour des simulations (automatiques ou interactives) et pour un déploiement réel, par une implémentation commune générique vis-à-vis du langage. / This thesis addresses the problem of the organization of querying system on data streams under capacity constraints, such system being user-powered and based on the queries' similarity. Equivalence relations between queries allow to group the participants into communities. Those communities are then used as an abstraction to split the general organization problem into several easier and smaller subproblems. In order to stay language-independent, the organization is based on a simple and modular API, that rely on a query answering using views mechanism, well known in databases. Choice between the different rewritten queries is done using an adjustable cost model. Relations between communities are thus materialized by a spreading mechanism, a participant from one community joining the other(s) to contribute. This allows to avoid the capacities problem on the organization's abstract level, while efficiently taking care of it on the concrete one. Inside the communities, all the participants receive the common results they need using a spanning tree. The QTor approach, incrementally built, allows an efficient reduce of the processing and diffusion costs (processing cost being optimal in some cases, e.g. containment) with a reasonable latency, for a limited organization cost. Experiments have shown that the organization is flexible, regarding both the expressed queries and the participants' capacities. A demonstrator was built, allowing to both perform (automatic or interactive) simulations, and deploy the system over a real network, with a single. Informatique Requête Requêts continues sur flux Communauté d'intérêt Réécriture de requêtes Information Technology Query Continuous queries over streams Query rewriting 025.040 72
9	Efficient And Scalable Evaluation Of Continuous, Spatio-temporal Queries In Mobile Computing Environments Cazalas, Jonathan M 01 January 2012 (has links) A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatiotemporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. iv For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatiotemporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications Mobile computing continuous queries data streams spatio temporal queries spatio temporal data streams evaluation scalable range query knn gpu geds nvidia cuda Computer Sciences Engineering

Search results