Spelling suggestions: "subject:"query languages"" "subject:"query ianguages""
61 |
Managing Schema Change in an Heterogeneous EnvironmentClaypool, Kajal Tilak 17 June 2002 (has links)
"Change is inevitable even for persistent information. Effectively managing change of persistent information, which includes the specification, execution and the maintenance of any derived information, is critical and must be addressed by all database systems. Today, for every data model there exists a well-defined set of change primitives that can alter both the structure (the schema) and the data. Several proposals also exist for incrementally propagating a primitive change to any derived information (or view). However, existing support is lacking in two ways. First, change primitives as presented in literature are very limiting in terms of their capabilities allowing users to simply add or remove schema elements. More complex types of changes such the merging or splitting of schema elements are not supported in a principled manner. Second, algorithms for maintaining derived information often do not account for the potential heterogeneity between the source and the target. The goal of this dissertation is to provide solutions that address these two key issues. The first part of this dissertation addresses the challenge of expressing a rich complex set of changes. We propose the SERF (Schema Evolution through an Extensible, Re-usable and Flexible) framework that allows users to perform a wide range of complex user-defined schema transformations. Our approach combines existing schema evolution primitives using OQL (object query language) as the glue logic. Within the context of this work, we look at the different domains in which SERF can be applied, including web site management. To further enrich our framework, we also investigate the optimization and verification of SERF transformations. The second part of this dissertation addresses the problem of maintaining views in the face of source changes when the source and the view are not in the same data model. With today's increasing heterogeneity in information structure, it is critical that maintenance of views addresses the data model boundaries. However, view definitions that go across data models are limited to hard-coded algorithms, thereby making it difficult to develop general maintenance algorithms. We provide a two-step solution for this problem. We have developed a cross algebra, that defines views such that there is no restriction that forces the view and the source data models to be the same. We then define update propagation algorithms that can propagate changes from source to target irrespective of the exact translation and the data models. We validate our ideas by applying them to translation and change propagation between the XML and relational data models."
|
62 |
VAMANA : A High Performance, Scalable and Cost Driven XPath EngineRaghavan, Venkatesh 05 May 2004 (has links)
Many applications are migrating or beginning to make use native XML data. We anticipate that queries will emerge that emphasize the structural semantics of XML query languages like XPath and XQuery. This brings a need for an efficient query engine and database management system tailored for XML data similar to traditional relational engines. While mapping large XML documents into relational database systems while possible, poses difficulty in mapping XML queries to the less powerful relational query language SQL and creates a data model mismatch between relational tables and semi-structured XML data. Hence native solutions to efficiently store and query XML data are being developed recently. However, most of these systems thus far fail to demonstrate scalability with large document sizes, to provide robust support for the XPath query language nor to adequately address costing with respect to query optimization. In this thesis, we propose a novel cost-driven XPath engine to support the scalable evaluation of ad-hoc XPath expressions called VAMANA. VAMANA makes use of an efficient XML repository for storing and indexing large XML documents called the Multi-Axis Storage Structure (MASS) developed at WPI. VAMANA extensively uses indexes for query evaluation by considering index-only plans. To the best of our knowledge, it is the only XML query engine that supports an index plan approach for large XML documents. Our index-oriented query plans allow queries to be evaluated while reading only a fraction of the data, as all tuples for a particular context node are clustered together. The pipelined query framework minimizes the cost of handing intermediate data during query processing. Unlike other native solutions, VAMANA provides support for all 13 XPath axes. Our schema independent cost model provides dynamically calculated statistics that are then used for intelligent cost-based transformations, further improving performance. Our optimization strategy for increasing execution time performance is affirmed through our experimental studies on XMark benchmark data. VAMANA query execution is significantly faster than leading available XML query engines.
|
63 |
Efficient XML Stream Processing with Automata and Query AlgebraJian, Jinhuj 27 August 2003 (has links)
"XML Stream Processing is an emerging technology designed to support declarative queries over continuous streams of data. The interest in this novel technology is growing due to the increasing number of real world applications such as monitoring systems for stock, email, and sensor data that need to analyze incoming data streams. There are however several open challenges. One, we must develop efficient techniques for pattern matching over the nested tag structure of XML as data streams in token by token. Two, we must develop techniques for query optimization to cope with complex user queries while given only incomplete knowledge of source data. When considering these challenges separately, then automata models have been shown by several recent works to be suited to tackle the first problem, while algebraic query models have been regarded as appropriate foundations to tackle the second problem. The question however remains how best to put these two models together to have an overall effective system. This thesis aims to exactly fill this gap. We propose a unified query framework to augment automata-style processing with algebra-based query optimization capabilities. We use the automata model to handle the token-oriented streaming XML data and use the algebraic model to support set-oriented optimization techniques. The framework has been designed in two layers such that the logical layer provides a uniform abstraction across the two models and any optimization techniques can be applied in either model uniformly using query rewritings. The physical layer, on the other hand, allows us to refine the implementation details after the logical layer optimization. We have successfully applied this framework in the Raindrop stream processing system. We have identified several trade-offs regarding which query functionality should be realized in which specific query model. We have developed novel optimization techniques to exploit these trade-offs. For example, a query rewrite rule can flexibly push down a pattern matching into the automata model when the optimizer decides that it is more efficient to do so. To deal with incomplete knowledge of source data, we have also developed novel techniques to monitor data statistics, based on which we can apply optimization techniques to choose the optimal query plan at runtime. Our experimental study confirms that considerable performance gains are being achieved when these optimization techniques are applied in our system."
|
64 |
Self Maintenance of Materialized XQuery Views via Query Containment and Re-WritingNilekar, Shirish K. 24 April 2006 (has links)
In recent years XML, the eXtensible Markup Language has become the de-facto standard for publishing and exchanging information on the web and in enterprise data integration systems. Materialized views are often used in information integration systems to present a unified schema for efficient querying of distributed and possibly heterogenous data sources. On similar lines, ACE-XQ, an XQuery based semantic caching system shows the significant performance gains achieved by caching query results (as materialized views) and using these materialized views along with query containment techniques for answering future queries over distributed XML data sources. To keep data in these materialized views of ACE-XQ up-to-date, the view must be maintained i.e. whenever the base data changes, the corresponding cached data in the materialized view must also be updated. This thesis builds on the query containment ideas of ACE-XQ and proposes an efficient approach for self-maintenance of materialized views. Our experimental results illustrate the significant performance improvement achieved by this strategy over view re-computation for a variety of situations.
|
65 |
Multiple Continuous Query Processing with Relative Window Predicates "Juggler"Silva, Asima 27 May 2004 (has links)
"Efficient querying over streaming data is a critical technology which requires the ability to handle numerous and possibly similar queries in real time dynamic environments such as the stock market and medical devices. Existing DBMS technology is not well suited for this domain since it was developed for static historical data. Queries over streams often contain relative window predicates such as in the query: ``Heart rate decreased to fifty-two beats per second within four seconds after the patient's temperature started rising." Relative window predicates are a specific type of join between streams that is based on the tuple's timestamp. In our operator, called Juggler, predicates are classified into three types: attribute, join, and window. Attribute predicates are stream values compared to a constant. Join predicates are stream values compared to another stream's values. Window predicates are join predicates where the streams' timestamp values are compared. Juggler's composite operator incorporates the processing of similar though not identical, query functionalities as one complex computation process. This execution strategy handles multi-way joins for multiple selection and join predicates. It adaptively orders the execution of predicates by their selectivity to efficiently process multiple continuous queries based on stream characteristics. In Juggler, all similar predicates are grouped into lists. These indices are represented by a collection of bits. Every tuple contains the bit structure representation of the predicate lists which encodes tuple predicate evaluation history. Every query also contains a similar bit structure to encode the predicate's relationship to the registered queries. The tuple's and query's bit structures are compared to assess if the tuple has satisfied a query. Juggler is designed and implemented in Java. Experiments were conducted to verify correctness and to assess the performance of Juggler's three features. Its adaptivity of reordering the evaluation of predicate types performed as well as the most selective predicate ordering. Its ability to exploit similar predicates in multiple queries showed reduction in number of comparisons. Its effectiveness when multiple queries are combined in a single Juggler operator indicated potential performance improvements after optimization of Juggler's data structures."
|
66 |
State Spill Policies for State Intensive Continuous Query Plan EvaluationJbantova, Mariana G 02 May 2007 (has links)
The needs of new modern day applications such as network monitoring systems, telecommunications data management, web applications, remote medical monitoring applications and others for near real time results over continuous data streams have spurred the development of new data management systems called Data Stream Management Systems (DSMS). Unlike traditional database systems which answer one-time user queries only after the finite data has been captured on disk, DSMSs provide on-the-fly answers to user queries as data is arriving at various rates in the form of continuous, potentially infinite streams of tuples. To meet the timeliness requirements of applications, DSMSs aim to keep all data in main memory. Thus queries with multiple stateful operators pose a major strain on memory. Existing adaptation techniques designed to address this issue are ineffective when faced with continuous bursts of high data rates. When system load exceeds system capacity, a DSMS has three options: 1) discard some new data; 2) crash; or 3) spill data to disk. Only option three allows it to produce delayed, yet accurate and complete query results. However, this option involves disk access overhead and change in the natural order of tuples flowing through the query plan tree. As not all stream operators can process correctly out of order tuples, data spilling may have a negative impact on the quality of the final results. Moreover, since operators in a query plan are interconnected, changes in the order of tuple flows inevitably impact the stages of execution of affected downstream operators such as for example data purging . Data purging is necessary for processing continuous queries composed of stateful operators. The state of such operators is divided into finite non-overlapping sets of tuples called windows. Thus, after all the tuples for a window have been processed and all results output, these tuples can be discarded to free memory for new data. To address these issues, we have redesigned the state structure of continuous operators into smaller, finite, non-overlapping sets of tuples such as partitioned window groups, which incur less disk-access overhead. Second, we provide for the capability of continuous operators to correctly process out of order tuples using punctuation pointers. Third, we design methods for downstream operators to synchronize their processing stages with those of upstream operators to achieve optimized query plan throughput. Putting these techniques together, we have designed a consolidated spilling adaptation strategy which considers all aspects of operators' inter-connections in a query plan for making optimal adaptation decisions. The effectiveness of our integrated approach was empirically tested in a comparative evaluation study against several alternate spilling adaptation strategies. We conducted our experiments on CAPE, a DSMS developed at WPI, using different types of query plans composed of multiple partitioned window join operators. Our experiments prove that despite the higher overhead of a more synchronized adaptation approach, our consolidated strategy provides better query plan performance and higher plan throughput during periods of continuous bursts of high data rates.
|
67 |
D-CAPE: A Self-Tuning Continuous Query Plan Distribution ArchitectureSutherland, Timothy Michael 05 May 2004 (has links)
The study of systems for querying data streams, coined Data Stream Management Systems (DSMS), has gained in popularity over the last several years. This new area of research for the database community includes studies in areas such as Sensor Networks, Network Intrusion, and monitoring data such as Medicine, Stock, or Weather feeds. With this new popularity comes increased performance expectations, with increased data sizes and speed and larger more complex query plans as well as high volumes of possibly small queries. Due to the finite resources on a single query processor, future Data Stream Management Systems must distribute their workload to multiple query processors, working together in a synchronized manner. This thesis discusses a new Distributed Continuous Query System (D-CAPE) developed here at WPI that has the ability to distribute query plans over a large cluster of machines. We describe the architecture of the new system, policies for query plan distribution to improve overall performance, as well as techniques for self-tuning query plan re-distribution. D-CAPE is designed to be as flexible as possible for future research. We include a multi-tiered architecture that scales to a large number of query processors. D-CAPE has also been designed to minimize the cost of the communications network by bundling synchronization messages, thus minimizing packets sent between query processors. These messages are also incremental at run-time to aid in minimizing the communication cost of D-CAPE. The architecture allows for the flexible incorporation of different distribution algorithms and operator reallocation policies.. D-CAPE provides an operator reallocation algorithm that is able to seamlessly move an operator(s) across any query processors in our computing cluster. We do so by creating ``pipes" between query processors to allow the data streams to flow, and then filling these pipes with data streams once execution begins. Operator redistribution is accomplished by systematically reconnecting these pipes as to not interrupt the data flow. Experimental evaluation using our real prototype system (not just simulation) shows that executing a query plan distributed over multiple machines causes no more overhead than processing it on a single centralized query processor; even for rather lightly loaded machines. Further, we find that distributing a query plan among a cluster of query processors can boost performance up to twice that of a centralized DSMS. We conclude that the limitation of each query processor within the distributed network of cooperating processors is not primarily in the volume of the data nor the number of query operators, but rather the number of data connections per processor and the allocation of the stateful and thus most costly operators. We also find that the overhead of distributing query operators is very low, allowing for a potentially frequent dynamic redistribution of query plans during execution.
|
68 |
Continuous Query Processing on Spatio-Temporal Data StreamsNehme, Rimma V 23 August 2005 (has links)
"This thesis addresses important challenges in the areas of streaming and spatio-temporal databases. It focuses on continuous querying of spatio-temporal environments characterized by (1) a large number of moving and stationary objects and queries; (2) need for near real-time results; (3) limited memory and cpu resources; and (4) different accuracy requirements. The first part of the thesis studies the problem of performance vs. accuracy tradeoff using different location modelling techniques when processing continuous spatio-temporal range queries on moving objects. Two models for modeling the movement, namely: continuous and discrete models are described. This thesis introduces an accuracy comparison model to estimate the quality of the answers returned by each of the models. Experimental evaluations show the effectiveness of each model given certain characteristics of spatio-temporal environment (e.g., varying speed, location update frequency). The second part of the thesis introduces SCUBA, a Scalable Cluster Based Algorithm for evaluating a large set of continuous queries over spatio-temporal data streams. Unlike the commonly used static grid indices, the key idea of SCUBA is to group moving objects and queries based on common dynamic properties (e.g., speed, destination, and road network location) at run-time into moving clusters. This results in improvement in performance which facilitate scalability. SCUBA exploits shared cluster-based execution consisting of two phases. In phase I, the evaluation of a set of spatio-temporal queries is abstracted as a spatial join between moving clusters for cluster-based filtering of true negatives. There after, in phase II, a fine-grained join process is executed for all pairs identified as potentially joinable by a positive cluster-join match in phase I. If the clusters don’t satisfy the join predicate, the objects and queries that belong to those clusters can be savely discarded as being guaranteed to not join individually either. This provides processing cost savings. Another advantage of SCUBA is that moving cluster-driven load shedding is facilitated. A moving cluster (or its subset, called nucleus)approximates the locations of its members. As a consequence relatively accurate answers can be produced using solely the abstracted cluster location information in place of precise object-by-object matches, resulting in savings in memory and improvement in processing time. A theoretical analysis of SCUBA is presented with respect to the memory requirements, number of join comparisons and I/O costs. Experimental evaluations on real datasets demonstrate that SCUBA achieves a substantial improvement when executing continuous queries on highly dense moving objects. The experiments are conducted in a real data streaming system (CAPE) developed at WPI on real datasets generated by the Network-Based Moving Objects Generator."
|
69 |
Querying Geographically Dispersed, Heterogeneous Data Stores: The PPerfXchange ApproachColgrove, Matthew Edward 01 January 2002 (has links)
This thesis details PPerfXchange’s approach for querying geographically dispersed heterogeneous data stores. While elements of PPerfXchange’s method have been implemented for other application areas, PPerfXchange shows how these elements can be applied to parallel performance analysis. The accomplishments of this thesis are: The design of an architecture for PPerfXchange, giving a uniform method to query heterogeneous data stores; A proof of concept prototype implementation of PPerfXchange including a partial implementation of an XQuery processor and a relational database virtual XML document; and Evaluation of PPerfXchange using example parallel performance analysis data.
|
70 |
Concept-Oriented Model and Nested Partially Ordered SetsSavinov, Alexandr 24 April 2014 (has links) (PDF)
Concept-oriented model of data (COM) has been recently defined syntactically by means of the concept-oriented query language (COQL). In this paper we propose a formal embodiment of this model, called nested partially ordered sets (nested posets), and demonstrate how it is connected with its syntactic counterpart. Nested poset is a novel formal construct that can be viewed either as a nested set with partial order relation established on its elements or as a conventional poset where elements can themselves be posets. An element of a nested poset is defined as a couple consisting of one identity tuple and one entity tuple. We formally define main operations on nested posets and demonstrate their usefulness in solving typical data management and analysis tasks such as logic navigation, constraint propagation, inference and multidimensional analysis.
|
Page generated in 0.0652 seconds