Global ETD Search

1	Incremental Maintenance Of Materialized XQuery Views El-Sayed, Maged F 23 August 2005 (has links) "Keeping views fresh by maintaining the consistency between materialized views and their base data in the presence of base updates is a critical problem for many applications, including data warehousing and data integration. While heavily studied for traditional databases, the maintenance of XML views remains largely unexplored. Maintaining XML views is complex due to the richness of the XML data model and the powerful capabilities of XML query languages, such as XQuery. This dissertation proposes a comprehensive solution for the general problem of maintaining materialized XQuery views. Our solution is the first to enable the maintenance of a large class of XQuery views including XPath expressions, FLWOR expressions, and Element Constructors. These views may contain arbitrary result construction and arbitrary grouping and join operations. Our solution also supports the unique order requirements of XQuery including source document order and query order. The contributions of this dissertation include: (i) an efficient solution for supporting order in XML query processing and view maintenance, (ii) an identifier-based technique for enabling incremental construction of XML views, (iii) a mechanism for modeling and validating source XML updates, (iv) a counting algorithm for supporting view maintenance on delete and modify updates, (v) an algebraic solution for propagating bulk XML updates, and (vi) an efficient mechanism for refreshing materialized XML views on propagated updates. We provide proofs of correctness of our proposed techniques for materialized XQuery maintenance. We have implemented a prototype of our view maintenance solution on top of the Rainbow XML query engine, developed at WPI. Our experiments confirm that our solution provides a practical and efficient solution for maintaining materialized XQuery views even when handling heterogeneous batches of possibly large source updates. Our solution follows the widely adopted propagate-apply framework for view maintenance common to all mainstream query engines. That is, our solution produces incremental maintenance plans in the same algebraic language used to define the views. These plans can thus be optimized and executed by standard query processing techniques. Being compatible with standard frameworks paves the way for our XML view maintenance solution to be easily adopted by existing database engines." XML XQuery Incremental View Maintenance XML (Document markup language) Querying (Computer science)
2	Efficient Incremental View Maintenance for Data Warehousing Chen, Songting 20 December 2005 (has links) "Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality." View Matching View Maintenance Materialized View Data Warehouse Information Integration Data warehousing Database searching
3	Order-sensitive View Maintenance of Materialized XQuery Views Dimitrova, Katica 05 May 2003 (has links) Materialized XML views are a popular technique for integrating data from possibly distributed and heterogeneous data sources. However, the problem of the incremental maintenance of such XML views poses new challenges which to date remain unaddressed. One, XML views not only filter the data, but may radically restructure it to construct new XML nested document structures. Moreover, order is inherent in the XML model, and XML views reflect both the implicit document order of the underlying sources and the order explicitly imposed in the view definition. Therefore, order also has to be preserved at view maintenance time. In this thesis we present an algebraic approach for the incremental maintenance of XQuery views, called VOX (View maintenance for Ordered XML). To the best of our knowledge, this is the first solution to order-preserving XML view maintenance. Our strategy correctly transforms an update to source XML data into sequences of updates that refresh the view. Our technique is based on an algebraic representation of the XQuery view expression using an XML algebra. The XML algebra has ordered bag semantics; hence most of the operators logically are order preserving. We propose an order-encoding mechanism that migrates the XML algebra to (non-ordered) bag semantics, no longer requiring most of the operators to be order-aware. Furthermore, this now allows most of the algebra operators to become distributive over update operations. This transformation brings the problem of maintaining XML views one step closer to the problem of maintaining views in other (unordered) data models. We are thus now able to adopt some of the existing (relational) maintenance techniques towards our goal of efficient order-sensitive XQuery view maintenance. In addition we develop a full set of rules for propagating updates through XML specific operations. We have proven the correctness of the VOX view maintenance approach. A full implementation of VOX on top of RAINBOW, the XML data management system developed at WPI, has been completed. Our experimental results performed using the data and queries provided by the XMark benchmark, confirm that incremental XML view maintenance indeed is significantly faster than complete recomputation in most cases. Incremental maintenance is shown to outperform recomputation even for large updates. XML algebra order view maintenance propagation rules XML XML (Document markup language)
4	The Impact of Storage Strategies on Maintenance of XML Views Åhgren, Mikael January 2001 (has links) <p>Information in a data warehouse is stored in materialized views, which must be kept consistent with respect to changes made in the sources. This problem has been extensively studied in the relational model. The process is referred to as view maintenance.</p><p>XML is emerging as the de facto standard for data representation and data exchange of semistructured data. Most discussions involving XML assume the XML data is stored in plain text files. However, there are a number of different approaches for storing XML data, which can be categorized according to the underlying system used.</p><p>Views and materialized views can also be specified in XML. This dissertation investigates how view maintenance in an XML context is influenced by the utilized approach for storage. We survey existing storage strategies using a relational database as the underlying system for storage, and storage strategies using plain text files. Further, we survey approaches for maintenance in the context of XML. We investigate three selected storage strategies in detail. We conclude with some insights gained during the investigation.</p> XML Storage Strategies Materialized Views View Maintenance Computer and systems science Data- och systemvetenskap
5	Evaluation of view maintenance with complex joins in a data warehouse environment Asthorsson, Kjartan January 2002 (has links) <p>Data warehouse maintenance and maintenance cost has been well studied in the literature. Integrating data sources, in a data warehouse environment, may often need data cleaning, transformation, or any other function applied to the data in order to integrate it. The impact on view maintenance, when data is integrated with other comparison operators than defined in theta join, has, however, not been closely looked at in previous studies.</p><p>In this study the impact of using a complex join in data warehouse environment is analyzed to measure how different maintenance strategies are affected when data needs to be integrated using other comparison operators than defined in a theta join. The analysis shows that maintenance cost is greatly increased when using complex joins since such joins often lack optimization techniques which are available when using a theta join. The study shows, among other things, that the join aware capability of sources is not of importance when performing complex joins, and incremental view maintenance is better approach than using recomputed view maintenance, when using complex joins. Strategies for maintaining data warehouses when data is integrated using a complex join are therefore different than when a theta join is used, and different maintenance strategies need to be applied.</p> Data warehouses view maintenance join algorithms Computer and systems science Data- och systemvetenskap
6	Materialized Views over Heterogeneous Structured Data Sources in a Distributed Event Stream Processing Environment January 2011 (has links) abstract: Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment. / Dissertation/Thesis / Ph.D. Computer Science 2011 Computer Science Common Subexpressions Incremental View Maintenance LINQ Magic Sets Materialized Views Metadata Repository
7	Evaluation of view maintenance with complex joins in a data warehouse environment Asthorsson, Kjartan January 2002 (has links) Data warehouse maintenance and maintenance cost has been well studied in the literature. Integrating data sources, in a data warehouse environment, may often need data cleaning, transformation, or any other function applied to the data in order to integrate it. The impact on view maintenance, when data is integrated with other comparison operators than defined in theta join, has, however, not been closely looked at in previous studies. In this study the impact of using a complex join in data warehouse environment is analyzed to measure how different maintenance strategies are affected when data needs to be integrated using other comparison operators than defined in a theta join. The analysis shows that maintenance cost is greatly increased when using complex joins since such joins often lack optimization techniques which are available when using a theta join. The study shows, among other things, that the join aware capability of sources is not of importance when performing complex joins, and incremental view maintenance is better approach than using recomputed view maintenance, when using complex joins. Strategies for maintaining data warehouses when data is integrated using a complex join are therefore different than when a theta join is used, and different maintenance strategies need to be applied. Data warehouses view maintenance join algorithms Information Systems
8	The Impact of Storage Strategies on Maintenance of XML Views Åhgren, Mikael January 2001 (has links) Information in a data warehouse is stored in materialized views, which must be kept consistent with respect to changes made in the sources. This problem has been extensively studied in the relational model. The process is referred to as view maintenance. XML is emerging as the de facto standard for data representation and data exchange of semistructured data. Most discussions involving XML assume the XML data is stored in plain text files. However, there are a number of different approaches for storing XML data, which can be categorized according to the underlying system used. Views and materialized views can also be specified in XML. This dissertation investigates how view maintenance in an XML context is influenced by the utilized approach for storage. We survey existing storage strategies using a relational database as the underlying system for storage, and storage strategies using plain text files. Further, we survey approaches for maintenance in the context of XML. We investigate three selected storage strategies in detail. We conclude with some insights gained during the investigation. XML Storage Strategies Materialized Views View Maintenance Information Systems
9	Self Maintenance of Materialized XQuery Views via Query Containment and Re-Writing Nilekar, Shirish K. 24 April 2006 (has links) In recent years XML, the eXtensible Markup Language has become the de-facto standard for publishing and exchanging information on the web and in enterprise data integration systems. Materialized views are often used in information integration systems to present a unified schema for efficient querying of distributed and possibly heterogenous data sources. On similar lines, ACE-XQ, an XQuery based semantic caching system shows the significant performance gains achieved by caching query results (as materialized views) and using these materialized views along with query containment techniques for answering future queries over distributed XML data sources. To keep data in these materialized views of ACE-XQ up-to-date, the view must be maintained i.e. whenever the base data changes, the corresponding cached data in the materialized view must also be updated. This thesis builds on the query containment ideas of ACE-XQ and proposes an efficient approach for self-maintenance of materialized views. Our experimental results illustrate the significant performance improvement achieved by this strategy over view re-computation for a variety of situations. XML Query Re-Writing View Maintenance Query Containment XML (Document markup language) Cache memory Database searching Query languages (Computer science)
10	A Declarative Approach to Modeling and Solving the View Selection Problem / Une approche déclarative pour la modélisation et la résolution du problème de la sélection de vues à matérialiser Mami, Imene 15 November 2012 (has links) La matérialisation de vues est une technique très utilisée dans les systèmes de gestion bases de données ainsi que dans les entrepôts de données pour améliorer les performances des requêtes. Elle permet de réduire de manière considérable le temps de réponse des requêtes en pré-calculant des requêtes coûteuses et en stockant leurs résultats. De ce fait, l'exécution de certaines requêtes nécessite seulement un accès aux vues matérialisées au lieu des données sources. En contrepartie, la matérialisation entraîne un surcoût de maintenance des vues. En effet, les vues matérialisées doivent être mises à jour lorsque les données sources changent afin de conserver la cohérence et l'intégrité des données. De plus, chaque vue matérialisée nécessite également un espace de stockage supplémentaire qui doit être pris en compte au moment de la sélection. Le problème de choisir quelles sont les vues à matérialiser de manière à réduire les coûts de traitement des requêtes étant donné certaines contraintes tel que l'espace de stockage et le coût de maintenance, est connu dans la littérature sous le nom du problème de la sélection de vues. Trouver la solution optimale satisfaisant toutes les contraintes est un problème NP-complet. Dans un contexte distribué constitué d'un ensemble de noeuds ayant des contraintes de ressources différentes (CPU, IO, capacité de l'espace de stockage, bande passante réseau, etc.), le problème de la sélection des vues est celui de choisir un ensemble de vues à matérialiser ainsi que les noeuds du réseau sur lesquels celles-ci doivent être matérialisées de manière à optimiser les coût de maintenance et de traitement des requêtes.Notre étude traite le problème de la sélection de vues dans un environnement centralisé ainsi que dans un contexte distribué. Notre objectif est de fournir une approche efficace dans ces contextes. Ainsi, nous proposons une solution basée sur la programmation par contraintes, connue pour être efficace dans la résolution des problèmes NP-complets et une méthode puissante pour la modélisation et la résolution des problèmes d'optimisation combinatoire. L'originalité de notre approche est qu'elle permet une séparation claire entre la formulation et la résolution du problème. A cet effet, le problème de la sélection de vues est modélisé comme un problème de satisfaction de contraintes de manière simple et déclarative. Puis, sa résolution est effectuée automatiquement par le solveur de contraintes. De plus, notre approche est flexible et extensible, en ce sens que nous pouvons facilement modéliser et gérer de nouvelles contraintes et mettre au point des heuristiques pour un objectif d'optimisation.Les principales contributions de cette thèse sont les suivantes. Tout d'abord, nous définissons un cadre qui permet d'avoir une meilleure compréhension des problèmes que nous abordons dans cette thèse. Nous analysons également l'état de l'art des méthodes de sélection des vues à matérialiser en en identifiant leurs points forts ainsi que leurs limites. Ensuite, nous proposons une solution utilisant la programmation par contraintes pour résoudre le problème de la sélection de vues dans un contexte centralisé. Nos résultats expérimentaux montrent notre approche fournit de bonnes performances. Elle permet en effet d'avoir le meilleur compromis entre le temps de calcul nécessaire pour la sélection des vues à matérialiser et le gain de temps de traitement des requêtes à réaliser en matérialisant ces vues. Enfin, nous étendons notre approche pour résoudre le problème de la sélection de vues à matérialiser lorsque celui-ci est étudié sous contraintes de ressources multiples dans un contexte distribué. A l'aide d'une évaluation de performances extensive, nous montrons que notre approche fournit des résultats de qualité et fiable. / View selection is important in many data-intensive systems e.g., commercial database and data warehousing systems to improve query performance. View selection can be defined as the process of selecting a set of views to be materialized in order to optimize query evaluation. To support this process, different related issues have to be considered. Whenever a data source is changed, the materialized views built on it have to be maintained in order to compute up-to-date query results. Besides the view maintenance issue, each materialized view also requires additional storage space which must be taken into account when deciding which and how many views to materialize.The problem of choosing which views to materialize that speed up incoming queries constrained by an additional storage overhead and/or maintenance costs, is known as the view selection problem. This is one of the most challenging problems in data warehousing and it is known to be a NP-complete problem. In a distributed environment, the view selection problem becomes more challenging. Indeed, it includes another issue which is to decide on which computer nodes the selected views should be materialized. The view selection problem in a distributed context is now additionally constrained by storage space capacities per computer node, maximum global maintenance costs and the communications cost between the computer nodes of the network.In this work, we deal with the view selection problem in a centralized context as well as in a distributed setting. Our goal is to provide a novel and efficient approach in these contexts. For this purpose, we designed a solution using constraint programming which is known to be efficient for the resolution of NP-complete problems and a powerful method for modeling and solving combinatorial optimization problems. The originality of our approach is that it provides a clear separation between formulation and resolution of the problem. Indeed, the view selection problem is modeled as a constraint satisfaction problem in an easy and declarative way. Then, its resolution is performed automatically by the constraint solver. Furthermore, our approach is flexible and extensible, in that it can easily model and handle new constraints and new heuristic search strategies for optimization purpose. The main contributions of this thesis are as follows. First, we define a framework that enables to have a better understanding of the problems we address in this thesis. We also analyze the state of the art in materialized view selection to review the existing methods by identifying respective potentials and limits. We then design a solution using constraint programming to address the view selection problem in a centralized context. Our performance experimentation results show that our approach has the ability to provide the best balance between the computing time to be required for finding the materialized views and the gain to be realized in query processing by materializing these views. Our approach will also guarantee to pick the optimal set of materialized views where no time limit is imposed. Finally, we extend our approach to provide a solution to the view selection problem when the latter is studied under multiple resource constraints in a distributed context. Based on our extensive performance evaluation, we show that our approach outperforms the genetic algorithm that has been designed for a distributed setting. Vue matérialisées Optimisation de requêtes Sélection de vues Maintenance de vues Programmation par contraintes Materialized views Query optimization View selection View maintenance Constraint programming

Search results