Global ETD Search

1	ViewDF: a Flexible Framework for Incremental View Maintenance in Stream Data Warehouses Yang, Yuke January 2013 (has links) Because of the increasing data sizes and demands for low latency in modern data analysis, the traditional data warehousing technologies are greatly pushed beyond their limits. Several stream data warehouse (SDW) systems, which are warehouses that ingest append-only data feeds and support frequent refresh cycles, have been proposed including different methods to improve the responsiveness of the systems. Materialized views are critical in large-scale data warehouses due to their ability to speed up queries. Thus an SDW maintains layers of materialized views. Materialized view maintenance in SDW systems introduces new challenges. However, some of the existing SDW systems do not address the maintenance of views while others employ view maintenance techniques that are not efficient. This thesis presents ViewDF, a flexible framework for incremental maintenance of materialized views in SDW systems that generalizes existing techniques and enables new optimizations for views defined with operators that are common in stream analytics. We give a special view definition (ViewDF) to enhance the traditional way of creating views in SQL by being able to reference any partition of any table. We describe a prototype system based on this idea, which allows users to write ViewDFs directly and can automatically translate a broad class of queries into ViewDFs. Several optimizations are proposed and experiments show that our proposed system can improve view maintenance time by a factor of two or more in practical settings. materialized view stream data warehouse
2	SPACE ALLOCATION FOR MATERIALIZED VIEWS AND INDEXES USING GENETIC ALGORITHMS MACHIRAJU, SIRISHA 16 September 2002 (has links) No description available. data warehousing materialized views genetic algorithms index selection space allocation
3	Efficient Incremental View Maintenance for Data Warehousing Chen, Songting 20 December 2005 (has links) "Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality." View Matching View Maintenance Materialized View Data Warehouse Information Integration Data warehousing Database searching
4	The Impact of Storage Strategies on Maintenance of XML Views Åhgren, Mikael January 2001 (has links) <p>Information in a data warehouse is stored in materialized views, which must be kept consistent with respect to changes made in the sources. This problem has been extensively studied in the relational model. The process is referred to as view maintenance.</p><p>XML is emerging as the de facto standard for data representation and data exchange of semistructured data. Most discussions involving XML assume the XML data is stored in plain text files. However, there are a number of different approaches for storing XML data, which can be categorized according to the underlying system used.</p><p>Views and materialized views can also be specified in XML. This dissertation investigates how view maintenance in an XML context is influenced by the utilized approach for storage. We survey existing storage strategies using a relational database as the underlying system for storage, and storage strategies using plain text files. Further, we survey approaches for maintenance in the context of XML. We investigate three selected storage strategies in detail. We conclude with some insights gained during the investigation.</p> XML Storage Strategies Materialized Views View Maintenance Computer and systems science Data- och systemvetenskap
5	Materialized Views over Heterogeneous Structured Data Sources in a Distributed Event Stream Processing Environment January 2011 (has links) abstract: Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment. / Dissertation/Thesis / Ph.D. Computer Science 2011 Computer Science Common Subexpressions Incremental View Maintenance LINQ Magic Sets Materialized Views Metadata Repository
6	The Impact of Storage Strategies on Maintenance of XML Views Åhgren, Mikael January 2001 (has links) Information in a data warehouse is stored in materialized views, which must be kept consistent with respect to changes made in the sources. This problem has been extensively studied in the relational model. The process is referred to as view maintenance. XML is emerging as the de facto standard for data representation and data exchange of semistructured data. Most discussions involving XML assume the XML data is stored in plain text files. However, there are a number of different approaches for storing XML data, which can be categorized according to the underlying system used. Views and materialized views can also be specified in XML. This dissertation investigates how view maintenance in an XML context is influenced by the utilized approach for storage. We survey existing storage strategies using a relational database as the underlying system for storage, and storage strategies using plain text files. Further, we survey approaches for maintenance in the context of XML. We investigate three selected storage strategies in detail. We conclude with some insights gained during the investigation. XML Storage Strategies Materialized Views View Maintenance Information Systems
7	Hypergraphs in the Service of Very Large Scale Query Optimization. Application : Data Warehousing / Les hypergraphes au service de l'optimisation de requêtes à très large échelle. Application : Entrepôt de données Boukorca, Ahcène 12 December 2016 (has links) L'apparition du phénomène Big-Data, a conduit à l'arrivée de nouvelles besoins croissants et urgents de partage de données qui a engendré un grand nombre de requêtes que les SGBD doivent gérer. Ce problème a été aggravé par d 'autres besoins de recommandation et d 'exploration des requêtes. Vu que le traitement de données est toujours possible grâce aux solutions liées à l'optimisation de requêtes, la conception physique et l'architecture de déploiement, où ces solutions sont des résultats de problèmes combinatoires basés sur les requêtes, il est indispensable de revoir les méthodes traditionnelles pour répondre aux nouvelles besoins de passage à l'échelle. Cette thèse s'intéresse à ce problème de nombreuses requêtes et propose une approche, implémentée par un Framework appelé Big-Quereis, qui passe à l'échelle et basée sur le hypergraph, une structure de données flexible qui a une grande puissance de modélisation et permet des formulations précises de nombreux problèmes d•combinatoire informatique. Cette approche est. le fruit. de collaboration avec l'entreprise Mentor Graphies. Elle vise à capturer l'interaction de requêtes dans un plan unifié de requêtes et utiliser des algorithmes de partitionnement pour assurer le passage à l'échelle et avoir des structures d'optimisation optimales (vues matérialisées et partitionnement de données). Ce plan unifié est. utilisé dans la phase de déploiement des entrepôts de données parallèles, par le partitionnement de données en fragments et l'allocation de ces fragments dans les noeuds de calcule correspondants. Une étude expérimentale intensive a montré l'intérêt de notre approche en termes de passage à l'échelle des algorithmes et de réduction de temps de réponse de requêtes. / The emergence of the phenomenon Big-Data conducts to the introduction of new increased and urgent needs to share data between users and communities, which has engender a large number of queries that DBMS must handle. This problem has been compounded by other needs of recommendation and exploration of queries. Since data processing is still possible through solutions of query optimization, physical design and deployment architectures, in which these solutions are the results of combinatorial problems based on queries, it is essential to review traditional methods to respond to new needs of scalability. This thesis focuses on the problem of numerous queries and proposes a scalable approach implemented on framework called Big-queries and based on the hypergraph, a flexible data structure, which bas a larger modeling power and may allow accurate formulation of many problems of combinatorial scientific computing. This approach is the result of collaboration with the company Mentor Graphies. It aims to capture the queries interaction in an unified query plan and to use partitioning algorithms to ensure scalability and to optimal optimization structures (materialized views and data partitioning). Also, the unified plan is used in the deploymemt phase of parallel data warehouses, by allowing data partitioning in fragments and allocating these fragments in the correspond processing nodes. Intensive experimental study sbowed the interest of our approach in terms of scaling algorithms and minimization of query response time. Conception physique Fragmentation de données Vues matérialisées Physical design Data partitioning Materialized views
8	Vers une conception logique et physique des bases de données avancées dirigée par la variabilité / Towards a Variability-Aware Logical and Physical Database Design Bouarar, Selma 13 December 2016 (has links) Le processus de conception des BD ne cesse d'augmenter en complexité et d'exiger plus de temps et de ressources afin de contenir la diversité des applications BD. Rappelons qu’il se base essentiellement sur le talent et les connaissances des concepteurs. Ces bases s'avèrent de plus en plus insuffisantes face à la croissante diversité de choix de conception, en soulevant le problème de la fiabilité et de l'exhaustivité de cette connaissance. Ce problème est bien connu sous le nom de la gestion de la variabilité en génie logiciel. S’il existe quelques travaux de gestion de variabilité portant sur les phases physique et conceptuelle, peu se sont intéressés à la phase logique. De plus, ces travaux abordent les phases de conception de manière séparée, ignorant ainsi les différentes interdépendances.Dans cette thèse, nous présentons d'abord la démarche à suivre afin d'adopter la technique des lignes de produits et ce sur l'ensemble du processus de conception afin de (i) considérer les interdépendances entre les phases, (ii) offrir une vision globale au concepteur, et (iii) augmenter l'automatisation. Vu l'étendue de la question, nous procédons par étapes dans la réalisation de cette vision, en consacrant cette thèse à l'étude d'un cas choisi de façon à montrer : (i) l'importance de la variabilité de la conception logique, (ii) comment la gérer en offrant aux concepteurs l'exhaustivité des choix, et la fiabilité de la sélection, (iii) son impact sur la conception physique (gestion multiphase),(iv) l'évaluation de la conception logique, et de l'impact de la variabilité logique sur la conception physique (sélection des vues matérialisées) en termes des besoins non fonctionnel(s) :temps d'exécution, consommation d'énergie voire l'espace de stockage. / The evolution of computer technology has strongly impacted the database design process which is henceforth requiring more time and resources to encompass the diversity of DB applications.Note that designers rely on their talent and knowledge, which have proven insufficient to face the increasing diversity of design choices, raising the problem of the reliability and completeness of this knowledge. This problem is well known as variability management in software engineering. While there exist some works on managing variability of physical and conceptual phases, very few have focused on logical design. Moreover, these works focus on design phases separately, thus ignore the different interdependencies. In this thesis, we first present a methodology to manage the variability of the whole DB design process using the technique of software product lines, so that (i)interdependencies between design phases can be considered, (ii) a holistic vision is provided to the designer and (iii) process automation is increased. Given the scope of the study, we proceed step-bystepin implementing this vision, by studying a case that shows: (i) the importance of logical design variability (iii) its impact on physical design (multi-phase management), (iv) the evaluation of logical design, and the impact of logical variability on the physical design (materialized view selection) in terms of non-functional requirements: execution time, energy consumption and storage space. Gestion de la variabilité Conception physique Vues matérialisées Variability management Physical design Materialized views
9	Optimierung der materialisierten Sichten in einem Datawarehouse auf der Grundlage der aus einem ERP-System übernommenen operativen Daten Achs, Thomas Ludwig 10 1900 (has links) (PDF) Das Planen und Entwickeln eines optimalen Data Warehouse-Systems ist ein Ansinnen vieler Wissenschaftler und Forscher aus unterschiedlichen Bereichen. Zahlreiche Publikationen wurden zu diesem Thema verfasst und in den letzten Jahren veröffentlicht. In dieser Literatur wird versucht eine Heuristik zu entwickeln, welche eine Lösung nahe am Optimum für das Materialisierungsproblem im Data Warehouse liefert. In der Vergangenheit wurden in zahlreichen Publikationen Annahmen, wie unbegrenzte Ressourcen oder rasche Zugriffszeit getroffen, welche in der realen Welt allerdings nicht vorhanden sind. Die Vision, welche hinter dieser Arbeit steckt, ist es, ein Instrument zu entwickeln, welches diese limitierenden Faktoren mitberücksichtigt, bzw. dieses versucht. Dabei hat sich insbesondere die Modellierungsmethode des Aggregation Path Arrays von Prosser und Ossimitz als geeignet erwiesen, in diesem Problembereich einen Lösungsansatz zu finden. Vor allem ist diese Methode durch die einfache graphische Darstellungsfähigkeit besonders für informationstechnische Darstellung geeignet. Dabei ist es auch unerfahrenen Endbenutzer möglich, das Design eines Warehouses zu bewerkstelligen. Aus diesem Grund ist die Methode auch für Schulungs- und Ausbildungszwecke besonders geeignet. Die kostenminimale physische Bereitstellung der wichtigen Informationen für die Entscheidungsträger in Unternehmen stellt das Ziel dieser Arbeit dar. Dabei ist ein Optimierungsproblem zu lösen, welches limitierende Zeit- und Speicherressourcen bei gleichzeitigem Berücksichtigen wichtiger Information beachtet. Leider ist diese Information nicht immer als homogen anzusehen. Es gibt beispielsweise wichtige Information, welche für das Überleben einer Organisation notwendig ist und Information, welche wichtig, aber nicht ständig verfügbar sein muss. Der Versuch einen Lösungsansatz für diese Problematik zu finden, stellt das Herzstück meiner Arbeit dar. (Autorenref.)
10	Boa Views: Enabling Modularization and Sharing of Boa Queries Hung, Che Shian 09 August 2019 (has links) No description available. Computer Science

Search results