Global ETD Search

1	OLAP query optimization and result visualization / Optimisation de requêtes OLAP et visualisation des résultats Simonenko, Ekaterina 16 September 2011 (has links) Nous explorons différents aspects des entrepôts de données et d’OLAP, le point commun de nos recherches étant le modèle fonctionnel pour l'analyse de données. Notre objectif principal est d'utiliser ce modèle dans l'étude de trois aspects différents, mais liés:- l'optimisation de requêtes par réécriture et la gestion du cache,- la visualisation du résultat d'une requête OLAP,- le mapping d'un schéma relationnel en BCNF vers un schéma fonctionnel. L'optimisation de requêtes et la gestion de cache sont des problèmes cruciaux dans l'évaluation de requêtes en général, et les entrepôts de données en particulier; et la réécriture de requêtes est une des techniques de base pour l'optimisation de requêtes. Nous établissons des conditions d'implication de requêtes analytiques, en utilisant le pré-ordre partiel sur l'ensemble de requêtes, et nous définissons un algorithme sain et complet de réécriture ainsi que une stratégie de gestion de cache optimisée, tous les deux basés sur le modèle fonctionnel.Le deuxième aspect important que nous explorons dans cette thèse est celui de la visualisation du résultat. Nous démontrons l'importance pour la visualisation de reproduire des propriétés essentielles de données qui sont les dépendances fonctionnelles. Nous montrons que la connexion, existante entre les données et leur visualisation, est précisément la connexion entre leurs représentations fonctionnelles. Nous dérivons alors un cadre technique, ayant pour objectif d'établir une telle connexion pour un ensemble de données et un ensemble de visualisations. En plus d'analyse du processus de visualisation, nous utilisons le modèle fonctionnel comme un guide pour la visualisation interactive, et définissons ce qu'on appelle la visualisation paramétrique. Le troisième aspect important de notre travail est l'expérimentation des résultats obtenus dans cette thèse. Les résultats de cette thèse peuvent être utilisés afin d’analyser les données contenues dans une table en Boyce-Codd Normal Form (BCNF), étant donné que le schéma de la table peut être transformé aisément en un schéma fonctionnel. Nous présentons une telle transformation (mapping) dans cette thèse. Une fois le schéma relationnel transformé en un schéma fonctionnel, nous pouvons profiter des résultats sur l'optimisation et la visualisation de requêtes. Nous avons utilisé cette transformation dans l’implémentation de deux prototypes dans le cadre de deux projets différents. / In this thesis, we explore different aspects of Data Warehousing and OLAP, the common point of our proposals being the functional model for data analysis. Our main objective is to use that model in studying three different, but related aspects:- query optimization through rewriting and cache management,- query result visualization,- mapping of a relational BCNF schema to a functional schema.Query optimization and cache management is a crucial issue in query processing in general, and in data warehousing in particular; and query rewriting is one of the basic techniques for query optimization. We establish derivability conditions for analytic functional queries, using a partial pre-order over the set of queries. Then we provide a sound and complete rewriting algorithm, as well as an optimized cache management strategy, both based on the underlying functional model.A second important aspect that we explore in the thesis is that of query result visualization. We show the importance for the visualization to reflect such essential features of the dataset as functional dependencies. We show that the connection existing between data and visualization is precisely the connection between their functional representations. We then define a framework, whose objective is to establish such a connection for a given dataset and a set of visualizations. In addition to the analysis of the visualization process, we use the functional data model as a guide for interactive visualization, and define what we call a parametric visualization. A third important aspect of our work is experimentation with the results obtained in the thesis. In order to be able to analyze the data contained in a Boyce-Codd Normal Form (BCNF) table, one can use the results obtained in this thesis, provided that the schema of the table can be mapped to a functional schema. We present such a mapping in this thesis. Once the relational schema has been transformed into a functional schema, we can take advantage of the query optimization and result visualization results presented in the thesis. We have used this transformation in the implementation of two prototypes in the context of two different projects. Requêtes OLAP Réécriture de requêtes OLAP Visualisation de données Interaction visuelle ‘business intelligence’ Entrepôt de données OLAP query OLAP query rewriting Query optimization Data visualization Visual interaction Business intelligence Data warehouse
2	Processamento de consultas analíticas com predicados de similaridade entre imagens em ambientes de data warehousing / Processing of analytical with similarity search predicates over images in data warehousing environments Teixeira, Jefferson William 29 May 2015 (has links) Um ambiente de data warehousing oferece suporte ao processo de tomada de decisão. Ele consolida dados de fontes de informação distribuições, autônomas e heterogêneas em um único componente, o data warehouse, e realiza o processamento eficiente de consultas analíticas, denominadas OLAP (on-line analytical processing). Um data warehouse convencional armazena apenas dados alfanuméricos. Por outro lado, um data warehouse de imagens armazena, além desses dados convencionais, características intrínsecas de imagens, permitindo a realização de consultas analíticas estendidas com predicados de similaridade entre imagens. Esses ambientes demandam, portanto, a criação de estratégias que possibilitem o processamento eficiente dessas consultas complexas e custosas. Apesar de haver na literatura trabalhos voltados a índices bitmap para ambientes de data warehousing e métodos de acesso métricos para melhorar o desempenho de consultas por similaridade entre imagens, no melhor do nosso conhecimento, não há uma técnica que investigue essas duas questões em um mesmo contexto. Esta dissertação visa preencher essa lacuna na literatura por meio das seguintes contribuições: (i) proposta do ImageDWindex, um mecanismo para a otimização de consultas analíticas estendidas com predicados de similaridade entre imagens; e (ii) definição de diferentes estratégias de processamento de consultas sobre data warehouses de imagens usando o ImageDW-index. Para validar as soluções propostas, foram desenvolvidas duas outras contribuições secundárias, que são: (iii) o ImageDW-Gen, um gerador de dados com o objetivo de povoar o data warehouse de imagens; e (iv) a proposta de quatro classes de consulta, as quais enfocam em diferentes custos de processamento dos predicados de similaridade entre imagens. Utilizando o ImageDW-Gen, foram realizados testes de desempenho para investigar as vantagens introduzidas pelas estratégias propostas, de acordo com as classes de consultas definidas. Comparado com o trabalho mais correlato existente na literatura, o uso do ImageDWindex proveu uma melhora no desempenho do processamento de consultas IOLAP que variou em média de 55,57% até 82,16%, considerando uma das estratégias propostas. / A data warehousing environment offers support to the decision-making process. It consolidates data from distributed, autonomous and heterogeneous information sources into one of its main components, the data warehouse. Furthermore, it provides effcient processing of analytical queries (i.e. OLAP queries). A conventional data warehouse stores only alphanumeric data. On the other hand, an image data warehouse stores not only alphanumeric data but also intrinsic features of images, thus allowing data warehousing environments to perform analytical similarity queries over images. This requires the development of strategies to provide efficient processing of these complex and costly queries. Although there are a number of approaches in the literature aimed at the development of bitmap index for data warehouses and metric access methods for the efficient processing of similarity queries over images, to the best of our knowledge, there is not an approach that investigate these two issues in the same setting. In this research, we fill this gap in the literature by introducing the following main contributions: (i) the proposal of the ImageDW-index, an optimization mechanism aimed at the efficient processing of analytical queries extended with similarity predicates over images; and (ii) definition of different processing strategies for image data warehouses using the ImageDW-index. In order to validate these main proposals, we also introduce two secondary contributions, as follows: (iii) the ImageDW-Gen, a data generator to populate image data warehouses; and (iv) the proposal of four query classes, each one enforcing different query processing costs associated to the similarity predicates in image data warehousing environments. Using the ImageDW-Gen, performance tests were carried out in order to investigate the advantages introduced by the proposed strategies, according to the query classes. Compared to the most related work available in the literature, the ImageDW-index provided a performance gain that varied from 55.57% to 82.16%, considering one of the proposed strategies. Bitmap index Consulta OLAP Consulta por similaridade Data warehouse de imagens Image data warehouse Índice bitmap Método de acesso métrico Metric access method OLAP query Similarity query
3	Processamento de consultas analíticas com predicados de similaridade entre imagens em ambientes de data warehousing / Processing of analytical with similarity search predicates over images in data warehousing environments Jefferson William Teixeira 29 May 2015 (has links) Um ambiente de data warehousing oferece suporte ao processo de tomada de decisão. Ele consolida dados de fontes de informação distribuições, autônomas e heterogêneas em um único componente, o data warehouse, e realiza o processamento eficiente de consultas analíticas, denominadas OLAP (on-line analytical processing). Um data warehouse convencional armazena apenas dados alfanuméricos. Por outro lado, um data warehouse de imagens armazena, além desses dados convencionais, características intrínsecas de imagens, permitindo a realização de consultas analíticas estendidas com predicados de similaridade entre imagens. Esses ambientes demandam, portanto, a criação de estratégias que possibilitem o processamento eficiente dessas consultas complexas e custosas. Apesar de haver na literatura trabalhos voltados a índices bitmap para ambientes de data warehousing e métodos de acesso métricos para melhorar o desempenho de consultas por similaridade entre imagens, no melhor do nosso conhecimento, não há uma técnica que investigue essas duas questões em um mesmo contexto. Esta dissertação visa preencher essa lacuna na literatura por meio das seguintes contribuições: (i) proposta do ImageDWindex, um mecanismo para a otimização de consultas analíticas estendidas com predicados de similaridade entre imagens; e (ii) definição de diferentes estratégias de processamento de consultas sobre data warehouses de imagens usando o ImageDW-index. Para validar as soluções propostas, foram desenvolvidas duas outras contribuições secundárias, que são: (iii) o ImageDW-Gen, um gerador de dados com o objetivo de povoar o data warehouse de imagens; e (iv) a proposta de quatro classes de consulta, as quais enfocam em diferentes custos de processamento dos predicados de similaridade entre imagens. Utilizando o ImageDW-Gen, foram realizados testes de desempenho para investigar as vantagens introduzidas pelas estratégias propostas, de acordo com as classes de consultas definidas. Comparado com o trabalho mais correlato existente na literatura, o uso do ImageDWindex proveu uma melhora no desempenho do processamento de consultas IOLAP que variou em média de 55,57% até 82,16%, considerando uma das estratégias propostas. / A data warehousing environment offers support to the decision-making process. It consolidates data from distributed, autonomous and heterogeneous information sources into one of its main components, the data warehouse. Furthermore, it provides effcient processing of analytical queries (i.e. OLAP queries). A conventional data warehouse stores only alphanumeric data. On the other hand, an image data warehouse stores not only alphanumeric data but also intrinsic features of images, thus allowing data warehousing environments to perform analytical similarity queries over images. This requires the development of strategies to provide efficient processing of these complex and costly queries. Although there are a number of approaches in the literature aimed at the development of bitmap index for data warehouses and metric access methods for the efficient processing of similarity queries over images, to the best of our knowledge, there is not an approach that investigate these two issues in the same setting. In this research, we fill this gap in the literature by introducing the following main contributions: (i) the proposal of the ImageDW-index, an optimization mechanism aimed at the efficient processing of analytical queries extended with similarity predicates over images; and (ii) definition of different processing strategies for image data warehouses using the ImageDW-index. In order to validate these main proposals, we also introduce two secondary contributions, as follows: (iii) the ImageDW-Gen, a data generator to populate image data warehouses; and (iv) the proposal of four query classes, each one enforcing different query processing costs associated to the similarity predicates in image data warehousing environments. Using the ImageDW-Gen, performance tests were carried out in order to investigate the advantages introduced by the proposed strategies, according to the query classes. Compared to the most related work available in the literature, the ImageDW-index provided a performance gain that varied from 55.57% to 82.16%, considering one of the proposed strategies. Consulta OLAP Consulta por similaridade Data warehouse de imagens Índice bitmap Método de acesso métrico Bitmap index Image data warehouse Metric access method OLAP query Similarity query
4	Adaptive work placement for query processing on heterogeneous computing resources Karnagel, Thomas, Habich, Dirk, Wolfgang 10 November 2022 (has links) The hardware landscape is currently changing from homogeneous multi-core systems towards heterogeneous systems with many di↵erent computing units, each with their own characteristics. This trend is a great opportunity for database systems to increase the overall performance if the heterogeneous resources can be utilized eciently. To achieve this, the main challenge is to place the right work on the right computing unit. Current approaches tackling this placement for query processing assume that data cardinalities of intermediate results can be correctly estimated. However, this assumption does not hold for complex queries. To overcome this problem, we propose an adaptive placement approach being independent of cardinality estimation of intermediate results. Our approach is incorporated in a novel adaptive placement sequence. Additionally, we implement our approach as an extensible virtualization layer, to demonstrate the broad applicability with multiple database systems. In our evaluation, we clearly show that our approach significantly improves OLAP query processing on heterogeneous hardware, while being adaptive enough to react to changing cardinalities of intermediate query results. info:eu-repo/classification/ddc/004 ddc:004
5	Plan Bouquets : An Exploratory Approach to Robust Query Processing Dutt, Anshuman January 2016 (has links) (PDF) Over the last four decades, relational database systems, with their mathematical basis in first-order logic, have provided a congenial and efficient environment to handle enterprise data during its entire life cycle of generation, storage, maintenance and processing. An organic reason for their pervasive popularity is intrinsic support for declarative user queries, wherein the user only specifies the end objectives, and the system takes on the responsibility of identifying the most efficient means, called “plans”, to achieve these objectives. A crucial input to generating efficient query execution plans are the compile-time estimates of the data volumes that are output by the operators implementing the algebraic predicates present in the query. These volume estimates are typically computed using the “selectivities” of the predicates. Unfortunately, a pervasive problem encountered in practice is that these selectivities often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. While the database research community has spent considerable efforts to address the above challenge, the prior techniques all suffer from a systemic limitation - the inability to provide any guarantees on the execution performance. In this thesis, we materially address this long-standing open problem by developing a radically different query processing strategy that lends itself to attractive guarantees on run-time performance. Specifically, in our approach, the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset called the “plan bouquet”, is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, an exploratory sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction provides viable guarantees on the worst-case performance relative to an oracular system that magically possesses accurate apriori knowledge of all selectivities. Moreover, it ensures repeatable execution strategies across different invocations of a query, an extremely desirable feature in industrial settings. Our second contribution is a suite of techniques that substantively improve on the performance guarantees offered by the basic bouquet algorithm. First, we present an algorithm that skips carefully chosen executions from the basic plan bouquet sequence, leveraging the observation that an expensive execution may provide better coverage as compared to a series of cheaper siblings, thereby reducing the aggregate exploratory overheads. Next, we explore randomized variants with regard to both the sequence of plan executions and the constitution of the plan bouquet, and show that the resulting guarantees are markedly superior, in expectation, to the corresponding worst case values. From a deployment perspective, the above techniques are appealing since they are completely “black-box”, that is, non-invasive with regard to the database engine, implementable using only API features that are commonly available in modern systems. As a proof of concept, the bouquet approach has been fully prototyped in QUEST, a Java-based tool that provides a visual and interactive demonstration of the bouquet identification and execution phases. In similar spirit, we propose an efficient isosurface identification algorithm that avoids exploration of large portions of the error space and drastically reduces the effort involved in bouquet construction. The plan bouquet approach is ideally suited for “canned” query environments, where the computational investment in bouquet identification is amortized over multiple query invocations. The final contribution of this thesis is extending the advantage of compile-time sub-optimality guarantees to ad hoc query environments where the overheads of the off-line bouquet identification may turn out to be impractical. Specifically, we propose a completely revamped bouquet algorithm that constructs the cost-budgeted execution sequence in an “on-the-fly” manner. This is achieved through a “white-box” interaction style with the engine, whereby the plan output cardinalities exposed by the engine are used to compute lower bounds on the error-prone selectivities during plan executions. For this algorithm, the sub-optimality guarantees are in the form of a low order polynomial of the number of error-prone selectivities in the query. The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial engine ComOpt, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers orders of magnitude improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. In absolute terms, the worst case sub-optimality is upper bounded by 20 across the suite of queries, and the average performance is empirically found to be within a factor of 4 wrt the optimal. Even with the on-the-fly bouquet algorithm, the guarantees are found to be within a factor of 3 as compared to those achievable in the corresponding canned query environment. Overall, the plan bouquet approach provides novel performance guarantees that open up exciting possibilities for robust query processing. Robust Query Processing Plan Bouquets Improve Robustness Bounds Plan Bouquet Architecture Query Processing Monadic Second Order Logic Plan Bouquet Approach Plan Bouquet Sequence Query Processing Computer Science

1

Page generated in 0.0302 seconds