Global ETD Search

131	[en] QEEF: AN EXTENSIBLE QUERY EXECUTION ENGINE / [pt] QEEF: UMA MÁQUINA DE EXECUÇÃO DE CONSULTAS FAUSTO VERAS MARANHAO AYRES 30 June 2004 (has links) [pt] O processamento de consultas em Sistemas de Gerência de Banco de Dados tradicionais tem sido largamente estudado na literatura e utilizado comercialmente com enorme sucesso. Isso é devido, em parte, à eficiência das Máquinas de Execução de Consultas (MEC) no suporte ao modelo de execução tradicional. Porém, o surgimento de novos cenários de aplicação, principalmente em conseqüência do modelo computacional da web, motivou a pesquisa de novos modelos de execução, tais como: modelo adaptável e modelo contínuo, além da pesquisa de modelos de dados semi-estruturados, tal como o XML, ambos não suportados pelas MEC tradicionais. O objetivo desta tese consiste no desenvolvimento de uma MEC extensível frente a diferentes modelos de execução e de dados. Adicionalmente, esta proposta trata de maneira ortogonal o modelo de execução e o modelo de dados, o que permite a avaliação de planos de execução de consultas (PEC) com fragmentos em diferentes modelos. Utilizou-se a técnica de framework de software para a especificação da MEC extensível, produzindo o framework QEEF (Query Execution Engine Framework). A extensibilidade da solução reflete-se em um meta-modelo, denominado QUEM (QUery Execution Meta-model), capaz de exprimir diferentes modelos em um meta-PEC. O framework QEEF pré-processa um meta-PEC e produz um PEC final a ser avaliado pela MEC instanciada. Como parte da validação desta proposta, instanciou-se o QEEF para diferentes modelos de execução e de dados. / [en] Querying processing in traditional Database Management Systems (DBMS) has been extensively studied in the literature and adopted in industry. Such success is, in part, due to the performance of their Query Execution Engines (QEE) for supporting the traditional query execution model. The advent of new query scenarios, mainly due to the web computational model, has motivate the research on new execution models such as: adaptive and continuous, and on semistructured data models, such as XML, both not natively supported by traditional query engines. This thesis proposes the development of an extensible QEE adapted to the new execution and data models. Achieving this goal, we use a software design approach based on framework technique to produce the Query Execution Engine Framework (QEEF). Moreover, we address the question of the orthogonality between execution and data models, witch allows for executing query execution plans (QEP) with fragments in different models. The extensibility of our solution is specified by in a QEP by an execution meta- model named QUEM (QUery Execution Meta-model) used to express different models in a meta-QEP. During query evaluation, the latter is pre-processed by the QEEF producing a final QEP to be evaluated by the running QEE. The QEEF is instantiated for different execution and data models as part of the validation of this proposal. [pt] BANCO DE DADOS [en] DATABASE [pt] PROCESSAMENTO DE CONSULTAS [en] QUERY PROCESSING [pt] MAQUINA DE EXECUCAO DE CONSULTAS [en] QUERY EXECUTION ENGINE [pt] MODELO DE EXECUCAO DE CONSULTAS [en] QUERY EXECUTION MODEL [pt] MODELO DE DADOS SEMI-ESTRUTURADO [en] SEMI-STRUCTURED DATA MODEL [pt] FRAMEWORK DE SOFTWARE [en] SOFTWARE FRAMEWORK
132	Balancing Money and Time for OLAP Queries on Cloud Databases Sabih, Rafia January 2016 (has links) (PDF) Enterprise Database Management Systems (DBMSs) have to contend with resource-intensive and time-varying workloads, making them well-suited candidates for migration to cloud plat-forms { specifically, they can dynamically leverage the resource elasticity while retaining affordability through the pay-as-you-go rental interface. The current design of database engine components lays emphasis on maximizing computing efficiency, but to fully capitalize on the cloud's benefits, the outlays of these computations also need to be factored into the planning exercise. In this thesis, we investigate this contemporary problem in the context of industrial-strength deployments of relational database systems on real-world cloud platforms. Specifically, we consider how the traditional metric used to compare query execution plans, namely response-time, can be augmented to incorporate monetary costs in the decision process. The challenge here is that execution-time and monetary costs are adversarial metrics, with a decrease in one entailing a rise in the other. For instance, a Virtual Machine (VM) with rich physical resources (RAM, cores, etc.) decreases the query response-time, but is expensive with regard to rental rates. In a nutshell, there is a tradeoff between money and time, and our goal therefore is to identify the VM that others the best tradeoff between these two competing considerations. In our study, we pro le the behavior of money versus time for a given query, and de ne the best tradeoff as the \knee" { that is, the location on the pro le with the minimum Euclidean distance from the origin. To study the performance of industrial-strength database engines on real-world cloud infrastructure, we have deployed a commercial DBMS on Google cloud services. On this platform, we have carried out extensive experimentation with the TPC-DS decision-support benchmark, an industry-wide standard for evaluating database system performance. Our experiments demonstrate that the choice of VM for hosting the database server is a crucial decision, because: (i) variation in time and money across VMs is significant for a given query, (ii) no one VM offers the best money-time tradeoff across all queries. To efficiently identify the VM with the best tradeoff from a large suite of available configurations, we propose a technique to characterize the money-time pro le for a given query. The core of this technique is a VM pruning mechanism that exploits the property of partially ordered set of the VMs on their resources. It processes the minimal and maximal VMs of this poset for estimated query response-time. If the response-times on these extreme VMs are similar, then all the VMs sandwiched between them are pruned from further consideration. Otherwise, the already processed VMs are set aside, and the minimal and maximal VMs of the remaining unprocessed VMs are evaluated for their response-times. Finally, the knee VM is identified from the processed VMs as the one with the minimum Euclidean distance from the origin on the money-time space. We theoretically prove that this technique always identifies the knee VM; further, if it is acceptable to and a \near-optimal" knee by providing a relaxation-factor on the response-time distance from the optimal knee, then it is also capable of finding more efficiently a satisfactory knee under these relaxed conditions. We propose two favors of this approach: the first one prunes the VMs using complete plan information received from database engine API, and named as Plan-based Identification of Knee (PIK). On the other hand, to further increase the efficiency of the identification of the knee VM, we propose a sub-plan based pruning algorithm called Sub-Plan-based Identification of Knee (SPIK), which requires modifications in the query optimizer. We have evaluated PIK on a commercial system and found that it often requires processing for only 20% of the total VMs. The efficiency of the algorithm is further increased significantly, by using 10-20% relaxation in response-time. For evaluating SPIK , we prototyped it on an open-source engine { Postgresql 9.3, and also implemented it as Java wrapper program with the commercial engine. Experimentally, the processing done by SPIK is found to be only 40% of the PIK approach. Therefore, from an overall perspective, this thesis facilitates the desired migration of enterprise databases to cloud platforms, by identifying the VM(s) that offer competitive tradeoffs between money and time for the given query. Database Management Syatem (DBMS) Virtual Machine Google Cloud Services Cloud Platforms Cloud Databases Cloud Query Processing Model Plan-based Identification of Knee (PIK ) Knee VM Computational and Data Sciences
133	Suporte a consultas por similaridade unárias em SQL / Extending SQL to support unary similary queries Mônica Ribeiro Porto Ferreira 15 February 2008 (has links) Os operadores convencionais para comparação de dados por igualdade e por relação de ordem total não são adequados para o gerenciamento de dados complexos como, por exemplo, os dados multimí?dia (imagens, áudio, textos longos), séries temporais e seqüências genéticas. Para comparar dados desses tipos, o grau de similaridade entre suas instâncias é, em geral, o fator mais importante sendo, portanto, indicado que as operações de consulta sejam realizadas utilizando os chamados operadores por similaridade. Existem operadores de busca por similaridade tanto unários quanto binários. Os operadores unários são utilizados para implementar operações de seleção, enquanto os operadores binários destinam-se a operações de junção. A álgebra relacional, usada nos Sistemas de Gerenciamento de Bases de Dados Relacionais, não provê suporte para expressar critérios de busca por similaridade. Para suprir esse suporte, está em desenvolvimento no Grupo de Bases de Dados e Imagens (GBdI-ICMC-USP) uma extensão à álgebra relacional que permite representar as consultas por similaridade em expressões algébricas. Esta dissertação incorpora-se nesse empreendimento, abordando o tratamento aos operadores unários por similaridade na álgebra, bem como a implementação do otimizador de consultas por similaridade no SIREN (Similarity Retrieval Engine) para que as consultas por similaridade possam ser respondidas pelos Sistemas de Gerenciamento de Bases de Dados relacionais / Conventional operators for data comparison based on exact matching and total order relations are not appropriate to manage complex data, such as multimedia data (e.g. images, audio and large texts), time series and genetic sequences. In fact, the most important aspect to compare complex data is usually the similarity degree between instances, leading to the use of similarity operators to perform search and retrieval operations. Similarity operators can be classified as unary or as binary, respectively used to implement selection operations and joins. However, the Relation Algebra, employed in Relational Database Management Systems (DBMS), does not provide resources to express similarity search criteria. In order to fulfill this lack of support, an extension to the Relational Algebra is under development at GBdI-ICMC-USP (Grupo de Bases de Dados e Imagens), aiming to represent similarity queries in algebraic expressions. This work contributes to such an effort by dealing with unary similarity operators in Relational Algebra and by developing a similarity query optimizer for SIREN (Similarity Retrieval Engine), therefore allowing similarity queries to be answered by Relational DBMS Álgebra por similaridade Consultas por similaridade Interpretação de consultas Método de acesso métrico Otimização de consulta Seleção por similaridade Metric acess method Query optimization Query processing Similarity algebra Similarity queries Similarity selection
134	Consulta espacial preferencial por palavra-chave Almeida, Jo?o Paulo Dias de 17 December 2015 (has links) Submitted by Luis Ricardo Andrade da Silva (lrasilva@uefs.br) on 2016-03-01T21:58:16Z No. of bitstreams: 1 disserta??o.pdf: 1075417 bytes, checksum: 1ac0911a0f45578306a02c8eae7a090f (MD5) / Made available in DSpace on 2016-03-01T21:58:16Z (GMT). No. of bitstreams: 1 disserta??o.pdf: 1075417 bytes, checksum: 1ac0911a0f45578306a02c8eae7a090f (MD5) Previous issue date: 2015-12-17 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES / With the popularity of devices that are able to annotate data with spatial information (latitude and longitude), the processing of spatial queries has received a lot of attention from the research community recently. In this dissertation, we study a new query type named Top-k Spatial Keyword Preference Query that selects objects of interest based on the textual relevance of other spatio-textual objects in their spatial neighborhood. This work introduces this new query type, presents three algorithms for processing the query efficiently and performs an experimental evaluation using real databases to study the performance of the proposed algorithms. / Com a popularidade de dispositivos capazes de anotar dados com coordenadas espaciais (latitude e longitude), o processamento de consultas espaciais tem recebido bastante aten??o da comunidade cient?fica recentemente. Esta disserta??o apresenta uma nova consulta, chamada Consulta Espacial Preferencial por Palavra-chave, que seleciona objetos de interesse de acordo com a relev?ncia textual de outros objetos espa?o-textuais presentes na sua vizinhan?a espacial. Este trabalho introduz esta nova consulta, apresenta tr?s algoritmos para process?-la de forma eficiente e avalia o desempenho dos algoritmos propostos atrav?s de um estudo experimental, utilizando bases de dados reais. Processamento de consultas Bases de dados espaciais ?ndices h?bridos Consultas preferenciais Sistemas de informa??o Recupera??o de informa??o Query processing Spatial databases Hybrid indexes Preference queries Information systems Information retrieval
135	Traitement de requêtes SPARQL sur des données liées / SPARQL distributed query processing over linked data Macina, Abdoul 17 December 2018 (has links) De plus en plus de sources de données liées sont publiées à travers le Web en s'appuyant sur les technologies du Web sémantique, formant ainsi un large réseau de données distribuées. Cependant il est difficile pour les consommateurs de données de profiter de la richesse de ces données, compte tenu de leur distribution, de l'augmentation de leur volume et de l'autonomie des sources de données. Les moteurs fédérateurs de données permettent d'interroger ces sources de données en utilisant des techniques de traitement de requêtes distribuées. Cependant, une mise en œuvre naïve de ces techniques peut générer un nombre considérable de requêtes distantes et de nombreux résultats intermédiaires entraînant ainsi un long temps de traitement des requêtes et des communications réseau coûteuse. Par ailleurs, la sémantique des requêtes distribuées est souvent ignorée. L'expressivité des requêtes, le partitionnement des données et leur réplication sont d'autres défis auxquels doivent faire face les moteurs de requêtes. Pour répondre à ces défis, nous avons d'abord proposé une sémantique des requêtes distribuées compatible avec les standards SPARQL et RDF qui préserve l’expressivité de SPARQL. Nous avons ensuite présenté plusieurs stratégies d'optimisation pour un moteur de requêtes fédérées qui interroge de manière transparente des sources de données distribuées. La performance de ces optimisations est évaluée sur une implémentation d’un moteur de requêtes distribuées SPARQL / Driven by the Semantic Web standards, an increasing number of RDF data sources are published and connected over the Web by data providers, leading to a large distributed linked data network. However, exploiting the wealth of these data sources is very challenging for data consumers considering the data distribution, their volume growth and data sources autonomy. In the Linked Data context, federation engines allow querying these distributed data sources by relying on Distributed Query Processing (DQP) techniques. Nevertheless, a naive implementation of the DQP approach may generate a tremendous number of remote requests towards data sources and numerous intermediate results, thus leading to costly network communications. Furthermore, the distributed query semantics is often overlooked. Query expressiveness, data partitioning, and data replication are other challenges to be taken into account. To address these challenges, we first proposed in this thesis a SPARQL and RDF compliant Distributed Query Processing semantics which preserves the SPARQL language expressiveness. Afterwards, we presented several strategies for a federated query engine that transparently addresses distributed data sources, while managing data partitioning, query results completeness, data replication, and query processing performance. We implemented and evaluated our approach and optimization strategies in a federated query engine to prove their effectiveness. Web sémantique Web de données Données liées Données ouvertes liées Traitement de requêtes distribuées Évaluation des requêtes fédérées SPARQL Semantic Web Web of data Linked data Linked open data Distributed query processing Federated query evaluation SPARQL
136	Analytical Query Processing Based on Continuous Compression of Intermediates Damme, Patrick 02 October 2020 (has links) Nowadays, increasingly large amounts of data are being collected in numerous areas ranging from science to industry. To gain valueable insights from these data, the importance of Online Analytical Processing (OLAP) workloads is constantly growing. At the same time, the hardware landscape is continuously evolving. On the one hand, the increasing capacities of DRAM allow database systems to store their entire data in main memory. Furthermore, the performance of microprocessors has improved tremendously in recent years through the use of sophisticated hardware techniques, such as Single Instruction Multiple Data (SIMD) extensions promising hitherto unknown processing speeds. On the other hand, the main memory bandwidth has not increased proportionately, such that the data access is now the main bottleneck for an efficient data processing. To face these developments, in-memory column-stores have emerged as a new database architecture. These systems store each attribute of a relation separately in memory as a contiguous sequence of values. It is state-of-the-art to encode all values as integers and apply lossless lightweight integer compression to reduce the data size. This offers several advantages ranging from lower transfer times between RAM and CPU over a better utilization of the cache hierarchy to fast direct processing of compressed data. However, compression also incurs a certain computational overhead. State-of-the-art systems focus on the compression of base data. However, intermediate results generated during the execution of complex analytical queries can exceed the base data in number and total size. Since in in-memory systems, accessing intermediates is as expensive as accessing base data, intermediates should be handled as efficiently as possible, too. While there are approaches trying to avoid intermediates whenever it is possible, we envision the orthogonal approach of efficiently representing intermediates using lightweight integer compression algorithms to reduce memory accesses. More precisely, our vision is a balanced query processing based on lightweight compression of intermediate results in in-memory column-stores. That means, all intermediates shall be represented using a suitable lightweight integer compression algorithm and processed by compression-enabled query operators to avoid a full decompression, whereby compression shall be used in a balanced way to ensure that its benefits outweigh its costs. In this thesis, we address all important aspects of this vision. We provide an extensive overview of existing lightweight integer compression algorithms and conduct a systematical experimental survey of several of these algorithms to gain a deep understanding of their behavior. We propose a novel compression-enabled processing model for in-memory column-stores allowing a continuous compression of intermediates. Additionally, we develop novel cost-based strategies for a compression-aware secondary query optimization to make effective use of our processing model. Our end-to-end evaluation using the famous Star Schema Benchmark shows that our envisioned compression of intermediates can improve both the memory footprint and the runtime of complex analytical queries significantly.:1 Introduction 1.1 Contributions 1.2 Outline 2 Lightweight Integer Compression 2.1 Foundations 2.1.1 Disambiguation of Lightweight Integer Compression 2.1.2 Overview of Lightweight Integer Compression 2.1.3 State-of-the-Art in Lightweight Integer Compression 2.2 Experimental Survey 2.2.1 Related Work 2.2.2 Experimental Setup and Methodology 2.2.3 Evaluation of the Impact of the Data Characteristics 2.2.4 Evaluation of the Impact of the Hardware Characteristics 2.2.5 Evaluation of the Impact of the SIMD Extension 2.3 Summary and Discussion 3 Processing Compressed Intermediates 3.1 Processing Model for Compressed Intermediates 3.1.1 Related Work 3.1.2 Description of the Underlying Processing Model 3.1.3 Integration of Compression into Query Operators 3.1.4 Integration of Compression into the Overall Query Execution 3.1.5 Efficient Implementation 3.1.6 Evaluation 3.2 Direct Integer Morphing Algorithms 3.2.1 Related Work 3.2.2 Integer Morphing Algorithms 3.2.3 Example Algorithms 3.2.4 Evaluation 3.3 Summary and Discussion 4 Compression-Aware Query Optimization Strategies 4.1 Related Work 4.2 Compression-Aware Secondary Query Optimization 4.2.1 Compression-Level: Selecting a Suitable Algorithm 4.2.2 Operator-Level: Selecting Suitable Input/Output Formats 4.2.3 QEP-Level: Selecting Suitable Formats for All Involved Columns 4.3 Evaluation 4.3.1 Compression-Level: Selecting a Suitable Algorithm 4.3.2 Operator-Level: Selecting Suitable Input/Output Formats 4.3.3 Lessons Learned 4.4 Summary and Discussion 5 End-to-End Evaluation 5.1 Experimental Setup and Methodology 5.2 A Simple OLAP Query 5.3 Complex OLAP Queries: The Star Schema Benchmark 5.4 Summary and Discussion 6 Conclusion 6.1 Summary of this Thesis 6.2 Directions for Future Work Bibliography List of Figures List of Tables info:eu-repo/classification/ddc/004 ddc:004
137	On-line analytical processing in distributed data warehouses Lehner, Wolfgang, Albrecht, Jens 14 April 2022 (has links) The concepts of 'data warehousing' and 'on-line analytical processing' have seen a growing interest in the research and commercial product community. Today, the trend moves away from complex centralized data warehouses to distributed data marts integrated in a common conceptual schema. However, as the first part of this paper demonstrates, there are many problems and little solutions for large distributed decision support systems in worldwide operating corporations. After showing the benefits and problems of the distributed approach, this paper outlines possibilities for achieving performance in distributed online analytical processing. Finally, the architectural framework of the prototypical distributed OLAP system CUBESTAR is outlined. info:eu-repo/classification/ddc/005 ddc:005
138	AL: Unified Analytics in Domain Specific Terms Luong, Johannes, Habich, Dirk, Lehner, Wolfgang 13 June 2022 (has links) Data driven organizations gather information on various aspects of their endeavours and analyze that information to gain valuable insights or to increase automatization. Today, these organizations can choose from a wealth of specialized analytical libraries and platforms to meet their functional and non-functional requirements. Indeed, many common application scenarios involve the combination of multiple such libraries and platforms in order to provide a holistic perspective. Due to the scattered landscape of specialized analytical tools, this integration can result in complex and hard to evolve applications. In addition, the necessary movement of data between tools and formats can introduce a serious performance penalty. In this article we present a unified programming environment for analytical applications. The environment includes AL, a programming language that combines concepts of various common analytical domains. Further, the environment also includes a flexible compilation system that uses a language-, domain-, and platform independent program intermediate representation to separate high level application logic and physical organisation. We provide a detailed introduction of AL, establish our program intermediate representation as a generally useful abstraction, and give a detailed explanation of the translation of AL programs into workloads for our experimental shared-memory processing engine. info:eu-repo/classification/ddc/004 ddc:004
139	Shrinked Data Marts Enabled for Negative Caching Lehner, Wolfgang, Thiele, Maik 15 June 2022 (has links) Data marts storing pre-aggregated data, prepared for further roll-ups, play an essential role in data warehouse environments and lead to significant performance gains in the query evaluation. However, in order to ensure the completeness of query results on the data mart without to access the underlying data warehouse, null values need to be stored explicitly; this process is denoted as negative caching. Such null values typically occur in multidimensional data sets, which are naturally very sparse. To our knowledge, there is no work on shrinking the null tuples in a multi-dimensional data set within ROLAP. For these tuples, we propose a lossless compression technique, leading to a dramatic reduction in size of the data mart. Queries depending on null value information can be answered with 100% precision by partially inflating the shrunken data mart. We complement our analytical approach with an experimental evaluation using real and synthetic data sets, and demonstrate our results. info:eu-repo/classification/ddc/004 ddc:004
140	Optimistic Coarse-Grained Cache Semantics for Data Marts Lehner, Wolfgang, Thiele, Maik, Albrecht, Jens 15 June 2022 (has links) Data marts and caching are two closely related concepts in the domain of multi-dimensional data. Both store pre-computed data to provide fast response times for complex OLAP queries, and for both it must be guaranteed that every query can be completely processed. However, they differ extremely in their update behaviour which we utilise to build a specific data mart extended by cache semantics. In this paper, we introduce a novel cache exploitation concept for data marts - coarse-grained caching - in which the containedness check for a multi-dimensional query is done through the comparison of the expected and the actual cardinalities. Therefore, we subdivide the multi-dimensional data into coarse partitions, the so called cubletets, which allow to specify the completeness criteria for incoming queries. We show that during query processing, the completeness check is done with no additional costs. info:eu-repo/classification/ddc/004 ddc:004

Search results