Spelling suggestions: "subject:"query processing"" "subject:"guery processing""
141 |
Consulta espacial preferencial por palavra-chaveAlmeida, Jo?o Paulo Dias de 17 December 2015 (has links)
Submitted by Luis Ricardo Andrade da Silva (lrasilva@uefs.br) on 2016-03-01T21:58:16Z
No. of bitstreams: 1
disserta??o.pdf: 1075417 bytes, checksum: 1ac0911a0f45578306a02c8eae7a090f (MD5) / Made available in DSpace on 2016-03-01T21:58:16Z (GMT). No. of bitstreams: 1
disserta??o.pdf: 1075417 bytes, checksum: 1ac0911a0f45578306a02c8eae7a090f (MD5)
Previous issue date: 2015-12-17 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES / With the popularity of devices that are able to annotate data with spatial information (latitude and longitude), the processing of spatial queries has received a lot of attention from the research community recently. In this dissertation, we study a new query type named Top-k Spatial Keyword Preference Query that selects objects of interest based on the textual relevance of other spatio-textual objects in their spatial neighborhood. This work introduces this new query type, presents three algorithms for processing the query efficiently and performs an experimental evaluation using real databases to study the performance of the proposed algorithms. / Com a popularidade de dispositivos capazes de anotar dados com coordenadas espaciais (latitude e longitude), o processamento de consultas espaciais tem recebido bastante aten??o da comunidade cient?fica recentemente. Esta disserta??o apresenta uma nova consulta, chamada Consulta Espacial Preferencial por Palavra-chave, que seleciona objetos de interesse de acordo com a relev?ncia textual de outros objetos espa?o-textuais presentes na sua vizinhan?a espacial. Este trabalho introduz esta nova consulta, apresenta tr?s algoritmos para process?-la de forma eficiente e avalia o desempenho dos algoritmos propostos atrav?s de um estudo experimental, utilizando bases de dados reais.
|
142 |
Traitement de requêtes SPARQL sur des données liées / SPARQL distributed query processing over linked dataMacina, Abdoul 17 December 2018 (has links)
De plus en plus de sources de données liées sont publiées à travers le Web en s'appuyant sur les technologies du Web sémantique, formant ainsi un large réseau de données distribuées. Cependant il est difficile pour les consommateurs de données de profiter de la richesse de ces données, compte tenu de leur distribution, de l'augmentation de leur volume et de l'autonomie des sources de données. Les moteurs fédérateurs de données permettent d'interroger ces sources de données en utilisant des techniques de traitement de requêtes distribuées. Cependant, une mise en œuvre naïve de ces techniques peut générer un nombre considérable de requêtes distantes et de nombreux résultats intermédiaires entraînant ainsi un long temps de traitement des requêtes et des communications réseau coûteuse. Par ailleurs, la sémantique des requêtes distribuées est souvent ignorée. L'expressivité des requêtes, le partitionnement des données et leur réplication sont d'autres défis auxquels doivent faire face les moteurs de requêtes. Pour répondre à ces défis, nous avons d'abord proposé une sémantique des requêtes distribuées compatible avec les standards SPARQL et RDF qui préserve l’expressivité de SPARQL. Nous avons ensuite présenté plusieurs stratégies d'optimisation pour un moteur de requêtes fédérées qui interroge de manière transparente des sources de données distribuées. La performance de ces optimisations est évaluée sur une implémentation d’un moteur de requêtes distribuées SPARQL / Driven by the Semantic Web standards, an increasing number of RDF data sources are published and connected over the Web by data providers, leading to a large distributed linked data network. However, exploiting the wealth of these data sources is very challenging for data consumers considering the data distribution, their volume growth and data sources autonomy. In the Linked Data context, federation engines allow querying these distributed data sources by relying on Distributed Query Processing (DQP) techniques. Nevertheless, a naive implementation of the DQP approach may generate a tremendous number of remote requests towards data sources and numerous intermediate results, thus leading to costly network communications. Furthermore, the distributed query semantics is often overlooked. Query expressiveness, data partitioning, and data replication are other challenges to be taken into account. To address these challenges, we first proposed in this thesis a SPARQL and RDF compliant Distributed Query Processing semantics which preserves the SPARQL language expressiveness. Afterwards, we presented several strategies for a federated query engine that transparently addresses distributed data sources, while managing data partitioning, query results completeness, data replication, and query processing performance. We implemented and evaluated our approach and optimization strategies in a federated query engine to prove their effectiveness.
|
143 |
Analytical Query Processing Based on Continuous Compression of IntermediatesDamme, Patrick 02 October 2020 (has links)
Nowadays, increasingly large amounts of data are being collected in numerous areas ranging from science to industry. To gain valueable insights from these data, the importance of Online Analytical Processing (OLAP) workloads is constantly growing. At the same time, the hardware landscape is continuously evolving. On the one hand, the increasing capacities of DRAM allow database systems to store their entire data in main memory. Furthermore, the performance of microprocessors has improved tremendously in recent years through the use of sophisticated hardware techniques, such as Single Instruction Multiple Data (SIMD) extensions promising hitherto unknown processing speeds. On the other hand, the main memory bandwidth has not increased proportionately, such that the data access is now the main bottleneck for an efficient data processing.
To face these developments, in-memory column-stores have emerged as a new database architecture. These systems store each attribute of a relation separately in memory as a contiguous sequence of values. It is state-of-the-art to encode all values as integers and apply lossless lightweight integer compression to reduce the data size. This offers several advantages ranging from lower transfer times between RAM and CPU over a better utilization of the cache hierarchy to fast direct processing of compressed data. However, compression also incurs a certain computational overhead. State-of-the-art systems focus on the compression of base data. However, intermediate results generated during the execution of complex analytical queries can exceed the base data in number and total size. Since in in-memory systems, accessing intermediates is as expensive as accessing base data, intermediates should be handled as efficiently as possible, too.
While there are approaches trying to avoid intermediates whenever it is possible, we envision the orthogonal approach of efficiently representing intermediates using lightweight integer compression algorithms to reduce memory accesses. More precisely, our vision is a balanced query processing based on lightweight compression of intermediate results in in-memory column-stores. That means, all intermediates shall be represented using a suitable lightweight integer compression algorithm and processed by compression-enabled query operators to avoid a full decompression, whereby compression shall be used in a balanced way to ensure that its benefits outweigh its costs.
In this thesis, we address all important aspects of this vision. We provide an extensive overview of existing lightweight integer compression algorithms and conduct a systematical experimental survey of several of these algorithms to gain a deep understanding of their behavior. We propose a novel compression-enabled processing model for in-memory column-stores allowing a continuous compression of intermediates. Additionally, we develop novel cost-based strategies for a compression-aware secondary query optimization to make effective use of our processing model. Our end-to-end evaluation using the famous Star Schema Benchmark shows that our envisioned compression of intermediates can improve both the memory footprint and the runtime of complex analytical queries significantly.:1 Introduction
1.1 Contributions
1.2 Outline
2 Lightweight Integer Compression
2.1 Foundations
2.1.1 Disambiguation of Lightweight Integer Compression
2.1.2 Overview of Lightweight Integer Compression
2.1.3 State-of-the-Art in Lightweight Integer Compression
2.2 Experimental Survey
2.2.1 Related Work
2.2.2 Experimental Setup and Methodology
2.2.3 Evaluation of the Impact of the Data Characteristics
2.2.4 Evaluation of the Impact of the Hardware Characteristics
2.2.5 Evaluation of the Impact of the SIMD Extension
2.3 Summary and Discussion
3 Processing Compressed Intermediates
3.1 Processing Model for Compressed Intermediates
3.1.1 Related Work
3.1.2 Description of the Underlying Processing Model
3.1.3 Integration of Compression into Query Operators
3.1.4 Integration of Compression into the Overall Query Execution
3.1.5 Efficient Implementation
3.1.6 Evaluation
3.2 Direct Integer Morphing Algorithms
3.2.1 Related Work
3.2.2 Integer Morphing Algorithms
3.2.3 Example Algorithms
3.2.4 Evaluation
3.3 Summary and Discussion
4 Compression-Aware Query Optimization Strategies
4.1 Related Work
4.2 Compression-Aware Secondary Query Optimization
4.2.1 Compression-Level: Selecting a Suitable Algorithm
4.2.2 Operator-Level: Selecting Suitable Input/Output Formats
4.2.3 QEP-Level: Selecting Suitable Formats for All Involved Columns
4.3 Evaluation
4.3.1 Compression-Level: Selecting a Suitable Algorithm
4.3.2 Operator-Level: Selecting Suitable Input/Output Formats
4.3.3 Lessons Learned
4.4 Summary and Discussion
5 End-to-End Evaluation
5.1 Experimental Setup and Methodology
5.2 A Simple OLAP Query
5.3 Complex OLAP Queries: The Star Schema Benchmark
5.4 Summary and Discussion
6 Conclusion
6.1 Summary of this Thesis
6.2 Directions for Future Work
Bibliography
List of Figures
List of Tables
|
144 |
On-line analytical processing in distributed data warehousesLehner, Wolfgang, Albrecht, Jens 14 April 2022 (has links)
The concepts of 'data warehousing' and 'on-line analytical processing' have seen a growing interest in the research and commercial product community. Today, the trend moves away from complex centralized data warehouses to distributed data marts integrated in a common conceptual schema. However, as the first part of this paper demonstrates, there are many problems and little solutions for large distributed decision support systems in worldwide operating corporations. After showing the benefits and problems of the distributed approach, this paper outlines possibilities for achieving performance in distributed online analytical processing. Finally, the architectural framework of the prototypical distributed OLAP system CUBESTAR is outlined.
|
145 |
fAST Refresh using Mass Query OptimizationLehner, Wolfgang, Cochrane, Bobbie, Pirahesh, Hamid, Zaharioudakis, Markos 02 June 2022 (has links)
Automatic summary tables (ASTs), more commonly known as materialized views, are widely used to enhance query performance, particularly for aggregate queries. Such queries access a huge number of rows to retrieve aggregated summary data while performing multiple joins in the context of a typical data warehouse star schema. To keep ASTs consistent with their underlying base data, the ASTs are either immediately synchronized or fully recomputed. This paper proposes an optimization strategy for simultaneously refreshing multiple ASTs, thus avoiding multiple scans of a large fact table (one pass for AST computation). A query stacking strategy detects common sub-expressions using the available query matching technology of DB2. Since exact common sub-expressions are rare, the novel query sharing approach systematically generates common subexpressions for a given set of 'related' queries, considering different predicates, grouping expressions, and sets of base tables. The theoretical framework, a prototype implementation of both strategies in the IBM DB2 UDB/UWO database system, and performance evaluations based on the TPC/R data schema are presented in this paper.
|
146 |
AL: Unified Analytics in Domain Specific TermsLuong, Johannes, Habich, Dirk, Lehner, Wolfgang 13 June 2022 (has links)
Data driven organizations gather information on various aspects of their endeavours and analyze that information to gain valuable insights or to increase automatization. Today, these organizations can choose from a wealth of specialized analytical libraries and platforms to meet their functional and non-functional requirements. Indeed, many common application scenarios involve the combination of multiple such libraries and platforms in order to provide a holistic perspective. Due to the scattered landscape of specialized analytical tools, this integration can result in complex and hard to evolve applications. In addition, the necessary movement of data between tools and formats can introduce a serious performance penalty. In this article we present a unified programming environment for analytical applications. The environment includes AL, a programming language that combines concepts of various common analytical domains. Further, the environment also includes a flexible compilation system that uses a language-, domain-, and platform independent program intermediate representation to separate high level application logic and physical organisation. We provide a detailed introduction of AL, establish our program intermediate representation as a generally useful abstraction, and give a detailed explanation of the translation of AL programs into workloads for our experimental shared-memory processing engine.
|
147 |
Shrinked Data Marts Enabled for Negative CachingLehner, Wolfgang, Thiele, Maik 15 June 2022 (has links)
Data marts storing pre-aggregated data, prepared for further roll-ups, play an essential role in data warehouse environments and lead to significant performance gains in the query evaluation. However, in order to ensure the completeness of query results on the data mart without to access the underlying data warehouse, null values need to be stored explicitly; this process is denoted as negative caching. Such null values typically occur in multidimensional data sets, which are naturally very sparse. To our knowledge, there is no work on shrinking the null tuples in a multi-dimensional data set within ROLAP. For these tuples, we propose a lossless compression technique, leading to a dramatic reduction in size of the data mart. Queries depending on null value information can be answered with 100% precision by partially inflating the shrunken data mart. We complement our analytical approach with an experimental evaluation using real and synthetic data sets, and demonstrate our results.
|
148 |
Optimistic Coarse-Grained Cache Semantics for Data MartsLehner, Wolfgang, Thiele, Maik, Albrecht, Jens 15 June 2022 (has links)
Data marts and caching are two closely related concepts in the domain of multi-dimensional data. Both store pre-computed data to provide fast response times for complex OLAP queries, and for both it must be guaranteed that every query can be completely processed. However, they differ extremely in their update behaviour which we utilise to build a specific data mart extended by cache semantics. In this paper, we introduce a novel cache exploitation concept for data marts - coarse-grained caching - in which the containedness check for a multi-dimensional query is done through the comparison of the expected and the actual cardinalities. Therefore, we subdivide the multi-dimensional data into coarse partitions, the so called cubletets, which allow to specify the completeness criteria for incoming queries. We show that during query processing, the completeness check is done with no additional costs.
|
149 |
Conjunctive Queries with Inequalities Under UpdatesIdris, Muhammad, Ugarte, Martín, Vansummeren, Stijn, Voigt, Hannes, Lehner, Wolfgang 15 June 2022 (has links)
Modern application domains such as Composite Event Recognition (CER) and real-time Analytics require the ability to dynamically refresh query results under high update rates. Traditional approaches to this problem are based either on the materialization of subresults (to avoid their recomputation) or on the recomputation of subresults (to avoid the space overhead of materialization). Both techniques have recently been shown suboptimal: instead of materializing results and subresults, one can maintain a data structure that supports efficient maintenance under updates and can quickly enumerate the full query output, as well as the changes produced under single updates. Unfortunately, these data structures have been developed only for aggregate-join queries composed of equi-joins, limiting their applicability in domains such as CER where temporal joins are commonplace. In this paper, we present a new approach for dynamically evaluating queries with multi-way θ-joins under updates that is effective in avoiding both materialization and recomputation of results, while supporting a wide range of applications. To do this we generalize Dynamic Yannakakis, an algorithm for dynamically processing acyclic equi-join queries. In tandem, and of independent interest, we generalize the notions of acyclicity and free-connexity to arbitrary θ-joins. We instantiate our framework to the case where θ-joins are only composed of equalities and inequalities (<, ≤, =, >, ≥) and experimentally compare this algorithm, called IEDyn, to state of the art CER systems as well as incremental view maintenance engines. IEDyn performs consistently better than the competitor systems with up to two orders of magnitude improvements in both time and memory consumption.
|
150 |
Efficient Query Processing for Dynamically Changing DatasetsIdris, Muhammad, Ugarte, Martín, Vansummeren, Stijn, Voigt, Hannes, Lehner, Wolfgang 11 August 2022 (has links)
The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. Traditional approaches to this problem were developed around the notion of Incremental View Maintenance (IVM), and are based either on the materialization of subresults (to avoid their recomputation) or on the recomputation of subresults (to avoid the space overhead of materialization). Both techniques are suboptimal: instead of materializing results and subresults, one may also maintain a data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In two previous articles, we have presented algorithms for dynamically evaluating queries that are easy to implement, efficient, and can be naturally extended to evaluate queries from a wide range of application domains. In this paper, we discuss our algorithm and its complexity, explaining the main components behind its efficiency. Finally, we show experiments that compare our algorithm to a state-of-the-art (Higher-order) IVM engine, as well as to a prominent complex event recognition engine. Our approach outperforms the competitor systems by up to two orders of magnitude in processing time, and one order in memory consumption.
|
Page generated in 0.0946 seconds