Global ETD Search

1	CROSS-DB: a feature-extended multidimensional data model for statistical and scientific databases Lehner, Wolfgang, Ruf, Thomas, Teschke, Michael 13 September 2022 (has links) Statistical and scientific computing applications exhibit characteristics that are fundamentally different from classical database system application domains. The CROSS-DB data model presented in this paper is optimized for use in such applications by providing advanced data modelling methods and application-oriented query facilities, thus providing a framework for optimized data management procedures. CROSS-DB (which stands for Classification-oriented, Redundancy-based Optimization of Statistical and Scientific DataBases) is based on a multidimensional data view. The model differs from other approaches by o~ering two complementary rnechanisrnsfor structuring qualifying information, classification and feature description. Using these mechanisms results in a normalized, low-dimensional database schema which ensures both, modelling uniqueness and understandability while providing enhanced modelling flexibility. info:eu-repo/classification/ddc/004 ddc:004
2	Cardinality Estimation with Local Deep Learning Models Woltmann, Lucas, Hartmann, Claudio, Thiele, Maik, Habich, Dirk, Lehner, Wolfgang 14 June 2022 (has links) Cardinality estimation is a fundamental task in database query processing and optimization. Unfortunately, the accuracy of traditional estimation techniques is poor resulting in non-optimal query execution plans. With the recent expansion of machine learning into the field of data management, there is the general notion that data analysis, especially neural networks, can lead to better estimation accuracy. Up to now, all proposed neural network approaches for the cardinality estimation follow a global approach considering the whole database schema at once. These global models are prone to sparse data at training leading to misestimates for queries which were not represented in the sample space used for generating training queries. To overcome this issue, we introduce a novel local-oriented approach in this paper, therefore the local context is a specific sub-part of the schema. As we will show, this leads to better representation of data correlation and thus better estimation accuracy. Compared to global approaches, our novel approach achieves an improvement by two orders of magnitude in accuracy and by a factor of four in training time performance for local models. info:eu-repo/classification/ddc/004 ddc:004
3	Sample synopses for approximate answering of group-by queries Lehner, Wolfgang, Rösch, Philipp 22 April 2022 (has links) With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. Typically, those analytical queries partition the data into groups and aggregate the values within the groups. Further, with the commonly used roll-up and drill-down operations a broad range of group-by queries is posed to the system, which makes the construction of highly-specialized synopses difficult. In this paper, we propose a general-purpose sampling scheme that is biased in order to answer group-by queries with high accuracy. While existing techniques focus on the size of the group when computing its sample size, our technique is based on its standard deviation. The basic idea is that the more homogeneous a group is, the less representatives are required in order to give a good estimate. With an extensive set of experiments, we show that our approach reduces both the estimation error and the construction cost compared to existing techniques. info:eu-repo/classification/ddc/004 ddc:004
4	Derby/S: A DBMS for Sample-Based Query Answering Klein, Anja, Gemulla, Rainer, Rösch, Philipp, Lehner, Wolfgang 10 November 2022 (has links) Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS. info:eu-repo/classification/ddc/004 ddc:004
5	Robust Real-time Query Processing with QStream Schmidt, Sven, Legler, Thomas, Schär, Sebastian, Lehner, Wolfgang 08 August 2023 (has links) Processing data streams with Quality-of-Service (QoS) guarantees is an emerging area in existing streaming applications. Although it is possible to negotiate the result quality and to reserve the required processing resources in advance, it remains a challenge to adapt the DSMS to data stream characteristics which are not known in advance or are difficult to obtain. Within this paper we present the second generation of our QStream DSMS which addresses the above challenge by using a real-time capable operating system environment for resource reservation and by applying an adaptation mechanism if the data stream characteristics change spontaneously. info:eu-repo/classification/ddc/004 ddc:004
6	Query optimization by using derivability in a data warehouse environment Albrecht, Jens, Hümmer, Wolfgang, Lehner, Wolfgang, Schlesinger, Lutz 10 January 2023 (has links) Materialized summary tables and cached query results are frequently used for the optimization of aggregate queries in a data warehouse. Query rewriting techniques are incorporated into database systems to use those materialized views and thus avoid the access of the possibly huge raw data. A rewriting is only possible if the query is derivable from these views. Several approaches can be found in the literature to check the derivability and find query rewritings. The specific application scenario of a data warehouse with its multidimensional perspective allows the consideration of much more semantic information, e.g. structural dependencies within the dimension hierarchies and different characteristics of measures. The motivation of this article is to use this information to present conditions for derivability in a large number of relevant cases which go beyond previous approaches. info:eu-repo/classification/ddc/004 ddc:004
7	AL: Unified Analytics in Domain Specific Terms Luong, Johannes, Habich, Dirk, Lehner, Wolfgang 13 June 2022 (has links) Data driven organizations gather information on various aspects of their endeavours and analyze that information to gain valuable insights or to increase automatization. Today, these organizations can choose from a wealth of specialized analytical libraries and platforms to meet their functional and non-functional requirements. Indeed, many common application scenarios involve the combination of multiple such libraries and platforms in order to provide a holistic perspective. Due to the scattered landscape of specialized analytical tools, this integration can result in complex and hard to evolve applications. In addition, the necessary movement of data between tools and formats can introduce a serious performance penalty. In this article we present a unified programming environment for analytical applications. The environment includes AL, a programming language that combines concepts of various common analytical domains. Further, the environment also includes a flexible compilation system that uses a language-, domain-, and platform independent program intermediate representation to separate high level application logic and physical organisation. We provide a detailed introduction of AL, establish our program intermediate representation as a generally useful abstraction, and give a detailed explanation of the translation of AL programs into workloads for our experimental shared-memory processing engine. info:eu-repo/classification/ddc/004 ddc:004
8	Conjunctive Queries with Inequalities Under Updates Idris, Muhammad, Ugarte, Martín, Vansummeren, Stijn, Voigt, Hannes, Lehner, Wolfgang 15 June 2022 (has links) Modern application domains such as Composite Event Recognition (CER) and real-time Analytics require the ability to dynamically refresh query results under high update rates. Traditional approaches to this problem are based either on the materialization of subresults (to avoid their recomputation) or on the recomputation of subresults (to avoid the space overhead of materialization). Both techniques have recently been shown suboptimal: instead of materializing results and subresults, one can maintain a data structure that supports efficient maintenance under updates and can quickly enumerate the full query output, as well as the changes produced under single updates. Unfortunately, these data structures have been developed only for aggregate-join queries composed of equi-joins, limiting their applicability in domains such as CER where temporal joins are commonplace. In this paper, we present a new approach for dynamically evaluating queries with multi-way θ-joins under updates that is effective in avoiding both materialization and recomputation of results, while supporting a wide range of applications. To do this we generalize Dynamic Yannakakis, an algorithm for dynamically processing acyclic equi-join queries. In tandem, and of independent interest, we generalize the notions of acyclicity and free-connexity to arbitrary θ-joins. We instantiate our framework to the case where θ-joins are only composed of equalities and inequalities (<, ≤, =, >, ≥) and experimentally compare this algorithm, called IEDyn, to state of the art CER systems as well as incremental view maintenance engines. IEDyn performs consistently better than the competitor systems with up to two orders of magnitude improvements in both time and memory consumption. info:eu-repo/classification/ddc/004 ddc:004

Search results