Global ETD Search

1	Sample synopses for approximate answering of group-by queries Lehner, Wolfgang, Rösch, Philipp 22 April 2022 (has links) With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. Typically, those analytical queries partition the data into groups and aggregate the values within the groups. Further, with the commonly used roll-up and drill-down operations a broad range of group-by queries is posed to the system, which makes the construction of highly-specialized synopses difficult. In this paper, we propose a general-purpose sampling scheme that is biased in order to answer group-by queries with high accuracy. While existing techniques focus on the size of the group when computing its sample size, our technique is based on its standard deviation. The basic idea is that the more homogeneous a group is, the less representatives are required in order to give a good estimate. With an extensive set of experiments, we show that our approach reduces both the estimation error and the construction cost compared to existing techniques. info:eu-repo/classification/ddc/004 ddc:004
2	Data Management Support for Notification Services Lehner, Wolfgang 17 July 2023 (has links) Database management systems are highly specialized to efficiently organize and process huge amounts of data in a transactional manner. During the last years, however, database management systems have been evolving as a central hub for the integration of mostly heterogeneous and autonomous data sources to provide homogenized data access. The next step in pushing database technology forward to play the role of an information marketplace is to actively notify registered users about incoming messages or changes in the underlying data set. Therefore, notification services may be seen as a generic term for subscription systems or, more general, data stream systems which both enable processing of standing queries over transient data. This article gives a comprehensive introduction into the context of notification services by outlining their differences to the classical query/response-based communication pattern, it illustrates potential application areas, and it discusses requirements addressing the underlying data management support. In more depth, this article describes the core concepts of the PubScribe project thereby choosing three different perspectives. From a first perspective, the subscription process and its mapping onto the primitive publish/subscribe communication pattern is explained. The second part focuses on a hybrid subscription data model by describing the basic constructs from a structural as well as an operational point of view. Finally, the PubScribe notification service project is characterized by a storage and processing model based on relational database technology. To summarize, this contribution introduces the idea of notification services from an application point of view by inverting the database approach and dealing with persistent queries and transient data. Moreover, the article provides an insight into database technology, which must be exploited and adopted to provide a solid base for a scalable notification infrastructure, using the PubScribe project as an example. info:eu-repo/classification/ddc/004 ddc:004
3	DrillBeyond: Processing Multi-Result Open World SQL Queries Eberius, Julian, Thiele, Maik, Braunschweig, Katrin, Lehner, Wolfgang 11 July 2022 (has links) In a traditional relational database management system, queries can only be defined over attributes defined in the schema, but are guaranteed to give single, definitive answer structured exactly as specified in the query. In contrast, an information retrieval system allows the user to pose queries without knowledge of a schema, but the result will be a top-k list of possible answers, with no guarantees about the structure or content of the retrieved documents. In this paper, we present DrillBeyond, a novel IR/RDBMS hybrid system, in which the user seamlessly queries a relational database together with a large corpus of tables extracted from a web crawl. The system allows full SQL queries over the relational database, but additionally allows the user to use arbitrary additional attributes in the query that need not to be defined in the schema. The system then processes this semi-specified query by computing a top-k list of possible query evaluations, each based on different candidate web data sources, thus mixing properties of RDBMS and IR systems. We design a novel plan operator that encapsulates a web data retrieval and matching system and allows direct integration of such systems into relational query processing. We then present methods for efficiently processing multiple variants of a query, by producing plans that are optimized for large invariant intermediate results that can be reused between multiple query evaluations. We demonstrate the viability of the operator and our optimization strategies by implementing them in PostgreSQL and evaluating on a standard benchmark by adding arbitrary attributes to its queries. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.0967 seconds