Global ETD Search

111	Derby/S: A DBMS for Sample-Based Query Answering Klein, Anja, Gemulla, Rainer, Rösch, Philipp, Lehner, Wolfgang 10 November 2022 (has links) Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS. info:eu-repo/classification/ddc/004 ddc:004
112	Adaptive work placement for query processing on heterogeneous computing resources Karnagel, Thomas, Habich, Dirk, Wolfgang 10 November 2022 (has links) The hardware landscape is currently changing from homogeneous multi-core systems towards heterogeneous systems with many di↵erent computing units, each with their own characteristics. This trend is a great opportunity for database systems to increase the overall performance if the heterogeneous resources can be utilized eciently. To achieve this, the main challenge is to place the right work on the right computing unit. Current approaches tackling this placement for query processing assume that data cardinalities of intermediate results can be correctly estimated. However, this assumption does not hold for complex queries. To overcome this problem, we propose an adaptive placement approach being independent of cardinality estimation of intermediate results. Our approach is incorporated in a novel adaptive placement sequence. Additionally, we implement our approach as an extensible virtualization layer, to demonstrate the broad applicability with multiple database systems. In our evaluation, we clearly show that our approach significantly improves OLAP query processing on heterogeneous hardware, while being adaptive enough to react to changing cardinalities of intermediate query results. info:eu-repo/classification/ddc/004 ddc:004
113	Robust Real-time Query Processing with QStream Schmidt, Sven, Legler, Thomas, Schär, Sebastian, Lehner, Wolfgang 08 August 2023 (has links) Processing data streams with Quality-of-Service (QoS) guarantees is an emerging area in existing streaming applications. Although it is possible to negotiate the result quality and to reserve the required processing resources in advance, it remains a challenge to adapt the DSMS to data stream characteristics which are not known in advance or are difficult to obtain. Within this paper we present the second generation of our QStream DSMS which addresses the above challenge by using a real-time capable operating system environment for resource reservation and by applying an adaptation mechanism if the data stream characteristics change spontaneously. info:eu-repo/classification/ddc/004 ddc:004
114	Exploring Techniques for Providing Privacy in Location-Based Services Nearest Neighbor Query Asanya, John-Charles 01 January 2015 (has links) Increasing numbers of people are subscribing to location-based services, but as the popularity grows so are the privacy concerns. Varieties of research exist to address these privacy concerns. Each technique tries to address different models with which location-based services respond to subscribers. In this work, we present ideas to address privacy concerns for the two main models namely: the snapshot nearest neighbor query model and the continuous nearest neighbor query model. First, we address snapshot nearest neighbor query model where location-based services response represents a snapshot of point in time. In this model, we introduce a novel idea based on the concept of an open set in a topological space where points belongs to a subset called neighborhood of a point. We extend this concept to provide anonymity to real objects where each object belongs to a disjointed neighborhood such that each neighborhood contains a single object. To help identify the objects, we implement a database which dynamically scales in direct proportion with the size of the neighborhood. To retrieve information secretly and allow the database to expose only requested information, private information retrieval protocols are executed twice on the data. Our study of the implementation shows that the concept of a single object neighborhood is able to efficiently scale the database with the objects in the area. The size of the database grows with the size of the grid and the objects covered by the location-based services. Typically, creating neighborhoods, computing distances between objects in the area, and running private information retrieval protocols causes the CPU to respond slowly with this increase in database size. In order to handle a large number of objects, we explore the concept of kernel and parallel computing in GPU. We develop GPU parallel implementation of the snapshot query to handle large number of objects. In our experiment, we exploit parameter tuning. The results show that with parameter tuning and parallel computing power of GPU we are able to significantly reduce the response time as the number of objects increases. To determine response time of an application without knowledge of the intricacies of GPU architecture, we extend our analysis to predict GPU execution time. We develop the run time equation for an operation and extrapolate the run time for a problem set based on the equation, and then we provide a model to predict GPU response time. As an alternative, the snapshot nearest neighbor query privacy problem can be addressed using secure hardware computing which can eliminate the need for protecting the rest of the sub-system, minimize resource usage and network transmission time. In this approach, a secure coprocessor is used to provide privacy. We process all information inside the coprocessor to deny adversaries access to any private information. To obfuscate access pattern to external memory location, we use oblivious random access memory methodology to access the server. Experimental evaluation shows that using a secure coprocessor reduces resource usage and query response time as the size of the coverage area and objects increases. Second, we address privacy concerns in the continuous nearest neighbor query model where location-based services automatically respond to a change in object*s location. In this model, we present solutions for two different types known as moving query static object and moving query moving object. For the solutions, we propose plane partition using a Voronoi diagram, and a continuous fractal space filling curve using a Hilbert curve order to create a continuous nearest neighbor relationship between the points of interest in a path. Specifically, space filling curve results in multi-dimensional to 1-dimensional object mapping where values are assigned to the objects based on proximity. To prevent subscribers from issuing a query each time there is a change in location and to reduce the response time, we introduce the concept of transition and update time to indicate where and when the nearest neighbor changes. We also introduce a database that dynamically scales with the size of the objects in a path to help obscure and relate objects. By executing the private information retrieval protocol twice on the data, the user secretly retrieves requested information from the database. The results of our experiment show that using plane partitioning and a fractal space filling curve to create nearest neighbor relationships with transition time between objects reduces the total response time. Engineering
115	Partition-based SIMD Processing and its Application to Columnar Database Systems Hildebrandt, Juliana, Pietrzyk, Johannes, Krause, Alexander, Habich, Dirk, Lehner, Wolfgang 19 March 2024 (has links) The Single Instruction Multiple Data (SIMD) paradigm became a core principle for optimizing query processing in columnar database systems. Until now, only the LOAD/STORE instructions are considered to be efficient enough to achieve the expected speedups, while avoiding GATHER/SCATTER is considered almost imperative. However, the GATHER instruction offers a very flexible way to populate SIMD registers with data elements coming from non-consecutive memory locations. As we will discuss within this article, the GATHER instruction can achieve the same performance as the LOAD instruction, if applied properly. To enable the proper usage, we outline a novel access pattern allowing fine-grained, partition-based SIMD implementations. Then, we apply this partition-based SIMD processing to two representative examples from columnar database systems to experimentally demonstrate the applicability and efficiency of our new access pattern. info:eu-repo/classification/ddc/004 ddc:004
116	Sample Footprints für Data-Warehouse-Datenbanken Rösch, Philipp, Lehner, Wolfgang 20 January 2023 (has links) Durch stetig wachsende Datenmengen in aktuellen Data-Warehouse-Datenbanken erlangen Stichproben eine immer größer werdende Bedeutung. Insbesondere interaktive Analysen können von den signifikant kürzeren Antwortzeiten der approximativen Anfrageverarbeitung erheblich profitieren. Linked-Bernoulli-Synopsen bieten in diesem Szenario speichereffiziente, schemaweite Synopsen, d. h. Synopsen mit Stichproben jeder im Schema enthaltenen Tabelle bei minimalem Mehraufwand für die Erhaltung der referenziellen Integrität innerhalb der Synopse. Dies ermöglicht eine effiziente Unterstützung der näherungsweisen Beantwortung von Anfragen mit beliebigen Fremdschlüsselverbundoperationen. In diesem Artikel wird der Einsatz von Linked-Bernoulli-Synopsen in Data-Warehouse-Umgebungen detaillierter analysiert. Dies beinhaltet zum einen die Konstruktion speicherplatzbeschränkter, schemaweiter Synopsen, wobei unter anderem folgende Fragen adressiert werden: Wie kann der verfügbare Speicherplatz auf die einzelnen Stichproben aufgeteilt werden? Was sind die Auswirkungen auf den Mehraufwand? Zum anderen wird untersucht, wie Linked-Bernoulli-Synopsen für die Verwendung in Data-Warehouse-Datenbanken angepasst werden können. Hierfür werden eine inkrementelle Wartungsstrategie sowie eine Erweiterung um eine Ausreißerbehandlung für die Reduzierung von Schätzfehlern approximativer Antworten von Aggregationsanfragen mit Fremdschlüsselverbundoperationen vorgestellt. Eine Vielzahl von Experimenten zeigt, dass Linked-Bernoulli-Synopsen und die in diesem Artikel präsentierten Verfahren vielversprechend für den Einsatz in Data-Warehouse-Datenbanken sind. / With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. In this scenario, Linked Bernoulli Synopses provide memory-efficient schema-level synopses, i. e., synopses that consist of random samples of each table in the schema with minimal overhead for retaining foreign-key integrity within the synopsis. This provides efficient support to the approximate answering of queries with arbitrary foreign-key joins. In this article, we focus on the application of Linked Bernoulli Synopses in data warehouse environments. On the one hand, we analyze the instantiation of memory-bounded synopses. Among others, we address the following questions: How can the given space be partitioned among the individual samples? What is the impact on the overhead? On the other hand, we consider further adaptations of Linked Bernoulli Synopses for usage in data warehouse databases. We show how synopses can incrementally be kept up-to-date when the underlying data changes. Further, we suggest additional outlier handling methods to reduce the estimation error of approximate answers of aggregation queries with foreign-key joins. With a variety of experiments, we show that Linked Bernoulli Synopses and the proposed techniques have great potential in the context of data warehouse databases. info:eu-repo/classification/ddc/004 ddc:004
117	Using Vocabulary Mappings for Federated RDF Query Processing / Att använda vokabulär mappning för federerad RDF frågebehandling Winneroth, Juliette January 2023 (has links) Federated RDF querying systems provide an interface to multiple autonomous RDF data sources, allowing a user to execute a SPARQL query on multiple data sources at once and get one unified result. When these autonomous data sources use different vocabularies, the SPARQL query must be rewritten to the vocabulary of the data source in order to get the desired results. This thesis describes how vocabulary mappings can be used to rewrite SPARQL queries for federated RDF query processing. In this thesis, different types of vocabulary mappings are explored to find a suitable vocabulary mapping representation to use in formulating an approach for query rewriting. The approach describes how the SPARQL subqueries and solution mappings can be rewritten in order to handle heterogeneous vocabularies. The thesis then presents how the query federation engine HeFQUIN is extended to rewrite the federated queries and their results. A final evaluation of the implementation shows how implementing a query rewriting approach can improve the federated query engine’s execution times. federated query processing RDF vocabulary mapping query federation engine SPARQL Natural Sciences Naturvetenskap Computer and Information Sciences Data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi)
118	Query optimization by using derivability in a data warehouse environment Albrecht, Jens, Hümmer, Wolfgang, Lehner, Wolfgang, Schlesinger, Lutz 10 January 2023 (has links) Materialized summary tables and cached query results are frequently used for the optimization of aggregate queries in a data warehouse. Query rewriting techniques are incorporated into database systems to use those materialized views and thus avoid the access of the possibly huge raw data. A rewriting is only possible if the query is derivable from these views. Several approaches can be found in the literature to check the derivability and find query rewritings. The specific application scenario of a data warehouse with its multidimensional perspective allows the consideration of much more semantic information, e.g. structural dependencies within the dimension hierarchies and different characteristics of measures. The motivation of this article is to use this information to present conditions for derivability in a large number of relevant cases which go beyond previous approaches. info:eu-repo/classification/ddc/004 ddc:004
119	A database accelerator for energy-efficient query processing and optimization Lehner, Wolfgang, Haas, Sebastian, Arnold, Oliver, Scholze, Stefan, Höppner, Sebastian, Ellguth, Georg, Dixius, Andreas, Ungethüm, Annett, Mier, Eric, Nöthen, Benedikt, Matúš, Emil, Schiefer, Stefan, Cederstroem, Love, Pilz, Fabian, Mayr, Christian, Schüffny, Renè, Fettweis, Gerhard P. 12 January 2023 (has links) Data processing on a continuously growing amount of information and the increasing power restrictions have become an ubiquitous challenge in our world today. Besides parallel computing, a promising approach to improve the energy efficiency of current systems is to integrate specialized hardware. This paper presents a Tensilica RISC processor extended with an instruction set to accelerate basic database operators frequently used in modern database systems. The core was taped out in a 28 nm SLP CMOS technology and allows energy-efficient query processing as well as query optimization by applying selectivity estimation techniques. Our chip measurements show an 1000x energy improvement on selected database operators compared to state-of-the-art systems. info:eu-repo/classification/ddc/005 ddc:005
120	MorphStore — In-Memory Query Processing based on Morphing Compressed Intermediates LIVE Habich, Dirk, Damme, Patrick, Ungethüm, Annett, Pietrzyk, Johannes, Krause, Alexander, Hildebrandt, Juliana, Lehner, Wolfgang 15 September 2022 (has links) In this demo, we present MorphStore, an in-memory column store with a novel compression-aware query processing concept. Basically, compression using lightweight integer compression algorithms already plays an important role in existing in-memory column stores, but mainly for base data. The continuous handling of compression from the base data to the intermediate results during query processing has already been discussed, but not investigated in detail since the computational effort for compression as well as decompression is often assumed to exceed the benefits of a reduced transfer cost between CPU and main memory. However, this argument increasingly loses its validity as we are going to show in our demo. Generally, our novel compression-aware query processing concept is characterized by the fact that we are able to speed up the query execution by morphing compressed intermediate results from one scheme to another scheme to dynamically adapt to the changing data characteristics during query processing. Our morphing decisions are made using a cost-based approach. info:eu-repo/classification/ddc/004 ddc:004

Search results