Spelling suggestions: "subject:"database systems"" "subject:"catabase systems""
91 |
Poster session: Constrained dynamic physical database designLehner, Wolfgang, Voigt, Hannes, Salem, Kenneth 12 August 2022 (has links)
Physical design has always been an important part of database administration. Today's commercial database management systems offer physical design tools, which recommend a physical design for a given workload. However, these tools work only with static workloads and ignore the fact that workloads, and physical designs, may change over time. Research has now begun to focus on dynamic physical design, which can account for time-varying workloads. In this paper, we consider a dynamic but constrained approach to physical design. The goal is to recommend dynamic physical designs that reflect major workload trends but that are not tailored too closely to the details of the input workloads. To achieve this, we constrain the number of changes that are permitted in the recommended design. In this paper we present our definition of the constrained dynamic physical design problem and discuss several techniques for solving it.
|
92 |
Towards Aspectual Component-Based Real-Time System DevelopmentTešanović, Aleksandra January 2003 (has links)
Increasing complexity of real-time systems and demands for enabling their configurability and tailorability are strong motivations for applying new software engineering principles such as aspect-oriented and component-based software development. The integration of these two techniques into real-time systems development would enable: (i) efficient system configuration from the components in the component library based on the system requirements, (ii) easy tailoring of components and/or a system for a specific application by changing the behavior (code) of the component by aspect weaving, and (iii) enhanced flexibility of the real-time and embedded software through the notion of system configurability and component tailorability. In this thesis we focus on applying aspect-oriented and component-based software development to real-time system development. We propose a novel concept of aspectual component-based real-time system development (ACCORD). ACCORD introduces the following into real-time system development: (i) a design method that assumes the decomposition of the real-time system into a set of components and a set of aspects, (ii) a real-time component model denoted RTCOM that supports aspect weaving while enforcing information hiding, (iii) a method and a tool for performing worst-case execution time analysis of different configurations of aspects and components, and (iv) a new approach to modelling of real-time policies as aspects. We present a case study of the development of a configurable real-time database system, called COMET, using ACCORD principles. In the COMET example we show that applying ACCORD does have an impact on the real-time system development in providing efficient configuration of the real-time system. Thus, it could be a way for improved reusability and flexibility of real-time software, and modularization of crosscutting concerns. In connection with development of ACCORD, we identify criteria that a design method for component-based real-time systems needs to address. The criteria include a well-defined component model for real-time systems, aspect separation, support for system configuration, and analysis of the composed real-time system. Using the identified set of criteria we provide an evaluation of ACCORD. In comparison with other approaches, ACCORD provides a distinct classification of crosscutting concerns in the real-time domain into different types of aspects, and provides a real-time component model that supports weaving of aspects into the code of a component, as well as a tool for temporal analysis of the weaved system. / <p>Report code: LiU-TEK-LIC-2003:23.</p>
|
93 |
Compilation Techniques, Algorithms, and Data Structures for Efficient and Expressive Data Processing SystemsSupun Madusha Bandara Abeysinghe Tennakoon Mudiyanselage (17454786) 30 November 2023 (has links)
<pre>The proliferation of digital data, driven by factors like social media, e-commerce, etc., has created an increasing demand for highly processed data at higher levels of fidelity, which puts increasing demands on modern data processing systems. In the past, data processing systems faced bottlenecks due to limited main memory availability. However, as main memory becomes more abundant, their optimization focus has shifted from disk I/O to optimized computation through techniques like compilation. This dissertation addresses several critical limitations within such compilation-based data processing systems.<br><br>In modern data analytics pipelines, combination of workloads from various paradigms, such as traditional DBMS and Machine Learning, is common. <br>These pipelines are typically managed by specialized systems designed for specific workload types. While these specialized systems optimize their individual performance, substantial performance loss occurs when they are combined to handle mixed workloads. This loss is mainly due to overheads at system boundaries, including data copying and format conversions, as well as the general inability to perform cross-system optimizations.<br><br>This dissertation tackles this problem in two angles. First, it proposes an efficient post-hoc integration of individual systems using generative programming via the construction of common intermediate layers. This approach preserves the best-of-breed performance of individual workloads while achieving state-of-the-art performance for combined workloads. Second, we introduce a high-level query language capable of expressing various workload types, acting as a general substrate to implement combined workloads. This allows the generation of optimized code for end-to-end workloads through<br>the construction of an intermediate representation (IR).<br><br>The dissertation then shifts focus to data processing systems used for incremental view maintenance (IVM). While existing IVM systems achieve high performance through compilation and novel algorithms, they have limitations in handling specific query classes. Notably, they are incapable of handling queries involving correlated nested aggregate subqueries. To address this, our work proposes a novel indexing scheme based on a new data structure and a corresponding set of algorithms that fully incrementalize such queries. This approach result in substantial asymptotic speedups and order-of-magnitude performance improvements for workloads of practical importance.<br><br>Finally, the dissertation explores efficient and expressive fixed-point computations, with a focus on Datalog--a language widely used for declarative program analysis. Although existing Datalog engines rely on compilation and specialized code generation to achieve performance, they lack the flexibility to support extensions required for complex program analysis. Our work introduces a new Datalog engine built using generative programming techniques that offers both flexibility and state-of-the-art performance through specialized code generation.</pre><p></p>
|
94 |
A Benchmark Framework for Data Compression TechniquesDamme, Patrick, Habich, Dirk, Lehner, Wolfgang 03 February 2023 (has links)
Lightweight data compression is frequently applied in main memory database systems to improve query performance. The data processed by such systems is highly diverse. Moreover, there is a high number of existing lightweight compression techniques. Therefore, choosing the optimal technique for a given dataset is non-trivial. Existing approaches are based on simple rules, which do not suffice for such a complex decision. In contrast, our vision is a cost-based approach. However, this requires a detailed cost model, which can only be obtained from a systematic benchmarking of many compression algorithms on many different datasets. A naïve benchmark evaluates every algorithm under consideration separately. This yields many redundant steps and is thus inefficient. We propose an efficient and extensible benchmark framework for compression techniques. Given an ensemble of algorithms, it minimizes the overall run time of the evaluation. We experimentally show that our approach outperforms the naïve approach.
|
95 |
DATA INTEGRITY IN THE HEALTHCARE INDUSTRY: ANALYZING THE EFFECTIVENESS OF DATA SECURITY IN GOOD DATA AND RECORD MANAGEMENT PRACTICES (A CASE STUDY OF COMPUTERIZING THE COMPETENCE MATRIX FOR A QUALITY CONTROL DRUG LABORATORY)Marcel C Okezue (12522565) 06 October 2022 (has links)
<p> </p>
<p>This project analyzes the concept of time efficiency in the data management process associated with the personnel training and competence assessments in the quality control (QC) laboratory of Nigeria’s foods and drugs authority (NAFDAC). The laboratory administrators are encumbered with a lot of mental and paper-based record keeping because the personnel training data is managed manually. Consequently, the personnel training and competence assessments in the laboratory are not efficiently conducted. The Microsoft Excel spreadsheet provided by a Purdue doctoral dissertation as a remedial to this is found to be deficient in handling operations in database tables. As a result, hence doctoral dissertation did not appropriately address the inefficiencies.</p>
<p>The problem addressed by this study is the operational inefficiency that results from the manual or Excel-based personnel training data management process in the NAFDAC laboratory. The purpose, therefore, is to reduce the time it essentially takes to generate, obtain, manipulate, exchange, and securely store the personnel competence training and assessment data. To do this, the study developed a software system that is integrated with a relational database management system (RDBMS) to improve the manual/Microsoft Excel-based data management procedure. This project examines the operational (time) efficiencies in using manual or Excel-based format in comparison with the new system that this project developed, as a method to ascertain its validity.</p>
<p>The data used in this qualitative research is from literary sources and from simulating the distinction between the times spent in administering personnel training and competence assessment using the New system developed by this study and the Excel system by another project, respectively. The fundamental finding of this study is that the idea of improving the operational (time) efficiency in the personnel training and competence assessment process in the QC laboratory is valid. Doing that will reduce human errors, achieve enhanced time-efficient operation, and improve personnel training and competence assessment processes.</p>
<p>Recommendations are made as to the procedure the laboratory administrator must adopt to take advantage of the new system. The study also recommended the steps for the potential research to extend the capability of this project. </p>
|
96 |
Make Larger Vector Register Sizes New Challenges?: Lessons Learned from the Area of Vectorized Lightweight Compression AlgorithmsHabich, Dirk, Damme, Patrick, Ungethüm, Annett, Lehner, Wolfgang 15 September 2022 (has links)
The exploitation of data as well as hardware properties is a core aspect for efficient data management. This holds in particular for the field of in-memory data processing. Aside from increasing main memory capacities, in-memory data processing also benefits from novel processing concepts based on lightweight compressed data. To speed up compression as well as decompression, an active research field deals with the specialization of these algorithms to hardware features such as vectorization using SIMD instructions. Most of the vectorized implementations have been proposed for 128 bit vector registers. However, hardware vendors still increase the vector register sizes, whereby a straightforward transformation to these wider vector sizes is possible in most-cases. Thus, we systematically investigated the impact of different SIMD instruction set extensions with wider vector sizes on the behavior of straightforward transformed implementations. In this paper, we will describe our evaluation methodology and present selective results of our exhaustive evaluation. In particular, we will highlight some challenges and present first approaches to tackle them.
|
97 |
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data RecoveryOukid, Ismail, Booss, Daniel, Lehner, Wolfgang, Bumbulis, Peter, Willhalm, Thomas 19 September 2022 (has links)
Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput.
|
98 |
Utvidgning av SQL-bibliotek i kompilatorn Storm : En jämförande studie av SQL-satser och datatyper i olika databashanterareWesterberg Jernström, Simon, Häger, Hanna January 2023 (has links)
This thesis investigates the possibilities of further developing an SQL library in the interactive compiler Storm. Currently, the library only supports the database handler SQLite, and the ambition is to add support for more database systems. To achieve this addition, a common SQL syntax that is compatible with different database systems is required. The main challenge of developing a common SQL syntax is that each database system has its own variations and deviations from the standard SQL. The study focuses on a comparison between database handlers with the aim of identifying common SQL statements and data types. The goal is to establish a common subset that can be used as the foundation for the general SQL syntax in Storm. The results of the study confirm that it is possible to develop a general SQL syntax in Storm. In addition to the comparative study, the work also involves implementing support for the database handler MariaDB in Storm. / Detta examensarbete undersöker möjligheterna att vidareutveckla ett SQL-bibliotek i den interaktiva kompilatorn Storm. Vidareutvecklingen innebär tillägg av stöd för fler databashanterare, då SQL-biblioteket för närvarande endast stödjer SQLite. För att möjliggöra tillägget krävs en generell SQL-syntax som är kompatibel med olika databashanterare. Utmaningen som uppstår vid framtagandet av en generell syntax är databasernas variationer och avvikelser från varandra och SQL-standarden. Denna studie fokuserar på en jämförelse mellan databashanterarna med syfte att identifiera gemensamma SQL-satser och datatyper. Målet är att ta fram ett gemensamt subset som kan stå till grund för den generella syntaxen i SQL-biblioteket i Storm. Utifrån studiens resultat kan det fastställas att framtagning av en generell SQL-syntax i Storm är genomförbar. Utöver den jämförande studien innebär dessutom arbetet en implementering av stöd för databasen MariaDB i Storm.
|
99 |
Multilingual Information Processing On Relaltional Database ArchitecturesKumaran, A 12 1900 (has links) (PDF)
No description available.
|
100 |
Spatial Indexing on Flash-based Solid State Drives / Espacial em Dispositivos de Estado Sólido baseados em Memória FlashCarniel, Anderson Chaves 21 December 2018 (has links)
Spatial database systems widely employ spatial indexing structures to speed up the processing of spatial queries. Many of the proposed spatial indices in the literature, such as the R-tree, assume magnetic disks (i.e., HDDs) as the underlying storage device. They are termed as disk-based spatial indices. On the other hand, several spatial database applications are increasingly using flash-based Solid State Drives (SSDs) and thus, designing spatial indices for these storage devices has gained increasing attention. This is due the fact that, compared to HDDs, SSDs offer smaller size, lighter weight, lower power consumption, better shock resistance, and faster reads and writes. Hence, specific indices for SSDs, termed flash-aware spatial indices, have been proposed in the literature to deal with the intrinsic characteristics of SSDs, such as the asymmetric costs of reads and writes. However, the research to date has not been able to establish a flash-aware spatial index that actually exploits all the benefits of SSDs. This PhD thesis advances on the literature as follows. We firstly define a methodology to create spatial datasets for experimental evaluations. We also propose FESTIval, a versatile framework that provides a common and unique environment to execute experimental evaluations. Such contributions served as a foundation to conduct performance analysis along this PhD work. By using this foundation, we analyze the performance behavior of spatial indices on different storage devices, such as HDDs and SSDs. Further, we discuss the applicability of employing flash simulators on the evaluation of spatial indices. The findings of these experiments contributed to the proposal of eFIND, a generic and efficient framework for flash-aware spatial indexing. eFIND is generic because it can port a wide range of disk-based spatial indices to SSDs. eFIND is also efficient because it is based on a set of design goals that exploits SSD performance. Performance tests showed that, compared to the state of the art, eFIND improved the construction of ported disk-based spatial indices and the execution of spatial queries. For porting the R-tree (i.e., the eFIND R-tree), eFIND showed performance reductions from 43% to 77% to build spatial indices, and from 4% to 23% to execute spatial queries. For porting the xBR+-tree (i.e., the eFIND xBR+-tree), eFIND showed reductions from 28% to 83% to build spatial indices and up to 35% in the spatial query processing. / Sistemas de banco de dados espaciais empregam estruturas de indexação espaciais para acelerar o processamento de consultas espaciais. Muitos dos índices espaciais propostos na literatura, como a R-tree, assumem que os dispositivos de armazenamentos são os discos magnéticos (i.e., HDDs) e são denominados índices espaciais baseados em disco. Por outro lado, várias aplicações de banco de dados espaciais estão cada vez mais usando Solid State Drives (SSDs) baseados em memória flash e, assim, projetar índices espaciais para esses dispositivos tem ganhado cada vez mais atenção. Isso se deve ao fato de que, em comparação com os HDDs, os SSDs oferecem menor tamanho, menor peso, menor consumo de energia, melhor resistência a choques além de leituras e escritas mais rápidas. Assim, índices espaciais para memória flash têm sido propostos na literatura para lidar com as características intrínsecas dos SSDs, como os custos assimétricos de leituras e escritas. No entanto, a pesquisa até o momento não conseguiu estabelecer um índice espacial que realmente explora todos os benefícios dos SSDs. Esta tese de doutorado avança na literatura da seguinte forma. Primeiramente, é definida uma metodologia para criar conjuntos de dados espaciais para avaliações experimentais. Também é proposto FESTIval, um arcabouço versátil que fornece um ambiente comum e único para executar avaliações experimentais. Tais contribuições serviram como base para conduzir análises de desempenho ao longo deste trabalho de doutorado. Usando essa base, o comportamento de desempenho de índices espaciais em diferentes dispositivos de armazenamento, como HDDs e SSDs, é analisado. Além disso, discutese a aplicabilidade de simuladores flash na avaliação experimental de índices espaciais. Os resultados desses experimentos contribuíram para a proposta de eFIND, uma estrutura genérica e eficiente para indexação espacial em memórias flash. eFIND é genérico porque pode portar uma ampla gama de índices espaciais baseados em disco para SSDs. eFIND também é eficiente porque é baseado em um conjunto de objetivos de projeto que exploram o desempenho do SSD. Os testes de desempenho mostraram que, em comparação com o estado da arte, eFIND melhorou a construção de índices espaciais portados e a execução de consultas espaciais. Para portar a R-tree (ou seja, a eFIND R-tree), eFIND mostrou melhorias de desempenho de 43% a 77% para construir índices espaciais e de 4% a 23% para executar consultas espaciais. Para portar a xBR+-tree (ou seja, a eFIND xBR+-tree), eFIND mostrou melhorias de 28% a 83% para construir índices espaciais e de até 35% no processamento de consultas espaciais.
|
Page generated in 0.0484 seconds