Global ETD Search

621	A latency comparison in a sharded database environment : A study between Vitess-MySQL and CockroachDB Lundh, Filip, Mohlin, Mikael January 2022 (has links) The world is becoming more and more digitized which in turn puts pressure on existing applications and systems to be able to handle large quantities of data. And in some cases, that data also needs to be operated in secure and isolated environments. To address these needs, a new category of databases has emerged, by the name of NewSQL. The downside of this new category is that it still remains unexplored in some areas, such as how each database under that category performs towards each other, or even towards databases belonging to other categories. One major aspect, in terms of performance is latency, since it affects the overall user-experience. In order to clear up some of the unexplored areas within NewSQL, two databases were studied in the context of their latency performance: CockroachDB and Vitess. The study was divided into two main parts. The first one, was a quantitative study, which was about gathering data on how each database performed in terms of latency when serving the create, read, update, and delete-operations. No clear differences in latency were found for the create- and read-operations. While the results for update- and delete-operations showed significant differences where Vitess had lower latency than CockroachDB. The second part of this study was a qualitative study, dedicated to analyze and inspect each database architecture and source code. The intention was to identify potential factors that may affect latency performance. The outcome from the analysis was that three main factors could be identified. The first identified factor is that CockroachDB had a layered architecture and that it needed to translate SQL queries into a set of key-value operations. The second one is that the databases makes use of different storage engines, which in turn can have differences in performance. The third and final identified factor is that MySQL, which was integrated with Vitess, had existed for a longer period of time compared to CockroachDB. Which indicates that the database probably has been more optimized over the years. Architecture Analysis CockroachDB Database Management System Distributed SQL Latency Comparison MySQL NewSQL Sharding Source Code Inspection Vitess Computer and Information Sciences Data- och informationsvetenskap
622	A Comparative Analysis of Database Management Systems for Time Series Data / En jämförelse av databashanteringssystem för tidsseriedata Verner-Carlsson, Tove, Lomanto, Valerio January 2023 (has links) Time series data refers to data recorded over time, often periodically, and can rapidly accumulate into vast quantities. To effectively present, analyse, or conduct research on such data it must be stored in an accessible manner. For convenient storage, database management systems (DBMSs) are employed. There are numerous types of such systems, each with their own advantages and disadvantages, making different trade-offs between desired qualities. In this study we conduct a performance comparison between two contrasting DBMSs for time series data. The first system evaluated is PostgreSQL, a popular relational DBMS, equipped with the time series-specific extension TimescaleDB. The second comparand is MongoDB, one of the most well-known and widely used NoSQL systems, with out-of-the-box time series tailoring. We address the question of which out of these DBMSs is better suited for time series data by comparing their query execution times. This involves setting up two databases populated with sample time series data — in our case, publicly available weather data from the Swedish Meteorological and Hydrological Institute. Subsequently, a set of trial queries designed to mimic real-world use cases are executed against each database, while measuring their runtimes. The benchmark results are compared and analysed query-by-query, to identify relative performance differences. Our study finds considerable variation in the relative performance of the two systems, with PostgreSQL outperforming MongoDB in some queries (by up to more than two orders of magnitude) and MongoDB resulting in faster execution in others (by a factor of over 30 in one case). Based on these findings, we conclude that certain queries, and their corresponding real-world use cases, may be better suited for one of the two DBMSs due to the alignment between query structure and the strengths of that system. We further explore other possible explanations for our results, elaborating on factors impacting the efficiency with which each DBMS can execute the provided queries, and consider potential improvements. / I takt med att mängden data världen över växer exponentiellt, ökar också behovet av effektiva lagringsmetoder. En ofta förekommande typ av data är tidsseriedata, där varje värde är associerat med en tidpunkt. Det kan till exempel vara något som mäts en gång om dagen, en gång i timmen, eller med någon annan periodicitet. Ett exempel på sådan data är klimat- och väderdata. Sveriges meteorologiska och hydrologiska institut samlar varje minut in mätvärden från tusentals mätstationer runt om i landet, så som lufttemperatur, vindhastighet och nederbördsmängd. Det leder snabbt till oerhört stora datamängder, som måste lagras för att effektivt kunna analyseras, förmedlas vidare, och bevaras för eftervärlden. Sådan lagring sker i databaser. Det finns många olika typer av databaser, där de vanligaste är relationella databaser och så kallande NoSQL-databaser. I den här uppsatsen undersöker vi två olika databashanteringssystem, och deras lämplighet för lagring av tidsseriedata. Specifikt jämför vi prestandan för det relationella databashanteringssystemet PostgreSQL, utökat med tillägget TimescaleDB som optimerar systemet för användande med tidsseriedata, och NoSQL-systemet MongoDB som har inbyggd tidsserieanpassning. Vi utför jämförelsen genom att implementera två databasinstanser, en per komparand, fyllda med SMHI:s väderdata och därefter mäta exekveringstiderna för ett antal utvalda uppgifter som relaterar till behandling av tidsseriedata. Studien konstaterar att inget av systemen genomgående överträffar det andra, utan det varierar beroende på uppgift. Resultaten indikerar att TimescaleDB är bättre på komplexa uppgifter och uppgifter som involverar att plocka ut all data inom ett visst tidsintervall, emedan MongoDB presterar bättre när endast data från en delmängd av mätstationerna efterfrågas. Database Management Systems PostgreSQL TimescaleDB MongoDB Time Series Database Comparison Performance Analysis Databashanteringssystem PostgreSQL TimescaleDB MongoDB Tidsserier Databasjämförelse Prestandaanalys Computer and Information Sciences Data- och informationsvetenskap
623	Database Selection Process in Very Small Enterprises in Software Development : A Case Study examining Factors, Methods, and Properties Adolfsson, Teodor, Sundin, Axel January 2023 (has links) This thesis investigates the database model selection process in VSEs, looking into how priorities and needs differ compared to what is proposed by existing theory in the area. The study was conducted as a case study of a two-person company engaged in developing various applications and performing consulting tasks. Data was collected through two semi-structured interviews. The first interview aimed to understand the company's process for selecting a database model, while the second interview focused on obtaining their perspective on any differences in their selection process compared to the theoretical recommendations and suggested methodology. The purpose was to investigate the important factors involved in the process and explore why and how they deviated from what the theory proposes. The study concludes that VSEs have different priorities compared to larger enterprises. Factors like transaction amount does not have to be considered much at the scale of a VSE. It is more important to look into the total cost of the database solution, including making sure that the selected technology is sufficiently efficient to use in development and relatively easy to maintain. Regarding selection methodology it was concluded that the time investment required to decide what is the best available database solution can be better spent elsewhere in the enterprise, and finding a good enough solution to get the wheels of the ground is likely a more profitable aim. Database selection process Database model Database management system Technology selection Very small entities Very small enterprises Software development Information Systems
624	Mobile Framework for Real-Time Database Management Jansson, Simon, Sandström, Theodor January 2017 (has links) The primary purpose of this thesis is to explore what issues may arise during development of a framework for handling and display of streamed real-time data. In addition to this, it also investigates how the display of different types of data, along with a change of execution platform, impacts execution time. Through the undertaking of two case studies, each split into developmental and an experimental phases, the thesis goes through the development of such a real-time data handling framework. The framework was developed in both stationary and mobile forms, and the developmental issues encountered along each of these paths are highlighted. Afterwards, the results gathered from performance tests run on each framework version were compared, in order to ascertain whether the handling and display of different data types, along with a change in execution platform, had had an impact upon the frameworks execution time. The results from the developmental observations revealed that the most commonly encountered issues were those relating to program latency, commonly due to sub-optimal program architecture along with connectivity issues encountered during data streaming. The second most encountered issue regarded the choice of an appropriate display method, in order to communicate changes in the displayed data along with correlation between several tracked data points. The results from the experimental comparisons revealed that while the impact on execution time caused by the use of calculated data, as opposed to raw data values, was marginal at most, a change of execution platform impacted said time drastically. By porting the framework to the mobile platform, the different processes whose execution time were measured during the tests experienced an increase in execution time ranging from 2405% all the way to 15860%. The authors recommend that the framework be developed towards gaining the ability to connect to any given relational database, and to handle and display the data therein, in order for it to have application areas other than as a test instrument. Further, the authors also recommend that additional tests be run on the framework using a wider variety of stationary and mobile devices, in order to determine whether the conclusions drawn from the results in the thesis hold up in the face of greater hardware variety. / Denna studies primära mål är att utforska vilka problem som kan uppstå under utveckling av ett ramverk för hantering och visande av streamad realtidsdata. Utöver det undersöks även hur visande av olika datatyper, ihop med ett byte av exekveringsplattform, påverkar exekveringstiden. Genom utförandet av två fallstudier, båda uppdelade i utvecklingsoch experimenteringsfaser, går denna studie igenom utvecklingen av ett sådant ramverk för hantering av realtidsdata. Ramverket utvecklades i både stationär och mobil form, och de utvecklingsrelaterade problem som påträffades i vardera fall belyses. Efteråt jämfördes resultaten framtagna genom prestandatester, som kördes på samtliga ramverksversioner, för att upptäcka om hantering och visning av olika datatyper, samt ett skifte av exekveringsplattform, hade påverkat ramverkets exekveringstid. Resultaten från de utvecklingsrelaterade observationerna visade att det mest påträffade problemet hade att göra med programlatens, vanligtvis p.g.a. ickeoptimal programarkitektur kombinerat med konnektivitetsproblem. Det näst mest påträffade problemet hade att göra med valet av en passande visningsmetod, för att kunna förmedla förändringar i den visade datan, samt korrelation mellan flera följda datapunkter. Resultaten från de experimentella jämförelserna visade att medan påverkan av exekveringstiden som uppstått genom användandet av kalkylerad data, till skillnad från rådatavärden, endast var marginell som bäst, påverkade förändringen av exekveringsplattform denna tid drastiskt. Genom att porta ramverket till den mobila plattformen upplevde de processer vars exekveringstid mättes under testerna en ökning från 2405% hela vägen upp till 15860%. Författarna rekommenderar att ramverket utvecklas mot förmågan att koppla till godtycklig databas, och att kunna hantera och visa datan från denna, för att ha ett användningsområde bortom användandet som testinstrument. Vidare rekommenderar även författarna att ytterliggare test utförs på ramverket med en större variation av stationära och mobila enheter, för att kunna bekräfta om slutsatserna som dragits utifrån resultaten av denna studie kvarstår efter att de utsatts för mer varierande hårdvara. Database management mobile development Java development real-time data metadata management Databashantering mobil utveckling Javautveckling realtidsdata metadatahantering Computer and Information Sciences Data- och informationsvetenskap
625	Generell DDL-Generering: metodik för olika databashanterare : Undersökning av metoder för generisk DDL-kod-generering över olika databassystem Gabrielsson, Andreas January 2023 (has links) Syftet med denna studie var att utveckla en generell applikation som kan generera DDL-skript från tre olika databaser: Oracle, SQL Server och DB2, genom att enbart använda en JDBC-uppkoppling. Behovet av denna studie kommer från att databasadministratörer och utvecklare effektivt ska kunna hantera databaser med olika system med varierande syntax och struktur. Processen genomfördes i IDEAn IntelliJ med java.sql-APIt för databasoperationer. Resultatet visade att trots skillnaderna mellan dessa databaser var det möjligt att utveckla en generell process för att extrahera DDL-kod med endast en JDBCuppkoppling. Dock krävdes vissa specifika anpassningar för varje databassystem. En observation var hanteringen av primärnycklar och index mellan systemen. Denna applikation har potential att vidareutvecklas till ett kraftfullt verktyg för databashantering, vilket sparar tid och resurser. Områden för framtida undersökning inkluderar hantering av komplexa datatyper och strukturer, samt prestanda med stora databaser. / This study was aimed at developing a generic application capable of generating DDL-code from three different databases: Oracle, SQL Server and DB2 by using JDBC. This research necessity origins from database administrators and developers need to effectively manage databases across different systems with different syntax and structure. The process was conducted in the IDEA IntelliJ using the java.sql-API for database operations. The result showed that despite the differences between these databases it was possible to develop a generic process for extracting DDL-code only using a JDBC connection. However, some specific adaptions were required for each database system. An observation was the managing of primary keys and indexes across the systems. This application has the potential to be developed further into a powerful tool for database management that saves time and resources. Areas for further investigation is handling of complex data types and structures and performance with large databases. Database Management Data Definition Language JDBC Oracle SQL Server DB2 Java DDL Databashantering Data Definition Language JDBC Oracle SQL Server DB2 Java DDL. Software Engineering Programvaruteknik
626	Data Management Support for Notification Services Lehner, Wolfgang 17 July 2023 (has links) Database management systems are highly specialized to efficiently organize and process huge amounts of data in a transactional manner. During the last years, however, database management systems have been evolving as a central hub for the integration of mostly heterogeneous and autonomous data sources to provide homogenized data access. The next step in pushing database technology forward to play the role of an information marketplace is to actively notify registered users about incoming messages or changes in the underlying data set. Therefore, notification services may be seen as a generic term for subscription systems or, more general, data stream systems which both enable processing of standing queries over transient data. This article gives a comprehensive introduction into the context of notification services by outlining their differences to the classical query/response-based communication pattern, it illustrates potential application areas, and it discusses requirements addressing the underlying data management support. In more depth, this article describes the core concepts of the PubScribe project thereby choosing three different perspectives. From a first perspective, the subscription process and its mapping onto the primitive publish/subscribe communication pattern is explained. The second part focuses on a hybrid subscription data model by describing the basic constructs from a structural as well as an operational point of view. Finally, the PubScribe notification service project is characterized by a storage and processing model based on relational database technology. To summarize, this contribution introduces the idea of notification services from an application point of view by inverting the database approach and dealing with persistent queries and transient data. Moreover, the article provides an insight into database technology, which must be exploited and adopted to provide a solid base for a scalable notification infrastructure, using the PubScribe project as an example. info:eu-repo/classification/ddc/004 ddc:004
627	State Management for Efficient Event Pattern Detection Zhao, Bo 20 May 2022 (has links) Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt. / Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency. Datenstromverarbeitung Complex event processing Mustererkennung Datenbankmanagementsystem Data stream processing Complex event processing Pattern detection Database management systems 004 Informatik ST 265 ddc:004
628	Memory-Efficient Frequent-Itemset Mining Schlegel, Benjamin, Gemulla, Rainer, Lehner, Wolfgang 15 September 2022 (has links) Efficient discovery of frequent itemsets in large datasets is a key component of many data mining tasks. In-core algorithms---which operate entirely in main memory and avoid expensive disk accesses---and in particular the prefix tree-based algorithm FP-growth are generally among the most efficient of the available algorithms. Unfortunately, their excessive memory requirements render them inapplicable for large datasets with many distinct items and/or itemsets of high cardinality. To overcome this limitation, we propose two novel data structures---the CFP-tree and the CFP-array---, which reduce memory consumption by about an order of magnitude. This allows us to process significantly larger datasets in main memory than previously possible. Our data structures are based on structural modifications of the prefix tree that increase compressability, an optimized physical representation, lightweight compression techniques, and intelligent node ordering and indexing. Experiments with both real-world and synthetic datasets show the effectiveness of our approach. info:eu-repo/classification/ddc/004 ddc:004
629	The design of a database of resources for rational therapy Steyn, Genevieve Lee 06 1900 (has links) The purpose of this study is to design a database of resources for rational therapy. An investigation of the current health situation and reorientation towards primary health care (PHC) in South Africa evidenced the need for a database of resources which would meet the demand for rational therapy information made on the Helderberg College Library by various user groups as well as make a contribution to the national health information infrastructure. Rational therapy is viewed as an approach within PHC that is rational, common-sense, wholistic and credible, focusing on the prevention and maintenance of health. A model of the steps in database design was developed. A user study identified users' requirements for design and the conceptual schema was developed. The entities, attributes, relationships and policies were presented and graphically summarised in an Entity-Relationship (E-R) diagram. The conceptual schema is the blueprint for further design and implementation of the database. / Information Science / M.Inf. Information systems Database Database design Primary health care Information science Entity-relationship model Conceptual schema rational therapy Data dictionary 005.756 Database management Relational databases Data dictionaries Medical care -- Databases
630	Role-based Data Management Jäkel, Tobias 29 May 2017 (has links) (PDF) Database systems build an integral component of today’s software systems and as such they are the central point for storing and sharing a software system’s data while ensuring global data consistency at the same time. Introducing the primitives of roles and their accompanied metatype distinction in modeling and programming languages, results in a novel paradigm of designing, extending, and programming modern software systems. In detail, roles as modeling concept enable a separation of concerns within an entity. Along with its rigid core, an entity may acquire various roles in different contexts during its lifetime and thus, adapts its behavior and structure dynamically during runtime. Unfortunately, database systems, as important component and global consistency provider of such systems, do not keep pace with this trend. The absence of a metatype distinction, in terms of an entity’s separation of concerns, in the database system results in various problems for the software system in general, for the application developers, and ﬁnally for the database system itself. In case of relational database systems, these problems are concentrated under the term role-relational impedance mismatch. In particular, the whole software system is designed by using different semantics on various layers. In case of role-based software systems in combination with relational database systems this gap in semantics between applications and the database system increases dramatically. Consequently, the database system cannot directly represent the richer semantics of roles as well as the accompanied consistency constraints. These constraints have to be ensured by the applications and the database system loses its single point of truth characteristic in the software system. As the applications are in charge of guaranteeing global consistency, their development requires more effort in data management. Moreover, the software system’s data management is distributed over several layers, which results in an unstructured software system architecture. To overcome the role-relational impedance mismatch and bring the database system back in its rightful position as single point of truth in a software system, this thesis introduces the novel and tripartite RSQL approach. It combines a novel database model that represents the metatype distinction as ﬁrst class citizen in a database system, an adapted query language on the database model’s basis, and ﬁnally a proper result representation. Precisely, RSQL’s logical database model introduces Dynamic Data Types, to directly represent the separation of concerns within an entity type on the schema level. On the instance level, the database model deﬁnes the notion of a Dynamic Tuple that combines an entity with the notion of roles and thus, allows for dynamic structure adaptations during runtime without changing an entity’s overall type. These deﬁnitions build the main data structures on which the database system operates. Moreover, formal operators connecting the query language statements with the database model data structures, complete the database model. The query language, as external database system interface, features an individual data deﬁnition, data manipulation, and data query language. Their statements directly represent the metatype distinction to address Dynamic Data Types and Dynamic Tuples, respectively. As a consequence of the novel data structures, the query processing of Dynamic Tuples is completely redesigned. As last piece for a complete database integration of a role-based notion and its accompanied metatype distinction, we specify the RSQL Result Net as result representation. It provides a novel result structure and features functionalities to navigate through query results. Finally, we evaluate all three RSQL components in comparison to a relational database system. This assessment clearly demonstrates the beneﬁts of the roles concept’s full database integration. Rollenkonzept Datenbankmanagementsystem DBMS RSQL Anfragesprache Datenbankmodell logische Operatorenen Role concept database management system DBMS RSQL query language database model compartment role object model logical database operators ddc:004 rvk:ST 270

Search results