Global ETD Search

41	Object oriented database management systems Nassis, Antonios 11 1900 (has links) Modern data intensive applications, such as multimedia systems require the ability to store and manipulate complex data. The classical Database Management Systems (DBMS), such as relational databases, cannot support these types of applications efficiently. This dissertation presents the salient features of Object Database Management Systems (ODBMS) and Persistent Programming Languages (PPL), which have been developed to address the data management needs of these difficult applications. An 'impedance mismatch' problem occurs in the traditional DBMS because the data and computational aspects of the application are implemented using two different systems, that of query and programming language. PPL's provide facilities to cater for both persistent and transient data within the same language, hence avoiding the impedance mismatch problem. This dissertation presents a method of implementing a PPL by extending the language C++ with pre-compiled classes. The classes are first developed and then used to implement object persistence in two simple applications. / Computing / M. Sc. (Information Systems) Database management systems Database programming languages Object orientated management systems Object orientated technology Object persistence Persistence Persistant programming languages Pointer persistence 005.757 Database management Object-oriented databases
42	An evaluation of non-relational database management systems as suitable storage for user generated text-based content in a distributed environment Du Toit, Petrus 07 October 2016 (has links) Non-relational database management systems address some of the limitations relational database management systems have when storing large volumes of unstructured, user generated text-based data in distributed environments. They follow different approaches through the data model they use, their ability to scale data storage over distributed servers and the programming interface they provide. An experimental approach was followed to measure the capabilities these alternative database management systems present in their approach to address the limitations of relational databases in terms of their capability to store unstructured text-based data, data warehousing capabilities, ability to scale data storage across distributed servers and the level of programming abstraction they provide. The results of the research highlighted the limitations of relational database management systems. The different database management systems do address certain limitations, but not all. Document-oriented databases provide the best results and successfully address the need to store large volumes of user generated text-based data in a distributed environment / School of Computing / M. Sc. (Computer Science) Relational databases Database performance measurement Distributed databases Column-oriented databases Key/value databases Database benchmarking Database management systems Horizontal scalability 005.756 Non-relational databases Database management
43	A Comparative Analysis of Database Management Systems for Time Series Data / En jämförelse av databashanteringssystem för tidsseriedata Verner-Carlsson, Tove, Lomanto, Valerio January 2023 (has links) Time series data refers to data recorded over time, often periodically, and can rapidly accumulate into vast quantities. To effectively present, analyse, or conduct research on such data it must be stored in an accessible manner. For convenient storage, database management systems (DBMSs) are employed. There are numerous types of such systems, each with their own advantages and disadvantages, making different trade-offs between desired qualities. In this study we conduct a performance comparison between two contrasting DBMSs for time series data. The first system evaluated is PostgreSQL, a popular relational DBMS, equipped with the time series-specific extension TimescaleDB. The second comparand is MongoDB, one of the most well-known and widely used NoSQL systems, with out-of-the-box time series tailoring. We address the question of which out of these DBMSs is better suited for time series data by comparing their query execution times. This involves setting up two databases populated with sample time series data — in our case, publicly available weather data from the Swedish Meteorological and Hydrological Institute. Subsequently, a set of trial queries designed to mimic real-world use cases are executed against each database, while measuring their runtimes. The benchmark results are compared and analysed query-by-query, to identify relative performance differences. Our study finds considerable variation in the relative performance of the two systems, with PostgreSQL outperforming MongoDB in some queries (by up to more than two orders of magnitude) and MongoDB resulting in faster execution in others (by a factor of over 30 in one case). Based on these findings, we conclude that certain queries, and their corresponding real-world use cases, may be better suited for one of the two DBMSs due to the alignment between query structure and the strengths of that system. We further explore other possible explanations for our results, elaborating on factors impacting the efficiency with which each DBMS can execute the provided queries, and consider potential improvements. / I takt med att mängden data världen över växer exponentiellt, ökar också behovet av effektiva lagringsmetoder. En ofta förekommande typ av data är tidsseriedata, där varje värde är associerat med en tidpunkt. Det kan till exempel vara något som mäts en gång om dagen, en gång i timmen, eller med någon annan periodicitet. Ett exempel på sådan data är klimat- och väderdata. Sveriges meteorologiska och hydrologiska institut samlar varje minut in mätvärden från tusentals mätstationer runt om i landet, så som lufttemperatur, vindhastighet och nederbördsmängd. Det leder snabbt till oerhört stora datamängder, som måste lagras för att effektivt kunna analyseras, förmedlas vidare, och bevaras för eftervärlden. Sådan lagring sker i databaser. Det finns många olika typer av databaser, där de vanligaste är relationella databaser och så kallande NoSQL-databaser. I den här uppsatsen undersöker vi två olika databashanteringssystem, och deras lämplighet för lagring av tidsseriedata. Specifikt jämför vi prestandan för det relationella databashanteringssystemet PostgreSQL, utökat med tillägget TimescaleDB som optimerar systemet för användande med tidsseriedata, och NoSQL-systemet MongoDB som har inbyggd tidsserieanpassning. Vi utför jämförelsen genom att implementera två databasinstanser, en per komparand, fyllda med SMHI:s väderdata och därefter mäta exekveringstiderna för ett antal utvalda uppgifter som relaterar till behandling av tidsseriedata. Studien konstaterar att inget av systemen genomgående överträffar det andra, utan det varierar beroende på uppgift. Resultaten indikerar att TimescaleDB är bättre på komplexa uppgifter och uppgifter som involverar att plocka ut all data inom ett visst tidsintervall, emedan MongoDB presterar bättre när endast data från en delmängd av mätstationerna efterfrågas. Database Management Systems PostgreSQL TimescaleDB MongoDB Time Series Database Comparison Performance Analysis Databashanteringssystem PostgreSQL TimescaleDB MongoDB Tidsserier Databasjämförelse Prestandaanalys Computer and Information Sciences Data- och informationsvetenskap
44	State Management for Efficient Event Pattern Detection Zhao, Bo 20 May 2022 (has links) Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt. / Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency. Datenstromverarbeitung Complex event processing Mustererkennung Datenbankmanagementsystem Data stream processing Complex event processing Pattern detection Database management systems 004 Informatik ST 265 ddc:004
45	Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System Kernert, David 20 September 2016 (has links) (PDF) Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists. Main-Memory Datenbanksysteme Spaltenorientierte DBMS Lineare Algebra Implementierung Matrixdatenstrukturen Ausdrucksoptimierung in-memory database management systems column-oriented DBMS linear algebra implementation matrix data structures expression optimization ddc:004 rvk:ST 270
46	網際網路資料庫系統績效評估方法之研究--以電子商務為例 / Web database systems benchmark method - electronic commerce orientation 程文成, Cheng, Wen-Cheng Unknown Date (has links) 網路開放商業化以來，各式各樣的商機在網際網路上浮現，大小商家紛在網路上設立據點，或從事廣告、宣傳公司形象，或展示產品內容與產品功能，有的更進一步的設計能與消費者在網路上直接交易，我們概括性的稱之為電子商務。在電子商務的背後，各網站必須要有資料庫作支撐，才能有好的效率和服務。本研究以網際網路資料庫為探討主體，績效為探討目的。並主要以績效評估(Benchmark)理論為貫穿整個研究的基礎理論，一步步地建立適合網際網路資料庫及運算作績效評估的模型。並於模型完成後，予以程式化實作，並以所得數據作測試。 / A lot of business opportunity springs after commercial activities are officially allowed on Internet. The power and the potential of this market is represented by lots of web sites being set up. "Electronic Commerce" is coming to the whole world. However, it is the database supporting business web sites running that counts. Based upon the theory of benchmarking, we want to find out ways that we can know the performance of web database. After the model is set up and the workload is ready, an implementation is being made. 網際網路關連式資料庫電子商務績效評估 WWW Relational database management systems Electronic commerce Database benchmark Internet Performance measurement and evaluation
47	Energy-Efficient In-Memory Database Computing Lehner, Wolfgang 27 June 2013 (has links) (PDF) The efficient and flexible management of large datasets is one of the core requirements of modern business applications. Having access to consistent and up-to-date information is the foundation for operational, tactical, and strategic decision making. Within the last few years, the database community sparked a large number of extremely innovative research projects to push the envelope in the context of modern database system architectures. In this paper, we outline requirements and influencing factors to identify some of the hot research topics in database management systems. We argue that—even after 30 years of active database research—the time is right to rethink some of the core architectural principles and come up with novel approaches to meet the requirements of the next decades in data management. The sheer number of diverse and novel (e.g., scientific) application areas, the existence of modern hardware capabilities, and the need of large data centers to become more energy-efficient will be the drivers for database research in the years to come. Datenbank Datenbanksystem DBS Datenbankmanagementsystem DBMS Sonderforschungsbereich 912 Hochadaptive Energieeffiziente Systeme database system database management systems Big Data Collaborative Research Centre 912 ddc:004 rvk:ST 270
48	Evaluating Mitigations For Meltdown and Spectre : Benchmarking performance of mitigations against database management systems with OLTP workload / Bedömining Av Mitigering Mot Spectre och Meltdown : Prestandamätningar av databashanteringssystem efter mitigering mot Spectre och Meltdown med OLTP arbetsbelastning Nilsson, Victor January 2018 (has links) With Spectre and Meltdown out in the public, a rushed effort was made to patch these vulnerabilities by operating system vendors. However, with the mitigations against said vulnerabilities there will be some form of performance impact. This study aims to find out how much of an impact the software mitigations against Spectre and Meltdown have on database management systems during an online transaction processing workload. An experiment was carried out to evaluate two popular open-source database management systems and see how they were affected before and after the software mitigations against Spectre and Meltdown was applied. The study found that there is an average of 4-5% impact on the performance when the software mitigations is applied. The study also compared the two database management systems with each other and found that PostgreSQL can have a reduced performance of about 27% when both a hypervisor and the operating system is patched against Spectre and Meltdown. / När Spectre och Meltdown tillkännagavs gjordes en snabb insats för att korrigera dessa sårbarheter av operativsystemleverantörer. Men med mildringarna mot dessa sårbarheter kommer det att finnas någon form av prestationspåverkan. Denna studie syftar till att ta reda på hur mycket av en påverkan uppdateringarna mot Spectre och Meltdown har på databashanteringssystem under en online-transaktionsbehandlings arbetsbelastning. Ett experiment gjordes för att utvärdera två populära databashanteringssystem baserad på fri mjukvara och se hur de påverkades före och efter att uppdateringarna mot Spectre och Meltdown applicerats i en Linux maskin. Studien fann att det i genomsnitt är 4–5% påverkan på prestandan när uppdateringarna tillämpas. Studien jämförde också de två databashanteringssystemen med varandra och fann att PostgreSQL kan ha en reducerad prestanda på cirka 27% när både det virtuella maskinhanteringssystemet och operativsystemet är uppdaterad mot Spectre och Meltdown. Spectre Meltdown Database Database Management Systems MariaDB PostgreSQL OLTP Transactions Per Minute Performance Spectre Meltdown Databas Databashanteringssystem MariaDB PostgreSQL OLTP Transaktioner per minut Prestanda Computer and Information Sciences Data- och informationsvetenskap
49	Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System Kernert, David 20 September 2016 (has links) Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists. info:eu-repo/classification/ddc/004 ddc:004
50	Energy-Efficient In-Memory Database Computing Lehner, Wolfgang January 2013 (has links) The efficient and flexible management of large datasets is one of the core requirements of modern business applications. Having access to consistent and up-to-date information is the foundation for operational, tactical, and strategic decision making. Within the last few years, the database community sparked a large number of extremely innovative research projects to push the envelope in the context of modern database system architectures. In this paper, we outline requirements and influencing factors to identify some of the hot research topics in database management systems. We argue that—even after 30 years of active database research—the time is right to rethink some of the core architectural principles and come up with novel approaches to meet the requirements of the next decades in data management. The sheer number of diverse and novel (e.g., scientific) application areas, the existence of modern hardware capabilities, and the need of large data centers to become more energy-efficient will be the drivers for database research in the years to come. info:eu-repo/classification/ddc/004 ddc:004

Search results