Global ETD Search

21	An evaluation of non-relational database management systems as suitable storage for user generated text-based content in a distributed environment Du Toit, Petrus 07 October 2016 (has links) Non-relational database management systems address some of the limitations relational database management systems have when storing large volumes of unstructured, user generated text-based data in distributed environments. They follow different approaches through the data model they use, their ability to scale data storage over distributed servers and the programming interface they provide. An experimental approach was followed to measure the capabilities these alternative database management systems present in their approach to address the limitations of relational databases in terms of their capability to store unstructured text-based data, data warehousing capabilities, ability to scale data storage across distributed servers and the level of programming abstraction they provide. The results of the research highlighted the limitations of relational database management systems. The different database management systems do address certain limitations, but not all. Document-oriented databases provide the best results and successfully address the need to store large volumes of user generated text-based data in a distributed environment / School of Computing / M. Sc. (Computer Science) Relational databases Database performance measurement Distributed databases Column-oriented databases Key/value databases Database benchmarking Database management systems Horizontal scalability 005.756 Non-relational databases Database management
22	Avaliação do consumo de energia em sistemas de gerenciamento de banco de dados NoSQL ARAÚJO, Carlos Gomes 08 August 2016 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-04-25T12:27:42Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertacao_CarlosGomes_MPROF_CINUFPE_2016.pdf: 4079444 bytes, checksum: 308622549a641d5ab125dbbdbceb4d2d (MD5) / Made available in DSpace on 2017-04-25T12:27:42Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertacao_CarlosGomes_MPROF_CINUFPE_2016.pdf: 4079444 bytes, checksum: 308622549a641d5ab125dbbdbceb4d2d (MD5) Previous issue date: 2016-08-08 / NoSQL é uma tecnologia de sistemas de gerenciamento de banco de dados (SGBD) emergente, tendo modelos flexíveis focados em desempenho e escalabilidade, proposta para a manipulação de grandes quantidades de dados. NoSQL não substitui as abordagens de sistemas de gerenciamento de banco de dados relacionais, mas sim atende às restrições relacionadas à manipulação de dados em massa. Tal tecnologia já é aplicada em sistemas bem conhecidos em todo o mundo, tais como serviços de e-commerce e middleware. A importância de tal tecnologia tem motivado muitos trabalhos, principalmente em relação ao desempenho. Poucos trabalhos caracterizam e comparam o consumo de energia no contexto de SGBDs NoSQL, apesar de sua importância. De fato, o consumo de energia não deve ser negligenciado devido ao aumento dos custos financeiros e ambientais. A fim de avaliar essa questão, este trabalho analisa o desempenho e consumo de energia em sistemas de gerenciamento de banco de dados NoSQL, selecionamos o Cassandra (coluna), MongoDB (orientado a documento) e Redis (chave-valor) por serem representativos exemplos desta tecnologia. A metodologia baseia-se em Design of Experiments, de tal forma que as cargas de trabalho são geradas por Yahoo! Cloud Serving Benchmark (YCSB) produzindo leitura, escrita e atualização, por ciclos de 1.000, 10.000 e 100.000 operações. Como resultado são avaliados 27 tratamentos. Para a medição do consumo de energia é aplicado um framework específico chamado Emeter. As métricas são tempo de execução e consumo de energia, assim como a evolução no incremento da carga de trabalho. Os resultados demonstram que o consumo de energia pode variar significativamente entre os SGBDs para comandos distintos e cargas de trabalho. Conclui-se ainda que mesmo havendo uma correlação positiva entre o consumo de energia e o tempo de execução, o SGBD mais rápido não é, necessariamente o que utiliza menos energia. / NoSQL is an emergent database management systems technology (DBMS), having flexible models focused on performance and scalability, proposed for manipulating massive amounts of data. NoSQL is not intending for replacing the relational database management systems approaches, but to overcome constraints related to massive data manipulation. Such a technology already is applied in well-known systems around the world, such as e-commerce and middleware services. The importance of such technology has motivated lots of works, mainly relating to performance. Few works can be enumerated regarding characterization of energy consumption on NoSQL DataBase Management Systems, despite its importance. In fact the energy consumption is a feature that cannot be neglected due its impact on financial cost and environmental questions. In order to deal with such an issue, this work evaluates not only performance but the energy consumption involved on NoSQL DataBase Management Systems, specifically for Cassandra (Column), MongoDB (Document Oriented) and Redis (Key-Value). The methodology is based on Design of Experiments, in such a way the workloads are generated by Yahoo! Cloud Serving Benchmark (YCSB) producing readings, writings and updatings by cycles of 1.000, 10.000 and 100.000. As result, it is evaluated twenty seven treatments. For measuring energy consumption is applied a specific framework named Emeter. The Emeter captures metrics such as execution time and energy consumption related to treatments under analyze. In addition to the individual evaluation, the performance and energy consumption are analyzed among relevant scenarios, as well as the trends due to increases in the workload. The results demonstrate that energy consumption can differs for each DBMS according to command and workload. Additionally, the results make it possible to infer that despite the well-known positive correlation between performance and energy consumption, the fastest DBMS is not necessarily the best on saving energy. NoSQL SGBD Orientado a Coluna Orientado a Documento Chave-valor Consumo de Energia Avaliação de Desempenho NoSQL DBMS Colunm Oriented Document Oriented Key-Value Energy Consumption Performance Evalution
23	Výhody a nevýhody relačních a nerelačních (noSQL) databází pro analytické úlohy / Advantages and disadvantages of relational and non-relational (NoSQL) databases for analytical tasks Klapač, Milan January 2015 (has links) This work focuses on NoSQL databases, their use for analytical tasks and on comparison of NoSQL databases with relational and OLAP databases. The aim is to analyse the benefits of NoSQL databases and their use for analytical purposes. The first part presents the basic principles of Business Intelligence, Data Warehousing, and Big Data. The second part deals with the key features of relational and NoSQL databases. The last part of the thesis describes the properties of four basic types of NoSQL databases, analyses their advantages, disadvantages and areas of application. The end of this part in-cludes specific examples of the use of NoSQL databases, together with the reasons for the selection of those solutions.
24	An analysis of LSM caching in NVRAM Lersch, Lucas, Oukid, Ismail, Lehner, Wolfgang, Schreter, Ivan 13 June 2022 (has links) The rise of NVRAM technologies promises to change the way we think about system architectures. In order to fully exploit its advantages, it is required to develop systems specially tailored for NVRAM devices. Not only this imposes great challenges, but developing full system architectures from scratch is undesirable in many scenarios due to prohibitive development costs. Instead, we analyze in this paper the behavior of an existing log-structured persistent key-value store, namely LevelDB, when run on top of an emulated NVRAM device. We investigate initial opportunities for improvement when adapting a system tailored for HDD/SSDs to run on top of an NVRAM environment. Furthermore, we analyze the behavior of the legacy DRAM caching component of LevelDB and whether more suitable caching policies are required. info:eu-repo/classification/ddc/004 ddc:004
25	Robust, fault-tolerant majority based key-value data store supporting multiple data consistency Khan, Tareq Jamal January 2011 (has links) Web 2.0 has significantly transformed the way how modern society works now-a-days. In today‘s Web, information not only flows top down from the web sites to the readers; but also flows bottom up contributed by mass user. Hugely popular Web 2.0 applications like Wikis, social applications (e.g. Facebook, MySpace), media sharing applications (e.g. YouTube, Flickr), blogging and numerous others generate lots of user generated contents and make heavy use of the underlying storage. Data storage system is the heart of these applications as all user activities are translated to read and write requests and directed to the database for further action. Hence focus is on the storage that serves data to support the applications and its reliable and efficient design is instrumental for applications to perform in line with expectations. Large scale storage systems are being used by popular social networking services like Facebook, MySpace where millions of users‘ data have been stored and fully accessed by these companies. However from users‘ point of view there has been justified concern about user data ownership and lack of control over personal data. For example, on more than one occasions Facebook have exercised its control over users‘ data without respecting users‘ rights to ownership of their own content and manipulated data for its own business interest without users‘ knowledge or consent. The thesis proposes, designs and implements a large scale, robust and fault-tolerant key-value data storage prototype that is peer-to-peer based and intends to back away from the client-server paradigm with a view to relieving the companies from data storage and management responsibilities and letting users control their own personal data. Several read and write APIs (similar to Yahoo!‘s P NUTS but different in terms of underlying design and the environment they are targeted for) with various data consistency guarantees are provided from which a wide range of web applications would be able to choose the APIs according to their data consistency, performance and availability requirements. An analytical comparison is also made against the PNUTS system that targets a more stable environment. For evaluation, simulation has been carried out to test the system availability, scalability and fault-tolerance in a dynamic environment. The results are then analyzed and conclusion is drawn that the system is scalable, available and shows acceptable performance. Web 2.0 applications peer-to-peer (P2P) system key-value data store relaxed consistency Distributed Hash Table (DHT) majority based quorum technique Engineering and Technology Teknik och teknologier
26	Performance comparison between multi-model, key-value and documental NoSQL database management systems Jansson, Jens, Vukosavljevic, Alexandar, Catovic, Ismet January 2021 (has links) This study conducted an experiment that compares the multi-model NoSQL DBMS ArangoDB with other NoSQL DBMS, in terms of the average response time of queries. The DBMS compared in this experiment are the following: Redis, MongoDB, Couchbase, and OrientDB. The hypothesis that is answered in this study is the following: “There is a significant difference between ArangoDB, OrientDB, Couchbase, Redis, MongoDB in terms of the average response time of queries”. This is examined by comparing the average response time of 1 000, 100 000, and 1 000 000 queries between these database systems. The results show that ArangoDB performs worse compared to the other DBMS. Examples of future work include using additional DBMS in the same experiment and replacing ArangoDB with another multi-model DBMS to decide whether such a DBMS, in general, performs worse than single-model DBMS. Database system performance comparison ArangoDB multi-model database system documental database system key-value database system Information Systems, Social aspects
27	Scaling Apache Hudi by boosting query performance with RonDB as a Global Index : Adopting a LATS data store for indexing / Skala Apache Hudi genom att öka frågeprestanda med RonDB som ett globalt index : Antagande av LATS-datalager för indexering Zangis, Ralfs January 2022 (has links) The storage and use of voluminous data are perplexing issues, the resolution of which has become more pressing with the exponential growth of information. Lakehouses are relatively new approaches that try to accomplish this while hiding the complexity from the user. They provide similar capabilities to a standard database while operating on top of low-cost storage and open file formats. An example of such a system is Hudi, which internally uses indexing to improve the performance of data management in tabular format. This study investigates if the execution times could be decreased by introducing a new engine option for indexing in Hudi. Therefore, the thesis proposes the usage of RonDB as a global index, which is expanded upon by further investigating the viability of different connectors that are available for communication. The research was conducted using both practical experiments and the study of relevant literature. The analysis involved observations made over multiple workloads to document how adequately the solutions can adapt to changes in requirements and types of actions. This thesis recorded the results and visualized them for the convenience of the reader, as well as made them available in a public repository. The conclusions did not coincide with the author’s hypothesis that RonDB would provide the fastest indexing solution for all scenarios. Nonetheless, it was observed to be the most consistent approach, potentially making it the best general-purpose solution. As an example, it was noted, that RonDB is capable of dealing with read and write heavy workloads, whilst consistently providing low query latency independent from the file count. / Lagring och användning av omfattande data är förbryllande frågor, vars lösning har blivit mer pressande med den exponentiella tillväxten av information. Lakehouses är relativt nya metoder som försöker åstadkomma detta samtidigt som de döljer komplexiteten för användaren. De tillhandahåller liknande funktioner som en standarddatabas samtidigt som de fungerar på toppen av lågkostnadslagring och öppna filformat. Ett exempel på ett sådant system är Hudi, som internt använder indexering för att förbättra prestandan för datahantering i tabellformat. Denna studie undersöker om exekveringstiderna kan minskas genom att införa ett nytt motoralternativ för indexering i Hudi. Därför föreslår avhandlingen användningen av RonDB som ett globalt index, vilket utökas genom att ytterligare undersöka lönsamheten hos olika kontakter som är tillgängliga för kommunikation. Forskningen genomfördes med både praktiska experiment och studie av relevant litteratur. Analysen involverade observationer som gjorts över flera arbetsbelastningar för att dokumentera hur adekvat lösningarna kan anpassas till förändringar i krav och typer av åtgärder. Denna avhandling registrerade resultaten och visualiserade dem för att underlätta för läsaren, samt gjorde dem tillgängliga i ett offentligt arkiv. Slutsatserna sammanföll inte med författarnas hypotes att RonDB skulle tillhandahålla den snabbaste indexeringslösningen för alla scenarier. Icke desto mindre ansågs det vara det mest konsekventa tillvägagångssättet, vilket potentiellt gör det till den bästa generella lösningen. Som ett exempel noterades att RonDB är kapabel att hantera läs- och skrivbelastningar, samtidigt som det konsekvent tillhandahåller låg frågelatens oberoende av filantalet. Apache Hudi Lakehouse RonDB Performance Index Key-value store Apache Hudi Lakehouse RonDB Prestanda Index Nyckel-värde butik Software Engineering Programvaruteknik
28	adXtractor – Automated and Adaptive Generation of Wrappers for Information Retrieval Ademi, Muhamet January 2017 (has links) The aim of this project is to investigate the feasibility of retrieving unstructured automotive listings from structured web pages on the Internet. The research has two major purposes: (1) to investigate whether it is feasible to pair information extraction algorithms and compute wrappers (2) demonstrate the results of pairing these techniques and evaluate the measurements. We merge two training sets available on the web to construct reference sets which is the basis for the information extraction. The wrappers are computed by using information extraction techniques to identify data properties with a variety of techniques such as fuzzy string matching, regular expressions and document tree analysis. The results demonstrate that it is possible to pair these techniques successfully and retrieve the majority of the listings. Additionally, the findings also suggest that many platforms utilise lazy loading to populate image resources which the algorithm is unable to capture. In conclusion, the study demonstrated that it is possible to use information extraction to compute wrappers dynamically by identifying data properties. Furthermore, the study demonstrates the ability to open non-queryable domain data through a unified service. wrapper generation information extraction content of interest identification wrapper rules text extraction key value pair wrapper generate main content identification web scraping information extraction algorithms web extraction dom tree analysis dom analysis Engineering and Technology Teknik och teknologier
29	Programming Model and Protocols for Reconfigurable Distributed Systems Arad, Cosmin January 2013 (has links) Distributed systems are everywhere. From large datacenters to mobile devices, an ever richer assortment of applications and services relies on distributed systems, infrastructure, and protocols. Despite their ubiquity, testing and debugging distributed systems remains notoriously hard. Moreover, aside from inherent design challenges posed by partial failure, concurrency, or asynchrony, there remain significant challenges in the implementation of distributed systems. These programming challenges stem from the increasing complexity of the concurrent activities and reactive behaviors in a distributed system on the one hand, and the need to effectively leverage the parallelism offered by modern multi-core hardware, on the other hand. This thesis contributes Kompics, a programming model designed to alleviate some of these challenges. Kompics is a component model and programming framework for building distributed systems by composing message-passing concurrent components. Systems built with Kompics leverage multi-core machines out of the box, and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic execution replay for debugging, testing, and reproducible behavior evaluation for large-scale Kompics distributed systems. The same system code is used for both simulation and production deployment, greatly simplifying the system development, testing, and debugging cycle. We highlight the architectural patterns and abstractions facilitated by Kompics through a case study of a non-trivial distributed key-value storage system. CATS is a scalable, fault-tolerant, elastic, and self-managing key-value store which trades off service availability for guarantees of atomic data consistency and tolerance to network partitions. We present the composition architecture for the numerous protocols employed by the CATS system, as well as our methodology for testing the correctness of key CATS algorithms using the Kompics simulation framework. Results from a comprehensive performance evaluation attest that CATS achieves its claimed properties and delivers a level of performance competitive with similar systems which provide only weaker consistency guarantees. More importantly, this testifies that Kompics admits efficient system implementations. Its use as a teaching framework as well as its use for rapid prototyping, development, and evaluation of a myriad of scalable distributed systems, both within and outside our research group, confirm the practicality of Kompics. / Kompics / CATS / REST distributed systems programming model message-passing concurrency nested hierarchical composition reactive components software architecture dynamic reconfiguration multi-core discrete-event simulation peer-to-peer testing debugging distributed key-value stores data replication consistency linearizability network partition tolerance consistent hashing self-organization scalability elasticity fault tolerance consistent quorums
30	Programming Model and Protocols for Reconfigurable Distributed Systems Arad, Cosmin Ionel January 2013 (has links) Distributed systems are everywhere. From large datacenters to mobile devices, an ever richer assortment of applications and services relies on distributed systems, infrastructure, and protocols. Despite their ubiquity, testing and debugging distributed systems remains notoriously hard. Moreover, aside from inherent design challenges posed by partial failure, concurrency, or asynchrony, there remain significant challenges in the implementation of distributed systems. These programming challenges stem from the increasing complexity of the concurrent activities and reactive behaviors in a distributed system on the one hand, and the need to effectively leverage the parallelism offered by modern multi-core hardware, on the other hand. This thesis contributes Kompics, a programming model designed to alleviate some of these challenges. Kompics is a component model and programming framework for building distributed systems by composing message-passing concurrent components. Systems built with Kompics leverage multi-core machines out of the box, and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic execution replay for debugging, testing, and reproducible behavior evaluation for largescale Kompics distributed systems. The same system code is used for both simulation and production deployment, greatly simplifying the system development, testing, and debugging cycle. We highlight the architectural patterns and abstractions facilitated by Kompics through a case study of a non-trivial distributed key-value storage system. CATS is a scalable, fault-tolerant, elastic, and self-managing key-value store which trades off service availability for guarantees of atomic data consistency and tolerance to network partitions. We present the composition architecture for the numerous protocols employed by the CATS system, as well as our methodology for testing the correctness of key CATS algorithms using the Kompics simulation framework. Results from a comprehensive performance evaluation attest that CATS achieves its claimed properties and delivers a level of performance competitive with similar systems which provide only weaker consistency guarantees. More importantly, this testifies that Kompics admits efficient system implementations. Its use as a teaching framework as well as its use for rapid prototyping, development, and evaluation of a myriad of scalable distributed systems, both within and outside our research group, confirm the practicality of Kompics. / <p>QC 20130520</p> distributed systems programming model message-passing concurrency nested hierarchical composition reactive components software architecture dynamic reconfiguration multi-core discrete-event simulation peer-to-peer testing debugging distributed key-value stores data replication consistency linearizability network partition tolerance consistent hashing self-organization scalability elasticity fault tolerance consistent quorums

Search results