Global ETD Search

1	Performance Evaluation of Cassandra in a Virtualized Environment Vellanki, Mohit January 2017 (has links) Context. Apache Cassandra is an open-source, scalable, NoSQL database that distributes the data over many commodity servers. It provides no single point of failure by copying and storing the data in different locations. Cassandra uses a ring design rather than the traditional master-slave design. Virtualization is the technique using which physical resources of a machine are divided and utilized by various virtual machines. It is the fundamental technology, which allows cloud computing to provide resource sharing among the users. Objectives. Through this research, the effects of virtualization on Cassandra are observed by comparing the virtual machine arrangement to physical machine arrangement along with the overhead caused by virtualization. Methods. An experiment is conducted in this study to identify the aforementioned effects of virtualization on Cassandra compared to the physical machines. Cassandra runs on physical machines with Ubuntu 14.04 LTS arranged in a multi node cluster. Results are obtained by executing the mixed, read only and write only operations in the Cassandra stress tool on the data populated in this cluster. This procedure is repeated for 100% and 66% workload. The same procedure is repeated in virtual machines cluster and the results are compared. Results. Virtualization overhead has been identified in terms of CPU utilization and the effects of virtualization on Cassandra are found out in terms of Disk utilization, throughput and latency. Conclusions. The overhead caused due to virtualization is observed and the effect of this overhead on the performance of Cassandra has been identified. The consequence of the virtualization overhead has been related to the change in performance of Cassandra. Cassandra Virtualization NoSQL databases Computer Systems Datorsystem
2	Creating a NoSQL database for the Internet of Things : Creating a key-value store on the SensibleThings platform Zhu, Sainan January 2015 (has links) Due to the requirements of the Web 2.0 applications and the relational databaseshave a limitation in horizontal scalability. NoSQL databases have become moreand more popular in recent years. However, it is not easy to select a databasethat is suitable for a specific use. This thesis describes the detailed design, im plementation and final performance evaluation of a key-value NoSQL databasefor the SensibleThings platform, which is an Internet of Things platform. Thethesis starts by comparing the different types of NoSQL databases to select themost appropriate one. During the implementation of the database, the algorithms for data partition, data access, replication, addition and removal ofnodes, failure detection and handling are dealt with. The final results for theload distribution and the performance evaluation are also presented in this pa per. At the end of the thesis, some problems and improvements that need betaken into consideration in the futures. NoSQL databases key-value Internet of Things SensibleThings platform
3	Access Control and Storage of Distributed IoT Data Mends, Diana 03 April 2018 (has links) There has been a growth of a class of databases known as the Not only SQL (NoSQL) databases in recent years. Its quick growth has been fueled by a high demand by businesses as it offers a convenient way to store data and is significantly different from our traditional relational databases. It is easy to process unstructured data, offers a cloud-friendly ap- proach and grows through the distribution of data over lots of commodity computers. Most of these NoSQL databases are distributed in several different locations, spanning countries and are known as geo-distributed cloud datastores. We work to customize one of these known as Cassandra. Given the size of the database and the size of applications accessing the data stored, it has been challenging to customize it to meet existing application Service Level Agreement (SLAs). We live in an era of data breaches and even though some types of information are stripped of all sensitive data, there are ways to easily identify and link it to data of real persons or government. Data saved in different countries are subject to the rules and regulations of that specific country and security measures employed to safeguard consumer data. In this thesis, we describe mechanisms for selectively replicating data in a large scale NoSQL datastore in respect of privacy and legal regulations. We introduce an easily extensible constraint language to implement these policy constraints through the creation of a pluggable topology provider in the configuration files of Cassandra. Experiments using the modified Cassandra trunk demonstrate that our techniques work well, respect response times and improves read and write latencies. IoT NoSQL databases Distributed databases Cassandra Policy Constraints Access Control Selective replication
4	Srovnání distribuovaných "No-SQL" databází s důrazem na výkon a škálovatelnost / Comparison of Distributed "No-SQL" Databases with an Emphasis on Performance and Scalability Petera, Martin January 2014 (has links) This thesis deals with NoSQL database performance issue. The aim of the paper is to compare most common prototypes of distributed database systems with emphasis on performance and scalability. Yahoo! Cloud Serving Benchmark (YCSB) is used to accomplish the aforementioned aim. The YCSB tool allows performance testing through performance indicators like throughput or response time. It is followed by a thorough explanation of how to work with this tool, which gives readers an opportunity to test performance or do a performance comparison of other distributed database systems than of those described in this thesis. It also helps readers to be able to create testing environment and apply the testing method which has been listed in this thesis should they need it. This paper can be used as a help when making an arduous choice for a specific system from a wide variety of NoSQL database systems for intended solution.
5	Podpora MongoDB pro UnifiedPush Server / MongoDB Support for UnifiedPush Server Pecsérke, Róbert January 2016 (has links) Tato diplomová práce se zabývá návrhem a implementací rozšíření pro UnifiedPush Server, které serveru umožní přistupovat k nerelační databázi MongoDB a využívá potenciál horiznotální škálovatelnosti neralačních databází. Součástí práce je i návrh výkonnostních testů a porovnání výkonu při behu na jednom a vícero uzlích, návrh migračního scénáře z MySQL na MongoDB, identifikace úzkých míst. Aplikace je implementována v jazyce Java a využívá Java Persistence API pro přístup k databázím. Pro přístup k nerelačním databázím používá implementaci standardu JPA Hibernate OGM.
6	Developing Random Compaction Strategy for Apache Cassandra database and Evaluating performance of the strategy Surampudi, Roop Sai January 2021 (has links) Introduction: Nowadays, the data generated by global communication systems is enormously increasing. There is a need by Telecommunication Industries to monitor and manage this data generation efficiently. Apache Cassandra is a NoSQL database that manages any formatted data and a massive amount of data flow efficiently. Aim: This project is focused on developing a new random compaction strategy and evaluating this random compaction strategy's performance. In this study, limitations of generic compaction strategies Size Tiered Compaction Strategy and Leveled Compaction Strategy will be investigated. A new random compaction strategy will be developed to address the limitations of the generic Compaction Strategies. Important performance metrics required for the evaluation of the strategy will be studied. Method: In this study, a grey literature review is done to understand the working of Apache Cassandra, different compaction strategies' APIs. A random compaction strategy is developed in two phases of development. A testing environment is created consisting of a 4-node cluster and a simulator. Evaluated the performance by stress-testing the cluster using different workloads. Conclusions: A stable RCS artifact is developed. This artifact also includes the support of generating random threshold from any user-defined distribution. Currently, only Uniform, Geometric, and Poisson distributions are supported. The RCS-Uniform's performance is found to be better than both STCS and LCS. The RCS-Poisson's performance is found to be not better than both STCS and LCS. The RCS-Geometric's performance is found to be better than STCS. Apache Cassandra Compaction Random Probability Distributions IBM Cloud NoSQL databases Telecommunications Telekommunikation
7	A Study of Migrating Biological Data from Relational Databases to NoSQL Databases Moatassem, Nawal N. 18 September 2015 (has links) No description available. Information Science Computer Science Database management systems Relational databases NoSQL databases Migration
8	Odvozování schématu v NoSQL databázích / Schema Inference for NoSQL Databases Veinhardt Latták, Ivan January 2021 (has links) NoSQL databases are becoming increasingly more popular due to their undeniable advantages in the context of storing and processing big data, mainly horizontal scala- bility and the lack of a requirement to define a data schema upfront. In the absence of explicit schema, however, an implicit schema inherent to the stored data still exists and can be inferred. Once inferred, a schema is of great value to the stakeholders and database maintainers. Nevertheless, the problem of schema inference is non-trivial and is still the subject of ongoing research. We explore the many aspects of NoSQL schema inference and data modeling, analyze a number of existing schema inference solutions in terms of their inner workings and capabilities, point out their shortcomings, and devise (1) a novel horizontally scalable approach based on the Apache Spark platform and (2) a new NoSQL Schema metamodel capable of modeling i.a. inter-entity referential relation- ships and deeply nested JSON constructs. We then experimentally evaluate the newly designed approach along with the preexisting solutions with respect to their functional and performance capabilities. 1
9	A Cloud Based Platform for Big Data Science Islam, Md. Zahidul January 2014 (has links) With the advent of cloud computing, resizable scalable infrastructures for data processing is now available to everyone. Software platforms and frameworks that support data intensive distributed applications such as Amazon Web Services and Apache Hadoop enable users to the necessary tools and infrastructure to work with thousands of scalable computers and process terabytes of data. However writing scalable applications that are run on top of these distributed frameworks is still a demanding and challenging task. The thesis aimed to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large data sets, collectively known as “big data”. The term “big-data” in this thesis refers to large, diverse, complex, longitudinal and/or distributed data sets generated from instruments, sensors, internet transactions, email, social networks, twitter streams, and/or all digital sources available today and in the future. We introduced architectures and concepts for implementing a cloud-based infrastructure for analyzing large volume of semi-structured and unstructured data. We built and evaluated an application prototype for collecting, organizing, processing, visualizing and analyzing data from the retail industry gathered from indoor navigation systems and social networks (Twitter, Facebook etc). Our finding was that developing large scale data analysis platform is often quite complex when there is an expectation that the processed data will grow continuously in future. The architecture varies depend on requirements. If we want to make a data warehouse and analyze the data afterwards (batch processing) the best choices will be Hadoop clusters and Pig or Hive. This architecture has been proven in Facebook and Yahoo for years. On the other hand, if the application involves real-time data analytics then the recommendation will be Hadoop clusters with Storm which has been successfully used in Twitter. After evaluating the developed prototype we introduced a new architecture which will be able to handle large scale batch and real-time data. We also proposed an upgrade of the existing prototype to handle real-time indoor navigation data. Big Data Data Analysis Hadoop Hive Sentiment Analysis Predictive Analysis Fraud Detection Big data concepts NoSQL Databases Amazon AWS Windows Azure Data Visualization Lambda architecture Software Engineering Programvaruteknik
10	Prevention of Privilege Abuse on NoSQL Databases : Analysis on MongoDB access control / Förebyggande av Privilegier Missbruk på NoSQL-databaser : Analys på MongoDB-åtkomstkontroll Ishak, Marwah January 2021 (has links) Database security is vital to retain confidentiality and integrity of data as well as prevent security threats such as privilege abuse. The most common form of privilege abuse is excessive privilege abuse, which entails assigning users with excessive privileges beyond their job function, which can be abused deliberately or inadvertently. The thesis’s objective is to determine how to prevent privilege abuse in the NoSQL database MongoDB. Prior studies have noted the importance of access control to secure databases from privilege abuse. Access control is essential to manage and protect the accessibility of the data stored and restrict unauthorised access. Therefore, the study analyses MongoDB’s embedded access control through experimental testing to test various built-in and advanced privileges roles in preventing privilege abuse. The results indicate that privilege abuse can be prevented if users are granted roles composed of the least privileges. Additionally, the results indicate that assigning users with excessive privileges exposes the system to privilege abuse. The study also underlines that an inaccurate allocation of privileges or permissions to users of databases may have profound consequences for the system and organisation, such as data breach and data manipulation. Hence, organisations that utilise information technology should be obliged to protect their interests and databases from others and their members through access control policies. / Datasäkerhet är avgörande för att bevara datats konfidentialitet och integritet samt för att förhindra säkerhetshot som missbruk av privilegier. Missbruk av överflödig privilegier, är den vanligaste formen av privilegier missbruk. Detta innebär att en användare tilldelas obegränsad behörighet utöver det som behövs för deras arbete, vilket kan missbrukas medvetet eller av misstag. Examensarbetets mål är att avgöra hur man kan förhindra missbruk av privilegier i NoSQL-databasen MongoDB. Tidigare studier har noterat vikten av åtkomstkontroll för att säkra databaser från missbruk av privilegier. Åtkomstkontroll är viktigt för att hantera och skydda åtkomlighet för de lagrade data samt begränsa obegränsad åtkomst. Därför analyserar arbetet MongoDBs inbäddade åtkomstkontroll genom experimentell testning för att testa olika inbyggda och avancerade priviligierade roller för att förhindra missbruk av privilegier. Resultaten indikerar att missbruk av privilegier kan förhindras om användare får roller som har färre privilegier. Dessutom visar resultaten att tilldelning av användare med obegränsade privilegier utsätter systemet för missbruk av privilegier. Studien understryker också att en felaktig tilldelning av privilegier eller behörigheter för databasanvändare kan få allvarliga konsekvenser för systemet och organisationen, såsom dataintrång och datamanipulation. Därför bör organisationer som använder informationsteknologi ha som plikt att skydda sina tillgångar och databaser från obehöriga men även företagets medarbetare som inte är beroende av datat genom policys för åtkomstkontroll. NoSQL databases MongoDB Access control Privilege abuse Role-based access control NoSQL-databaser MongoDB Åtkomstkontroll Missbruk av privilegier Computer Engineering Datorteknik

Search results