Global ETD Search

1	Evaluating NOSQL Technologies for Historical Financial Data Rafique, Ansar January 2013 (has links) Today, when businesses and organizations are generating huge volumes of data; the applications like Web 2.0 or social networking requires processing of petabytes of data. Stock Exchange Systems are among the ones that process large amount of quotes and trades on a daily basis. The limited database storage ability is a major bottleneck in meeting up the challenge of providing efficient access to information. Further to this, varying data are the major source of information for the financial industry. This data needs to be read and written efficiently in the database; this is quite costly when it comes to traditional Relational Database Management System. RDBMS is good for different scenarios and can handle certain types of data very well, but it isn’t always the perfect choice. The existence of innovative architectures allows the storage of large data in an efficient manner. “Not only SQL” brings an effective solution through the provision of an efficient information storage capability. NOSQL is an umbrella term for various new data store. The NOSQL databases have gained popularity due to different factors that include their open source nature, existence of non-relational data store, high-performance, fault-tolerance, and scalability to name a few. Nowadays, NOSQL databases are rapidly gaining popularity because of the advantages that they offer compared to RDBMS. The major aim of this research is to find an efficient solution for storing and processing the huge volume of data for certain variants. The study is based on choosing a reliable, distributed, and efficient NOSQL database at Cinnober Financial Technology AB. The research majorly explores NOSQL databases and discusses issues with RDBMS; eventually selecting a database, which is best suited for financial data management. It is an attempt to contribute the current research in the field of NOSQL databases which compares one such NOSQL database Apache Cassandra with Apache Lucene and the traditional relational database MySQL for financial management. The main focus is to find out which database is the preferred choice for different variants. In this regard, the performance test framework for a selected set of candidates has also been taken into consideration. Read more NOSQL Apache Cassandra MySQL Financial data Historical data Benchmark performance
2	Compactions in Apache Cassandra : Performance Analysis of Compaction Strategies in Apache Cassandra Kona, Srinand January 2016 (has links) Context: The global communication system is in a tremendous growth, leading to wide range of data generation. The Telecom operators in various Telecom Industries, that generate large amount of data has a need to manage these data efficiently. As the technology involved in the database management systems is increasing, there is a remarkable growth of NoSQL databases in the 20th century. Apache Cassandra is an advanced NoSQL database system, which is popular for handling semi-structured and unstructured format of Big Data. Cassandra has an effective way of compressing data by using different compaction strategies. This research is focused on analyzing the performances of different compaction strategies in different use cases for default Cassandra stress model. The analysis can suggest better usage of compaction strategies in Cassandra, for a write heavy workload. Objectives: In this study, we investigate the appropriate performance metrics to evaluate the performance of compaction strategies. We provide the detailed analysis of Size Tiered Compaction Strategy, Date Tiered Compaction Strategy, and Leveled Compaction Strategy for a write heavy (90/10) work load, using default cassandra stress tool. Methods: A detailed literature research has been conducted to study the NoSQL databases, and the working of different compaction strategies in Apache Cassandra. The performances metrics are considered by the understanding of the literature research conducted, and considering the opinions of supervisors and Ericsson’s Apache Cassandra team. Two different tools were developed for collecting the performances of the considered metrics. The first tool was developed using Jython scripting language to collect the cassandra metrics, and the second tool was developed using python scripting language to collect the Operating System metrics. The graphs have been generated in Microsoft Excel, using the values obtained from the scripts. Results: Date Tiered Compaction Strategy and Size Tiered Compaction strategy showed more or less similar behaviour during the stress tests conducted. Level Tiered Compaction strategy has showed some remarkable results that effected the system performance, as compared to date tiered compaction and size tiered compaction strategies. Date tiered compaction strategy does not perform well for default cassandra stress model. Size tiered compaction can be preferred for default cassandra stress model, but not considerable for big data. Conclusions: With a detailed analysis and logical comparison of metrics, we finally conclude that Level Tiered Compaction Strategy performs better for a write heavy (90/10) workload while using default cassandra stress model, as compared to size tiered compaction and date tiered compaction strategies. Read more Apache Cassandra Compaction Strategies Default Cassandra Stress model Performance NoSQL Database
3	Výpočetní úlohy pro řešení paralelního zpracování dat / Computational tasks for solving parallel data processing Rexa, Denis January 2019 (has links) The goal of this diploma thesis was to create four laboratory exercises for the subject "Parallel Data Processing", where students will try on the options and capabilities of Apache Spark as a parallel computing platform. The work also includes basic setup and use of Apache Kafka technology and NoSQL Apache Cassandra database. The other two lab assignments focus on working with a Travelling Salesman Problem. The first lab was designed to demonstrate the difficulty of a task where the student will face an exponential increase in complexity. The second task consists of an optimization algorithm to solve the problem in cluster. This algorithm is subjected to performance measurements in clusters. The conclusion of the thesis contains recommendations for optimization as well as comparison of running with different number of computing devices.
4	Developing Random Compaction Strategy for Apache Cassandra database and Evaluating performance of the strategy Surampudi, Roop Sai January 2021 (has links) Introduction: Nowadays, the data generated by global communication systems is enormously increasing. There is a need by Telecommunication Industries to monitor and manage this data generation efficiently. Apache Cassandra is a NoSQL database that manages any formatted data and a massive amount of data flow efficiently. Aim: This project is focused on developing a new random compaction strategy and evaluating this random compaction strategy's performance. In this study, limitations of generic compaction strategies Size Tiered Compaction Strategy and Leveled Compaction Strategy will be investigated. A new random compaction strategy will be developed to address the limitations of the generic Compaction Strategies. Important performance metrics required for the evaluation of the strategy will be studied. Method: In this study, a grey literature review is done to understand the working of Apache Cassandra, different compaction strategies' APIs. A random compaction strategy is developed in two phases of development. A testing environment is created consisting of a 4-node cluster and a simulator. Evaluated the performance by stress-testing the cluster using different workloads. Conclusions: A stable RCS artifact is developed. This artifact also includes the support of generating random threshold from any user-defined distribution. Currently, only Uniform, Geometric, and Poisson distributions are supported. The RCS-Uniform's performance is found to be better than both STCS and LCS. The RCS-Poisson's performance is found to be not better than both STCS and LCS. The RCS-Geometric's performance is found to be better than STCS. Read more Apache Cassandra Compaction Random Probability Distributions IBM Cloud NoSQL databases Telecommunications Telekommunikation
5	Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra KATIKI REDDY, RAHUL REDDY January 2020 (has links) Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutable tables which are called SSTables. These SSTables are subjected to a method called Compaction to reclaim the disk space and to improve READ performance. Size Tiered Compaction Strategy and Leveled Compaction Strategy are the most used generic compaction strategies for different use cases. Space Amplification and Write Amplification are the main limitations of the above compaction strategies, respectively. This research aims to address the limitations of existing generic compaction strategies. Objectives: A new random compaction strategy will be created to improve the efficiency and effectiveness of compaction. This newly created random compaction strategy will be evaluated by comparing the read, write and space amplification with the existing generic compaction strategies, for different use cases. Methods: In this study, Design Science has been used as a research method to answer both the research questions. Focus groups meetings have been conducted to gain knowledge on the limitations of existing compaction strategies, newly created random compaction strategy, and it’s appropriate solutions. During the evaluation, The metrics have been collected from Prometheus server and visualization is carried out in Grafana server. The compaction strategies are compared significantly by performing statistical tests. Results: The results in this study showed that the random compaction strategy is performing almost similar to Leveled Compaction Strategy. The Random Compaction Strategy solves the space amplification problem and write amplification problem in the Size Tiered Compaction Strategy and Leveled Compaction Strategy, respectively. In this section, eight important metrics have been analyzed for all three compaction strategies. Conclusions: The main artefact of this research is a new Random Compaction Strategy. After performing two iterations, a new stable random compaction strategy is designed. The results were analyzed by comparing the Size Tiered Compaction Strategy, Leveled Compaction Strategy and Random Compaction Strategy on two different use cases. The new random compaction strategy has performed great for Ericsson buffer management use case. Read more Apache Cassandra Compaction Strategy Random Compaction NoSQL Design Science. Software Engineering Programvaruteknik
6	Anomaly Detection in Wait Reports and its Relation with Apache Cassandra Statistics Madhu, Abheyraj Singh, Rapolu, Sreemayi January 2021 (has links) Background: Apache Cassandra is a highly scalable distributed system that can handle large amounts of data through several nodes / virtual machines grouped together as Apache Cassandra clusters. When one such node in an Apache Cassandra cluster is down, there is a need for a tool or an approach that can identify this failed virtual machine by analyzing the data generated from each of the virtual machines in the cluster. Manual analysis of this data is tedious and can be quite strenuous. Objectives: The objective of the thesis is to identify, build and evaluate a solution that can detect and report the behaviour of the erroneous or failed virtual machine by analyzing the data generated by each virtual machine in an Apache Cassandra cluster. In the study, we analyzed two specific data sources from each virtual machine, i.e., the wait reports and Apache Cassandra statistics, and proposed a tool named AnoDect to realize this objective. The tool has been built using the input provided by the technical support team at Ericsson through interviews and was also evaluated by them to realize its reliability, usability and, usefulness in an industrial setting. Methods: A case study methodology has been piloted at Ericsson and semi-structured interviews have been conducted to identify the key features in the data along with the functionalities AnoDect needs to perform to assist the CIL team (technical support team at Ericsson) to rectify the erroneous virtual machine in the cluster. An experimental evaluation and a static user evaluation have been conducted, as a part of the case study evaluation, where the experimental evaluation is conducted to identify the best technique for AnoDect's anomaly detection in wait reports and the static evaluation has been conducted to evaluate AnoDect for its reliability and usability once it is deployed for use. Results: From the feedback provided by the CIL team through the questionnaire, it has been observed that the results provided by the tool are quite satisfactory, in terms of usability and reliability of the tool. Read more Wait reports analysis time-series anomaly detection Apache Cassandra statistics anomaly detection behavior reporting tool Computer Sciences Datavetenskap (datalogi)
7	Energy-Efficient Key/Value Store Tena, Frezewd Lemma 11 September 2017 (has links) (PDF) Energy conservation is a major concern in todays data centers, which are the 21st century data processing factories, and where large and complex software systems such as distributed data management stores run and serve billions of users. The two main drivers of this major concern are the pollution impact data centers have on the environment due to their waste heat, and the expensive cost data centers incur due to their enormous energy demand. Among the many subsystems of data centers, the storage system is one of the main sources of energy consumption. Among the many types of storage systems, key/value stores happen to be the widely used in the data centers. In this work, I investigate energy saving techniques that enable a consistent hash based key/value store save energy during low activity times, and whenever there is an opportunity to reuse the waste heat of data centers. Energieeffizienz Schlüssel / Wert-Shop konsistentes Hashing Apache Cassandra Reihenpartition Micro-Clouds-System Kantenwolken Rechenzentrum Energieeffizienz Datenplatzierung Replik-Platzierung Energy efficiency key/value store consistent hashing Apache Cassandra row partitioned micro-clouds system edge clouds data center energy efficiency data placement replica placement ddc:004 rvk:ST 265
8	Energy-Efficient Key/Value Store Tena, Frezewd Lemma 29 August 2017 (has links) Energy conservation is a major concern in todays data centers, which are the 21st century data processing factories, and where large and complex software systems such as distributed data management stores run and serve billions of users. The two main drivers of this major concern are the pollution impact data centers have on the environment due to their waste heat, and the expensive cost data centers incur due to their enormous energy demand. Among the many subsystems of data centers, the storage system is one of the main sources of energy consumption. Among the many types of storage systems, key/value stores happen to be the widely used in the data centers. In this work, I investigate energy saving techniques that enable a consistent hash based key/value store save energy during low activity times, and whenever there is an opportunity to reuse the waste heat of data centers. info:eu-repo/classification/ddc/004 ddc:004
9	Methods for Comparing Database Management Systems Törnqvist, Jakob January 2023 (has links) Zenon AB is an it-company of which, this thesis was made in collaboration with. Zenon AB has clients that generate large amounts of data, therefore it is important for Zenon AB that they make competent choices of database management systems (DBMS) when designing systems for their clients. This thesis will therefore entail research carried out into the comparison of DBMS. Nowadays, there exists a large variety of DBMS. Despite this, there seems to be a lack of comparisons between types of DBMS and therefore a lack clarity of when each type should be used. Thus, this thesis aims to highlight these differences of DBMS types by creating a tailored test for each DBMS type and compare how each type performs in each-others area of specialization. This process will show how big the differences can be and highlight the importance of the choice of DBMS. The time it takes, and how simple DBMSs are to implement seems to be a factor most developers take into consideration when choosing DBMS but there is little research on how to compare the aspect. Therefore, this thesis will investigate the viability of a method to compare how easy the DBMSs are to implement into systems by querying programming help forums such as Stackoverflow. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> Read more DBMS mongodb postgresql apache cassandra memcached neo4j ease of implementation performance tests graph DBMS document DBMS wide-column DBMS relational DBMS cache database database management system Media and Communication Technology Medieteknik

Search results