• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation

Chekkilla, Avinash Goud January 2017 (has links)
Context Light weight process virtualization has been used in the past e.g., Solaris zones, jails in Free BSD and Linux’s containers (LXC). But only since 2013 is there a kernel support for user namespace and process grouping control that make the use of lightweight virtualization interesting to create virtual environments comparable to virtual machines. Telecom providers have to handle the massive growth of information due to the growing number of customers and devices. Traditional databases are not designed to handle such massive data ballooning. NoSQL databases were developed for this purpose. Cassandra, with its high read and write throughputs, is a popular NoSQL database to handle this kind of data. Running the database using operating system virtualization or containerization would offer a significant performance gain when compared to that of virtual machines and also gives the benefits of migration, fast boot up and shut down times, lower latency and less use of physical resources of the servers. Objectives This thesis aims to investigate the trade-off in performance while loading a Cassandra cluster in bare-metal and containerized environments. A detailed study of the effect of loading the cluster in each individual node in terms of Latency, CPU and Disk throughput will be analyzed. Method We implement the physical model of the Cassandra cluster based on realistic and commonly used scenarios or database analysis for our experiment. We generate different load cases on the cluster for Bare-Metal and Docker and see the values of CPU utilization, Disk throughput and latency using standard tools like sar and iostat. Statistical analysis (Mean value analysis, higher moment analysis and confidence intervals) are done on measurements on specific interfaces in order to show the reliability of the results. Results Experimental results show a quantitative analysis of measurements consisting Latency, CPU and Disk throughput while running a Cassandra cluster in Bare Metal and Container Environments. A statistical analysis summarizing the performance of Cassandra cluster while running single Cassandra is surveyed. Conclusions With the detailed analysis, the resource utilization of the database was similar in both the bare-metal and container scenarios. From the results the CPU utilization for the bare-metal servers is equivalent in the case of mixed, read and write loads. The latency values inside the container are slightly higher for all the cases. The mean value analysis and higher moment analysis helps us in doing a finer analysis of the results. The confidence intervals calculated show that there is a lot of variation in the disk performance which might be due to compactions happening randomly. Further work can be done by configuring the compaction strategies, memory, read and write rates.
2

Compactions in Apache Cassandra : Performance Analysis of Compaction Strategies in Apache Cassandra

Kona, Srinand January 2016 (has links)
Context: The global communication system is in a tremendous growth, leading to wide range of data generation. The Telecom operators in various Telecom Industries, that generate large amount of data has a need to manage these data efficiently. As the technology involved in the database management systems is increasing, there is a remarkable growth of NoSQL databases in the 20th century. Apache Cassandra is an advanced NoSQL database system, which is popular for handling semi-structured and unstructured format of Big Data. Cassandra has an effective way of compressing data by using different compaction strategies. This research is focused on analyzing the performances of different compaction strategies in different use cases for default Cassandra stress model. The analysis can suggest better usage of compaction strategies in Cassandra, for a write heavy workload. Objectives: In this study, we investigate the appropriate performance metrics to evaluate the performance of compaction strategies. We provide the detailed analysis of Size Tiered Compaction Strategy, Date Tiered Compaction Strategy, and Leveled Compaction Strategy for a write heavy (90/10) work load, using default cassandra stress tool. Methods: A detailed literature research has been conducted to study the NoSQL databases, and the working of different compaction strategies in Apache Cassandra. The performances metrics are considered by the understanding of the literature research conducted, and considering the opinions of supervisors and Ericsson’s Apache Cassandra team. Two different tools were developed for collecting the performances of the considered metrics. The first tool was developed using Jython scripting language to collect the cassandra metrics, and the second tool was developed using python scripting language to collect the Operating System metrics. The graphs have been generated in Microsoft Excel, using the values obtained from the scripts. Results: Date Tiered Compaction Strategy and Size Tiered Compaction strategy showed more or less similar behaviour during the stress tests conducted. Level Tiered Compaction strategy has showed some remarkable results that effected the system performance, as compared to date tiered compaction and size tiered compaction strategies. Date tiered compaction strategy does not perform well for default cassandra stress model. Size tiered compaction can be preferred for default cassandra stress model, but not considerable for big data. Conclusions: With a detailed analysis and logical comparison of metrics, we finally conclude that Level Tiered Compaction Strategy performs better for a write heavy (90/10) workload while using default cassandra stress model, as compared to size tiered compaction and date tiered compaction strategies.
3

Performance Evaluation of Cassandra Scalability on Amazon EC2

Srinadhuni, Siddhartha January 2018 (has links)
Context In the fields of communication systems and computer science, Infrastructure as a Service consists of building blocks for cloud computing and to provide robust network features. AWS is one such infrastructure as a service which provides several services out of which Elastic Cloud Compute (EC2) is used to deploy virtual machines across several data centers and provides fault tolerant storage for applications across the cloud. Apache Cassandra is one of the many NoSQL databases which provides fault tolerance and elasticity across the servers. It has a ring structure which helps the communication effective between the nodes in a cluster. Cassandra is robust which means that there will not be a down-time when adding new Cassandra nodes to the existing cluster.  Objectives. In this study quantifying the latency in adding Cassandra nodes to the Amazon EC2 instances and assessing the impact of Replication factors (RF) and Consistency Levels (CL) on autoscaling have been put forth. Methods. Primarily a literature review is conducted on how the experiment with the above-mentioned constraints can be carried out. Further an experimentation is conducted to address the latency and the effects of autoscaling. A 3-node Cassandra cluster runs on Amazon EC2 with Ubuntu 14.04 LTS as the operating system. A threshold value is identified for each Cassandra specific configuration and is scaled over to five nodes on AWS utilizing the benchmarking tool, Cassandra stress tool. This procedure is repeated for a 5-node Cassandra cluster and each of the configurations with a mixed workload of equal reads and writes. Results. Latency has been identified in adding Cassandra nodes on Amazon EC2 instances and the impacts of replication factors and consistency levels on autoscaling have been quantified. Conclusions. It is concluded that there is a decrease in latency after autoscaling for all the configurations of Cassandra and changing the replication factors and consistency levels have also resulted in performance change of Cassandra.

Page generated in 0.0471 seconds