Spelling suggestions: "subject:"gig data platforms"" "subject:"gig mata platforms""
1 |
Compaction Strategies in Apache Cassandra : Analysis of Default Cassandra stress modelRavu, Venkata Sathya Sita J S January 2016 (has links)
Context. The present trend in a large variety of applications are ranging from the web and social networking to telecommunications, is to gather and process very large and fast growing amounts of information leading to a common set of problems known collectively as “Big Data”. The ability to process large scale data analytics over large number of data sets in the last decade proved to be a competitive advantage in a wide range of industries like retail, telecom and defense etc. In response to this trend, the research community and the IT industry have proposed a number of platforms to facilitate large scale data analytics. Such platforms include a new class of databases, often refer to as NoSQL data stores. Apache Cassandra is a type of NoSQL data store. This research is focused on analyzing the performance of different compaction strategies in different use cases for default Cassandra stress model. Objectives. The performance of compaction strategies are observed in various scenarios on the basis of three use cases, Write heavy- 90/10, Read heavy- 10/90 and Balanced- 50/50. For a default Cassandra stress model, so as to finally provide the necessary events and specifications that suggest when to switch from one compaction strategy to another. Methods. Cassandra single node network is deployed on a web server and its behavior of read and write performance with different compaction strategies is studied with read heavy, write heavy and balanced workloads. Its performance metrics are collected and analyzed. Results. Performance metrics of different compaction strategies are evaluated and analyzed. Conclusions. With a detailed analysis and logical comparison, we finally conclude that Level Tiered Compaction Strategy performs better for a read heavy (10/90) workload while using default Cassandra stress model , as compared to size tiered compaction and date tiered compaction strategies. And for Balanced Date tiered compaction strategy performs better than size tiered compaction strategy and date tiered compaction strategy.
|
2 |
Performance Tuning of Big Data Platform : Cassandra Case StudySathvik, Katam January 2016 (has links)
Usage of cloud-based storage systems gained a lot of prominence in fast few years. Every day millions of files are uploaded and downloaded from cloud storage. This data that cannot be handled by traditional databases and this is considered to be Big Data. New powerful platforms have been developed to store and organize big and unstructured data. These platforms are called Big Data systems. Some of the most popular big data platform are Mongo, Hadoop, and Cassandra. In this, we used Cassandra database management system because it is an open source platform that is developed in java. Cassandra has a masterless ring architecture. The data is replicated among all the nodes for fault tolerance. Unlike MySQL, Cassandra uses per-column basis technique to store data. Cassandra is a NoSQL database system, which can handle unstructured data. Most of Cassandra parameters are scalable and are easy to configure. Amazon provides cloud computing platform that helps a user to perform heavy computing tasks over remote hardware systems. This cloud computing platform is known as Amazon Web Services. AWS services also include database deployment and network management services, that have a non-complex user experience. In this document, a detailed explanation on Cassandra database deployment on AWS platform is explained followed by Cassandra performance tuning. In this study impact on read and write performance with change Cassandra parameters when deployed on Elastic Cloud Computing platform are investigated. The performance evaluation of a three node Cassandra cluster is done. With the knowledge of configuration parameters a three node, Cassandra database is performance tuned and a draft model is proposed. A cloud environment suitable for the experiment is created on AWS. A three node Cassandra database management system is deployed in cloud environment created. The performance of this three node architecture is evaluated and is tested with different configuration parameters. The configuration parameters are selected based on the Cassandra metrics behavior with the change in parameters. Selected parameters are changed and the performance difference is observed and analyzed. Using this analysis, a draft model is developed after performance tuning selected parameters. This draft model is tested with different workloads and compared with default Cassandra model. The change in the key cache memory and memTable parameters showed improvement in performance metrics. With increases of key cache size and save time period, read performance improved. This also showed effect on system metrics like increasing CPU load and disk through put, decreasing operation time and The change in memTable parameters showed the effect on write performance and disk space utilization. With increase in threshold value of memTable flush writer, disk through put increased and operation time decreased. The draft derived from performance evaluation has better write and read performance.
|
Page generated in 0.0998 seconds