• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Analysis and comparison of interfacing, data generation and workload implementation in BigDataBench 4.0 and Intel HiBench 7.0

Barosen, Alexander, Dalin, Sadok January 2018 (has links)
One of the major challenges in Big Data is the accurate and meaningful assessment of system performance. Unlike other systems, minor differences in efficiency can escalate to large differences in costs and power consumption. While there are several tools on the marketplace for measuring the performance of Big Data systems, few of them have been explored in-depth. This report investigated the interfacing, data generation and workload implementations of two Big Data benchmarking suites, BigDataBench and Hibench. The purpose of the study was to establish the capabilities of each tool with regards to interfacing, data generation and workload implementation. An exploratory and qualitative approach was used to gather information and analyze each benchmarking tool. Source code, documentation, and reports published by the developers were used as information sources. The results showed that BigDataBench and HiBench were designed similarly with regards to interfacing and data flow during the execution of a workload with the exception of streaming workloads. BigDataBench provided for more realistic data generation while the data generation for HiBench was easier to control. With regards to workload design, the workloads in BigDataBench were designed to be applicable to multiple frameworks while the workloads in HiBench were focused on the Hadoop family. In conclusion, neither of benchmarking suites was superior to the other. They were both designed for different purposes and should be applied on a case-by-case basis. / En av de stora utmaningarna i Big Data är den exakta och meningsfulla bedömningen av systemprestanda. Till skillnad från andra system kan mindre skillnader i effektivitet eskalera till stora skillnader i kostnader och strömförbrukning. Medan det finns flera verktyg på marknaden för att mäta prestanda för Big Data-system, har få av dem undersökts djupgående. I denna rapport undersöktes gränssnittet, datagenereringen och arbetsbelastningen av två Big Data benchmarking-sviter, BigDataBench och HiBench. Syftet med studien var att fastställa varje verktygs kapacitet med hänsyn till de givna kriterierna. Ett utforskande och kvalitativt tillvägagångssätt användes för att samla information och analysera varje benchmarking verktyg. Källkod, dokumentation och rapporter som hade skrivits och publicerats av utvecklarna användes som informationskällor. Resultaten visade att BigDataBench och HiBench utformades på samma sätt med avseende på gränssnitt och dataflöde under utförandet av en arbetsbelastning med undantag för strömmande arbetsbelastningar. BigDataBench tillhandahöll mer realistisk datagenerering medan datagenerering för HiBench var lättare att styra. När det gäller arbetsbelastningsdesign var arbetsbelastningen i BigDataBench utformad för att kunna tillämpas på flera ramar, medan arbetsbelastningen i HiBench var inriktad på Hadoop-familjen. Sammanfattningsvis var ingen av benchmarkingssuperna överlägsen den andra. De var båda utformade för olika ändamål och bör tillämpas från fall till fall.
2

Performance Evaluation of Hadoop based Big Data Applications with HiBench Benchmarking tool on IaaS Cloud Platforms

Muthiah, Karthika, Ms. 01 January 2017 (has links)
Cloud computing is a computing paradigm where large numbers of devices are connected through networks that provide a dynamically scalable infrastructure for applications, data and storage. Currently, many businesses, from small scale to big companies and industries, are changing their operations to utilize cloud services because cloud platforms could increase company’s growth through process efficiency and reduction in information technology spending [Coles16]. Companies are relying on cloud platforms like Amazon Web Services, Google Compute Engine, and Microsoft Azure, etc., for their business development. Due to the emergence of new technologies, devices, and communications, the amount of data produced is growing rapidly every day. Big data is a collection of large dataset, typically hundreds of gigabytes, terabytes or petabytes. Big data storage and the analytics of this huge volume of data are a great challenge for companies and new businesses to handle, which is a primary focus of this paper. This research was conducted on Amazon’s Elastic Compute Cloud (EC2) and Microsoft Azure platforms using the HiBench Hadoop Big Data Benchmark suite [HiBench16]. Processing huge volumes of data is a tedious task that is normally handled through traditional database servers. In contrast, Hadoop is a powerful framework is used to handle applications with big data requirements efficiently by using the MapReduce algorithm to run them on systems with many commodity hardware nodes. Hadoop’s distributed file system facilitates rapid storage and data transfer rates of big data among the nodes and remains operational even when a node failure has occurred in a cluster. HiBench is a big data benchmarking tool that is used for evaluating the performance of big data applications whose data are handled and controlled by the Hadoop framework cluster. Hadoop cluster environment was enabled and evaluated on two cloud platforms. A quantitative comparison was performed on Amazon EC2 and Microsoft Azure along with a study of their pricing models. Measures are suggested for future studies and research.

Page generated in 0.0274 seconds