Global ETD Search

Return to search

Hadoop scalability evaluation for machine learning algorithms on physical machines : Parallel machine learning on computing clusters

The amount of available data has allowed the field of machine learning to flourish. But with growing data set sizes comes an increase in algorithm execution times. Cluster computing frameworks provide tools for distributing data and processing power on several computer nodes and allows for algorithms to run in feasible time frames when data sets are large. Different cluster computing frameworks come with different trade-offs. In this thesis, the scalability of the execution time of machine learning algorithms running on the Hadoop cluster computing framework is investigated. A recent version of Hadoop and algorithms relevant in industry machine learning, namely K-means, latent Dirichlet allocation and naive Bayes are used in the experiments. This paper provides valuable information to anyone choosing between different cluster computing frameworks. The results show everything from moderate scalability to no scalability at all. These results indicate that Hadoop as a framework may have serious restrictions in how well tasks are actually parallelized. Possible scalability improvements could be achieved by modifying the machine learning library algorithms or by Hadoop parameter tuning.

http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20102

Parallel machine learning

cluster computing

Hadoop

performance analysis

Information Systems, Social aspects

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:his-20102
Date	January 2021
Creators	Roderus, Jens, Larson, Simon, Pihl, Eric
Publisher	Högskolan i Skövde, Institutionen för informationsteknologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds

Hadoop scalability evaluation for machine learning algorithms on physical machines : Parallel machine learning on computing clusters

Description

Links & Downloads

Tags

Additional Fields