碩士 / 國立臺灣大學 / 資訊網路與多媒體研究所 / 102 / Logistic regression and linear SVM are useful methods for large-scale classification. However, their distributed implementations have not been well studied. Recently, because of the inefficiency of the MapReduce framework on iterative algorithms, Spark, an in-memory cluster-computing platform, has been proposed. It has emerged as a popular framework for large-scale data processing and analytics. In this work, we consider a distributed Newton method for solving logistic regression as well linear SVM and implement it on Spark. We carefully examine many implementation issues significantly affecting running time and propose our solutions. After conducting thorough empirical investigations, we release an efficient and easy-to-use tool for the Spark community.
Identifer | oai:union.ndltd.org:TW/102NTU05641017 |
Date | January 2014 |
Creators | Chieh-Yen Lin, 林玠言 |
Contributors | Chih-Jen Lin, 林智仁 |
Source Sets | National Digital Library of Theses and Dissertations in Taiwan |
Language | en_US |
Detected Language | English |
Type | 學位論文 ; thesis |
Format | 41 |
Page generated in 0.0018 seconds