Large-scale Logistic Regression and Linear Support Vector Machines Using Spark / 大規模羅吉斯回歸與線性支持向量機在Spark上之應用

碩士 / 國立臺灣大學 / 資訊網路與多媒體研究所 / 102 / Logistic regression and linear SVM are useful methods for large-scale classification. However, their distributed implementations have not been well studied. Recently, because of the inefficiency of the MapReduce framework on iterative algorithms, Spark, an in-memory cluster-computing platform, has been proposed. It has emerged as a popular framework for large-scale data processing and analytics. In this work, we consider a distributed Newton method for solving logistic regression as well linear SVM and implement it on Spark. We carefully examine many implementation issues significantly affecting running time and propose our solutions. After conducting thorough empirical investigations, we release an efficient and easy-to-use tool for the Spark community.

Identiferoai:union.ndltd.org:TW/102NTU05641017
Date January 2014
CreatorsChieh-Yen Lin, 林玠言
ContributorsChih-Jen Lin, 林智仁
Source SetsNational Digital Library of Theses and Dissertations in Taiwan
Languageen_US
Detected LanguageEnglish
Type學位論文 ; thesis
Format41

Page generated in 0.0018 seconds