Global ETD Search

Return to search

Algorithmes d'apprentissage pour les grandes masses de données : Application à la classification multi-classes et à l'optimisation distribuée asynchrone / Scalable algorithms for large-scale machine learning problems : Application to multiclass classification and asynchronous distributed optimization

L'objectif de cette thèse est de développer des algorithmes d'apprentissage adaptés aux grandes masses de données. Dans un premier temps, nous considérons le problème de la classification avec un grand nombre de classes. Afin d'obtenir un algorithme adapté à la grande dimension, nous proposons un algorithme qui transforme le problème multi-classes en un problème de classification binaire que nous sous-échantillonnons de manière drastique. Afin de valider cette méthode, nous fournissons une analyse théorique et expérimentale détaillée.Dans la seconde partie, nous approchons le problème de l'apprentissage sur données distribuées en introduisant un cadre asynchrone pour le traitement des données. Nous appliquons ce cadre à deux applications phares : la factorisation de matrice pour les systèmes de recommandation en grande dimension et la classification binaire. / This thesis focuses on developing scalable algorithms for large scale machine learning. In this work, we present two perspectives to handle large data. First, we consider the problem of large-scale multiclass classification. We introduce the task of multiclass classification and the challenge of classifying with a large number of classes. To alleviate these challenges, we propose an algorithm which reduces the original multiclass problem to an equivalent binary one. Based on this reduction technique, we introduce a scalable method to tackle the multiclass classification problem for very large number of classes and perform detailed theoretical and empirical analyses.In the second part, we discuss the problem of distributed machine learning. In this domain, we introduce an asynchronous framework for performing distributed optimization. We present application of the proposed asynchronous framework on two popular domains: matrix factorization for large-scale recommender systems and large-scale binary classification. In the case of matrix factorization, we perform Stochastic Gradient Descent (SGD) in an asynchronous distributed manner. Whereas, in the case of large-scale binary classification we use a variant of SGD which uses variance reduction technique, SVRG as our optimization algorithm.

http://www.theses.fr/2017GREAM046/document

Apprentissage machine

Filtrage collaboratif

Cadre distribué

Machine learning

Collaborative filtering

Distributed Framework

004

Identifer	oai:union.ndltd.org:theses.fr/2017GREAM046
Date	26 September 2017
Creators	Joshi, Bikash
Contributors	Grenoble Alpes, Amini, Massih-Reza, Iutzeler, Franck
Source Sets	Dépôt national des thèses électroniques françaises
Language	English
Detected Language	English
Type	Electronic Thesis or Dissertation, Text

Page generated in 0.002 seconds

Description

Links & Downloads

Tags

Additional Fields