Spelling suggestions: "subject:"finitem"" "subject:"finite""
1 |
New Optimization Methods for Modern Machine LearningReddi, Sashank Jakkam 01 July 2017 (has links)
Modern machine learning systems pose several new statistical, scalability, privacy and ethical challenges. With the advent of massive datasets and increasingly complex tasks, scalability has especially become a critical issue in these systems. In this thesis, we focus on fundamental challenges related to scalability, such as computational and communication efficiency, in modern machine learning applications. The underlying central message of this thesis is that classical statistical thinking leads to highly effective optimization methods for modern big data applications. The first part of the thesis investigates optimization methods for solving large-scale nonconvex Empirical Risk Minimization (ERM) problems. Such problems have surged into prominence, notably through deep learning, and have led to exciting progress. However, our understanding of optimization methods suitable for these problems is still very limited. We develop and analyze a new line of optimization methods for nonconvex ERM problems, based on the principle of variance reduction. We show that our methods exhibit fast convergence to stationary points and improve the state-of-the-art in several nonconvex ERM settings, including nonsmooth and constrained ERM. Using similar principles, we also develop novel optimization methods that provably converge to second-order stationary points. Finally, we show that the key principles behind our methods can be generalized to overcome challenges in other important problems such as Bayesian inference. The second part of the thesis studies two critical aspects of modern distributed machine learning systems — asynchronicity and communication efficiency of optimization methods. We study various asynchronous stochastic algorithms with fast convergence for convex ERM problems and show that these methods achieve near-linear speedups in sparse settings common to machine learning. Another key factor governing the overall performance of a distributed system is its communication efficiency. Traditional optimization algorithms used in machine learning are often ill-suited for distributed environments with high communication cost. To address this issue, we dis- cuss two different paradigms to achieve communication efficiency of algorithms in distributed environments and explore new algorithms with better communication complexity.
|
2 |
Large-Scale Optimization With Machine Learning ApplicationsVan Mai, Vien January 2019 (has links)
This thesis aims at developing efficient algorithms for solving some fundamental engineering problems in data science and machine learning. We investigate a variety of acceleration techniques for improving the convergence times of optimization algorithms. First, we investigate how problem structure can be exploited to accelerate the solution of highly structured problems such as generalized eigenvalue and elastic net regression. We then consider Anderson acceleration, a generic and parameter-free extrapolation scheme, and show how it can be adapted to accelerate practical convergence of proximal gradient methods for a broad class of non-smooth problems. For all the methods developed in this thesis, we design novel algorithms, perform mathematical analysis of convergence rates, and conduct practical experiments on real-world data sets. / <p>QC 20191105</p>
|
3 |
Non-Convex Optimization for Latent Data Models : Algorithms, Analysis and Applications / Optimisation Non Convexe pour Modèles à Données Latentes : Algorithmes, Analyse et ApplicationsKarimi, Belhal 19 September 2019 (has links)
De nombreux problèmes en Apprentissage Statistique consistent à minimiser une fonction non convexe et non lisse définie sur un espace euclidien. Par exemple, les problèmes de maximisation de la vraisemblance et la minimisation du risque empirique en font partie.Les algorithmes d'optimisation utilisés pour résoudre ce genre de problèmes ont été largement étudié pour des fonctions convexes et grandement utilisés en pratique.Cependant, l'accrudescence du nombre d'observation dans l'évaluation de ce risque empirique ajoutée à l'utilisation de fonctions de perte de plus en plus sophistiquées représentent des obstacles.Ces obstacles requièrent d'améliorer les algorithmes existants avec des mis à jour moins coûteuses, idéalement indépendantes du nombre d'observations, et d'en garantir le comportement théorique sous des hypothèses moins restrictives, telles que la non convexité de la fonction à optimiser.Dans ce manuscrit de thèse, nous nous intéressons à la minimisation de fonctions objectives pour des modèles à données latentes, ie, lorsque les données sont partiellement observées ce qui inclut le sens conventionnel des données manquantes mais est un terme plus général que cela.Dans une première partie, nous considérons la minimisation d'une fonction (possiblement) non convexe et non lisse en utilisant des mises à jour incrémentales et en ligne. Nous proposons et analysons plusieurs algorithmes à travers quelques applications.Dans une seconde partie, nous nous concentrons sur le problème de maximisation de vraisemblance non convexe en ayant recourt à l'algorithme EM et ses variantes stochastiques. Nous en analysons plusieurs versions rapides et moins coûteuses et nous proposons deux nouveaux algorithmes du type EM dans le but d'accélérer la convergence des paramètres estimés. / Many problems in machine learning pertain to tackling the minimization of a possibly non-convex and non-smooth function defined on a Many problems in machine learning pertain to tackling the minimization of a possibly non-convex and non-smooth function defined on a Euclidean space.Examples include topic models, neural networks or sparse logistic regression.Optimization methods, used to solve those problems, have been widely studied in the literature for convex objective functions and are extensively used in practice.However, recent breakthroughs in statistical modeling, such as deep learning, coupled with an explosion of data samples, require improvements of non-convex optimization procedure for large datasets.This thesis is an attempt to address those two challenges by developing algorithms with cheaper updates, ideally independent of the number of samples, and improving the theoretical understanding of non-convex optimization that remains rather limited.In this manuscript, we are interested in the minimization of such objective functions for latent data models, ie, when the data is partially observed which includes the conventional sense of missing data but is much broader than that.In the first part, we consider the minimization of a (possibly) non-convex and non-smooth objective function using incremental and online updates.To that end, we propose several algorithms exploiting the latent structure to efficiently optimize the objective and illustrate our findings with numerous applications.In the second part, we focus on the maximization of non-convex likelihood using the EM algorithm and its stochastic variants.We analyze several faster and cheaper algorithms and propose two new variants aiming at speeding the convergence of the estimated parameters.
|
Page generated in 0.0302 seconds