Spelling suggestions: "subject:"firstorder algorithms"" "subject:"rstorder algorithms""
1 |
First-Order Algorithms for Communication Efficient Distributed LearningKhirirat, Sarit January 2019 (has links)
Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms. In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression. In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error. Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results. / Utvecklandet av kommunikationsteknologi och datalagring har gjort stora mängder av datasamlingar mer tillgängliga än någonsin. Denna förändring leder till numeriska optimeringsproblem med datamängder med stor skala i volym och dimension. Som svar på denna trend har populariteten för högpresterande beräkningsarkitekturer ökat mer än någonsin tidigare. Skalbara optimeringsverktyg kan uppnå hög effektivitet genom att fördela beräkningsbördan mellan ett flertal maskiner. De kommer dock i praktiken med ett pris som utgörs av betydande kommunikationsomkostnader. Detta orsakar ett skifte i flaskhalsen för prestandan från beräkningar till kommunikation. När lösning av verkliga optimeringsproblem sker med ett stort antal parametrar, dominerar kommunikationen mellan maskiner nästan 80% av träningstiden. För att minska kommunikationsbelastningen, har ett flertal kompressionstekniker föreslagits i litteraturen. Även om optimeringsalgoritmer som använder dessa kompressorer rapporteras vara lika konkurrenskraftiga som sina motsvarigheter med full precision, dras de med en förlust av noggrannhet. För att ge en uppfattning om detta, utvecklar vi i denna avhandling teori och tekniker för att designa kommunikations-effektiva optimeringsalgoritmer som endast använder information med låg precision. I den första delen analyserar vi konvergensen hos optimeringsalgoritmer med direkt kompression. Först ger vi en översikt av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan presenterar vi ett enhetligt analysramverk för optimeringsalgoritmer med kompressorer, som kan vara antingen deterministiska eller randomiserade. I synnerhet visas val av parametrar i komprimerade optimeringsalgoritmer som avgörs av kompressorns parametrar som garanterar konvergens. Våra konvergensgarantier visar beroende av kompressorns noggrannhet och fördröjningseffekter på grund av asynkronicitet hos algoritmer. Detta låter oss karakterisera avvägningen mellan iterations- och kommunikations-komplexitet när kompression används. I den andra delen studerarvi hög prestanda hos felkompenseringsmetoder för komprimerade optimeringsalgoritmer. Även om konvergensgarantier med felkompensering har etablerats finns det väldigt begränsat teoretiskt stöd för konkurrenskraftiga konvergensgarantier med felkompensering. Vi utvecklar därför teoretiska förklaringar, som visar att användande av felkompensering garanterar godtyckligt hög lösningsnoggrannhet från komprimerad information. I synnerhet bidrar felkompensering till att ta bort ackumulerade kompressionsfel och förbättrar därmed lösningsnoggrannheten speciellt för illa konditionerade kvadratiska optimeringsproblem. Vi presenterar också stark konvergensanalys för felkompensering tillämpat på stokastiska gradientmetoder med ett kommunikationsnätverk innehållande ett flertal maskiner. De felkompenserade algoritmerna resulterar, i motsats till direkt kompression, i betydande reducering av kompressionsfelet. Simuleringar av algoritmer i denna avhandling på verkligaproblem med referensdatamängder validerar våra teoretiska resultat. / <p>QC20191120</p>
|
2 |
Elimination dynamique : accélération des algorithmes d'optimisation convexe pour les régressions parcimonieuses / Dynamic screening : accelerating convex optimization algorithms for sparse regressionsBonnefoy, Antoine 15 April 2016 (has links)
Les algorithmes convexes de résolution pour les régressions linéaires parcimonieuses possèdent de bonnes performances pratiques et théoriques. Cependant, ils souffrent tous des dimensions du problème qui dictent la complexité de chacune de leur itération. Nous proposons une approche pour réduire ce coût calculatoire au niveau de l'itération. Des stratégies récentes s'appuyant sur des tests d'élimination de variables ont été proposées pour accélérer la résolution des problèmes de régressions parcimonieuse pénalisées tels que le LASSO. Ces approches reposent sur l'idée qu'il est profitable de dédier un petit effort de calcul pour localiser des atomes inactifs afin de les retirer du dictionnaire dans une étape de prétraitement. L'algorithme de résolution utilisant le dictionnaire ainsi réduit convergera alors plus rapidement vers la solution du problème initial. Nous pensons qu'il existe un moyen plus efficace pour réduire le dictionnaire et donc obtenir une meilleure accélération : à l'intérieur de chaque itération de l'algorithme, il est possible de valoriser les calculs originalement dédiés à l'algorithme pour obtenir à moindre coût un nouveau test d'élimination dont l'effet d'élimination augmente progressivement le long des itérations. Le dictionnaire est alors réduit de façon dynamique au lieu d'être réduit de façon statique, une fois pour toutes, avant la première itération. Nous formalisons ce principe d'élimination dynamique à travers une formulation algorithmique générique, et l'appliquons en intégrant des tests d'élimination existants, à l'intérieur de plusieurs algorithmes du premier ordre pour résoudre les problèmes du LASSO et Group-LASSO. / Applications in signal processing and machine learning make frequent use of sparse regressions. Resulting convex problems, such as the LASSO, can be efficiently solved thanks to first-order algorithms, which are general, and have good convergence properties. However those algorithms suffer from the dimension of the problem, which impose the complexity of their iterations. In this thesis we study approaches, based on screening tests, aimed at reducing the computational cost at the iteration level. Such approaches build upon the idea that it is worth dedicating some small computational effort to locate inactive atoms and remove them from the dictionary in a preprocessing stage so that the regression algorithm working with a smaller dictionary will then converge faster to the solution of the initial problem. We believe that there is an even more efficient way to screen the dictionary and obtain a greater acceleration: inside each iteration of the regression algorithm, one may take advantage of the algorithm computations to obtain a new screening test for free with increasing screening effects along the iterations. The dictionary is henceforth dynamically screened instead of being screened statically, once and for all, before the first iteration. Our first contribution is the formalisation of this principle and its application to first-order algorithms, for the resolution of the LASSO and Group-LASSO. In a second contribution, this general principle is combined to active-set methods, whose goal is also to accelerate the resolution of sparse regressions. Applying the two complementary methods on first-order algorithms, leads to great acceleration performances.
|
3 |
Reconstruction adaptative des signaux par optimisation convexe / Adaptive signals recovery by convex optimizationOstrovskii, Dmitrii 11 January 2018 (has links)
Nous considérons le problème de débruitage d'un signal ou d'une image observés dans le bruit gaussien. Dans ce problème les estimateurs linéaires classiques sont quasi-optimaux quand l'ensemble des signaux, qui doit être convexe et compact, est connu a priori. Si cet ensemble n'est pas spécifié, la conception d'un estimateur adaptatif qui ``ne connait pas'' la structure cachée du signal reste un problème difficile. Dans cette thèse, nous étudions une nouvelle famille d'estimateurs des signaux satisfaisant certains propriétés d'invariance dans le temps. De tels signaux sont caractérisés par leur structure harmonique, qui est généralement inconnu dans la pratique.Nous proposons des nouveaux estimateurs capables d'exploiter la structure harmonique inconnue du signal è reconstruire. Nous démontrons que ces estimateurs obéissent aux divers "inégalités d'oracle," et nous proposons une implémentation algorithmique numériquement efficace de ces estimateurs basée sur des algorithmes d'optimisation de "premier ordre." Nous évaluons ces estimateurs sur des données synthétiques et sur des signaux et images réelles. / We consider the problem of denoising a signal observed in Gaussian noise.In this problem, classical linear estimators are quasi-optimal provided that the set of possible signals is convex, compact, and known a priori. However, when the set is unspecified, designing an estimator which does not ``know'' the underlying structure of a signal yet has favorable theoretical guarantees of statistical performance remains a challenging problem. In this thesis, we study a new family of estimators for statistical recovery of signals satisfying certain time-invariance properties. Such signals are characterized by their harmonic structure, which is usually unknown in practice. We propose new estimators which are capable to exploit the unknown harmonic structure of a signal to reconstruct. We demonstrate that these estimators admit theoretical performance guarantees, in the form of oracle inequalities, in a variety of settings.We provide efficient algorithmic implementations of these estimators via first-order optimization algorithm with non-Euclidean geometry, and evaluate them on synthetic data, as well as some real-world signals and images.
|
Page generated in 0.0743 seconds