Spelling suggestions: "subject:"descent methods"" "subject:"crescent methods""
1 |
Randomized coordinate descent methods for big data optimizationTakac, Martin January 2014 (has links)
This thesis consists of 5 chapters. We develop new serial (Chapter 2), parallel (Chapter 3), distributed (Chapter 4) and primal-dual (Chapter 5) stochastic (randomized) coordinate descent methods, analyze their complexity and conduct numerical experiments on synthetic and real data of huge sizes (GBs/TBs of data, millions/billions of variables). In Chapter 2 we develop a randomized coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth separable convex function and prove that it obtains an ε-accurate solution with probability at least 1 - p in at most O((n/ε) log(1/p)) iterations, where n is the number of blocks. This extends recent results of Nesterov [43], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing ε from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving first true iteration complexity bounds. For strongly convex functions the method converges linearly. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Our analysis is also much simpler. In Chapter 3 we show that the randomized coordinate descent method developed in Chapter 2 can be accelerated by parallelization. The speedup, as compared to the serial method, and referring to the number of iterations needed to approximately solve the problem with high probability, is equal to the product of the number of processors and a natural and easily computable measure of separability of the smooth component of the objective function. In the worst case, when no degree of separability is present, there is no speedup; in the best case, when the problem is separable, the speedup is equal to the number of processors. Our analysis also works in the mode when the number of coordinates being updated at each iteration is random, which allows for modeling situations with variable (busy or unreliable) number of processors. We demonstrate numerically that the algorithm is able to solve huge-scale l1-regularized least squares problems with a billion variables. In Chapter 4 we extended coordinate descent into a distributed environment. We initially partition the coordinates (features or examples, based on the problem formulation) and assign each partition to a different node of a cluster. At every iteration, each node picks a random subset of the coordinates from those it owns, independently from the other computers, and in parallel computes and applies updates to the selected coordinates based on a simple closed-form formula. We give bounds on the number of iterations sufficient to approximately solve the problem with high probability, and show how it depends on the data and on the partitioning. We perform numerical experiments with a LASSO instance described by a 3TB matrix. Finally, in Chapter 5, we address the issue of using mini-batches in stochastic optimization of Support Vector Machines (SVMs). We show that the same quantity, the spectral norm of the data, controls the parallelization speedup obtained for both primal stochastic subgradient descent (SGD) and stochastic dual coordinate ascent (SCDA) methods and use it to derive novel variants of mini-batched (parallel) SDCA. Our guarantees for both methods are expressed in terms of the original nonsmooth primal problem based on the hinge-loss. Our results in Chapters 2 and 3 are cast for blocks (groups of coordinates) instead of coordinates, and hence the methods are better described as block coordinate descent methods. While the results in Chapters 4 and 5 are not formulated for blocks, they can be extended to this setting.
|
2 |
Stochastische Gradientenverfahren zur Optimierung unter Echtzeitbedingungen in der adaptiven OptikLüdke, Johannes 20 October 2017 (has links)
Diese Diplomarbeit befasst sich mit stochastischen Gradientenverfahren als Methoden der numerischen Optimierung für die Anwendung in der adaptiven Optik. Die adaptive Optik wird insbesondere im Fall der Propagation von Licht durch die Atmosphäre zur Verbesserung der Abbildungseigenschaften eines optischen Systems verwendet. Als Verfahren der Wahl wird das Simultaneous Perturbation
Stochastic Approximation Verfahren (SPSA-Verfahren) betrachtet, das aufgrund seiner geringen Anzahl von benötigten Zielfunktionsauswertungen pro Iteration sehr gut zu den Echtzeit-Anforderungen der Anwendung passt. Neben der Darstellung des physikalischen Hintergrunds wird Wert auf eine fundierte Verankerung der Analyse der Verfahren in der Wahrscheinlichkeitstheorie gelegt. Die verwendeten Resultate u.a. der Martingaltheorie werden im Rahmen der Arbeit in die benötigte Form gebracht. Als Hintergrund wird ein Einblick in die Theorie der deterministischen, ableitungsfreienGradientenverfahren gegeben. Die Konvergenz des SPSA-Verfahrens wird auf ein Konvergenztheorem für Stochastic-Approximation-Verfahren von Kushner und Clark zurückgeführt. Dabei wurde in der Konvergenztheorie zunächst Spall gefolgt, aufgrund von wahrscheinlichkeitstheoretischen Überlegungen werden dann aber teilweise geänderte Voraussetzungen verwendet.Der praktische Teil der Arbeit beschäftigt sich mit der Anwendung des Verfahrens in der adaptiven Optik am Beispiel eines Labor-Testsystems. Die Voraussetzungen der Theorie werden für die Anwendung gewürdigt. Zur Umsetzung des Verfahrens in einen konkreten Algorithmus wird auf die Parameterwahl und mögliche Erweiterungen eingegangen. Die Wahl der Parameter nach Spall wird um eine Überlegung aus der Theorie der numerischen Differentiation erweitert, so dass sich eine andere Empfehlung hinsichtlich der Wahl der Schrittweite für die Gradientenschätzung ergibt. Ein Optimierungsdurchgang wird dargestellt und bewertet und anschließend ein Ausblick für die mögliche Verwendung des Verfahrens in einem adaptiv-optischen System gegeben.
|
3 |
Hybrid Steepest-Descent Methods for Variational InequalitiesHuang, Wei-ling 26 June 2006 (has links)
Assume that F is a nonlinear operator on a real Hilbert space H which is strongly monotone and Lipschitzian on a nonempty closed convex subset C of H. Assume also that C is the intersection of the fixed point sets of a finite number of nonexpansive mappings on H. We make a slight modification of the iterative algorithm in Xu and Kim (Journal of Optimization Theory and Applications, Vol. 119, No. 1, pp. 185-201, 2003), which generates a sequence {xn} from an arbitrary initial point x0 in H. The sequence {xn} is shown to converge in norm to the unique solution u* of the variational inequality, under the conditions different from Xu and Kim¡¦s ones imposed on the parameters. Applications to constrained generalized pseudoinverse are included. The results presented in this paper are complementary ones to Xu and Kim¡¦s theorems (Journal of Optimization Theory and Applications, Vol. 119, No. 1, pp. 185-201, 2003).
|
4 |
Continuous steepest descent path for traversing non-convex regionsBeddiaf, Salah January 2016 (has links)
In this thesis, we investigate methods of finding a local minimum for unconstrained problems of non-convex functions with n variables, by following the solution curve of a system of ordinary differential equations. The motivation for this was the fact that existing methods (e.g. those based on Newton methods with line search) sometimes terminate at a non-stationary point when applied to functions f(x) that do not a have positive-definite Hessian (i.e. ∇²f → 0) for all x. Even when methods terminate at a stationary point it could be a saddle or maximum rather than a minimum. The only method which makes intuitive sense in non-convex region is the trust region approach where we seek a step which minimises a quadratic model subject to a restriction on the two-norm of the step size. This gives a well-defined search direction but at the expense of a costly evaluation. The algorithms derived in this thesis are gradient based methods which require systems of equations to be solved at each step but which do not use a line search in the usual sense. Progress along the Continuous Steepest Descent Path (CSDP) is governed both by the decrease in the function value and measures of accuracy of a local quadratic model. Numerical results on specially constructed test problems and a number of standard test problems from CUTEr [38] show that the approaches we have considered are more promising when compared with routines in the optimization tool box of MATLAB [46], namely the trust region method and the quasi-Newton method. In particular, they perform well in comparison with the, superficially similar, gradient-flow method proposed by Behrman [7].
|
5 |
Descent dynamical systems and algorithms for tame optimization, and multi-objective problems / Systèmes dynamiques de descente et algorithmes pour l'optimisation modérée, et les problèmes multi-objectifGarrigos, Guillaume 02 November 2015 (has links)
Dans une première partie, nous nous intéressons aux systèmes dynamiques gradients gouvernés par des fonctions non lisses, mais aussi non convexes, satisfaisant l'inégalité de Kurdyka-Lojasiewicz. Après avoir obtenu quelques résultats préliminaires pour la dynamique de la plus grande pente continue, nous étudions un algorithme de descente général. Nous prouvons, sous une hypothèse de compacité, que tout suite générée par ce schéma général converge vers un point critique de la fonction. Nous obtenons aussi de nouveaux résultats sur la vitesse de convergence, tant pour les valeurs que pour les itérés. Ce schéma général couvre en particulier des versions parallélisées de la méthode forward-backward, autorisant une métrique variable et des erreurs relatives. Cela nous permet par exemple de proposer une version non convexe non lisse de l'algorithme Levenberg-Marquardt. Enfin, nous proposons quelques applications de ces algorithmes aux problèmes de faisabilité, et aux problèmes inverses. Dans une seconde partie, cette thèse développe une dynamique de descente associée à des problèmes d'optimisation vectoriels sous contrainte. Pour cela, nous adaptons la dynamique de la plus grande pente usuelle aux fonctions à valeurs dans un espace ordonné par un cône convexe fermé solide. Cette dynamique peut être vue comme l'analogue continu de nombreux algorithmes développés ces dernières années. Nous avons un intérêt particulier pour les problèmes de décision multi-objectifs, pour lesquels cette dynamique de descente fait décroitre toutes les fonctions objectif au cours du temps. Nous prouvons l'existence de trajectoires pour cette dynamique continue, ainsi que leur convergence vers des points faiblement efficients. Finalement, nous explorons une nouvelle dynamique inertielle pour les problèmes multi-objectif, avec l'ambition de développer des méthodes rapides convergeant vers des équilibres de Pareto. / In a first part, we focus on gradient dynamical systems governed by non-smooth but also non-convex functions, satisfying the so-called Kurdyka-Lojasiewicz inequality.After obtaining preliminary results for a continuous steepest descent dynamic, we study a general descent algorithm. We prove, under a compactness assumption, that any sequence generated by this general scheme converges to a critical point of the function.We also obtain new convergence rates both for the values and the iterates. The analysis covers alternating versions of the forward-backward method, with variable metric and relative errors. As an example, a non-smooth and non-convex version of the Levenberg-Marquardt algorithm is detailed.Applications to non-convex feasibility problems, and to sparse inverse problems are discussed.In a second part, the thesis explores descent dynamics associated to constrained vector optimization problems. For this, we adapt the classic steepest descent dynamic to functions with values in a vector space ordered by a solid closed convex cone. It can be seen as the continuous analogue of various descent algorithms developed in the last years.We have a particular interest for multi-objective decision problems, for which the dynamic make decrease all the objective functions along time.We prove the existence of trajectories for this continuous dynamic, and show their convergence to weak efficient points.Then, we explore an inertial dynamic for multi-objective problems, with the aim to provide fast methods converging to Pareto points.
|
6 |
Sobre a convergência de métodos de descida em otimização não-suave: aplicações à ciência comportamental / On the convergence of descent methods in nonsmooth optimization: applications to behavioral scienceSousa Júnior, Valdinês Leite de 03 February 2017 (has links)
Submitted by Cássia Santos (cassia.bcufg@gmail.com) on 2017-02-22T12:12:47Z
No. of bitstreams: 2
Tese - Valdinês Leite de Sousa Júnior - 2017.pdf: 2145153 bytes, checksum: 388666d9bc1ff5aa261882785a3cc5e0 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-02-22T13:04:40Z (GMT) No. of bitstreams: 2
Tese - Valdinês Leite de Sousa Júnior - 2017.pdf: 2145153 bytes, checksum: 388666d9bc1ff5aa261882785a3cc5e0 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-02-22T13:04:40Z (GMT). No. of bitstreams: 2
Tese - Valdinês Leite de Sousa Júnior - 2017.pdf: 2145153 bytes, checksum: 388666d9bc1ff5aa261882785a3cc5e0 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2017-02-03 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq / In this work, we investigate four different types of descent methods: a dual descent method in the scalar context and a multiobjective proximal point methods (one exact and two inexact versions). The first one is restricted to functions that satisfy the Kurdyka-Lojasiewicz property, where it is used a quasi-distance as a regularization function. In the next three methods, the objective is to study the convergence of a multiobjective proximal methods (exact an inexact) for a particular class of multiobjective functions that are not necessarily differentiable. For the inexact methods, we choose a proximal distance as the regularization term. Such a well-known distance allows us to analyze the convergence of the method under various settings. Applications in behavioral sciences are analyzed in the sense of the variational rationality approach. / Neste trabalho, investigaremos quatro tipos diferentes de métodos de descida: um método de descida dual e três versões do método do ponto proximal (exato e inexato) em otimização multiobjetivo. No primeiro, a análise de convergência será restrita a funções que satisfazem a propriedade Kurdyka-Lojasiewicz, onde é usada uma quase-distância como função regularizadora. Nos seguintes, o objetivo é estudar a convergência de uma versão exata e duas versões inexatas do método de ponto proximal em otimização multiobjetivo para uma classe particular de funções multiobjetivo que não são necessariamente diferenciáveis. Para os métodos inexatos, escolhemos uma distância proximal como termo regularizador. Aplicações em ciência comportamental serão analisadas no sentido da abordagem da teoria de racionalidade variacional.
|
Page generated in 0.0515 seconds