1 |
Tópicos em métodos ótimos para otimização convexa / Topics in optimal methods for convex optimizationRossetto, Diane Rizzotto 29 March 2012 (has links)
Neste trabalho apresentamos um novo método ótimo para otimização de uma função convexa diferenciável sujeita a restrições convexas. Nosso método é baseado em ideias de Nesterov e Auslender e Teboulle. A proposta dos últimos autores usa uma distância de Bregman coerciva para garantir que os iterados permaneçam no interior do conjunto viável. Nosso método estende esses resultados para permitir o emprego da distância Euclidiana ao quadrado. Mostramos também como estimar a constante de Lipschitz para o gradiente da função objetivo, o que resulta em uma melhora na eficiência numérica do método. Finalmente, apresentamos experimentos numéricos para validar nossa proposta e comparar com o algoritmo de Nesterov. / In this work we introduce a new optimal method for constrained differentiable convex optimization which is based on previous ideas by Nesterov and Auslender and Teboulle. The method proposed by the last authors use a coercive Bregman distance to ensure that the iterates remain in the interior of the feasible set. Our results extend this method to allow the use of the squared Euclidean distance. We also show how to estimate the Lipschitz constant of the gradient of the objective function, improving the numerical behavior of the method. Finally, we present numerical experiments to validate our approach and compare it to Nesterov\'s algorithm.
|
2 |
Tópicos em métodos ótimos para otimização convexa / Topics in optimal methods for convex optimizationDiane Rizzotto Rossetto 29 March 2012 (has links)
Neste trabalho apresentamos um novo método ótimo para otimização de uma função convexa diferenciável sujeita a restrições convexas. Nosso método é baseado em ideias de Nesterov e Auslender e Teboulle. A proposta dos últimos autores usa uma distância de Bregman coerciva para garantir que os iterados permaneçam no interior do conjunto viável. Nosso método estende esses resultados para permitir o emprego da distância Euclidiana ao quadrado. Mostramos também como estimar a constante de Lipschitz para o gradiente da função objetivo, o que resulta em uma melhora na eficiência numérica do método. Finalmente, apresentamos experimentos numéricos para validar nossa proposta e comparar com o algoritmo de Nesterov. / In this work we introduce a new optimal method for constrained differentiable convex optimization which is based on previous ideas by Nesterov and Auslender and Teboulle. The method proposed by the last authors use a coercive Bregman distance to ensure that the iterates remain in the interior of the feasible set. Our results extend this method to allow the use of the squared Euclidean distance. We also show how to estimate the Lipschitz constant of the gradient of the objective function, improving the numerical behavior of the method. Finally, we present numerical experiments to validate our approach and compare it to Nesterov\'s algorithm.
|
3 |
Optimal stochastic and distributed algorithms for machine learningOuyang, Hua 20 September 2013 (has links)
Stochastic and data-distributed optimization algorithms have received lots of attention from the machine learning community due to the tremendous demand from the large-scale learning and the big-data related optimization. A lot of stochastic and deterministic learning algorithms are proposed recently under various application scenarios. Nevertheless, many of these algorithms are based on heuristics and their optimality in terms of the generalization error is not sufficiently justified. In this talk, I will explain the concept of an optimal learning algorithm, and show that given a time budget and proper hypothesis space, only those achieving the lower bounds of the estimation error and the optimization error are optimal. Guided by this concept, we investigated the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We proposed a novel algorithm named Accelerated Nonsmooth Stochastic Gradient Descent, which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algorithm that can achieve the optimal O(1/t) rate for minimizing nonsmooth loss functions. The fast rates are confirmed by empirical comparisons with state-of-the-art algorithms including the averaged SGD. The Alternating Direction Method of Multipliers (ADMM) is another flexible method to explore function structures. In the second part we proposed stochastic ADMM that can be applied to a general class of convex and nonsmooth functions, beyond the smooth and separable least squares loss used in lasso. We also demonstrate the rates of convergence for our algorithm under various structural assumptions of the stochastic function: O(1/sqrt{t}) for convex functions and O(log t/t) for strongly convex functions. A novel application named Graph-Guided SVM is proposed to demonstrate the usefulness of our algorithm. We also extend the scalability of stochastic algorithms to nonlinear kernel machines, where the problem is formulated as a constrained dual quadratic optimization. The simplex constraint can be handled by the classic Frank-Wolfe method. The proposed stochastic Frank-Wolfe methods achieve comparable or even better accuracies than state-of-the-art batch and online kernel SVM solvers, and are significantly faster. The last part investigates the problem of data-distributed learning. We formulate it as a consensus-constrained optimization problem and solve it with ADMM. It turns out that the underlying communication topology is a key factor in achieving a balance between a fast learning rate and computation resource consumption. We analyze the linear convergence behavior of consensus ADMM so as to characterize the interplay between the communication topology and the penalty parameters used in ADMM. We observe that given optimal parameters, the complete bipartite and the master-slave graphs exhibit the fastest convergence, followed by bi-regular graphs.
|
Page generated in 0.0595 seconds