Global ETD Search

1	Provably efficient algorithms for decentralized optimization Liu, Changxin 31 August 2021 (has links) Decentralized multi-agent optimization has emerged as a powerful paradigm that finds broad applications in engineering design including federated machine learning and control of networked systems. In these setups, a group of agents are connected via a network with general topology. Under the communication constraint, they aim to solving a global optimization problem that is characterized collectively by their individual interests. Of particular importance are the computation and communication efficiency of decentralized optimization algorithms. Due to the heterogeneity of local objective functions, fostering cooperation across the agents over a possibly time-varying network is challenging yet necessary to achieve fast convergence to the global optimum. Furthermore, real-world communication networks are subject to congestion and bandwidth limit. To relieve the difficulty, it is highly desirable to design communication-efficient algorithms that proactively reduce the utilization of network resources. This dissertation tackles four concrete settings in decentralized optimization, and develops four provably efficient algorithms for solving them, respectively. Chapter 1 presents an overview of decentralized optimization, where some preliminaries, problem settings, and the state-of-the-art algorithms are introduced. Chapter 2 introduces the notation and reviews some key concepts that are useful throughout this dissertation. In Chapter 3, we investigate the non-smooth cost-coupled decentralized optimization and a special instance, that is, the dual form of constraint-coupled decentralized optimization. We develop a decentralized subgradient method with double averaging that guarantees the last iterate convergence, which is crucial to solving decentralized dual Lagrangian problems with convergence rate guarantee. Chapter 4 studies the composite cost-coupled decentralized optimization in stochastic networks, for which existing algorithms do not guarantee linear convergence. We propose a new decentralized dual averaging (DDA) algorithm to solve this problem. Under a rather mild condition on stochastic networks, we show that the proposed DDA attains an $\mathcal{O}(1/t)$ rate of convergence in the general case and a global linear rate of convergence if each local objective function is strongly convex. Chapter 5 tackles the smooth cost-coupled decentralized constrained optimization problem. We leverage the extrapolation technique and the average consensus protocol to develop an accelerated DDA algorithm. The rate of convergence is proved to be $\mathcal{O}\left( \frac{1}{t^2}+ \frac{1}{t(1-\beta)^2} \right)$, where $\beta$ denotes the second largest singular value of the mixing matrix. To proactively reduce the utilization of network resources, a communication-efficient decentralized primal-dual algorithm is developed based on the event-triggered broadcasting strategy in Chapter 6. In this algorithm, each agent locally determines whether to generate network transmissions by comparing a pre-defined threshold with the deviation between the iterates at present and lastly broadcast. Provided that the threshold sequence is summable over time, we prove an $\mathcal{O}(1/t)$ rate of convergence for convex composite objectives. For strongly convex and smooth problems, linear convergence is guaranteed if the threshold sequence is diminishing geometrically. Finally, Chapter 7 provides some concluding remarks and research directions for future study. / Graduate Convex optimization Distributed optimization Multi-agent systems Dual averaging algorithms Distributed primal-dual algorithms
2	HIGH-DIMENSIONAL INFERENCE OVER NETWORKS: STATISTICAL AND COMPUTATIONAL GUARANTEES Yao Ji (19697335) 19 September 2024 (has links) <p dir="ltr">Distributed optimization problems defined over mesh networks are ubiquitous in signal processing, machine learning, and control. In contrast to centralized approaches where all information and computation resources are available at a centralized server, agents on a distributed system can only use locally available information. As a result, efforts have been put into the design of efficient distributed algorithms that take into account the communication constraints and make coordinated decisions in a fully distributed manner from a pure optimization perspective. Given the massive sample size and high-dimensionality generated by distributed systems such as social media, sensor networks, and cloud-based databases, it is essential to understand the statistical and computational guarantees of distributed algorithms to solve such high-dimensional problems over a mesh network.</p><p dir="ltr">A goal of this thesis is a first attempt at studying the behavior of distributed methods in the high-dimensional regime. It consists of two parts: (I) distributed LASSO and (II) distributed stochastic sparse recovery.</p><p dir="ltr">In Part (I), we start by studying linear regression from data distributed over a network of agents (with no master node) by means of LASSO estimation, in high-dimension, which allows the ambient dimension to grow faster than the sample size. While there is a vast literature of distributed algorithms applicable to the problem, statistical and computational guarantees of most of them remain unclear in high dimensions. This thesis provides a first statistical study of the Distributed Gradient Descent (DGD) in the Adapt-Then-Combine (ATC) form. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions--which hold with high probability for standard data generation models--suitable conditions on the network connectivity and algorithm tuning, DGD-ATC converges globally at a linear rate to an estimate that is within the centralized statistical precision of the model. In the worst-case scenario, the total number of communications to statistical optimality grows logarithmically with the ambient dimension, which improves on the communication complexity of DGD in the Combine-Then-Adapt (CTA) form, scaling linearly with the dimension. This reveals that mixing gradient information among agents, as DGD-ATC does, is critical in high-dimensions to obtain favorable rate scalings. </p><p dir="ltr">In Part (II), we focus on addressing the problem of distributed stochastic sparse recovery through stochastic optimization. We develop and analyze stochastic optimization algorithms for problems over a network, modeled as an undirected graph (with no centralized node), where the expected loss is strongly convex with respect to the Euclidean norm, and the optimum is sparse. Assuming agents only have access to unbiased estimates of the gradients of the underlying expected objective, and stochastic gradients are sub-Gaussian, we use distributed stochastic dual averaging (DSDA) as a building block to develop a fully decentralized restarting procedure for recovery of sparse solutions over a network. We show that with high probability, the iterates generated by all agents linearly converge to an approximate solution, eliminating fast the initial error; and then converge sublinearly to the exact sparse solution in the steady-state stages owing to observation noise. The algorithm asymptotically achieves the optimal convergence rate and favorable dimension dependence enjoyed by a non-Euclidean centralized scheme. Further, we precisely identify its non-asymptotic convergence rate as a function of characteristics of the objective functions and the network, and we characterize the transient time needed for the algorithm to approach the optimal rate of convergence. We illustrate the performance of the algorithm in application to classical problems of sparse linear regression, sparse logistic regression and low rank matrix recovery. Numerical experiments demonstrate the tightness of the theoretical results.</p> Industrial engineering Operations research distributed optimization penalization high-dimension statistics linear convergence sparse linear regression stochastic optimization algorithms distributed dual averaging multi-epoch algorithm
3	Stochastic approximation and least-squares regression, with applications to machine learning / Approximation stochastique et régression par moindres carrés : applications en apprentissage automatique Flammarion, Nicolas 24 July 2017 (has links) De multiples problèmes en apprentissage automatique consistent à minimiser une fonction lisse sur un espace euclidien. Pour l’apprentissage supervisé, cela inclut les régressions par moindres carrés et logistique. Si les problèmes de petite taille sont résolus efficacement avec de nombreux algorithmes d’optimisation, les problèmes de grande échelle nécessitent en revanche des méthodes du premier ordre issues de la descente de gradient. Dans ce manuscrit, nous considérons le cas particulier de la perte quadratique. Dans une première partie, nous nous proposons de la minimiser grâce à un oracle stochastique. Dans une seconde partie, nous considérons deux de ses applications à l’apprentissage automatique : au partitionnement de données et à l’estimation sous contrainte de forme. La première contribution est un cadre unifié pour l’optimisation de fonctions quadratiques non-fortement convexes. Celui-ci comprend la descente de gradient accélérée et la descente de gradient moyennée. Ce nouveau cadre suggère un algorithme alternatif qui combine les aspects positifs du moyennage et de l’accélération. La deuxième contribution est d’obtenir le taux optimal d’erreur de prédiction pour la régression par moindres carrés en fonction de la dépendance au bruit du problème et à l’oubli des conditions initiales. Notre nouvel algorithme est issu de la descente de gradient accélérée et moyennée. La troisième contribution traite de la minimisation de fonctions composites, somme de l’espérance de fonctions quadratiques et d’une régularisation convexe. Nous étendons les résultats existants pour les moindres carrés à toute régularisation et aux différentes géométries induites par une divergence de Bregman. Dans une quatrième contribution, nous considérons le problème du partitionnement discriminatif. Nous proposons sa première analyse théorique, une extension parcimonieuse, son extension au cas multi-labels et un nouvel algorithme ayant une meilleure complexité que les méthodes existantes. La dernière contribution de cette thèse considère le problème de la sériation. Nous adoptons une approche statistique où la matrice est observée avec du bruit et nous étudions les taux d’estimation minimax. Nous proposons aussi un estimateur computationellement efficace. / Many problems in machine learning are naturally cast as the minimization of a smooth function defined on a Euclidean space. For supervised learning, this includes least-squares regression and logistic regression. While small problems are efficiently solved by classical optimization algorithms, large-scale problems are typically solved with first-order techniques based on gradient descent. In this manuscript, we consider the particular case of the quadratic loss. In the first part, we are interestedin its minimization when its gradients are only accessible through a stochastic oracle. In the second part, we consider two applications of the quadratic loss in machine learning: clustering and estimation with shape constraints. In the first main contribution, we provided a unified framework for optimizing non-strongly convex quadratic functions, which encompasses accelerated gradient descent and averaged gradient descent. This new framework suggests an alternative algorithm that exhibits the positive behavior of both averaging and acceleration. The second main contribution aims at obtaining the optimal prediction error rates for least-squares regression, both in terms of dependence on the noise of the problem and of forgetting the initial conditions. Our new algorithm rests upon averaged accelerated gradient descent. The third main contribution deals with minimization of composite objective functions composed of the expectation of quadratic functions and a convex function. Weextend earlier results on least-squares regression to any regularizer and any geometry represented by a Bregman divergence. As a fourth contribution, we consider the the discriminative clustering framework. We propose its first theoretical analysis, a novel sparse extension, a natural extension for the multi-label scenario and an efficient iterative algorithm with better running-time complexity than existing methods. The fifth main contribution deals with the seriation problem. We propose a statistical approach to this problem where the matrix is observed with noise and study the corresponding minimax rate of estimation. We also suggest a computationally efficient estimator whose performance is studied both theoretically and experimentally. Optimisation convexe Accélération Moyennage Gradient stochastique Régression par moindres carrés Approximation stochastique Algorithme dual moyenné Descente miroire Partionnement discriminatif Relaxation convexe Parcimonie Sériation statistique Apprentissage de permutation Estimation minimax Contraintes de forme Convex optimization Acceleration Averaging Stochastic gradient Least-squares regression Stochastic approximation Dual averaging Mirror descent Discriminative clustering Convex relaxation Sparsity Statistical seriation Permutation learning Minimax estimation Shape constraints 519

1

Page generated in 0.0613 seconds