• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • Tagged with
  • 12
  • 12
  • 12
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Parallel and Decentralized Algorithms for Big-data Optimization over Networks

Amir Daneshmand (11153640) 22 July 2021 (has links)
<p>Recent decades have witnessed the rise of data deluge generated by heterogeneous sources, e.g., social networks, streaming, marketing services etc., which has naturally created a surge of interests in theory and applications of large-scale convex and non-convex optimization. For example, real-world instances of statistical learning problems such as deep learning, recommendation systems, etc. can generate sheer volumes of spatially/temporally diverse data (up to Petabytes of data in commercial applications) with millions of decision variables to be optimized. Such problems are often referred to as Big-data problems. Solving these problems by standard optimization methods demands intractable amount of centralized storage and computational resources which is infeasible and is the foremost purpose of parallel and decentralized algorithms developed in this thesis.</p><p><br></p><p>This thesis consists of two parts: (I) Distributed Nonconvex Optimization and (II) Distributed Convex Optimization.</p><p><br></p><p>In Part (I), we start by studying a winning paradigm in big-data optimization, Block Coordinate Descent (BCD) algorithm, which cease to be effective when problem dimensions grow overwhelmingly. In particular, we considered a general family of constrained non-convex composite large-scale problems defined on multicore computing machines equipped with shared memory. We design a hybrid deterministic/random parallel algorithm to efficiently solve such problems combining synergically Successive Convex Approximation (SCA) with greedy/random dimensionality reduction techniques. We provide theoretical and empirical results showing efficacy of the proposed scheme in face of huge-scale problems. The next step is to broaden the network setting to general mesh networks modeled as directed graphs, and propose a class of gradient-tracking based algorithms with global convergence guarantees to critical points of the problem. We further explore the geometry of the landscape of the non-convex problems to establish second-order guarantees and strengthen our convergence to local optimal solutions results to global optimal solutions for a wide range of Machine Learning problems.</p><p><br></p><p>In Part (II), we focus on a family of distributed convex optimization problems defined over meshed networks. Relevant state-of-the-art algorithms often consider limited problem settings with pessimistic communication complexities with respect to the complexity of their centralized variants, which raises an important question: can one achieve the rate of centralized first-order methods over networks, and moreover, can one improve upon their communication costs by using higher-order local solvers? To answer these questions, we proposed an algorithm that utilizes surrogate objective functions in local solvers (hence going beyond first-order realms, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of the central objective function. The algorithm is proved to match the convergence rate of its centralized counterparts, up to multiplying network factors. When considering in particular, Empirical Risk Minimization (ERM) problems with statistically homogeneous data across the agents, our algorithm employing high-order surrogates provably achieves faster rates than what is achievable by first-order methods. Such improvements are made without exchanging any Hessian matrices over the network. </p><p><br></p><p>Finally, we focus on the ill-conditioning issue impacting the efficiency of decentralized first-order methods over networks which rendered them impractical both in terms of computation and communication cost. A natural solution is to develop distributed second-order methods, but their requisite for Hessian information incurs substantial communication overheads on the network. To work around such exorbitant communication costs, we propose a “statistically informed” preconditioned cubic regularized Newton method which provably improves upon the rates of first-order methods. The proposed scheme does not require communication of Hessian information in the network, and yet, achieves the iteration complexity of centralized second-order methods up to the statistical precision. In addition, (second-order) approximate nature of the utilized surrogate functions, improves upon the per-iteration computational cost of our earlier proposed scheme in this setting.</p>
12

DISTRIBUTED MACHINE LEARNING OVER LARGE-SCALE NETWORKS

Frank Lin (16553082) 18 July 2023 (has links)
<p>The swift emergence and wide-ranging utilization of machine learning (ML) across various industries, including healthcare, transportation, and robotics, have underscored the escalating need for efficient, scalable, and privacy-preserving solutions. Recognizing this, we present an integrated examination of three novel frameworks, each addressing different aspects of distributed learning and privacy issues: Two Timescale Hybrid Federated Learning (TT-HF), Delay-Aware Federated Learning (DFL), and Differential Privacy Hierarchical Federated Learning (DP-HFL). TT-HF introduces a semi-decentralized architecture that combines device-to-server and device-to-device (D2D) communications. Devices execute multiple stochastic gradient descent iterations on their datasets and sporadically synchronize model parameters via D2D communications. A unique adaptive control algorithm optimizes step size, D2D communication rounds, and global aggregation period to minimize network resource utilization and achieve a sublinear convergence rate. TT-HF outperforms conventional FL approaches in terms of model accuracy, energy consumption, and resilience against outages. DFL focuses on enhancing distributed ML training efficiency by accounting for communication delays between edge and cloud. It also uses multiple stochastic gradient descent iterations and periodically consolidates model parameters via edge servers. The adaptive control algorithm for DFL mitigates energy consumption and edge-to-cloud latency, resulting in faster global model convergence, reduced resource consumption, and robustness against delays. Lastly, DP-HFL is introduced to combat privacy vulnerabilities in FL. Merging the benefits of FL and Hierarchical Differential Privacy (HDP), DP-HFL significantly reduces the need for differential privacy noise while maintaining model performance, exhibiting an optimal privacy-performance trade-off. Theoretical analysis under both convex and nonconvex loss functions confirms DP-HFL’s effectiveness regarding convergence speed, privacy performance trade-off, and potential performance enhancement with appropriate network configuration. In sum, the study thoroughly explores TT-HF, DFL, and DP-HFL, and their unique solutions to distributed learning challenges such as efficiency, latency, and privacy concerns. These advanced FL frameworks have considerable potential to further enable effective, efficient, and secure distributed learning.</p>

Page generated in 0.1303 seconds