• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

New Optimization Methods for Modern Machine Learning

Reddi, Sashank Jakkam 01 July 2017 (has links)
Modern machine learning systems pose several new statistical, scalability, privacy and ethical challenges. With the advent of massive datasets and increasingly complex tasks, scalability has especially become a critical issue in these systems. In this thesis, we focus on fundamental challenges related to scalability, such as computational and communication efficiency, in modern machine learning applications. The underlying central message of this thesis is that classical statistical thinking leads to highly effective optimization methods for modern big data applications. The first part of the thesis investigates optimization methods for solving large-scale nonconvex Empirical Risk Minimization (ERM) problems. Such problems have surged into prominence, notably through deep learning, and have led to exciting progress. However, our understanding of optimization methods suitable for these problems is still very limited. We develop and analyze a new line of optimization methods for nonconvex ERM problems, based on the principle of variance reduction. We show that our methods exhibit fast convergence to stationary points and improve the state-of-the-art in several nonconvex ERM settings, including nonsmooth and constrained ERM. Using similar principles, we also develop novel optimization methods that provably converge to second-order stationary points. Finally, we show that the key principles behind our methods can be generalized to overcome challenges in other important problems such as Bayesian inference. The second part of the thesis studies two critical aspects of modern distributed machine learning systems — asynchronicity and communication efficiency of optimization methods. We study various asynchronous stochastic algorithms with fast convergence for convex ERM problems and show that these methods achieve near-linear speedups in sparse settings common to machine learning. Another key factor governing the overall performance of a distributed system is its communication efficiency. Traditional optimization algorithms used in machine learning are often ill-suited for distributed environments with high communication cost. To address this issue, we dis- cuss two different paradigms to achieve communication efficiency of algorithms in distributed environments and explore new algorithms with better communication complexity.
2

First-Order Algorithms for Communication Efficient Distributed Learning

Khirirat, Sarit January 2019 (has links)
Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms. In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression. In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error. Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results. / Utvecklandet av kommunikationsteknologi och datalagring har gjort stora mängder av datasamlingar mer tillgängliga än någonsin. Denna förändring leder till numeriska optimeringsproblem med datamängder med stor skala i volym och dimension. Som svar på denna trend har populariteten för högpresterande beräkningsarkitekturer ökat mer än någonsin tidigare. Skalbara optimeringsverktyg kan uppnå hög effektivitet genom att fördela beräkningsbördan mellan ett flertal maskiner. De kommer dock i praktiken med ett pris som utgörs av betydande kommunikationsomkostnader. Detta orsakar ett skifte i flaskhalsen för prestandan från beräkningar till kommunikation. När lösning av verkliga optimeringsproblem sker med ett stort antal parametrar, dominerar kommunikationen mellan maskiner nästan 80% av träningstiden. För att minska kommunikationsbelastningen, har ett flertal kompressionstekniker föreslagits i litteraturen. Även om optimeringsalgoritmer som använder dessa kompressorer rapporteras vara lika konkurrenskraftiga som sina motsvarigheter med full precision, dras de med en förlust av noggrannhet. För att ge en uppfattning om detta, utvecklar vi i denna avhandling teori och tekniker för att designa kommunikations-effektiva optimeringsalgoritmer som endast använder information med låg precision. I den första delen analyserar vi konvergensen hos optimeringsalgoritmer med direkt kompression. Först ger vi en översikt av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan presenterar vi ett enhetligt analysramverk för optimeringsalgoritmer med kompressorer, som kan vara antingen deterministiska eller randomiserade. I synnerhet visas val av parametrar i komprimerade optimeringsalgoritmer som avgörs av kompressorns parametrar som garanterar konvergens. Våra konvergensgarantier visar beroende av kompressorns noggrannhet och fördröjningseffekter på grund av asynkronicitet hos algoritmer. Detta låter oss karakterisera avvägningen mellan iterations- och kommunikations-komplexitet när kompression används. I den andra delen studerarvi hög prestanda hos felkompenseringsmetoder för komprimerade optimeringsalgoritmer. Även om konvergensgarantier med felkompensering har etablerats finns det väldigt begränsat teoretiskt stöd för konkurrenskraftiga konvergensgarantier med felkompensering. Vi utvecklar därför teoretiska förklaringar, som visar att användande av felkompensering garanterar godtyckligt hög lösningsnoggrannhet från komprimerad information. I synnerhet bidrar felkompensering till att ta bort ackumulerade kompressionsfel och förbättrar därmed lösningsnoggrannheten speciellt för illa konditionerade kvadratiska optimeringsproblem. Vi presenterar också stark konvergensanalys för felkompensering tillämpat på stokastiska gradientmetoder med ett kommunikationsnätverk innehållande ett flertal maskiner. De felkompenserade algoritmerna resulterar, i motsats till direkt kompression, i betydande reducering av kompressionsfelet. Simuleringar av algoritmer i denna avhandling på verkligaproblem med referensdatamängder validerar våra teoretiska resultat. / <p>QC20191120</p>

Page generated in 0.08 seconds