Global ETD Search

1	New Optimization Methods for Modern Machine Learning Reddi, Sashank Jakkam 01 July 2017 (has links) Modern machine learning systems pose several new statistical, scalability, privacy and ethical challenges. With the advent of massive datasets and increasingly complex tasks, scalability has especially become a critical issue in these systems. In this thesis, we focus on fundamental challenges related to scalability, such as computational and communication efficiency, in modern machine learning applications. The underlying central message of this thesis is that classical statistical thinking leads to highly effective optimization methods for modern big data applications. The first part of the thesis investigates optimization methods for solving large-scale nonconvex Empirical Risk Minimization (ERM) problems. Such problems have surged into prominence, notably through deep learning, and have led to exciting progress. However, our understanding of optimization methods suitable for these problems is still very limited. We develop and analyze a new line of optimization methods for nonconvex ERM problems, based on the principle of variance reduction. We show that our methods exhibit fast convergence to stationary points and improve the state-of-the-art in several nonconvex ERM settings, including nonsmooth and constrained ERM. Using similar principles, we also develop novel optimization methods that provably converge to second-order stationary points. Finally, we show that the key principles behind our methods can be generalized to overcome challenges in other important problems such as Bayesian inference. The second part of the thesis studies two critical aspects of modern distributed machine learning systems — asynchronicity and communication efficiency of optimization methods. We study various asynchronous stochastic algorithms with fast convergence for convex ERM problems and show that these methods achieve near-linear speedups in sparse settings common to machine learning. Another key factor governing the overall performance of a distributed system is its communication efficiency. Traditional optimization algorithms used in machine learning are often ill-suited for distributed environments with high communication cost. To address this issue, we dis- cuss two different paradigms to achieve communication efficiency of algorithms in distributed environments and explore new algorithms with better communication complexity. Machine Learning Optimization Large-scale Distributed optimization Communication-efficient Finite-sum
2	First-Order Algorithms for Communication Efficient Distributed Learning Khirirat, Sarit January 2019 (has links) Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms. In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression. In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error. Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results. / Utvecklandet av kommunikationsteknologi och datalagring har gjort stora mängder av datasamlingar mer tillgängliga än någonsin. Denna förändring leder till numeriska optimeringsproblem med datamängder med stor skala i volym och dimension. Som svar på denna trend har populariteten för högpresterande beräkningsarkitekturer ökat mer än någonsin tidigare. Skalbara optimeringsverktyg kan uppnå hög effektivitet genom att fördela beräkningsbördan mellan ett flertal maskiner. De kommer dock i praktiken med ett pris som utgörs av betydande kommunikationsomkostnader. Detta orsakar ett skifte i flaskhalsen för prestandan från beräkningar till kommunikation. När lösning av verkliga optimeringsproblem sker med ett stort antal parametrar, dominerar kommunikationen mellan maskiner nästan 80% av träningstiden. För att minska kommunikationsbelastningen, har ett flertal kompressionstekniker föreslagits i litteraturen. Även om optimeringsalgoritmer som använder dessa kompressorer rapporteras vara lika konkurrenskraftiga som sina motsvarigheter med full precision, dras de med en förlust av noggrannhet. För att ge en uppfattning om detta, utvecklar vi i denna avhandling teori och tekniker för att designa kommunikations-effektiva optimeringsalgoritmer som endast använder information med låg precision. I den första delen analyserar vi konvergensen hos optimeringsalgoritmer med direkt kompression. Först ger vi en översikt av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan presenterar vi ett enhetligt analysramverk för optimeringsalgoritmer med kompressorer, som kan vara antingen deterministiska eller randomiserade. I synnerhet visas val av parametrar i komprimerade optimeringsalgoritmer som avgörs av kompressorns parametrar som garanterar konvergens. Våra konvergensgarantier visar beroende av kompressorns noggrannhet och fördröjningseffekter på grund av asynkronicitet hos algoritmer. Detta låter oss karakterisera avvägningen mellan iterations- och kommunikations-komplexitet när kompression används. I den andra delen studerarvi hög prestanda hos felkompenseringsmetoder för komprimerade optimeringsalgoritmer. Även om konvergensgarantier med felkompensering har etablerats finns det väldigt begränsat teoretiskt stöd för konkurrenskraftiga konvergensgarantier med felkompensering. Vi utvecklar därför teoretiska förklaringar, som visar att användande av felkompensering garanterar godtyckligt hög lösningsnoggrannhet från komprimerad information. I synnerhet bidrar felkompensering till att ta bort ackumulerade kompressionsfel och förbättrar därmed lösningsnoggrannheten speciellt för illa konditionerade kvadratiska optimeringsproblem. Vi presenterar också stark konvergensanalys för felkompensering tillämpat på stokastiska gradientmetoder med ett kommunikationsnätverk innehållande ett flertal maskiner. De felkompenserade algoritmerna resulterar, i motsats till direkt kompression, i betydande reducering av kompressionsfelet. Simuleringar av algoritmer i denna avhandling på verkligaproblem med referensdatamängder validerar våra teoretiska resultat. / <p>QC20191120</p> Communication efficient learning Optimization algorithms Quantization Error compensation First-order algorithms Stochastic gradient descent Control Engineering Reglerteknik
3	Learning optimizers for communication-efficient distributed learning Joseph, Charles-Étienne 07 1900 (has links) Ce mémoire propose d'utiliser des optimiseurs appris, soit une approche tirée du méta-apprentissage, pour améliorer l'optimisation distribuée. Nous présentons deux architectures d'optimiseurs appris et nous montrons qu'elles sont plus performantes que les référentiels de l'état de l'art tout en généralisant aux ensembles de données et aux architectures inconnues. Nous établissons ainsi l'optimisation apprise comme une direction prometteuse pour l'apprentissage distribué efficace en termes de communication. Nous explorons également l'application des optimiseurs appris à l'apprentissage fédéré, une technique visant à la vie privée où les données restent sur les appareils individuels. Nos résultats démontrent que les optimiseurs appris obtiennent de bonnes performances dans des contextes d'apprentissage fédéré, entre autres avec une distribution hétérogène des données entre les clients. Enfin, ce mémoire étudie la combinaison des optimiseurs appris avec la parcimonification des gradients, une technique qui réduit la communication en ne transmettant qu'un sous-ensemble de gradients. Nos résultats montrent que les optimiseurs appris peuvent effectivement tirer parti de la parcimonie pour améliorer l'efficacité de la communication. Dans l'ensemble, ce mémoire démontre l'efficacité des optimiseurs appris pour l'apprentissage distribué efficace en termes de communication. Nous ouvrons également la voie à une exploration plus poussée de la combinaison des optimiseurs appris avec d'autres techniques visant l'efficacité en termes de communication. / This thesis proposes the use of learned optimizers, a meta-learning approach, to improve distributed optimization. We present two learned optimizer architectures and show that they outperform state-of-the-art baselines while generalizing to unknown datasets and architectures. We thus establish learned optimization as a promising direction for communication-efficient distributed learning. We also explore the application of learned optimizers to federated learning, a privacy-oriented setting where data remains on individual devices. Our results show that learned optimizers perform well in federated learning contexts, including for setups with heterogeneous data distribution among clients. Finally, this thesis investigates the combination of learned optimizers with gradient sparsification, a technique that reduces communication by transmitting only a subset of gradients. Our results show that learned optimizers can indeed take advantage of sparsification to improve communication efficiency. Overall, this thesis demonstrates the effectiveness of learned optimizers for communication-efficient distributed learning. We also pave the way for further exploration of learned optimizers combined with other techniques targeting communication efficiency. Optimisation apprise Apprentissage fédéré Méta-apprentissage Learned optimization Federated learning Meta-learning

1

Page generated in 0.1166 seconds