Non-convex optimization problems appear in various applications of machine learning. Because of their practical importance, these problems gained a lot of attention in recent years, leading to the rapid development of new efficient stochastic gradient-type methods. In the quest to improve the generalization performance of modern deep learning models, practitioners are resorting to using larger and larger datasets in the training process, naturally distributed across a number of edge devices. However, with the increase of trainable data, the computational costs of gradient-type methods increase significantly. In addition, distributed methods almost invariably suffer from the so-called communication bottleneck: the cost of communication of the information necessary for the workers to jointly solve the problem is often very high, and it can be orders of magnitude higher than the cost of computation. This thesis provides a study of first-order stochastic methods addressing these issues. In particular, we structure this study by considering certain classes of methods. That allowed us to understand current theoretical gaps, which we successfully filled by providing new efficient algorithms.
Identifer | oai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/676647 |
Date | 03 1900 |
Creators | Sokolov, Igor |
Contributors | Richtarik, Peter, Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Wang, Di, Moshkov, Mikhail |
Source Sets | King Abdullah University of Science and Technology |
Language | English |
Detected Language | English |
Type | Thesis |
Rights | 2024-04-27, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2024-04-27. |
Page generated in 0.2366 seconds