Global ETD Search

1	Automatic step-size adaptation in incremental supervised learning Mahmood, Ashique Unknown Date No description available. step size supervised learning stochastic gradient descent
2	Automatic step-size adaptation in incremental supervised learning Mahmood, Ashique 11 1900 (has links) Performance and stability of many iterative algorithms such as stochastic gradient descent largely depend on a fixed and scalar step-size parameter. Use of a fixed and scalar step-size value may lead to limited performance in many problems. We study several existing step-size adaptation algorithms in nonstationary, supervised learning problems using simulated and real-world data. We discover that effectiveness of the existing step-size adaptation algorithms requires tuning of a meta parameter across problems. We introduce a new algorithm - Autostep - by combining several new techniques with an existing algorithm, and demonstrate that it can effectively adapt a vector step-size parameter on all of our training and test problems without tuning its meta parameter across them. Autostep is the first step-size adaptation algorithm that can be used in widely different problems with the same setting of all of its parameters. step size supervised learning stochastic gradient descent
3	Stochastic, distributed and federated optimization for machine learning Konečný, Jakub January 2017 (has links) We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. Second, we study distributed setting, in which the data describing the optimization problem does not fit into a single computing node. In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck. We propose a communication-efficient framework which iteratively forms local subproblems that can be solved with arbitrary local optimization algorithms. Finally, we introduce the concept of Federated Optimization/Learning, where we try to solve the machine learning problems without having data stored in any centralized manner. The main motivation comes from industry when handling user-generated data. The current prevalent practice is that companies collect vast amounts of user data and store them in datacenters. An alternative we propose is not to collect the data in first place, and instead occasionally use the computational power of users' devices to solve the very same optimization problems, while alleviating privacy concerns at the same time. In such setting, minimization of communication rounds is the primary goal, and we demonstrate that solving the optimization problems in such circumstances is conceptually tractable.
4	Multiple Kernel Learning with Many Kernels Afkanpour, Arash Unknown Date No description available. Multiple Kernel Learning Stochastic Gradient Descent Greedy Coordinate Descent
5	Retrospective Approximation for Smooth Stochastic Optimization David T Newton (15369535) 30 April 2023 (has links) <p>Stochastic Gradient Descent (SGD) is a widely-used iterative algorithm for solving stochastic optimization problems for a smooth (and possibly non-convex) objective function via queries from a first-order stochastic oracle.</p> <p>In this dissertation, we critically examine SGD's choice of executing a single step as opposed to multiple steps between subsample updates. Our investigation leads naturally to generalizing SG into Retrospective Approximation (RA) where, during each iteration, a deterministic solver executes possibly multiple steps on a subsampled deterministic problem and stops when further solving is deemed unnecessary from the standpoint of statistical efficiency. RA thus leverages what is appealing for implementation -- during each iteration, a solver, e.g., L-BFGS with backtracking line search is used, as is, and the subsampled objected function is solved only to the extent necessary. We develop a complete theory using relative error of the observed gradients as the principal object, demonstrating that almost sure and L1 consistency of RA are preserved under especially weak conditions when sample sizes are increased at appropriate rates. We also characterize the iteration and oracle complexity (for linear and sub-linear solvers) of RA, and identify two practical termination criteria, one of which we show leads to optimal complexity rates. The message from extensive numerical experiments is that the ability of RA to incorporate existing second-order deterministic solvers in a strategic manner is useful both in terms of algorithmic trajectory as well as from the standpoint of dispensing with hyper-parameter tuning.</p> Optimisation Stochastic Optimization Machine Learning Stochastic Gradient Descent
6	Gradient Temporal-Difference Learning Algorithms Maei, Hamid Reza Unknown Date No description available. Reinforcement Learning Temporal-Difference learning Stochastic Gradient-Descent Value Function Approximation Policy Evaluation
7	First-order distributed optimization methods for machine learning with linear speed-up Spiridonoff, Artin 27 September 2021 (has links) This thesis considers the problem of average consensus, distributed centralized and decentralized Stochastic Gradient Descent (SGD) and their communication requirements. Namely, (i) an algorithm for achieving consensus among a collection of agents is studied and its convergence to the average is shown, in the presence of link failures and delays. The new results improve upon the prior works by relaxing some of the restrictive assumptions on communication, such as bounded link failures and intercommunication intervals, as well as allowing for message delays. Next, (ii) a Robust Asynchronous Stochastic Gradient Push (RASGP) algorithm is proposed to minimize the separable objective F(z) = 𝛴_{i=1}^n f_i(z) in a harsh network setting characterized by asynchronous updates, message losses and delays, and directed communication. RASGP is shown to asymptotically perform as well as the best bounds on a centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all local functions f_i(z). Next, (iii) a new communication strategy for Local SGD is proposed, a centralized optimization algorithm where workers make local updates and then calculate their average values only once in a while. It is shown that linear speed-up in the number of workers N is possible, using only O(N) communication (averaging) rounds, independent of the total number of iterations T. Empirical evidence suggests this bound is close to being tight as it is further shown that √N or N^{3/4} communications fail to achieve linear speed-up. Finally, (iv) under mild assumptions, the main of which is twice differentiability on any neighborhood of the optimal solution, one-shot averaging, which only uses a single round of communication, is shown to have optimal convergence rate asymptotically. Operations research Communication Local-SGD Optimization Speed-up Stochastic gradient descent
8	On the Modelling of Stochastic Gradient Descent with Stochastic Differential Equations Leino, Martin January 2023 (has links) Stochastic gradient descent (SGD) is arguably the most important algorithm used in optimization problems for large-scale machine learning. Its behaviour has been studied extensively from the viewpoint of mathematical analysis and probability theory; it is widely held that in the limit where the learning rate in the algorithm tends to zero, a specific stochastic differential equation becomes an adequate model of the dynamics of the algorithm. This study exhibits some of the research in this field by analyzing the application of a recently proven theorem to the problem of tensor principal component analysis. The results, originally discovered in an article by Gérard Ben Arous, Reza Gheissari and Aukosh Jagannath from 2022, illustrate how the phase diagram of functions of SGD differ in the high-dimensional regime from that of the classical fixed-dimensional setting. stochastic gradient descent stochastic differential equations statistical machine learning Other Mathematics Annan matematik
9	Back propagation control of model-based multi-layer adaptive filters for optical communication systems / 光通信のためのモデルベース適応多層フィルタの誤差逆伝播による制御 Arikawa, Manabu 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24937号 / 情博第848号 / 新制\|\|情\|\|142(附属図書館) / 京都大学大学院情報学研究科先端数理科学専攻 / (主査)教授林和則, 教授青柳富誌生, 准教授寺前順之介 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Optical fiber communication Digital signal processing Adaptive filters Back propagation Stochastic gradient descent 007
10	Latent Factor Models for Recommender Systems and Market Segmentation Through Clustering Zeng, Jingying 29 August 2017 (has links) No description available. Statistics Computer Science

Search results