Global ETD Search

1	Automatic step-size adaptation in incremental supervised learning Mahmood, Ashique Unknown Date No description available. step size supervised learning stochastic gradient descent
2	Automatic step-size adaptation in incremental supervised learning Mahmood, Ashique 11 1900 (has links) Performance and stability of many iterative algorithms such as stochastic gradient descent largely depend on a fixed and scalar step-size parameter. Use of a fixed and scalar step-size value may lead to limited performance in many problems. We study several existing step-size adaptation algorithms in nonstationary, supervised learning problems using simulated and real-world data. We discover that effectiveness of the existing step-size adaptation algorithms requires tuning of a meta parameter across problems. We introduce a new algorithm - Autostep - by combining several new techniques with an existing algorithm, and demonstrate that it can effectively adapt a vector step-size parameter on all of our training and test problems without tuning its meta parameter across them. Autostep is the first step-size adaptation algorithm that can be used in widely different problems with the same setting of all of its parameters. step size supervised learning stochastic gradient descent
3	Stochastic, distributed and federated optimization for machine learning Konečný, Jakub January 2017 (has links) We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. Second, we study distributed setting, in which the data describing the optimization problem does not fit into a single computing node. In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck. We propose a communication-efficient framework which iteratively forms local subproblems that can be solved with arbitrary local optimization algorithms. Finally, we introduce the concept of Federated Optimization/Learning, where we try to solve the machine learning problems without having data stored in any centralized manner. The main motivation comes from industry when handling user-generated data. The current prevalent practice is that companies collect vast amounts of user data and store them in datacenters. An alternative we propose is not to collect the data in first place, and instead occasionally use the computational power of users' devices to solve the very same optimization problems, while alleviating privacy concerns at the same time. In such setting, minimization of communication rounds is the primary goal, and we demonstrate that solving the optimization problems in such circumstances is conceptually tractable.
4	Multiple Kernel Learning with Many Kernels Afkanpour, Arash Unknown Date No description available. Multiple Kernel Learning Stochastic Gradient Descent Greedy Coordinate Descent
5	Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters Hanzely, Filip 20 August 2020 (has links) Many key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms. With the increase of the volume of data and the size and complexity of the statistical models used to formulate these often ill-conditioned optimization tasks, there is a need for new efficient algorithms able to cope with these challenges. In this thesis, we deal with each of these sources of difficulty in a different way. To efficiently address the big data issue, we develop new methods which in each iteration examine a small random subset of the training data only. To handle the big model issue, we develop methods which in each iteration update a random subset of the model parameters only. Finally, to deal with ill-conditioned problems, we devise methods that incorporate either higher-order information or Nesterov’s acceleration/momentum. In all cases, randomness is viewed as a powerful algorithmic tool that we tune, both in theory and in experiments, to achieve the best results. Our algorithms have their primary application in training supervised machine learning models via regularized empirical risk minimization, which is the dominant paradigm for training such models. However, due to their generality, our methods can be applied in many other fields, including but not limited to data science, engineering, scientific computing, and statistics. optimization machine learning stochastic gradient variance reduction coordinate descent
6	Retrospective Approximation for Smooth Stochastic Optimization David T Newton (15369535) 30 April 2023 (has links) <p>Stochastic Gradient Descent (SGD) is a widely-used iterative algorithm for solving stochastic optimization problems for a smooth (and possibly non-convex) objective function via queries from a first-order stochastic oracle.</p> <p>In this dissertation, we critically examine SGD's choice of executing a single step as opposed to multiple steps between subsample updates. Our investigation leads naturally to generalizing SG into Retrospective Approximation (RA) where, during each iteration, a deterministic solver executes possibly multiple steps on a subsampled deterministic problem and stops when further solving is deemed unnecessary from the standpoint of statistical efficiency. RA thus leverages what is appealing for implementation -- during each iteration, a solver, e.g., L-BFGS with backtracking line search is used, as is, and the subsampled objected function is solved only to the extent necessary. We develop a complete theory using relative error of the observed gradients as the principal object, demonstrating that almost sure and L1 consistency of RA are preserved under especially weak conditions when sample sizes are increased at appropriate rates. We also characterize the iteration and oracle complexity (for linear and sub-linear solvers) of RA, and identify two practical termination criteria, one of which we show leads to optimal complexity rates. The message from extensive numerical experiments is that the ability of RA to incorporate existing second-order deterministic solvers in a strategic manner is useful both in terms of algorithmic trajectory as well as from the standpoint of dispensing with hyper-parameter tuning.</p> Optimisation Stochastic Optimization Machine Learning Stochastic Gradient Descent
7	Breeding white storks in former East Prussia : comparing predicted relative occurrences across scales and time using a stochastic gradient boosting method (TreeNet), GIS and public data Wickert, Claudia January 2007 (has links) In dieser Arbeit wurden verschiedene GIS-basierte Habitatmodelle für den Weißstorch (Ciconia ciconia) im Gebiet der ehemaligen deutschen Provinz Ostpreußen (ca. Gebiet der russischen Exklave Kaliningrad und der polnischen Woiwodschaft Ermland-Masuren) erstellt. Zur Charakterisierung der Beziehung zwischen dem Weißstorch und der Beschaffenheit seiner Umwelt wurden verschiedene historische Datensätze über den Bestand des Weißstorches in den 1930er Jahren sowie ausgewählte Variablen zur Habitat-Beschreibung genutzt. Die Aufbereitung und Modellierung der verwendeten Datensätze erfolgte mit Hilfe eines geographischen Informationssystems (ArcGIS) und einer statistisch-mathematischen Methode aus den Bereichen „Machine Learning“ und „Data-Mining“ (TreeNet, Salford Systems Ltd.). Unter Verwendung der historischen Habitat-Parameter sowie der Daten zum Vorkommen des Weißstorches wurden quantitative Modelle auf zwei Maßstabs-Ebenen erstellt: (i) auf Punktskala unter Verwendung eines Rasters mit einer Zellgröße von 1 km und (ii) auf Verwaltungs-Kreisebene basierend auf der Gliederung der Provinz Ostpreußen in ihre Landkreise. Die Auswertung der erstellten Modelle zeigt, dass das Vorkommen von Storchennestern im ehemaligen Ostpreußen, unter Berücksichtigung der hier verwendeten Variablen, maßgeblich durch die Variablen ‚forest’, ‚settlement area’, ‚pasture land’ und ‚coastline’ bestimmt wird. Folglich lässt sich davon ausgehen, dass eine gute Nahrungsverfügbarkeit, wie der Weißstorch sie auf Wiesen und Weiden findet, sowie die Nähe zu menschlichen Siedlungen ausschlaggebend für die Nistplatzwahl des Weißstorches in Ostpreußen sind. Geschlossene Waldgebiete zeigen sich in den Modellen als Standorte für Horste des Weißstorches ungeeignet. Der starke Einfluss der Variable ‚coastline’ lässt sich höchstwahrscheinlich durch die starke naturräumliche Gliederung Ostpreußens parallel zur Küstenlinie erklären. In einem zweiten Schritt konnte unter Verwendung der in dieser Arbeit erstellten Modelle auf beiden Skalen Vorhersagen für den Zeitraum 1981-1993 getroffen werden. Dabei wurde auf dem Punktmaßstab eine Abnahme an potentiellem Bruthabitat vorhergesagt. Im Gegensatz dazu steigt die vorhergesagte Weißstorchdichte unter Verwendung des Modells auf Verwaltungs-Kreisebene. Der Unterschied zwischen beiden Vorhersagen beruht vermutlich auf der Verwendung unterschiedlicher Skalen und von zum Teil voneinander verschiedenen erklärenden Variablen. Weiterführende Untersuchungen sind notwendig, um diesen Sachverhalt zu klären. Des Weiteren konnten die Modellvorhersagen für den Zeitraum 1981-1993 mit den vorliegenden Bestandserfassungen aus dieser Zeit deskriptiv verglichen werden. Es zeigt sich hierbei, dass die hier vorhergesagten Bestandszahlen höher sind als die in den Zählungen ermittelten. Die hier erstellten Modelle beschreiben somit vielmehr die Kapazität des Habitats. Andere Faktoren, die die Größe der Weißstorch-Population bestimmen, wie z.B. Bruterfolg oder Mortalität sollten in zukünftige Untersuchungen mit einbezogen werden. Es wurde ein möglicher Ansatz aufgezeigt, wie man mit den hier vorgestellten Methoden und unter Verwendung historischer Daten wertvolle Habitatmodelle erstellen sowie die Auswirkung von Landnutzungsänderungen auf den Weißstorch beurteilen kann. Die hier erstellten Modelle sind als erste Grundlage zu sehen und lassen sich mit Hilfe weitere Daten hinsichtlich Habitatstruktur und mit exakteren räumlich expliziten Angaben zu Neststandorten des Weißstorches weiter verfeinern. In einem weiteren Schritt sollte außerdem ein Habitatmodell für die heutige Zeit erstellt werden. Dadurch wäre ein besserer Vergleich möglich hinsichtlich erdenklicher Auswirkungen von Änderungen der Landnutzung und relevanten Umweltbedingungen auf den Weißstorch im Gebiet des ehemaligen Ostpreußens sowie in seinem gesamten Verbreitungsgebiet. / Different habitat models were created for the White Stork (Ciconia ciconia) in the region of the former German province of East Prussia (equals app. the current Russian oblast Kaliningrad and the Polish voivodship Warmia-Masuria). Different historical data sets describing the occurrence of the White Stork in the 1930s, as well as selected variables for the description of landscape and habitat, were employed. The processing and modeling of the applied data sets was done with a geographical information system (ArcGIS) and a statistical modeling approach that comes from the disciplines of machine-learning and data mining (TreeNet by Salford Systems Ltd.). Applying historical habitat descriptors, as well as data on the occurrence of the White Stork, models on two different scales were created: (i) a point scale model applying a raster with a cell size of 1 km2 and (ii) an administrative district scale model based on the organization of the former province of East Prussia. The evaluation of the created models show that the occurrence of White Stork nesting grounds in the former East Prussia for most parts is defined by the variables ‘forest’, ‘settlement area’, ‘pasture land’ and ‘proximity to coastline’. From this set of variables it can be assumed that a good food supply and nesting opportunities are provided to the White Stork in pasture and meadows as well as in the proximity to human settlements. These could be seen as crucial factors for the choice of nesting White Stork in East Prussia. Dense forest areas appear to be unsuited as nesting grounds of White Storks. The high influence of the variable ‘coastline’ is most likely explained by the specific landscape composition of East Prussia parallel to the coastline and is to be seen as a proximal factor for explaining the distribution of breeding White Storks. In a second step, predictions for the period of 1981 to 1993 could be made applying both scales of the models created in this study. In doing so, a decline of potential nesting habitat was predicted on the point scale. In contrast, the predicted White Stork occurrence increases when applying the model of the administrative district scale. The difference between both predictions is to be seen in the application of different scales (density versus suitability as breeding ground) and partly dissimilar explanatory variables. More studies are needed to investigate this phenomenon. The model predictions for the period 1981 to 1993 could be compared to the available inventories of that period. It shows that the figures predicted here were higher than the figures established by the census. This means that the models created here show rather a capacity of the habitat (potential niche). Other factors affecting the population size e.g. breeding success or mortality have to be investigated further. A feasible approach on how to generate possible habitat models was shown employing the methods presented here and applying historical data as well as assessing the effects of changes in land use on the White Stork. The models present the first of their kind, and could be improved by means of further data regarding the structure of the habitat and more exact spatially explicit information on the location of the nesting sites of the White Stork. In a further step, a habitat model of the present times should be created. This would allow for a more precise comparison regarding the findings from the changes of land use and relevant conditions of the environment on the White Stork in the region of former East Prussia, e.g. in the light of coming landscape changes brought by the European Union (EU). Weißstorch Ostpreußen Habitatmodell TreeNet stochastic gradient boosting white stork ciconia ciconia East Prussia predictive habitat model TreeNet stochastic gradient boosting Life sciences
8	Gradient Temporal-Difference Learning Algorithms Maei, Hamid Reza Unknown Date No description available. Reinforcement Learning Temporal-Difference learning Stochastic Gradient-Descent Value Function Approximation Policy Evaluation
9	First-order distributed optimization methods for machine learning with linear speed-up Spiridonoff, Artin 27 September 2021 (has links) This thesis considers the problem of average consensus, distributed centralized and decentralized Stochastic Gradient Descent (SGD) and their communication requirements. Namely, (i) an algorithm for achieving consensus among a collection of agents is studied and its convergence to the average is shown, in the presence of link failures and delays. The new results improve upon the prior works by relaxing some of the restrictive assumptions on communication, such as bounded link failures and intercommunication intervals, as well as allowing for message delays. Next, (ii) a Robust Asynchronous Stochastic Gradient Push (RASGP) algorithm is proposed to minimize the separable objective F(z) = 𝛴_{i=1}^n f_i(z) in a harsh network setting characterized by asynchronous updates, message losses and delays, and directed communication. RASGP is shown to asymptotically perform as well as the best bounds on a centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all local functions f_i(z). Next, (iii) a new communication strategy for Local SGD is proposed, a centralized optimization algorithm where workers make local updates and then calculate their average values only once in a while. It is shown that linear speed-up in the number of workers N is possible, using only O(N) communication (averaging) rounds, independent of the total number of iterations T. Empirical evidence suggests this bound is close to being tight as it is further shown that √N or N^{3/4} communications fail to achieve linear speed-up. Finally, (iv) under mild assumptions, the main of which is twice differentiability on any neighborhood of the optimal solution, one-shot averaging, which only uses a single round of communication, is shown to have optimal convergence rate asymptotically. Operations research Communication Local-SGD Optimization Speed-up Stochastic gradient descent
10	Non-convex Stochastic Optimization With Biased Gradient Estimators Sokolov, Igor 03 1900 (has links) Non-convex optimization problems appear in various applications of machine learning. Because of their practical importance, these problems gained a lot of attention in recent years, leading to the rapid development of new efﬁcient stochastic gradient-type methods. In the quest to improve the generalization performance of modern deep learning models, practitioners are resorting to using larger and larger datasets in the training process, naturally distributed across a number of edge devices. However, with the increase of trainable data, the computational costs of gradient-type methods increase signiﬁcantly. In addition, distributed methods almost invariably suffer from the so-called communication bottleneck: the cost of communication of the information necessary for the workers to jointly solve the problem is often very high, and it can be orders of magnitude higher than the cost of computation. This thesis provides a study of ﬁrst-order stochastic methods addressing these issues. In particular, we structure this study by considering certain classes of methods. That allowed us to understand current theoretical gaps, which we successfully ﬁlled by providing new efﬁcient algorithms. non-convex optimization stochastic gradient methods distributed learning communication compression error feedback variance reduction

Search results