Spelling suggestions: "subject:"stochastic 9gradient descent"" "subject:"stochastic 9gradient crescent""
21 |
Feasibility Study of Implementation of Machine Learning Models on Card Transactions / Genomförbarhetsstudie på Implementering av Maskininlärningsmodeller på KorttransaktionerAlzghaier, Samhar, Can Kaya, Mervan January 2022 (has links)
Several studies have been conducted within machine learning, and various variations have been applied to a wide spectrum of other fields. However, a thorough feasibility study within the payment processing industry using machine learning classifier algorithms is yet to be explored. Here, we construct a rule-based response vector and use that in combination with a magnitude of varying feature vectors across different machine learning classifier algorithms to try and determine whether individual transactions can be considered profitable from a business point of view. These algorithms include Naive-Bayes, AdaBoosting, Stochastic Gradient Descent, K-Nearest Neighbors, Decision Trees and Random Forests, all helped us build a model with a high performance that acts as a robust confirmation of both the benefits and a theoretical guide on the implementation of machine learning algorithms in the payment processing industry. The results as such are a firm confirmation on the benefits of data intensive models, even in complex industries similar to Swedbank Pay’s. These Implications help further boost innovation and revenue as they offer a better understanding of the current pricing mechanisms. / Många studier har utförts inom ämnet maskininlärning, och olika variationer har applicerats på ett brett spektrum av andra ämnen. Däremot, så har en ordentlig genomförbarhetsstudie inom betalningsleveransindustrin med hjälp av klassificeringsalgortimer har ännu ej utforskats. Här har vi konstruerat en regelbaserad responsvektor och använt den, tillsammans med en rad olika och varierande egenskapvektorer på olika maskininlärningsklassificeringsalgoritmer för att försöka avgöra ifall individuella transaktioner är lönsamma utifrån företagets perspektiv. Dessa algoritmer är Naive-Bayes, AdaBoosting, Stokastisk gradient medåkning, K- Närmaste grannar, beslutsträd och slumpmässiga beslutsskogar. Alla dessa har hjälpt oss bygga en teoretisk vägledning om implementering av maskininlärningsalgoritmer inom betalningsleveransindustrin. Dessa resultat är en robust bekräftelse på fördelarna av dataintensiva modeller även inom sådana komplexa industrier Swedbank Pay är verksamma inom. Implikationerna hjälper vidare att förstärka innovationen och öka intäkterna eftersom de erbjuder en bättre förståelse för deras nuvarande prissättningsmekanism.
|
22 |
ONLINE STATISTICAL INFERENCE FOR LOW-RANK REINFORCEMENT LEARNINGQiyu Han (18284758) 01 April 2024 (has links)
<p dir="ltr">We propose a fully online procedure to conduct statistical inference with adaptively collected data. The low-rank structure of the model parameter and the adaptivity nature of the data collection process make this task challenging: standard low-rank estimators are biased and cannot be obtained in a sequential manner while existing inference approaches in sequential decision-making algorithms fail to account for the low-rankness and are also biased. To tackle the challenges previously outlined, we first develop an online low-rank estimation process employing Stochastic Gradient Descent with noisy observations. Subsequently, to facilitate statistical inference using the online low-rank estimator, we introduced a novel online debiasing technique designed to address both sources of bias simultaneously. This method yields an unbiased estimator suitable for parameter inference. Finally, we developed an inferential framework capable of establishing an online estimator for performing inference on the optimal policy value. In theory, we establish the asymptotic normality of the proposed online debiased estimators and prove the validity of the constructed confidence intervals for both inference tasks. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its non-asymptotic convergence result, which is also of independent interest.</p>
|
23 |
Recommender System for Gym CustomersSundaramurthy, Roshni January 2020 (has links)
Recommender systems provide new opportunities for retrieving personalized information on the Internet. Due to the availability of big data, the fitness industries are now focusing on building an efficient recommender system for their end-users. This thesis investigates the possibilities of building an efficient recommender system for gym users. BRP Systems AB has provided the gym data for evaluation and it consists of approximately 896,000 customer interactions with 8 features. Four different matrix factorization methods, Latent semantic analysis using Singular value decomposition, Alternating least square, Bayesian personalized ranking, and Logistic matrix factorization that are based on implicit feedback are applied for the given data. These methods decompose the implicit data matrix of user-gym group activity interactions into the product of two lower-dimensional matrices. They are used to calculate the similarities between the user and activity interactions and based on the score, the top-k recommendations are provided. These methods are evaluated by the ranking metrics such as Precision@k, Mean average precision (MAP) @k, Area under the curve (AUC) score, and Normalized discounted cumulative gain (NDCG) @k. The qualitative analysis is also performed to evaluate the results of the recommendations. For this specific dataset, it is found that the optimal method is the Alternating least square method which achieved around 90\% AUC for the overall system and managed to give personalized recommendations to the users.
|
24 |
Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte CarloWei Deng (11804435) 18 December 2021 (has links)
<div>The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. However, non-convex Bayesian learning in real big data applications can be arbitrarily slow and often fails to capture the uncertainty or informative modes given a limited time. As a result, advanced techniques are still required.</div><div><br></div><div>In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which is a Markov jump process that proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the na\"ive extension of swaps to big data problems leads to a large bias, and the bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange and propose a generalized deterministic even-odd scheme to track the non-reversibility and obtain an optimal round trip rate. Further approximations are conducted based on stochastic gradient descents, which yield a user-friendly nature for large-scale uncertainty approximation tasks without much tuning costs. </div><div><br></div><div>In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved successes in bioinformatics and statistical physics, however, the lack of scalability has greatly limited their extensions to big data applications. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms based on stochastic gradient Langevin dynamics. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes. In addition, we also propose a pleasingly parallel version of such algorithms with interacting latent variables. We show that the interacting algorithm can be theoretically more efficient than the single-chain alternative with an equivalent computational budget.</div>
|
25 |
A deep learning theory for neural networks grounded in physicsScellier, Benjamin 12 1900 (has links)
Au cours de la dernière décennie, l'apprentissage profond est devenu une composante majeure de l'intelligence artificielle, ayant mené à une série d'avancées capitales dans une variété de domaines. L'un des piliers de l'apprentissage profond est l'optimisation de fonction de coût par l'algorithme du gradient stochastique (SGD). Traditionnellement en apprentissage profond, les réseaux de neurones sont des fonctions mathématiques différentiables, et les gradients requis pour l'algorithme SGD sont calculés par rétropropagation. Cependant, les architectures informatiques sur lesquelles ces réseaux de neurones sont implémentés et entraînés souffrent d’inefficacités en vitesse et en énergie, dues à la séparation de la mémoire et des calculs dans ces architectures. Pour résoudre ces problèmes, le neuromorphique vise à implementer les réseaux de neurones dans des architectures qui fusionnent mémoire et calculs, imitant plus fidèlement le cerveau. Dans cette thèse, nous soutenons que pour construire efficacement des réseaux de neurones dans des architectures neuromorphiques, il est nécessaire de repenser les algorithmes pour les implémenter et les entraîner. Nous présentons un cadre mathématique alternative, compatible lui aussi avec l’algorithme SGD, qui permet de concevoir des réseaux de neurones dans des substrats qui exploitent mieux les lois de la physique. Notre cadre mathématique s'applique à une très large classe de modèles, à savoir les systèmes dont l'état ou la dynamique sont décrits par des équations variationnelles. La procédure pour calculer les gradients de la fonction de coût dans de tels systèmes (qui dans de nombreux cas pratiques ne nécessite que de l'information locale pour chaque paramètre) est appelée “equilibrium propagation” (EqProp). Comme beaucoup de systèmes en physique et en ingénierie peuvent être décrits par des principes variationnels, notre cadre mathématique peut potentiellement s'appliquer à une grande variété de systèmes physiques, dont les applications vont au delà du neuromorphique et touchent divers champs d'ingénierie. / In the last decade, deep learning has become a major component of artificial intelligence, leading to a series of breakthroughs across a wide variety of domains. The workhorse of deep learning is the optimization of loss functions by stochastic gradient descent (SGD). Traditionally in deep learning, neural networks are differentiable mathematical functions, and the loss gradients required for SGD are computed with the backpropagation algorithm. However, the computer architectures on which these neural networks are implemented and trained suffer from speed and energy inefficiency issues, due to the separation of memory and processing in these architectures. To solve these problems, the field of neuromorphic computing aims at implementing neural networks on hardware architectures that merge memory and processing, just like brains do. In this thesis, we argue that building large, fast and efficient neural networks on neuromorphic architectures also requires rethinking the algorithms to implement and train them. We present an alternative mathematical framework, also compatible with SGD, which offers the possibility to design neural networks in substrates that directly exploit the laws of physics. Our framework applies to a very broad class of models, namely those whose state or dynamics are described by variational equations. This includes physical systems whose equilibrium state minimizes an energy function, and physical systems whose trajectory minimizes an action functional (principle of least action). We present a simple procedure to compute the loss gradients in such systems, called equilibrium propagation (EqProp), which requires solely locally available information for each trainable parameter. Since many models in physics and engineering can be described by variational principles, our framework has the potential to be applied to a broad variety of physical systems, whose applications extend to various fields of engineering, beyond neuromorphic computing.
|
Page generated in 0.1034 seconds