Global ETD Search

11	On the Modelling of Stochastic Gradient Descent with Stochastic Differential Equations Leino, Martin January 2023 (has links) Stochastic gradient descent (SGD) is arguably the most important algorithm used in optimization problems for large-scale machine learning. Its behaviour has been studied extensively from the viewpoint of mathematical analysis and probability theory; it is widely held that in the limit where the learning rate in the algorithm tends to zero, a specific stochastic differential equation becomes an adequate model of the dynamics of the algorithm. This study exhibits some of the research in this field by analyzing the application of a recently proven theorem to the problem of tensor principal component analysis. The results, originally discovered in an article by Gérard Ben Arous, Reza Gheissari and Aukosh Jagannath from 2022, illustrate how the phase diagram of functions of SGD differ in the high-dimensional regime from that of the classical fixed-dimensional setting. stochastic gradient descent stochastic differential equations statistical machine learning Other Mathematics Annan matematik
12	Back propagation control of model-based multi-layer adaptive filters for optical communication systems / 光通信のためのモデルベース適応多層フィルタの誤差逆伝播による制御 Arikawa, Manabu 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24937号 / 情博第848号 / 新制\|\|情\|\|142(附属図書館) / 京都大学大学院情報学研究科先端数理科学専攻 / (主査)教授林和則, 教授青柳富誌生, 准教授寺前順之介 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Optical fiber communication Digital signal processing Adaptive filters Back propagation Stochastic gradient descent 007
13	Latent Factor Models for Recommender Systems and Market Segmentation Through Clustering Zeng, Jingying 29 August 2017 (has links) No description available. Statistics Computer Science
14	A Study of the Loss Landscape and Metastability in Graph Convolutional Neural Networks / En studie av lösningslandskapet och metastabilitet i grafiska faltningsnätverk Larsson, Sofia January 2020 (has links) Many novel graph neural network models have reported an impressive performance on benchmark dataset, but the theory behind these networks is still being developed. In this thesis, we study the trajectory of Gradient descent (GD) and Stochastic gradient descent (SGD) in the loss landscape of Graph neural networks by replicating Xing et al. [1] study for feed-forward networks. Furthermore, we empirically examine if the training process could be accelerated by an optimization algorithm inspired from Stochastic gradient Langevin dynamics and what effect the topology of the graph has on the convergence of GD by perturbing its structure. We find that the loss landscape is relatively flat and that SGD does not encounter any significant obstacles during its propagation. The noise-induced gradient appears to aid SGD in finding a stationary point with desirable generalisation capabilities when the learning rate is poorly optimized. Additionally, we observe that the topological structure of the graph plays a part in the convergence of GD but further research is required to understand how. / Många nya grafneurala nätverk har visat imponerande resultat på existerande dataset, dock är teorin bakom dessa nätverk fortfarande under utveckling. I denna uppsats studerar vi banor av gradientmetoden (GD) och den stokastiska gradientmetoden (SGD) i lösningslandskapet till grafiska faltningsnätverk genom att replikera studien av feed-forward nätverk av Xing et al. [1]. Dessutom undersöker vi empiriskt om träningsprocessen kan accelereras genom en optimeringsalgoritm inspirerad av Stokastisk gradient Langevin dynamik, samt om grafens topologi har en inverkan på konvergensen av GD genom att ändra strukturen. Vi ser att lösningslandskapet är relativt plant och att bruset inducerat i gradienten verkar hjälpa SGD att finna stabila stationära punkter med önskvärda generaliseringsegenskaper när inlärningsparametern har blivit olämpligt optimerad. Dessutom observerar vi att den topologiska grafstrukturen påverkar konvergensen av GD, men det behövs mer forskning för att förstå hur. Graph neural networks Graph convolutional neural networks Loss landscape Gradient descent Stochastic gradient descent Stochastic gradient Langevin dynamics Grafneurala nätverk grafiska faltningsnätverk lösningslandskap gradientmetoder stokastiska gradientmetoder stokastisk gradient Langevin dynamik Probability Theory and Statistics Sannolikhetsteori och statistik
15	First-Order Algorithms for Communication Efficient Distributed Learning Khirirat, Sarit January 2019 (has links) Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms. In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression. In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error. Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results. / Utvecklandet av kommunikationsteknologi och datalagring har gjort stora mängder av datasamlingar mer tillgängliga än någonsin. Denna förändring leder till numeriska optimeringsproblem med datamängder med stor skala i volym och dimension. Som svar på denna trend har populariteten för högpresterande beräkningsarkitekturer ökat mer än någonsin tidigare. Skalbara optimeringsverktyg kan uppnå hög effektivitet genom att fördela beräkningsbördan mellan ett flertal maskiner. De kommer dock i praktiken med ett pris som utgörs av betydande kommunikationsomkostnader. Detta orsakar ett skifte i flaskhalsen för prestandan från beräkningar till kommunikation. När lösning av verkliga optimeringsproblem sker med ett stort antal parametrar, dominerar kommunikationen mellan maskiner nästan 80% av träningstiden. För att minska kommunikationsbelastningen, har ett flertal kompressionstekniker föreslagits i litteraturen. Även om optimeringsalgoritmer som använder dessa kompressorer rapporteras vara lika konkurrenskraftiga som sina motsvarigheter med full precision, dras de med en förlust av noggrannhet. För att ge en uppfattning om detta, utvecklar vi i denna avhandling teori och tekniker för att designa kommunikations-effektiva optimeringsalgoritmer som endast använder information med låg precision. I den första delen analyserar vi konvergensen hos optimeringsalgoritmer med direkt kompression. Först ger vi en översikt av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan presenterar vi ett enhetligt analysramverk för optimeringsalgoritmer med kompressorer, som kan vara antingen deterministiska eller randomiserade. I synnerhet visas val av parametrar i komprimerade optimeringsalgoritmer som avgörs av kompressorns parametrar som garanterar konvergens. Våra konvergensgarantier visar beroende av kompressorns noggrannhet och fördröjningseffekter på grund av asynkronicitet hos algoritmer. Detta låter oss karakterisera avvägningen mellan iterations- och kommunikations-komplexitet när kompression används. I den andra delen studerarvi hög prestanda hos felkompenseringsmetoder för komprimerade optimeringsalgoritmer. Även om konvergensgarantier med felkompensering har etablerats finns det väldigt begränsat teoretiskt stöd för konkurrenskraftiga konvergensgarantier med felkompensering. Vi utvecklar därför teoretiska förklaringar, som visar att användande av felkompensering garanterar godtyckligt hög lösningsnoggrannhet från komprimerad information. I synnerhet bidrar felkompensering till att ta bort ackumulerade kompressionsfel och förbättrar därmed lösningsnoggrannheten speciellt för illa konditionerade kvadratiska optimeringsproblem. Vi presenterar också stark konvergensanalys för felkompensering tillämpat på stokastiska gradientmetoder med ett kommunikationsnätverk innehållande ett flertal maskiner. De felkompenserade algoritmerna resulterar, i motsats till direkt kompression, i betydande reducering av kompressionsfelet. Simuleringar av algoritmer i denna avhandling på verkligaproblem med referensdatamängder validerar våra teoretiska resultat. / <p>QC20191120</p> Communication efficient learning Optimization algorithms Quantization Error compensation First-order algorithms Stochastic gradient descent Control Engineering Reglerteknik
16	Analysis and Numerics of Stochastic Gradient Flows Kunick, Florian 22 September 2022 (has links) In this thesis we study three stochastic partial differential equations (SPDE) that arise as stochastic gradient flows via the fluctuation-dissipation principle. For the first equation we establish a finer regularity statement based on a generalized Taylor expansion which is inspired by the theory of rough paths. The second equation is the thin-film equation with thermal noise which is a singular SPDE. In order to circumvent the issue of dealing with possible renormalization, we discretize the gradient flow structure of the deterministic thin-film equation. Choosing a specific discretization of the metric tensor, we resdiscover a well-known discretization of the thin-film equation introduced by Grün and Rumpf that satisfies a discrete entropy estimate. By proving a stochastic entropy estimate in this discrete setting, we obtain positivity of the scheme in the case of no-slip boundary conditions. Moreover, we analyze the associated rate functional and perform numerical experiments which suggest that the scheme converges. The third equation is the massive $\varphi^4_2$-model on the torus which is also a singular SPDE. In the spirit of Bakry and Émery, we obtain a gradient bound on the Markov semigroup. The proof relies on an $L^2$-estimate for the linearization of the equation. Due to the required renormalization, we use a stopping time argument in order to ensure stochastic integrability of the random constant in the estimate. A postprocessing of this estimate yields an even sharper gradient bound. As a corollary, for large enough mass, we establish a local spectral gap inequality which by ergodicity yields a spectral gap inequality for the $\varphi^4_2$- measure. info:eu-repo/classification/ddc/500 ddc:500
17	Deep Learning for Ordinary Differential Equations and Predictive Uncertainty Yijia Liu (17984911) 19 April 2024 (has links) <p dir="ltr">Deep neural networks (DNNs) have demonstrated outstanding performance in numerous tasks such as image recognition and natural language processing. However, in dynamic systems modeling, the tasks of estimating and uncovering the potentially nonlinear structure of systems represented by ordinary differential equations (ODEs) pose a significant challenge. In this dissertation, we employ DNNs to enable precise and efficient parameter estimation of dynamic systems. In addition, we introduce a highly flexible neural ODE model to capture both nonlinear and sparse dependent relations among multiple functional processes. Nonetheless, DNNs are susceptible to overfitting and often struggle to accurately assess predictive uncertainty despite their widespread success across various AI domains. The challenge of defining meaningful priors for DNN weights and characterizing predictive uncertainty persists. In this dissertation, we present a novel neural adaptive empirical Bayes framework with a new class of prior distributions to address weight uncertainty.</p><p dir="ltr">In the first part, we propose a precise and efficient approach utilizing DNNs for estimation and inference of ODEs given noisy data. The DNNs are employed directly as a nonparametric proxy for the true solution of the ODEs, eliminating the need for numerical integration and resulting in significant computational time savings. We develop a gradient descent algorithm to estimate both the DNNs solution and the parameters of the ODEs by optimizing a fidelity-penalized likelihood loss function. This ensures that the derivatives of the DNNs estimator conform to the system of ODEs. Our method is particularly effective in scenarios where only a set of variables transformed from the system components by a given function are observed. We establish the convergence rate of the DNNs estimator and demonstrate that the derivatives of the DNNs solution asymptotically satisfy the ODEs determined by the inferred parameters. Simulations and real data analysis of COVID-19 daily cases are conducted to show the superior performance of our method in terms of accuracy of parameter estimates and system recovery, and computational speed.</p><p dir="ltr">In the second part, we present a novel sparse neural ODE model to characterize flexible relations among multiple functional processes. This model represents the latent states of the functions using a set of ODEs and models the dynamic changes of these states utilizing a DNN with a specially designed architecture and sparsity-inducing regularization. Our new model is able to capture both nonlinear and sparse dependent relations among multivariate functions. We develop an efficient optimization algorithm to estimate the unknown weights for the DNN under the sparsity constraint. Furthermore, we establish both algorithmic convergence and selection consistency, providing theoretical guarantees for the proposed method. We illustrate the efficacy of the method through simulation studies and a gene regulatory network example.</p><p dir="ltr">In the third part, we introduce a class of implicit generative priors to facilitate Bayesian modeling and inference. These priors are derived through a nonlinear transformation of a known low-dimensional distribution, allowing us to handle complex data distributions and capture the underlying manifold structure effectively. Our framework combines variational inference with a gradient ascent algorithm, which serves to select the hyperparameters and approximate the posterior distribution. Theoretical justification is established through both the posterior and classification consistency. We demonstrate the practical applications of our framework through extensive simulation examples and real-world datasets. Our experimental results highlight the superiority of our proposed framework over existing methods, such as sparse variational Bayesian and generative models, in terms of prediction accuracy and uncertainty quantification.</p> Deep learning Statistics not elsewhere classified Ordinary Differential Equations (ODE) Stochastic Gradient Algorithm Empirical Bayes Deep Neural Networks
18	Are Particle-Based Methods the Future of Sampling in Joint Energy Models? A Deep Dive into SVGD and SGLD Shah, Vedant Rajiv 19 August 2024 (has links) This thesis investigates the integration of Stein Variational Gradient Descent (SVGD) with Joint Energy Models (JEMs), comparing its performance to Stochastic Gradient Langevin Dynamics (SGLD). We incorporated a generative loss term with an entropy component to enhance diversity and a smoothing factor to mitigate numerical instability issues commonly associated with the energy function in energy-based models. Experiments on the CIFAR-10 dataset demonstrate that SGLD, particularly with Sharpness-Aware Minimization (SAM), outperforms SVGD in classification accuracy. However, SVGD without SAM, despite its lower classification accuracy, exhibits lower calibration error underscoring its potential for developing well-calibrated classifiers required in safety-critical applications. Our results emphasize the importance of adaptive tuning of the SVGD smoothing factor ($alpha$) to balance generative and classification objectives. This thesis highlights the trade-offs between computational cost and performance, with SVGD demanding significant resources. Our findings stress the need for adaptive scaling and robust optimization techniques to enhance the stability and efficacy of JEMs. This thesis lays the groundwork for exploring more efficient and robust sampling techniques within the JEM framework, offering insights into the integration of SVGD with JEMs. / Master of Science / This thesis explores advanced techniques for improving machine learning models with a focus on developing well-calibrated and robust classifiers. We concentrated on two methods, Stein Variational Gradient Descent (SVGD) and Stochastic Gradient Langevin Dynamics (SGLD), to evaluate their effectiveness in enhancing classification accuracy and reliability. Our research introduced a new mathematical approach to improve the stability and performance of Joint Energy Models (JEMs). By leveraging the generative capabilities of SVGD, the model is guided to learn better data representations, which are crucial for robust classification. Using the CIFAR-10 image dataset, we confirmed prior research indicating that SGLD, particularly when combined with an optimization method called Sharpness-Aware Minimization (SAM), delivered the best results in terms of accuracy and stability. Notably, SVGD without SAM, despite yielding slightly lower classification accuracy, exhibited significantly lower calibration error, making it particularly valuable for safety-critical applications. However, SVGD required careful tuning of hyperparameters and substantial computational resources. This study lays the groundwork for future efforts to enhance the efficiency and reliability of these advanced sampling techniques, with the overarching goal of improving classifier calibration and robustness with JEMs. Stein Variational Gradient Descent Joint Energy Models Stochastic Gradient Langevin Dynamics Wide Residual Networks Energy Based Models
19	Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo Wei Deng (11804435) 18 December 2021 (has links) <div>The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. However, non-convex Bayesian learning in real big data applications can be arbitrarily slow and often fails to capture the uncertainty or informative modes given a limited time. As a result, advanced techniques are still required.</div><div><br></div><div>In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which is a Markov jump process that proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the na\"ive extension of swaps to big data problems leads to a large bias, and the bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange and propose a generalized deterministic even-odd scheme to track the non-reversibility and obtain an optimal round trip rate. Further approximations are conducted based on stochastic gradient descents, which yield a user-friendly nature for large-scale uncertainty approximation tasks without much tuning costs. </div><div><br></div><div>In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved successes in bioinformatics and statistical physics, however, the lack of scalability has greatly limited their extensions to big data applications. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms based on stochastic gradient Langevin dynamics. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes. In addition, we also propose a pleasingly parallel version of such algorithms with interacting latent variables. We show that the interacting algorithm can be theoretically more efficient than the single-chain alternative with an equivalent computational budget.</div> Statistics Stochastic Analysis and Modelling Monte Carlo Algorithm Artificial intelligence Importance sampling Computer vision Langevin Dynamics Variance reduction techniques Wang-Landau algorithm Interacting particles Hamiltonian Monte Carlo Log-Sobolev inequality Metropolis Hasting Deep neural network Stochastic variance-reduced gradient Wasserstein distance Convolutional neural network Deterministic even odd scheme Non-reversibility Stochastic approximation Monte Carlo Stochastic differential equation Stochastic gradient descent Parallel tempering Stochastic approximation Replica exchange Stochastic gradient Langevin dynamics Markov Chain Monte Carlo
20	Deep learning for portfolio optimization MBITI, JOHN N. January 2021 (has links) In this thesis, an optimal investment problem is studied for an investor who can only invest in a financial market modelled by an Itô-Lévy process; with one risk free (bond) and one risky (stock) investment possibility. We present the dynamic programming method and the associated Hamilton-Jacobi-Bellman (HJB) equation to explicitly solve this problem. It is shown that with purification and simplification to the standard jump diffusion process, closed form solutions for the optimal investment strategy and for the value function are attainable. It is also shown that, an explicit solution can be obtained via a finite training of a neural network using Stochastic gradient descent (SGD) for a specific case. Portfolio optimization optimal portfolio jump diffusion Itô-Lévy process stochastic control dynamic programming HJB equation utility optimization stochastic gradient descent Deep learning neural network. Mathematics Matematik

Search results