Global ETD Search

Return to search

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

<div>The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. However, non-convex Bayesian learning in real big data applications can be arbitrarily slow and often fails to capture the uncertainty or informative modes given a limited time. As a result, advanced techniques are still required.</div><div><br></div><div>In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which is a Markov jump process that proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the na\"ive extension of swaps to big data problems leads to a large bias, and the bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange and propose a generalized deterministic even-odd scheme to track the non-reversibility and obtain an optimal round trip rate. Further approximations are conducted based on stochastic gradient descents, which yield a user-friendly nature for large-scale uncertainty approximation tasks without much tuning costs. </div><div><br></div><div>In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved successes in bioinformatics and statistical physics, however, the lack of scalability has greatly limited their extensions to big data applications. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms based on stochastic gradient Langevin dynamics. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes. In addition, we also propose a pleasingly parallel version of such algorithms with interacting latent variables. We show that the interacting algorithm can be theoretically more efficient than the single-chain alternative with an equivalent computational budget.</div>

10.25394/pgs.17161718.v1

Statistics

Stochastic Analysis and Modelling

Monte Carlo Algorithm

Artificial intelligence

Importance sampling

Computer vision

Langevin Dynamics

Variance reduction techniques

Wang-Landau algorithm

Interacting particles

Hamiltonian Monte Carlo

Log-Sobolev inequality

Metropolis Hasting

Deep neural network

Stochastic variance-reduced gradient

Wasserstein distance

Convolutional neural network

Deterministic even odd scheme

Non-reversibility

Stochastic approximation Monte Carlo

Stochastic differential equation

Stochastic gradient descent

Parallel tempering

Stochastic approximation

Replica exchange

Stochastic gradient Langevin dynamics

Markov Chain Monte Carlo

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/17161718
Date	18 December 2021
Creators	Wei Deng (11804435)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/Non-convex_Bayesian_Learning_via_Stochastic_Gradient_Markov_Chain_Monte_Carlo/17161718

Page generated in 0.0026 seconds

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

Description

Links & Downloads

Tags

Additional Fields