Global ETD Search

21	Hamiltonian Monte Carlo and consistent sampling for score matching based generative modeling Piché-Taillefer, Rémi 05 1900 (has links) Avant-propos: Cet ouvrage se base en partie sur le travail réalisé en collaboration avec Alexia Jolicoeur-Martineau, Ioannis Mitliagkas et Rémi Tachet des Combes, réalisé en 2020 et publié à la conférence internationale d'apprentissage de représentations (ICLR 2021). Les analyses présentées dans les prochaines pages approfondissent, corrigent et ajoutent à cet ouvrage de manière substantive, sans toutefois reposer sur cet ouvrage ou quelconque connaissance couverte par ce texte. / Ce mémoire a pour but de présenter des analyses pertinentes au sujet des méthodes génératives dites Denoising Score Matching dans le but de mieux comprendre leur fonctionnement et d'améliorer les techniques existantes. Ces méthodes consistent à graduellement réduire le bruit dans une image en usant de réseaux neuraux profonds à des fins de synthèse. Tandis que les premiers chapitres contextualisent le problème du Denoising Score Matching, les chapitres suivants s’affairent à reformuler l’objectif d’entraînement du réseau neuronal, puis à analyser le processus itératif générateur. J’introduis par la suite les concepts fondateurs des méthodes de Monte Carlo par chaînes de Markov (MCMC) pour dynamiques Hamiltoniennes, que j’adapte ensuite à la synthèse d’image par réduction graduelle de bruit. Tandis que les dynamiques de Langevin ont jusqu’alors eut monopole des processus génératifs dans la littérature de synthèse par le score, les dynamiques Hamiltoniennes font l'objet d’un engouement quant à leur vitesse de convergence supérieure. Je démontre leur efficacité dans les sections suivantes et précise, dans le cas de la génération d'images complexes, les contextes dans lesquels leur usage est avantageux. Lors d’une étude d’ablation complète, je présente les gains indépendants et jumelés des améliorations proposées, et par le fait même, je contribue à notre compréhension des modèles basés sur le score. / This thesis presents pertinent analysis around generative modeling of the Denoising Score Matching family with the goals of better understanding how they work and improving existing methods. These methods work by gradually reducing noise in images using deep neural networks. While the first chapters contextualize the problem of Denoising Score Matching, the following chapters focus on reformulating the training objective of the neural network and analysing the iterative generative process. I introduce the founding concepts of Markov Chain Monte Carlo (MCMC) for Hamiltonian Dynamics and adapt them to our framework of image synthesis by annealing of Gaussian noise. While Langevin Dynamics have thus far dominated generative processes in the Denoising Score Matching literature, Hamiltonian Dynamics sustained interest from their superior convergence rate. I demonstrate their efficiency in the next chapters and elaborate on the contexts in which their use is advantageous to complex image generation. In a complete ablation study, I present the independent and coupled gains from every proposed improvements and thereby elevate our comprehension of Denoising Score Matching methods. Generative Modeling Denoising Score Matching Hamiltonian Monte Carlo Langevin Dynamics Hamiltonian Dynamics Processus génératif Apprentissage profond Réseaux neuronaux Dynamiques Hamiltoniennes Monte-Carlo Hamiltonien Processus de diffusion
22	Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo Wei Deng (11804435) 18 December 2021 (has links) <div>The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. However, non-convex Bayesian learning in real big data applications can be arbitrarily slow and often fails to capture the uncertainty or informative modes given a limited time. As a result, advanced techniques are still required.</div><div><br></div><div>In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which is a Markov jump process that proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the na\"ive extension of swaps to big data problems leads to a large bias, and the bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange and propose a generalized deterministic even-odd scheme to track the non-reversibility and obtain an optimal round trip rate. Further approximations are conducted based on stochastic gradient descents, which yield a user-friendly nature for large-scale uncertainty approximation tasks without much tuning costs. </div><div><br></div><div>In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved successes in bioinformatics and statistical physics, however, the lack of scalability has greatly limited their extensions to big data applications. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms based on stochastic gradient Langevin dynamics. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes. In addition, we also propose a pleasingly parallel version of such algorithms with interacting latent variables. We show that the interacting algorithm can be theoretically more efficient than the single-chain alternative with an equivalent computational budget.</div> Statistics Stochastic Analysis and Modelling Monte Carlo Algorithm Artificial intelligence Importance sampling Computer vision Langevin Dynamics Variance reduction techniques Wang-Landau algorithm Interacting particles Hamiltonian Monte Carlo Log-Sobolev inequality Metropolis Hasting Deep neural network Stochastic variance-reduced gradient Wasserstein distance Convolutional neural network Deterministic even odd scheme Non-reversibility Stochastic approximation Monte Carlo Stochastic differential equation Stochastic gradient descent Parallel tempering Stochastic approximation Replica exchange Stochastic gradient Langevin dynamics Markov Chain Monte Carlo

Search results

Hamiltonian Monte Carlo and consistent sampling for score matching based generative modeling

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo