Spelling suggestions: "subject:"stein variational 9gradient descent"" "subject:"stein variational 9gradient crescent""
1 |
Are Particle-Based Methods the Future of Sampling in Joint Energy Models? A Deep Dive into SVGD and SGLDShah, Vedant Rajiv 19 August 2024 (has links)
This thesis investigates the integration of Stein Variational Gradient Descent (SVGD) with Joint Energy Models (JEMs), comparing its performance to Stochastic Gradient Langevin Dynamics (SGLD). We incorporated a generative loss term with an entropy component to enhance diversity and a smoothing factor to mitigate numerical instability issues commonly associated with the energy function in energy-based models. Experiments on the CIFAR-10 dataset demonstrate that SGLD, particularly with Sharpness-Aware Minimization (SAM), outperforms SVGD in classification accuracy. However, SVGD without SAM, despite its lower classification accuracy, exhibits lower calibration error underscoring its potential for developing well-calibrated classifiers required in safety-critical applications. Our results emphasize the importance of adaptive tuning of the SVGD smoothing factor ($alpha$) to balance generative and classification objectives. This thesis highlights the trade-offs between computational cost and performance, with SVGD demanding significant resources. Our findings stress the need for adaptive scaling and robust optimization techniques to enhance the stability and efficacy of JEMs. This thesis lays the groundwork for exploring more efficient and robust sampling techniques within the JEM framework, offering insights into the integration of SVGD with JEMs. / Master of Science / This thesis explores advanced techniques for improving machine learning models with a focus on developing well-calibrated and robust classifiers. We concentrated on two methods, Stein Variational Gradient Descent (SVGD) and Stochastic Gradient Langevin Dynamics (SGLD), to evaluate their effectiveness in enhancing classification accuracy and reliability. Our research introduced a new mathematical approach to improve the stability and performance of Joint Energy Models (JEMs). By leveraging the generative capabilities of SVGD, the model is guided to learn better data representations, which are crucial for robust classification. Using the CIFAR-10 image dataset, we confirmed prior research indicating that SGLD, particularly when combined with an optimization method called Sharpness-Aware Minimization (SAM), delivered the best results in terms of accuracy and stability. Notably, SVGD without SAM, despite yielding slightly lower classification accuracy, exhibited significantly lower calibration error, making it particularly valuable for safety-critical applications. However, SVGD required careful tuning of hyperparameters and substantial computational resources. This study lays the groundwork for future efforts to enhance the efficiency and reliability of these advanced sampling techniques, with the overarching goal of improving classifier calibration and robustness with JEMs.
|
2 |
The applicability and scalability of probabilistic inference in deep-learning-assisted geophysical inversion applicationsIzzatullah, Muhammad 04 1900 (has links)
Probabilistic inference, especially in the Bayesian framework, is a foundation for quantifying uncertainties in geophysical inversion applications. However, due to the presence of high-dimensional datasets and the large-scale nature of geophysical inverse problems, the applicability and scalability of probabilistic inference face significant challenges for such applications. This thesis is dedicated to improving the probabilistic inference algorithms' scalability and demonstrating their applicability for large-scale geophysical inversion applications. In this thesis, I delve into three leading applied approaches in computing the Bayesian posterior distribution in geophysical inversion applications: Laplace's approximation, Markov chain Monte Carlo (MCMC), and variational Bayesian inference.
The first approach, Laplace's approximation, is the simplest form of approximation for intractable Bayesian posteriors. However, its accuracy relies on the estimation of the posterior covariance matrix. I study the visualization of the misfit landscape in low-dimensional subspace and the low-rank approximations of the covariance for full waveform inversion (FWI). I demonstrate that a non-optimal Hessian's eigenvalues truncation for the low-rank approximation will affect the approximation accuracy of the standard deviation, leading to a biased statistical conclusion. Furthermore, I also demonstrate the propagation of uncertainties within the Bayesian physics-informed neural networks for hypocenter localization applications through this approach.
For the MCMC approach, I develop approximate Langevin MCMC algorithms that provide fast sampling at efficient computational costs for large-scale Bayesian FWI; however, this inflates the variance due to asymptotic bias. To account for this asymptotic bias and assess their sample quality, I introduce the kernelized Stein discrepancy (KSD) as a diagnostic tool. When larger computational resources are available, exact MCMC algorithms (i.e., with a Metropolis-Hastings criterion) should be favored for an accurate posterior distribution statistical analysis.
For the variational Bayesian inference, I propose a regularized variational inference framework that performs posterior inference by implicitly regularizing the Kullback-Leibler divergence loss with a deep denoiser through a Plug-and-Play method. I also developed Plug-and-Play Stein Variational Gradient Descent (PnP-SVGD), a novel algorithm to sample the regularized posterior distribution. The PnP-SVGD demonstrates its ability to produce high-resolution, trustworthy samples representative of the subsurface structures for a post-stack seismic inversion application.
|
Page generated in 0.1395 seconds