Spelling suggestions: "subject:"9gradient descent"" "subject:"cogradient descent""
71 |
推薦系統資料插補改良法-電影推薦系統應用 / Improving recommendations through data imputation-with application for movie recommendation楊智博, Yang, Chih Po Unknown Date (has links)
現今許多網路商店或電子商務將產品銷售給消費者的過程中,皆使用推薦系統的幫助來提高銷售量。如亞馬遜公司(Amazon)、Netflix,深入了解顧客的使用習慣,建構專屬的推薦系統並進行個性化的推薦商品給每一位顧客。
推薦系統應用的技術分為協同過濾和內容過濾兩大類,本研究旨在探討協同過濾推薦系統中潛在因子模型方法,利用矩陣分解法找出評分矩陣。在Koren等人(2009)中,將矩陣分解法的演算法大致分為兩種,隨機梯度下降法(Stochastic gradient descent)與交替最小平方法(Alternating least squares)。本研究主要研究目的有三項,一為比較交替最小平方法與隨機梯度下降法的預測能力,二為兩種矩陣分解演算法在加入偏誤項後的表現,三為先完成交替最小平方法與隨機梯度下降法,以其預測值對原始資料之遺失值進行資料插補,再利用奇異值分解法對完整資料做矩陣分解,觀察其前後方法的差異。
研究結果顯示,隨機梯度下降法所需的運算時間比交替最小平方法所需的運算時間少。另外,完成兩種矩陣分解演算法後,將預測值插補遺失值,進行奇異值分解的結果也顯示預測能力有提升。 / Recommender system has been largely used by Internet companies such Amazon and Netflix to make recommendations for Internet users. Techniques for recommender systems can be divided into content filtering approach and collaborative filtering approach. Matrix factorization is a popular method for collaborative filtering approach. It minimizes the object function through stochastic gradient descent and alternating least squares.
This thesis has three goals. First, we compare the alternating least squares method and stochastic gradient descent method. Secondly, we compare the performance of matrix factorization method with and without the bias term. Thirdly, we combine singular value decomposition and matrix factorization.
As expected, we found the stochastic gradient descent takes less time than the alternating least squares method, and the the matrix factorization method with bias term gives more accurate prediction. We also found that combining singular value decomposition with matrix factorization can improve the predictive accuracy.
|
72 |
Evaluation of computational methods for data predictionErickson, Joshua N. 03 September 2014 (has links)
Given the overall increase in the availability of computational resources, and the importance of forecasting the future, it should come as no surprise that prediction is considered to be one of the most compelling and challenging problems for both academia and industry in the world of data analytics. But how is prediction done, what factors make it easier or harder to do, how accurate can we expect the results to be, and can we harness the available computational resources in meaningful ways? With efforts ranging from those designed to save lives in the moments before a near field tsunami to others attempting to predict the performance of Major League Baseball players, future generations need to have realistic expectations about prediction methods and analytics. This thesis takes a broad look at the problem, including motivation, methodology, accuracy, and infrastructure. In particular, a careful study involving experiments in regression, the prediction of continuous, numerical values, and classification, the assignment of a class to each sample, is provided. The results and conclusions of these experiments cover only the included data sets and the applied algorithms as implemented by the Python library. The evaluation includes accuracy and running time of different algorithms across several data sets to establish tradeoffs between the approaches, and determine the impact of variations in the size of the data sets involved. As scalability is a key characteristic required to meet the needs of future prediction problems, a discussion of some of the challenges associated with parallelization is included. / Graduate / 0984 / erickson@uvic.ca
|
73 |
Uma estratégia para predição da taxa de aprendizagem do gradiente descendente para aceleração da fatoração de matrizes. / A strategy to predict the learning rate of the downward gradient for acceleration of matrix factorization. / Une stratégie pour prédire le taux d'apprentissage du gradient descendant pour l'accélération de la factorisation matricielle.NÓBREGA, Caio Santos Bezerra. 11 April 2018 (has links)
Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-04-11T14:50:08Z
No. of bitstreams: 1
CAIO SANTOS BEZERRA NÓBREGA - DISSERTAÇÃO PPGCC 2014..pdf: 983246 bytes, checksum: 5eca7651706ce317dc514ec2f1aa10c3 (MD5) / Made available in DSpace on 2018-04-11T14:50:08Z (GMT). No. of bitstreams: 1
CAIO SANTOS BEZERRA NÓBREGA - DISSERTAÇÃO PPGCC 2014..pdf: 983246 bytes, checksum: 5eca7651706ce317dc514ec2f1aa10c3 (MD5)
Previous issue date: 2014-07-30 / Capes / Sugerir os produtos mais apropriados aos diversos tipos de consumidores não é uma tarefa trivial, apesar de ser um fator chave para aumentar satisfação e lealdade destes. Devido a esse fato, sistemas de recomendação têm se tornado uma ferramenta importante para diversas aplicações, tais como, comércio eletrônico, sites personalizados e redes sociais. Recentemente, a fatoração de matrizes se tornou a técnica mais bem sucedida de implementação de sistemas de recomendação. Os parâmetros do modelo de fatoração de matrizes são tipicamente aprendidos por meio de métodos numéricos, tal como o gradiente descendente. O desempenho do gradiente descendente está diretamente relacionada à configuração da taxa de aprendizagem, a qual é tipicamente configurada para valores pequenos, com o objetivo de não perder um mínimo local. Consequentemente, o algoritmo pode levar várias iterações
para convergir. Idealmente,é desejada uma taxa de aprendizagem que conduza a um mínimo local nas primeiras iterações, mas isto é muito difícil de ser realizado dada a alta complexidade do espaço de valores a serem pesquisados. Começando com um estudo exploratório em várias bases de dados de sistemas de recomendação, observamos que, para a maioria das bases, há um padrão linear entre a taxa de aprendizagem e o número de iterações necessárias
para atingir a convergência. A partir disso, propomos utilizar modelos de regressão lineares
simples para predizer, para uma base de dados desconhecida, um bom valor para a taxa de
aprendizagem inicial. A ideia é estimar uma taxa de aprendizagem que conduza o gradiente
descendenteaummínimolocalnasprimeirasiterações. Avaliamosnossatécnicaem8bases
desistemasderecomendaçãoreaisecomparamoscomoalgoritmopadrão,oqualutilizaum
valorfixoparaataxadeaprendizagem,ecomtécnicasqueadaptamataxadeaprendizagem
extraídas da literatura. Nós mostramos que conseguimos reduzir o número de iterações até em 40% quando comparados à abordagem padrão. / Suggesting the most suitable products to different types of consumers is not a trivial task, despite being a key factor for increasing their satisfaction and loyalty. Due to this fact, recommender systems have be come an important tool for many applications, such as e-commerce, personalized websites and social networks. Recently, Matrix Factorization has become the most successful technique to implement recommendation systems. The parameters of this model are typically learned by means of numerical methods, like the gradient descent. The performance of the gradient descent is directly related to the configuration of the learning rate, which is typically set to small values, in order to do not miss a local minimum. As a consequence, the algorithm may take several iterations to converge. Ideally, one wants to find a learning rate that will lead to a local minimum in the early iterations, but this is
very difficult to achieve given the high complexity of search space. Starting with an exploratory study on several recommendation systems datasets, we observed that there is an over all linear relationship between the learnin grate and the number of iterations needed until convergence. From this, we propose to use simple linear regression models to predict, for a unknown dataset, a good value for an initial learning rate. The idea is to estimate a learning
rate that drives the gradient descent as close as possible to a local minimum in the first
iteration. We evaluate our technique on 8 real-world recommender datasets and compared it with the standard Matrix Factorization learning algorithm, which uses a fixed value for the learning rate over all iterations, and techniques fromt he literature that adapt the learning rate. We show that we can reduce the number of iterations until at 40% compared to the standard approach.
|
74 |
Feature Selection under Multicollinearity & Causal Inference on Time SeriesBhattacharya, Indranil January 2017 (has links) (PDF)
In this work, we study and extend algorithms for Sparse Regression and Causal Inference problems. Both the problems are fundamental in the area of Data Science.
The goal of regression problem is to nd out the \best" relationship between an output variable and input variables, given samples of the input and output values. We consider sparse regression under a high-dimensional linear model with strongly correlated variables, situations which cannot be handled well using many existing model selection algorithms. We study the performance of the popular feature selection algorithms such as LASSO, Elastic Net, BoLasso, Clustered Lasso as well as Projected Gradient Descent algorithms under this setting in terms of their running time, stability and consistency in recovering the true support. We also propose a new feature selection algorithm, BoPGD, which cluster the features rst based on their sample correlation and do subsequent sparse estimation using a bootstrapped variant of the projected gradient descent method with projection on the non-convex L0 ball. We attempt to characterize the efficiency and consistency of our algorithm by performing a host of experiments on both synthetic and real world datasets.
Discovering causal relationships, beyond mere correlation, is widely recognized as a fundamental problem. The Causal Inference problems use observations to infer the underlying causal structure of the data generating process. The input to these problems is either a multivariate time series or i.i.d sequences and the output is a Feature Causal Graph where the nodes correspond to the variables and edges capture the direction of causality. For high dimensional datasets, determining the causal relationships becomes a challenging task because of the curse of dimensionality. Graphical modeling of temporal data based on the concept of \Granger Causality" has gained much attention in this context. The blend of Granger methods along with model selection techniques, such as LASSO, enables efficient discovery of a \sparse" sub-set of causal variables in high dimensional settings. However, these temporal causal methods use an input parameter, L, the maximum time lag. This parameter is the maximum gap in time between the occurrence of the output phenomenon and the causal input stimulus. How-ever, in many situations of interest, the maximum time lag is not known, and indeed, finding the range of causal e ects is an important problem. In this work, we propose and evaluate a data-driven and computationally efficient method for Granger causality inference in the Vector Auto Regressive (VAR) model without foreknowledge of the maximum time lag. We present two algorithms Lasso Granger++ and Group Lasso Granger++ which not only constructs the
hypothesis feature causal graph, but also simultaneously estimates a value of maxlag (L) for each variable by balancing the trade-o between \goodness of t" and \model complexity".
|
75 |
Aplicação do Word2vec e do Gradiente descendente dstocástico em tradução automáticaAguiar, Eliane Martins de 30 May 2016 (has links)
Submitted by Eliane Martins de Aguiar (elianemart@gmail.com) on 2016-08-01T21:03:09Z
No. of bitstreams: 1
dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5) / Approved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2016-08-03T20:29:34Z (GMT) No. of bitstreams: 1
dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5) / Approved for entry into archive by Maria Almeida (maria.socorro@fgv.br) on 2016-08-23T20:12:35Z (GMT) No. of bitstreams: 1
dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5) / Made available in DSpace on 2016-08-23T20:12:54Z (GMT). No. of bitstreams: 1
dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5)
Previous issue date: 2016-05-30 / O word2vec é um sistema baseado em redes neurais que processa textos e representa pa- lavras como vetores, utilizando uma representação distribuída. Uma propriedade notável são as relações semânticas encontradas nos modelos gerados. Este trabalho tem como objetivo treinar dois modelos utilizando o word2vec, um para o Português e outro para o Inglês, e utilizar o gradiente descendente estocástico para encontrar uma matriz de tradução entre esses dois espaços.
|
76 |
Contributions à l'apprentissage grande échelle pour la classification d'images / Contributions to large-scale learning for image classificationAkata, Zeynep 06 January 2014 (has links)
La construction d'algorithmes classifiant des images à grande échelle est devenue une t^ache essentielle du fait de la difficulté d'effectuer des recherches dans les immenses collections de données visuelles non-etiquetées présentes sur Internet. L'objetif est de classifier des images en fonction de leur contenu pour simplifier la gestion de telles bases de données. La classification d'images à grande échelle est un problème complexe, de par l'importance de la taille des ensembles de données, tant en nombre d'images qu'en nombre de classes. Certaines de ces classes sont dites "fine-grained" (sémantiquement proches les unes des autres) et peuvent même ne contenir aucun représentant étiqueté. Dans cette thèse, nous utilisons des représentations à l'état de l'art d'images et nous concentrons sur des méthodes d'apprentissage efficaces. Nos contributions sont (1) un banc d'essai d'algorithmes d'apprentissage pour la classification à grande échelle et (2) un nouvel algorithme basé sur l'incorporation d'étiquettes pour apprendre sur des données peu abondantes. En premier lieu, nous introduisons un banc d'essai d'algorithmes d'apprentissage pour la classification à grande échelle, dans un cadre entièrement supervisé. Il compare plusieurs fonctions objectifs pour apprendre des classifieurs linéaires, tels que "un contre tous", "multiclasse", "classement", "classement avec pondération" par descente de gradient stochastique. Ce banc d'essai se conclut en un ensemble de recommandations pour la classification à grande échelle. Avec une simple repondération des données, la stratégie "un contre tous" donne des performances meilleures que toutes les autres. Par ailleurs, en apprentissage en ligne, un pas d'apprentissage assez petit s'avère suffisant pour obtenir des résultats au niveau de l'état de l'art. Enfin, l'arrêt prématuré de la descente de gradient stochastique introduit une régularisation qui améliore la vitesse d'entraînement ainsi que la capacité de régularisation. Deuxièmement, face à des milliers de classes, il est parfois difficile de rassembler suffisamment de données d'entraînement pour chacune des classes. En particulier, certaines classes peuvent être entièrement dénuées d'exemples. En conséquence, nous proposons un nouvel algorithme adapté à ce scénario d'apprentissage dit "zero-shot". Notre algorithme utilise des données parallèles, comme les attributs, pour incorporer les classes dans un espace euclidien. Nous introduisons par ailleurs une fonction pour mesurer la compatibilité entre image et étiquette. Les paramètres de cette fonction sont appris en utilisant un objectif de type "ranking". Notre algorithme dépasse l'état de l'art pour l'apprentissage "zero-shot", et fait preuve d'une grande flexibilité en permettant d'incorporer d'autres sources d'information parallèle, comme des hiérarchies. Il permet en outre une transition sans heurt du cas "zero-shot" au cas où peu d'exemples sont disponibles. / Building algorithms that classify images on a large scale is an essential task due to the difficulty in searching massive amount of unlabeled visual data available on the Internet. We aim at classifying images based on their content to simplify the manageability of such large-scale collections. Large-scale image classification is a difficult problem as datasets are large with respect to both the number of images and the number of classes. Some of these classes are fine grained and they may not contain any labeled representatives. In this thesis, we use state-of-the-art image representations and focus on efficient learning methods. Our contributions are (1) a benchmark of learning algorithms for large scale image classification, and (2) a novel learning algorithm based on label embedding for learning with scarce training data. Firstly, we propose a benchmark of learning algorithms for large scale image classification in the fully supervised setting. It compares several objective functions for learning linear classifiers such as one-vs-rest, multiclass, ranking and weighted average ranking using the stochastic gradient descent optimization. The output of this benchmark is a set of recommendations for large-scale learning. We experimentally show that, online learning is well suited for large-scale image classification. With simple data rebalancing, One-vs-Rest performs better than all other methods. Moreover, in online learning, using a small enough step size with respect to the learning rate is sufficient for state-of-the-art performance. Finally, regularization through early stopping results in fast training and a good generalization performance. Secondly, when dealing with thousands of classes, it is difficult to collect sufficient labeled training data for each class. For some classes we might not even have a single training example. We propose a novel algorithm for this zero-shot learning scenario. Our algorithm uses side information, such as attributes to embed classes in a Euclidean space. We also introduce a function to measure the compatibility between an image and a label. The parameters of this function are learned using a ranking objective. Our algorithm outperforms the state-of-the-art for zero-shot learning. It is flexible and can accommodate other sources of side information such as hierarchies. It also allows for a smooth transition from zero-shot to few-shots learning.
|
77 |
Algorithms For Stochastic Games And Service SystemsPrasad, H L 05 1900 (has links) (PDF)
This thesis is organized into two parts, one for my main area of research in the field of stochastic games, and the other for my contributions in the area of service systems. We first provide an abstract for my work in stochastic games.
The field of stochastic games has been actively pursued over the last seven decades because of several of its important applications in oligopolistic economics. In the past, zero-sum stochastic games have been modelled and solved for Nash equilibria using the standard techniques of Markov decision processes. General-sum stochastic games on the contrary have posed difficulty as they cannot be reduced to Markov decision processes. Over the past few decades the quest for algorithms to compute Nash equilibria in general-sum stochastic games has intensified and several important algorithms such as stochastic tracing procedure [Herings and Peeters, 2004], NashQ [Hu and Wellman, 2003], FFQ [Littman, 2001], etc., and their generalised representations such as the optimization problem formulations for various reward structures [Filar and Vrieze, 1997] have been proposed. However, they suffer from either lack of generality or are intractable for even medium sized problems or both. In our venture towards algorithms for stochastic games, we start with a non-linear optimization problem and then design a simple gradient descent procedure for the same. Though this procedure gives the Nash equilibrium for a sample problem of terrain exploration, we observe that, in general, it need not be true. We characterize the necessary conditions and define KKT-N point. KKT-N points are those Karush-Kuhn-Tucker (KKT) points which corresponding to Nash equilibria. Thus, for a simple gradient based algorithm to guarantee convergence to Nash equilibrium, all KKT points of the optimization problem need to be KKT-N points, which restricts the applicability of such algorithms.
We then take a step back and start looking at better characterization of those points of the optimization problem which correspond to Nash equilibria of the underlying game. As a result of this exploration, we derive two sets of necessary and sufficient conditions. The first set, KKT-SP conditions, is inspired from KKT conditions itself and is obtained by breaking down the main optimization problem into several sub-problems and then applying KKT conditions to each one of those sub-problems. The second set, SG-SP conditions, is a simplified set of conditions which characterize those Nash points more compactly. Using both KKT-SP and SG-SP conditions, we propose three algorithms, OFF-SGSP, ON-SGSP and DON-SGSP, respectively, which we show provide Nash equilibrium strategies for general-sum discounted stochastic games. Here OFF-SGSP is an off-line algorithm while ONSGSP and DON-SGSP are on-line algorithms. In particular, we believe that DON-SGSP is the first decentralized on-line algorithm for general-sum discounted stochastic games. We show that both our on-line algorithms are computationally efficient. In fact, we show that DON-SGSP is not only applicable for multi-agent scenarios but is also directly applicable for the single-agent case, i.e., MDPs (Markov Decision Processes).
The second part of the thesis focuses on formulating and solving the problem of minimizing the labour-cost in service systems. We define the setting of service systems and then model the labour-cost problem as a constrained discrete parameter Markov-cost process. This Markov process is parametrized by the number of workers in various shifts and with various skill levels. With the number of workers as optimization variables, we provide a detailed formulation of a constrained optimization problem where the objective is the expected long-run averages of the single-stage labour-costs, and the main set of constraints are the expected long-run average of aggregate SLAs (Service Level Agreements). For this constrained optimization problem, we provide two stochastic optimization algorithms, SASOC-SF-N and SASOC-SF-C, which use smoothed functional approaches to estimate gradient and perform gradient descent in the aforementioned constrained optimization problem. SASOC-SF-N uses Gaussian distribution for smoothing while SASOC-SF-C uses Cauchy distribution for the same. SASOC-SF-C is the first Cauchy based smoothing algorithm which requires a fixed number (two) of simulations independent of the number of optimization variables. We show that these algorithms provide an order of magnitude better performance than existing industrial standard tool, OptQuest. We also show that SASOC-SF-C gives overall better performance.
|
78 |
Formes d’ondes MSPSR, traitements et performances associés / MSPSR (Multi-Static Primary Surveillance Radar) waveforms, related processing and performancesArlery, Fabien 01 December 2017 (has links)
Aujourd’hui, les systèmes MSPSR (Multi-Static Primary Surveillance Radar) passifs se sont installés de manière durable dans le paysage de la surveillance aérienne [1]. L’intérêt que suscitent ces nouveaux systèmes provient du fait qu’en comparaison aux radars mono-statiques utilisés actuellement, les systèmes MSPSR reposent sur une distribution spatiale d’émetteurs et de récepteurs offrant des avantages en termes de fiabilité (redondance), de coûts (absence de joints tournants et émetteurs moins puissants) et de performances (diversité spatiale). Toutefois, le défaut majeur du MSPSR passif réside en l’absence de formes d’ondes dédiées due à l’exploitation d’émetteurs d’opportunités tels que les émetteurs de radio FM (Frequency Modulation) et/ou de DVB-T (Digital Video Broadcasting-Terrestrial) [2]. Afin de pallier à ce défaut, il est envisagé d’utiliser des émetteurs dédiés permettant l’emploi de formes d’ondes optimisées pour une application radar, on parle alors de MSPSR actif. Cette thèse se place dans ce cadre et a pour objectif d’étudier et de définir la ou les formes d’ondes ainsi que les traitements associés permettant d’atteindre de meilleurs performances : une meilleure flexibilité sur la disposition du système (positionnement des émetteurs libres), une continuité de service (non dépendance d’un système tiers) et de meilleurs performances radars (e.g. en terme de précision des mesures, détections, …). Dans ce but, cette thèse étudie : - Les critères de sélection des codes : comportement des fonctions d’ambiguïtés, PAPR (Peak to Average Power Ratio), efficacité spectrale, etc... ; - Les formes d’ondes utilisées en télécommunication (scrambling code, OFDM) afin d’identifier leur possible réemploi pour une application radar ; - L’utilisation d’algorithmes cycliques pour générer des familles de séquences adaptées à notre problème ; - Une approche basée sur une descente de gradient afin de générer des familles de codes de manière plus efficiente ; - Et l’évaluation des performances de ces différents algorithmes à travers l’établissement d’une borne supérieure sur le niveau maximum des lobes secondaires et à travers le dépouillement des données enregistrées suite à des campagnes d’essais / Nowadays, MSPSR (Multi-Static Primary Surveillance Radar) systems are sustainably settled in air surveillance program [1]. Compared to mono-static radar currently in use, an MSPSR system is based on a sparse network of transmitters (Tx) and receivers (Rx) interconnected to a Central Unit and offers advantages in terms of reliability, cost and performance.Two kinds of MSPSR systems exist: the Passive form and the Active one. While the Passive MSPSR uses transmitters of opportunity such as radio Frequency Modulation (FM) transmitters and/or Digital Video Broadcasting-Terrestrial (DVB-T) transmitters [2], the Active MSPSR uses dedicated transmitters, which emit a waveform that is controlled and designed for a radar application. Each receiver processes the signal coming from all transmitters and reflected on the targets; and the Central Unit restores the target location by intersecting “ellipsoids” from all (transmitter, receiver) pairs. Compared to passive MSPSR, the main advantages of the active MSPSR are the use of dedicated waveforms that allow reaching better performances (like a better association of the transmitters’ contributions at the receiver level); more flexibility in the deployment of transmitters and receivers station (in order to meet the requirements in localisation accuracy and in horizontal and altitude coverages); and the guarantee of having a service continuity. On this purpose, this thesis analyses the differents codes criteria such as the ambiguity function behaviour, the PAPR (Peak to Average Power Ratio), the spectrum efficiency, etc... . Then, in order to find dedicated waveforms for MSPSR systems, one solution is to find easily-constructed families of sequences. Thus building on the works carried out by the Telecommunication field for solving multi-user issues, this document investigates the application of spreading codes and OFDM signals in MSPSR concept. Besides, another solution is to directly generate a set of sequences. Based on cyclic algorithms in [3] we derive a new algorithm that allows to optimize sets of sequences. Similarly, using a gradient descent approach, we develop a more efficient algorithm than the cyclic one. Finally, in order to evaluate the performances of the different algorithms, this thesis generalizes the Levenshtein Bound, establishes new lower bounds on the PSLR (Peak Sidelobe Level Ratio) in mismatched filter case, and studies real data recorded during some trials
|
79 |
STATISTICAL PHYSICS OF CELL ADHESION COMPLEXES AND MACHINE LEARNINGAdhikari, Shishir Raj 26 August 2019 (has links)
No description available.
|
80 |
The applicability and scalability of probabilistic inference in deep-learning-assisted geophysical inversion applicationsIzzatullah, Muhammad 04 1900 (has links)
Probabilistic inference, especially in the Bayesian framework, is a foundation for quantifying uncertainties in geophysical inversion applications. However, due to the presence of high-dimensional datasets and the large-scale nature of geophysical inverse problems, the applicability and scalability of probabilistic inference face significant challenges for such applications. This thesis is dedicated to improving the probabilistic inference algorithms' scalability and demonstrating their applicability for large-scale geophysical inversion applications. In this thesis, I delve into three leading applied approaches in computing the Bayesian posterior distribution in geophysical inversion applications: Laplace's approximation, Markov chain Monte Carlo (MCMC), and variational Bayesian inference.
The first approach, Laplace's approximation, is the simplest form of approximation for intractable Bayesian posteriors. However, its accuracy relies on the estimation of the posterior covariance matrix. I study the visualization of the misfit landscape in low-dimensional subspace and the low-rank approximations of the covariance for full waveform inversion (FWI). I demonstrate that a non-optimal Hessian's eigenvalues truncation for the low-rank approximation will affect the approximation accuracy of the standard deviation, leading to a biased statistical conclusion. Furthermore, I also demonstrate the propagation of uncertainties within the Bayesian physics-informed neural networks for hypocenter localization applications through this approach.
For the MCMC approach, I develop approximate Langevin MCMC algorithms that provide fast sampling at efficient computational costs for large-scale Bayesian FWI; however, this inflates the variance due to asymptotic bias. To account for this asymptotic bias and assess their sample quality, I introduce the kernelized Stein discrepancy (KSD) as a diagnostic tool. When larger computational resources are available, exact MCMC algorithms (i.e., with a Metropolis-Hastings criterion) should be favored for an accurate posterior distribution statistical analysis.
For the variational Bayesian inference, I propose a regularized variational inference framework that performs posterior inference by implicitly regularizing the Kullback-Leibler divergence loss with a deep denoiser through a Plug-and-Play method. I also developed Plug-and-Play Stein Variational Gradient Descent (PnP-SVGD), a novel algorithm to sample the regularized posterior distribution. The PnP-SVGD demonstrates its ability to produce high-resolution, trustworthy samples representative of the subsurface structures for a post-stack seismic inversion application.
|
Page generated in 0.0676 seconds