81 |
Data-driven test case design of automatic test cases using Markov chains and a Markov chain Monte Carlo method / Datadriven testfallsdesign av automatiska testfall med Markovkedjor och en Markov chain Monte Carlo-metodLindahl, John, Persson, Douglas January 2021 (has links)
Large and complex software that is frequently changed leads to testing challenges. It is well established that the later a fault is detected in software development, the more it costs to fix. This thesis aims to research and develop a method of generating relevant and non-redundant test cases for a regression test suite, to catch bugs as early in the development process as possible. The research was executed at Axis Communications AB with their products and systems in mind. The approach utilizes user data to dynamically generate a Markov chain model and with a Markov chain Monte Carlo method, strengthen that model. The model generates test case proposals, detects test gaps, and identifies redundant test cases based on the user data and data from a test suite. The sampling in the Markov chain Monte Carlo method can be modified to bias the model for test coverage or relevancy. The model is generated generically and can therefore be implemented in other API-driven systems. The model was designed with scalability in mind and further implementations can be made to increase the complexity and further specialize the model for individual needs.
|
82 |
Cyclic Particle Systems on Finite Graphs and Cellular Automata on Rooted, Regular Trees and Galton-Watson TreesBello, Jason 01 October 2021 (has links)
No description available.
|
83 |
Bayesian Solution to the Analysis of Data with Values below the Limit of Detection (LOD)Jin, Yan January 2008 (has links)
No description available.
|
84 |
Deep Time: Deep Learning Extensions to Time Series Factor Analysis with Applications to Uncertainty Quantification in Economic and Financial ModelingMiller, Dawson Jon 12 September 2022 (has links)
This thesis establishes methods to quantify and explain uncertainty through high-order moments in time series data, along with first principal-based improvements on the standard autoencoder and variational autoencoder. While the first-principal improvements on the standard variational autoencoder provide additional means of explainability, we ultimately look to non-variational methods for quantifying uncertainty under the autoencoder framework.
We utilize Shannon's differential entropy to accomplish the task of uncertainty quantification in a general nonlinear and non-Gaussian setting. Together with previously established connections between autoencoders and principal component analysis, we motivate the focus on differential entropy as a proper abstraction of principal component analysis to this more general framework, where nonlinear and non-Gaussian characteristics in the data are permitted.
Furthermore, we are able to establish explicit connections between high-order moments in the data to those in the latent space, which induce a natural latent space decomposition, and by extension, an explanation of the estimated uncertainty. The proposed methods are intended to be utilized in economic and financial factor models in state space form, building on recent developments in the application of neural networks to factor models with applications to financial and economic time series analysis. Finally, we demonstrate the efficacy of the proposed methods on high frequency hourly foreign exchange rates, macroeconomic signals, and synthetically generated autoregressive data sets. / Master of Science / This thesis establishes methods to quantify and explain uncertainty in time series data, along with improvements on some latent variable neural networks called autoencoders and variational autoencoders. Autoencoders and varitational autoencodes are called latent variable neural networks since they can estimate a representation of the data that has less dimension than the original data. These neural network architectures have a fundamental connection to a classical latent variable method called principal component analysis, which performs a similar task of dimension reduction but under more restrictive assumptions than autoencoders and variational autoencoders. In contrast to principal component analysis, a common ailment of neural networks is the lack of explainability, which accounts for the colloquial term black-box models. While the improvements on the standard autoencoders and variational autoencoders help with the problem of explainability, we ultimately look to alternative probabilistic methods for quantifying uncertainty. To accomplish this task, we focus on Shannon's differential entropy, which is entropy applied to continuous domains such as time series data. Entropy is intricately connected to the notion of uncertainty, since it depends on the amount of randomness in the data. Together with previously established connections between autoencoders and principal component analysis, we motivate the focus on differential entropy as a proper abstraction of principal component analysis to a general framework that does not require the restrictive assumptions of principal component analysis.
Furthermore, we are able to establish explicit connections between high-order moments in the data to the estimated latent variables (i.e., the reduced dimension representation of the data). Estimating high-order moments allows for a more accurate estimation of the true distribution of the data. By connecting the estimated high-order moments in the data to the latent variables, we obtain a natural decomposition of the uncertainty surrounding the latent variables, which allows for increased explainability of the proposed autoencoder. The methods introduced in this thesis are intended to be utilized in a class of economic and financial models called factor models, which are frequently used in policy and investment analysis.
A factor model is another type of latent variable model, which in addition to estimating a reduced dimension representation of the data, provides a means to forecast future observations. Finally, we demonstrate the efficacy of the proposed methods on high frequency hourly foreign exchange rates, macroeconomic signals, and synthetically generated autoregressive data sets. The results support the superiority of the entropy-based autoencoder to the standard variational autoencoder both in capability and computational expense.
|
85 |
Semiparametric Bayesian Approach using Weighted Dirichlet Process Mixture For Finance Statistical ModelsSun, Peng 07 March 2016 (has links)
Dirichlet process mixture (DPM) has been widely used as exible prior in nonparametric Bayesian literature, and Weighted Dirichlet process mixture (WDPM) can be viewed as extension of DPM which relaxes model distribution assumptions. Meanwhile, WDPM requires to set weight functions and can cause extra computation burden. In this dissertation, we develop more efficient and exible WDPM approaches under three research topics. The first one is semiparametric cubic spline regression where we adopt a nonparametric prior for error terms in order to automatically handle heterogeneity of measurement errors or unknown mixture distribution, the second one is to provide an innovative way to construct weight function and illustrate some decent properties and computation efficiency of this weight under semiparametric stochastic volatility (SV) model, and the last one is to develop WDPM approach for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model (as an alternative approach for SV model) and propose a new model evaluation approach for GARCH which produces easier-to-interpret result compared to the canonical marginal likelihood approach.
In the first topic, the response variable is modeled as the sum of three parts. One part is a linear function of covariates that enter the model parametrically. The second part is an additive nonparametric model. The covariates whose relationships to response variable are unclear will be included in the model nonparametrically using Lancaster and Šalkauskas bases. The third part is error terms whose means and variance are assumed to follow non-parametric priors. Therefore we denote our model as dual-semiparametric regression because we include nonparametric idea for both modeling mean part and error terms. Instead of assuming all of the error terms follow the same prior in DPM, our WDPM provides multiple candidate priors for each observation to select with certain probability. Such probability (or weight) is modeled by relevant predictive covariates using Gaussian kernel. We propose several different WDPMs using different weights which depend on distance in covariates. We provide the efficient Markov chain Monte Carlo (MCMC) algorithms and also compare our WDPMs to parametric model and DPM model in terms of Bayes factor using simulation and empirical study.
In the second topic, we propose an innovative way to construct weight function for WDPM and apply it to SV model. SV model is adopted in time series data where the constant variance assumption is violated. One essential issue is to specify distribution of conditional return. We assume WDPM prior for conditional return and propose a new way to model the weights. Our approach has several advantages including computational efficiency compared to the weight constructed using Gaussian kernel. We list six properties of this proposed weight function and also provide the proof of them. Because of the additional Metropolis-Hastings steps introduced by WDPM prior, we find the conditions which can ensure the uniform geometric ergodicity of transition kernel in our MCMC. Due to the existence of zero values in asset price data, our SV model is semiparametric since we employ WDPM prior for non-zero values and parametric prior for zero values.
On the third project, we develop WDPM approach for GARCH type model and compare different types of weight functions including the innovative method proposed in the second topic. GARCH model can be viewed as an alternative way of SV for analyzing daily stock prices data where constant variance assumption does not hold. While the response variable of our SV models is transformed log return (based on log-square transformation), GARCH directly models the log return itself. This means that, theoretically speaking, we are able to predict stock returns using GARCH models while this is not feasible if we use SV model. Because SV models ignore the sign of log returns and provides predictive densities for squared log return only. Motivated by this property, we propose a new model evaluation approach called back testing return (BTR) particularly for GARCH. This BTR approach produces model evaluation results which are easier to interpret than marginal likelihood and it is straightforward to draw conclusion about model profitability by applying this approach. Since BTR approach is only applicable to GARCH, we also illustrate how to properly cal- culate marginal likelihood to make comparison between GARCH and SV. Based on our MCMC algorithms and model evaluation approaches, we have conducted large number of model fittings to compare models in both simulation and empirical study. / Ph. D.
|
86 |
Approche stochastique de l'analyse du « residual moveout » pour la quantification de l'incertitude dans l'imagerie sismique / A stochastic approach to uncertainty quantification in residual moveout analysisTamatoro, Johng-Ay 09 April 2014 (has links)
Le principale objectif de l'imagerie sismique pétrolière telle qu'elle est réalisée de nos jours est de fournir une image représentative des quelques premiers kilomètres du sous-sol. Cette image permettra la localisation des structures géologiques formant les réservoirs où sont piégées les ressources en hydrocarbures. Pour pouvoir caractériser ces réservoirs et permettre la production des hydrocarbures, le géophysicien utilise la migration-profondeur qui est un outil d'imagerie sismique qui sert à convertir des données-temps enregistrées lors des campagnes d'acquisition sismique en des images-profondeur qui seront exploitées par l'ingénieur-réservoir avec l'aide de l'interprète sismique et du géologue. Lors de la migration profondeur, les évènements sismiques (réflecteurs,…) sont replacés à leurs positions spatiales correctes. Une migration-profondeur pertinente requiert une évaluation précise modèle de vitesse. La précision du modèle de vitesse utilisé pour une migration est jugée au travers l'alignement horizontal des évènements présents sur les Common Image Gather (CIG). Les évènements non horizontaux (Residual Move Out) présents sur les CIG sont dus au ratio du modèle de vitesse de migration par la vitesse effective du milieu. L'analyse du Residual Move Out (RMO) a pour but d'évaluer ce ratio pour juger de la pertinence du modèle de vitesse et permettre sa mise à jour. Les CIG qui servent de données pour l'analyse du RMO sont solutions de problèmes inverses mal posés, et sont corrompues par du bruit. Une analyse de l'incertitude s'avère nécessaire pour améliorer l'évaluation des résultats obtenus. Le manque d'outils d'analyse de l'incertitude dans l'analyse du RMO en fait sa faiblesse. L'analyse et la quantification de l'incertitude pourrait aider à la prise de décisions qui auront des impacts socio-économiques importantes. Ce travail de thèse a pour but de contribuer à l'analyse et à la quantification de l'incertitude dans l'analyse des paramètres calculés pendant le traitement des données sismiques et particulièrement dans l'analyse du RMO. Pour atteindre ces objectifs plusieurs étapes ont été nécessaires. Elles sont entre autres :- L’appropriation des différents concepts géophysiques nécessaires à la compréhension du problème (organisation des données de sismique réflexion, outils mathématiques et méthodologiques utilisés);- Présentations des méthodes et outils pour l'analyse classique du RMO;- Interprétation statistique de l’analyse classique;- Proposition d’une approche stochastique;Cette approche stochastique consiste en un modèle statistique hiérarchique dont les paramètres sont :- la variance traduisant le niveau de bruit dans les données estimée par une méthode basée sur les ondelettes, - une fonction qui traduit la cohérence des amplitudes le long des évènements estimée par des méthodes de lissages de données,- le ratio qui est considéré comme une variable aléatoire et non comme un paramètre fixe inconnue comme c'est le cas dans l'approche classique de l'analyse du RMO. Il est estimé par des méthodes de simulations de Monte Carlo par Chaîne de Markov.L'approche proposée dans cette thèse permet d'obtenir autant de cartes de valeurs du paramètre qu'on le désire par le biais des quantiles. La méthodologie proposée est validée par l'application à des données synthétiques et à des données réelles. Une étude de sensibilité de l'estimation du paramètre a été réalisée. L'utilisation de l'incertitude de ce paramètre pour quantifier l'incertitude des positions spatiales des réflecteurs est présentée dans ce travail de thèse. / The main goal of the seismic imaging for oil exploration and production as it is done nowadays is to provide an image of the first kilometers of the subsurface to allow the localization and an accurate estimation of hydrocarbon resources. The reservoirs where these hydrocarbons are trapped are structures which have a more or less complex geology. To characterize these reservoirs and allow the production of hydrocarbons, the geophysicist uses the depth migration which is a seismic imaging tool which serves to convert time data recorded during seismic surveys into depth images which will be exploited by the reservoir engineer with the help of the seismic interpreter and the geologist. During the depth migration, seismic events (reflectors, diffractions, faults …) are moved to their correct locations in space. Relevant depth migration requires an accurate knowledge of vertical and horizontal seismic velocity variations (velocity model). Usually the so-called Common-Image-Gathers (CIGs) serve as a tool to verify correctness of the velocity model. Often the CIGs are computed in the surface offset (distance between shot point and receiver) domain and their flatness serve as criteria of the velocity model correctness. Residual moveout (RMO) of the events on CIGs due to the ratio of migration velocity model and effective velocity model indicates incorrectness of the velocity model and is used for the velocity model updating. The post-stacked images forming the CIGs which are used as data for the RMO analysis are the results of an inverse problem and are corrupt by noises. An uncertainty analysis is necessary to improve evaluation of the results. Dealing with the uncertainty is a major issue, which supposes to help in decisions that have important social and commercial implications. The goal of this thesis is to contribute to the uncertainty analysis and its quantification in the analysis of various parameters computed during the seismic processing and particularly in RMO analysis. To reach these goals several stages were necessary. We began by appropriating the various geophysical concepts necessary for the understanding of:- the organization of the seismic data ;- the various processing ;- the various mathematical and methodological tools which are used (chapters 2 and 3). In the chapter 4, we present different tools used for the conventional RMO analysis. In the fifth one, we give a statistical interpretation of the conventional RMO analysis and we propose a stochastic approach of this analysis. This approach consists in hierarchical statistical model where the parameters are: - the variance which express the noise level in the data ;- a functional parameter which express coherency of the amplitudes along events ; - the ratio which is assume to be a random variable and not an unknown fixed parameter as it is the case in conventional approach. The adjustment of data to the model done by using smoothing methods of data, combined with the using of the wavelets for the estimation of allow to compute the posterior distribution of given the data by the empirical Bayes methods. An estimation of the parameter is obtained by using Markov Chain Monte Carlo simulations of its posterior distribution. The various quantiles of these simulations provide different estimations of . The proposed methodology is validated in the sixth chapter by its application on synthetic data and real data. A sensitivity analysis of the estimation of the parameter was done. The using of the uncertainty of this parameter to quantify the uncertainty of the spatial positions of reflectors is presented in this thesis.
|
87 |
Numerical Methods for Bayesian Inference in Hilbert Spaces / Numerische Methoden für Bayessche Inferenz in HilberträumenSprungk, Björn 15 February 2018 (has links) (PDF)
Bayesian inference occurs when prior knowledge about uncertain parameters in mathematical models is merged with new observational data related to the model outcome. In this thesis we focus on models given by partial differential equations where the uncertain parameters are coefficient functions belonging to infinite dimensional function spaces. The result of the Bayesian inference is then a well-defined posterior probability measure on a function space describing the updated knowledge about the uncertain coefficient.
For decision making and post-processing it is often required to sample or integrate wit resprect to the posterior measure. This calls for sampling or numerical methods which are suitable for infinite dimensional spaces. In this work we focus on Kalman filter techniques based on ensembles or polynomial chaos expansions as well as Markov chain Monte Carlo methods.
We analyze the Kalman filters by proving convergence and discussing their applicability in the context of Bayesian inference. Moreover, we develop and study an improved dimension-independent Metropolis-Hastings algorithm. Here, we show geometric ergodicity of the new method by a spectral gap approach using a novel comparison result for spectral gaps. Besides that, we observe and further analyze the robustness of the proposed algorithm with respect to decreasing observational noise. This robustness is another desirable property of numerical methods for Bayesian inference.
The work concludes with the application of the discussed methods to a real-world groundwater flow problem illustrating, in particular, the Bayesian approach for uncertainty quantification in practice. / Bayessche Inferenz besteht daraus, vorhandenes a-priori Wissen über unsichere Parameter in mathematischen Modellen mit neuen Beobachtungen messbarer Modellgrößen zusammenzuführen. In dieser Dissertation beschäftigen wir uns mit Modellen, die durch partielle Differentialgleichungen beschrieben sind. Die unbekannten Parameter sind dabei Koeffizientenfunktionen, die aus einem unendlich dimensionalen Funktionenraum kommen. Das Resultat der Bayesschen Inferenz ist dann eine wohldefinierte a-posteriori Wahrscheinlichkeitsverteilung auf diesem Funktionenraum, welche das aktualisierte Wissen über den unsicheren Koeffizienten beschreibt.
Für Entscheidungsverfahren oder Postprocessing ist es oft notwendig die a-posteriori Verteilung zu simulieren oder bzgl. dieser zu integrieren. Dies verlangt nach numerischen Verfahren, welche sich zur Simulation in unendlich dimensionalen Räumen eignen. In dieser Arbeit betrachten wir Kalmanfiltertechniken, die auf Ensembles oder polynomiellen Chaosentwicklungen basieren, sowie Markowketten-Monte-Carlo-Methoden.
Wir analysieren die erwähnte Kalmanfilter, indem wir deren Konvergenz zeigen und ihre Anwendbarkeit im Kontext Bayesscher Inferenz diskutieren. Weiterhin entwickeln und studieren wir einen verbesserten dimensionsunabhängigen Metropolis-Hastings-Algorithmus. Hierbei weisen wir geometrische Ergodizität mit Hilfe eines neuen Resultates zum Vergleich der Spektrallücken von Markowketten nach. Zusätzlich beobachten und analysieren wir die Robustheit der neuen Methode bzgl. eines fallenden Beobachtungsfehlers. Diese Robustheit ist eine weitere wünschenswerte Eigenschaft numerischer Methoden für Bayessche Inferenz.
Den Abschluss der Arbeit bildet die Anwendung der diskutierten Methoden auf ein reales Grundwasserproblem, was insbesondere den Bayesschen Zugang zur Unsicherheitsquantifizierung in der Praxis illustriert.
|
88 |
Numerical Computations for Backward Doubly Stochastic Differential Equations and Nonlinear Stochastic PDEs / Calculs numériques des équations différentielles doublement stochastiques rétrogrades et EDP stochastiques non-linéairesBachouch, Achref 01 October 2014 (has links)
L’objectif de cette thèse est l’étude d’un schéma numérique pour l’approximation des solutions d’équations différentielles doublement stochastiques rétrogrades (EDDSR). Durant les deux dernières décennies, plusieurs méthodes ont été proposées afin de permettre la résolution numérique des équations différentielles stochastiques rétrogrades standards. Dans cette thèse, on propose une extension de l’une de ces méthodes au cas doublement stochastique. Notre méthode numérique nous permet d’attaquer une large gamme d’équations aux dérivées partielles stochastiques (EDPS) nonlinéaires. Ceci est possible par le biais de leur représentation probabiliste en termes d’EDDSRs. Dans la dernière partie, nous étudions une nouvelle méthode des particules dans le cadre des études de protection en neutroniques. / The purpose of this thesis is to study a numerical method for backward doubly stochastic differential equations (BDSDEs in short). In the last two decades, several methods were proposed to approximate solutions of standard backward stochastic differential equations. In this thesis, we propose an extension of one of these methods to the doubly stochastic framework. Our numerical method allows us to tackle a large class of nonlinear stochastic partial differential equations (SPDEs in short), thanks to their probabilistic interpretation. In the last part, we study a new particle method in the context of shielding studies.
|
89 |
Numerical Methods for Bayesian Inference in Hilbert SpacesSprungk, Björn 15 February 2018 (has links)
Bayesian inference occurs when prior knowledge about uncertain parameters in mathematical models is merged with new observational data related to the model outcome. In this thesis we focus on models given by partial differential equations where the uncertain parameters are coefficient functions belonging to infinite dimensional function spaces. The result of the Bayesian inference is then a well-defined posterior probability measure on a function space describing the updated knowledge about the uncertain coefficient.
For decision making and post-processing it is often required to sample or integrate wit resprect to the posterior measure. This calls for sampling or numerical methods which are suitable for infinite dimensional spaces. In this work we focus on Kalman filter techniques based on ensembles or polynomial chaos expansions as well as Markov chain Monte Carlo methods.
We analyze the Kalman filters by proving convergence and discussing their applicability in the context of Bayesian inference. Moreover, we develop and study an improved dimension-independent Metropolis-Hastings algorithm. Here, we show geometric ergodicity of the new method by a spectral gap approach using a novel comparison result for spectral gaps. Besides that, we observe and further analyze the robustness of the proposed algorithm with respect to decreasing observational noise. This robustness is another desirable property of numerical methods for Bayesian inference.
The work concludes with the application of the discussed methods to a real-world groundwater flow problem illustrating, in particular, the Bayesian approach for uncertainty quantification in practice. / Bayessche Inferenz besteht daraus, vorhandenes a-priori Wissen über unsichere Parameter in mathematischen Modellen mit neuen Beobachtungen messbarer Modellgrößen zusammenzuführen. In dieser Dissertation beschäftigen wir uns mit Modellen, die durch partielle Differentialgleichungen beschrieben sind. Die unbekannten Parameter sind dabei Koeffizientenfunktionen, die aus einem unendlich dimensionalen Funktionenraum kommen. Das Resultat der Bayesschen Inferenz ist dann eine wohldefinierte a-posteriori Wahrscheinlichkeitsverteilung auf diesem Funktionenraum, welche das aktualisierte Wissen über den unsicheren Koeffizienten beschreibt.
Für Entscheidungsverfahren oder Postprocessing ist es oft notwendig die a-posteriori Verteilung zu simulieren oder bzgl. dieser zu integrieren. Dies verlangt nach numerischen Verfahren, welche sich zur Simulation in unendlich dimensionalen Räumen eignen. In dieser Arbeit betrachten wir Kalmanfiltertechniken, die auf Ensembles oder polynomiellen Chaosentwicklungen basieren, sowie Markowketten-Monte-Carlo-Methoden.
Wir analysieren die erwähnte Kalmanfilter, indem wir deren Konvergenz zeigen und ihre Anwendbarkeit im Kontext Bayesscher Inferenz diskutieren. Weiterhin entwickeln und studieren wir einen verbesserten dimensionsunabhängigen Metropolis-Hastings-Algorithmus. Hierbei weisen wir geometrische Ergodizität mit Hilfe eines neuen Resultates zum Vergleich der Spektrallücken von Markowketten nach. Zusätzlich beobachten und analysieren wir die Robustheit der neuen Methode bzgl. eines fallenden Beobachtungsfehlers. Diese Robustheit ist eine weitere wünschenswerte Eigenschaft numerischer Methoden für Bayessche Inferenz.
Den Abschluss der Arbeit bildet die Anwendung der diskutierten Methoden auf ein reales Grundwasserproblem, was insbesondere den Bayesschen Zugang zur Unsicherheitsquantifizierung in der Praxis illustriert.
|
90 |
Optimization and Bayesian Modeling of Road Distance for Inventory of Potholes in Gävle Municipality / Optimering och bayesiansk modellering av bilvägsavstånd för inventering av potthål i Gävle kommunLindblom, Timothy Rafael, Tollin, Oskar January 2022 (has links)
Time management and distance evaluation have long been a difficult task for workers and companies. This thesis studies 6712 pothole coordinates in Gävle municipality, and evaluates the minimal total road distance needed to visit each pothole once, and return to an initial pothole. Road distance is approximated using the flight distance and a simple random sample of 113 road distances from Google Maps. Thereafter, the data from the sample along with a Bayesian approach is used to find a distribution of the ratio between road distance and flight distance. Lastly, a solution to the shortest distance is devised using the Nearest Neighbor algorithm (NNA) and Simulated Annealing (SA). Computational work is performed with Markov Chain Monte Carlo (MCMC). The results provide a minimal road distance of 717 km. / Tidshantering och distansutvärdering är som regel en svår uppgift för arbetare och företag. Den här uppsatsen studerar 6712 potthål i Gävle kommun, och utvärderar den bilväg som på kortast sträcka besöker varje potthål och återgår till den ursprungliga startpunkten. Bilvägsavståndet mellan potthålen uppskattas med hjälp av flygavståndet, där ett obundet slumpmässigt urval av 113 bilvägsavstånd mellan potthålens koordinatpunkter dras. Bilvägsdistanser hittas med hjälp av Google Maps. Därefter används data från urvalet tillsammans med en bayesiansk modell för att hitta en fördelning för förhållandet mellan bilvägsavstånd och flygavstånd. Slutligen framförs en lösning på det kortaste bilvägsavståndet med hjälp av en Nearest Neighbour algoritm (NNA) samt Simulated Annealing (SA). Statistiskt beräkningsarbete utförs med Markov Chain Monte Carlo (MCMC). Resultaten ger en kortaste bilvägssträcka på 717 km.
|
Page generated in 0.0697 seconds