1 
Continuous reservoir simulation model updating and forecasting using a markov chain monte carlo methodLiu, Chang 15 May 2009 (has links)
Currently, effective reservoir management systems play a very important part in
exploiting reservoirs. Fully exploiting all the possible events for a petroleum reservoir is a
challenge because of the infinite combinations of reservoir parameters. There is much
unknown about the underlying reservoir model, which has many uncertain parameters.
MCMC (Markov Chain Monte Carlo) is a more statistically rigorous sampling method,
with a stronger theoretical base than other methods. The performance of the MCMC
method on a high dimensional problem is a timely topic in the statistics field.
This thesis suggests a way to quantify uncertainty for high dimensional problems by
using the MCMC sampling process under the Bayesian frame. Based on the improved
method, this thesis reports a new approach in the use of the continuous MCMC method
for automatic history matching. The assimilation of the data in a continuous process is
done sequentially rather than simultaneously. In addition, by doing a continuous process,
the MCMC method becomes more applicable for the industry. Long periods of time to
run just one realization will no longer be a big problem during the sampling process. In addition, newly observed data will be considered once it is available, leading to a better
estimate.
The PUNQS3 reservoir model is used to test two methods in this thesis. The methods are:
STATIC (traditional) SIMULATION PROCESS and CONTINUOUS SIMULATION
PROCESS. The continuous process provides continuously updated probabilistic forecasts
of well and reservoir performance, accessible at any time. It can be used to optimize
longterm reservoir performance at field scale.

2 
Uncertainty quantification using multiscale methods for porous media flowsDostert, Paul Francis 15 May 2009 (has links)
In this dissertation we discuss numerical methods used for uncertainty quantifi
cation applications to flow in porous media. We consider stochastic flow equations
that contain both a spatial and random component which must be resolved in our numerical
models. When solving the flow and transport through heterogeneous porous
media some type of upscaling or coarsening is needed due to scale disparity. We describe
multiscale techniques used for solving the spatial component of the stochastic
flow equations. These techniques allow us to simulate the flow and transport processes
on the coarse grid and thus reduce the computational cost. Additionally, we
discuss techniques to combine multiscale methods with stochastic solution techniques,
specifically, polynomial chaos methods and sparse grid collocation methods.
We apply the proposed methods to uncertainty quantification problems where the
goal is to sample porous media properties given an integrated response. We propose
several efficient sampling algorithms based on Langevin diffusion and the Markov
chain Monte Carlo method. Analysis and detailed numerical results are presented
for applications in multiscale immiscible flow and water infiltration into a porous
medium.

3 
An Approximate MCMC Method for Convex HullsWang, Pengfei 20 August 2019 (has links)
Markov chain Monte Carlo (MCMC) is an extremely popular class of algorithms for
computing summaries of posterior distributions. One problem for MCMC in the socalled Big Data regime is the growing computational cost of most MCMC algorithms. Most popular and basic MCMC algorithms, like MetropolisHastings algorithm (MH) and Gibbs algorithm, have to take the full data set into account in every iteration. In Big Data case, it is a fact that datasets of more than 100 GB are now fairly common. The running time of standard MCMC on such large datasets is prohibitively long.
To solve this problem, some papers develop algorithms that use only a subset of the data at each step to obtain an approximate or exact posterior distribution. Korattikara et al (2013) merely estimates the transition probabilities of a typical MH chain using a subset of the data at each step of the chain, with some controllable error. The FireFly Monte Carlo (FLYMC) algorithm, presented by Maclaurin and Adams, augments the original dataset and only explicitly evaluates an “active" subset in each step. They show that the marginal distribution of the FLYMC algorithm at stationarity in fact still equal to the posterior distribution of interest. However, Both of the above two papers and other literature in this thesis are restrained to a special kind of posteriors with "productform" likelihoods. Such posteriors require all data points are conditionally independent and under the same likelihood.
However, what problem we want to solve is targeting a uniform distribution on a convex hull. In this case, \productform" is not applicable. The reason why we focus on this problem is in statistics we sometimes face the problem to compute the volume of distributions which have a convex hull shape or their shape is able to transformed into a convex hull. It is impossible to compute via decomposing and reducing convex hulls of high dimension. According to Barany et al in 1987, the ratio of the estimated upper and lower bound of the volume of a certain convex hull is quite big. It is not possible to estimate the volume well, either. Fastmixing Markov chains are basically the only way to actually do volume computations.
The initial work in this thesis is to de ne a dataaugmentation algorithm along the lines of FLYMC. We also introduce an auxiliary random variable to mark subsets. However, as our situation is more complicated, we also have one more variable to help selecting subsets than FLYMC algorithm. For the extra variable, we utilize pseudomarginal algorithm (PMMH), which allows us to replace interest parameter's distribution conditional on augmented variable by an estimator. Although our algorithm is not a standard case because our estimator is biased, bounds of the individual approximating measure of the parameter of interest is able to be directly translated into bounds of the error in the stationary measure of the algorithm.
After fi nishing an implementable algorithm, we then use two tricks including Locality
Sensitive Hash function (LSH) and Taylor's expansion to improve the original algorithm. LSH helps raise the e ciency of proposing new samples of the augmented variable. Taylor's expansion is able to produce a more accurate estimator of the parameter of interest.
Our main theoretical result is a bound on the pointwise bias of our estimator, which
results in a bound on the total error of the chain's stationary measure. We prove the total error will converge under a certain condition. Our simulation results illustrate this, and we use a large collection of simulations to illustrate some tips on how to choose parameters and length of chains in real cases.

4 
Statistical methods for the analysis of DSMC simulations of hypersonic shocksStrand, James Stephen 25 June 2012 (has links)
In this work, statistical techniques were employed to study the modeling of a hypersonic
shock with the Direct Simulation Monte Carlo (DSMC) method, and to gain insight into how the
model interacts with a set of physical parameters.
Direct Simulation Monte Carlo (DSMC) is a particle based method which is useful for
simulating gas dynamics in rarefied and/or highly nonequilibrium flowfields. A DSMC code
was written and optimized for use in this research. The code was developed with shock tube
simulations in mind, and it includes a number of improvements which allow for the efficient
simulation of 1D, hypersonic shocks. Most importantly, a moving sampling region is used to
obtain an accurate steady shock profile from an unsteady, moving shock wave. The code is MPI
parallel and an adaptive load balancing scheme ensures that the workload is distributed properly
between processors over the course of a simulation.
Global, Monte Carlo based sensitivity analyses were performed in order to determine
which of the parameters examined in this work most strongly affect the simulation results for
two scenarios: a 0D relaxation from an initial high temperature state and a hypersonic shock.
The 0D relaxation scenario was included in order to examine whether, with appropriate initial
conditions, it can be viewed in some regards as a substitute for the 1D shock in a statistical
sensitivity analysis. In both analyses sensitivities were calculated based on both the square of the
Pearson correlation coefficient and the mutual information. The quantity of interest (QoI)
chosen for these analyses was the NO density profile. This vector QoI was broken into a set of
scalar QoIs, each representing the density of NO at a specific point in time (for the relaxation) or
a specific streamwise location (for the shock), and sensitivities were calculated for each scalar
QoI based on both measures of sensitivity. The sensitivities were then integrated over the set of
scalar QoIs to determine an overall sensitivity for each parameter. A weighting function was
used in the integration in order to emphasize sensitivities in the region of greatest thermal and
chemical nonequilibrium. The six parameters which most strongly affect the NO density profile
were found to be the same for both scenarios, which provides justification for the claim that a 0D
relaxation can in some situations be used as a substitute model for a hypersonic shock. These six
parameters are the preexponential constants in the Arrhenius rate equations for the N2
dissociation reaction N2 + N ⇄ 3N, the O2 dissociation reaction O2 + O ⇄ 3O, the NO
dissociation reactions NO + N ⇄ 2N + O and NO + O ⇄ N + 2O, and the exchange reactions
N2 + O ⇄ NO + N and NO + O ⇄ O2 + N.
After identification of the most sensitive parameters, a synthetic data calibration was
performed to demonstrate that the statistical inverse problem could be solved for the 0D
relaxation scenario. The calibration was performed using the QUESO code, developed at the
PECOS center at UT Austin, which employs the Delayed Rejection Adaptive Metropolis
(DRAM) algorithm. The six parameters identified by the sensitivity analysis were calibrated
successfully with respect to a group of synthetic datasets. / text

5 
Understanding approximate Bayesian computation(ABC)Lim, Boram 16 March 2015 (has links)
The Bayesian approach has been developed in various areas and has come to be part of main stream statistical research. Markov Chain Monte Carlo (MCMC) methods have freed us from computational constraints for a wide class of models and several MCMC methods are now available for sampling from posterior distributions. However, when data is large and models are complex and the likelihood function is intractable we are limited in the use of MCMC, especially in evaluating likelihood function. As a solution to the problem, researchers have put forward approximate Bayesian computation (ABC), also known as a likelihoodfree method. In this report I introduce the ABC algorithm and show implementation for a stochastic volatility model (SV). Even though there are alternative methods for analyzing SV models, such as particle filters and other MCMC methods, I show the ABC method with an SV model and compare it, based on the same data and the SV model, to an approach based on a mixture of normals and MCMC. / text

6 
Ergodicity of Adaptive MCMC and its ApplicationsYang, Chao 28 September 2009 (has links)
Markov chain Monte Carlo algorithms (MCMC) and Adaptive Markov chain Monte Carlo algorithms (AMCMC) are most important methods of approximately sampling from complicated probability distributions
and are widely used in statistics, computer science, chemistry, physics, etc. The core problem to use these algorithms is to build
up asymptotic theories for them.
In this thesis, we show the Central Limit Theorem (CLT) for the uniformly ergodic Markov chain using the regeneration method. We exploit the weakest uniform drift conditions to ensure the ergodicity and WLLN of AMCMC.
Further we answer the open problem 21 in Roberts and Rosenthal [48] through constructing a counter example and finding out some stronger condition which indicates the ergodic property of AMCMC.
We find that the conditions (a) and (b) in [46] are not sufficient for WLLN holds when the functional is unbounded. We also prove the WLLN for unbounded functions with some stronger conditions.
Finally we consider the practical aspects of adaptive MCMC (AMCMC). We try some toy examples to explain that the general adaptive random walk Metropolis is not efficient for sampling from multimodel
targets. Therefore we discuss the mixed regional adaptation (MRAPT) on the compact state space and the modified mixed regional
adaptation on the general state space in which the regional proposal distributions are optimal and the switches between different models are very efficient. The theoretical proof is to show that the algorithms proposed here fall within the scope of general theorems that are used to validate AMCMC. As an application of our
theoretical results, we analyze the real data about the ``Loss of Heterozygosity" (LOH) using MRAPT.

7 
Ergodicity of Adaptive MCMC and its ApplicationsYang, Chao 28 September 2009 (has links)
Markov chain Monte Carlo algorithms (MCMC) and Adaptive Markov chain Monte Carlo algorithms (AMCMC) are most important methods of approximately sampling from complicated probability distributions
and are widely used in statistics, computer science, chemistry, physics, etc. The core problem to use these algorithms is to build
up asymptotic theories for them.
In this thesis, we show the Central Limit Theorem (CLT) for the uniformly ergodic Markov chain using the regeneration method. We exploit the weakest uniform drift conditions to ensure the ergodicity and WLLN of AMCMC.
Further we answer the open problem 21 in Roberts and Rosenthal [48] through constructing a counter example and finding out some stronger condition which indicates the ergodic property of AMCMC.
We find that the conditions (a) and (b) in [46] are not sufficient for WLLN holds when the functional is unbounded. We also prove the WLLN for unbounded functions with some stronger conditions.
Finally we consider the practical aspects of adaptive MCMC (AMCMC). We try some toy examples to explain that the general adaptive random walk Metropolis is not efficient for sampling from multimodel
targets. Therefore we discuss the mixed regional adaptation (MRAPT) on the compact state space and the modified mixed regional
adaptation on the general state space in which the regional proposal distributions are optimal and the switches between different models are very efficient. The theoretical proof is to show that the algorithms proposed here fall within the scope of general theorems that are used to validate AMCMC. As an application of our
theoretical results, we analyze the real data about the ``Loss of Heterozygosity" (LOH) using MRAPT.

8 
Approaches For Inferring Past Population Size Changes From Genomewide Genetic Data.Theunert, Christoph 02 September 2014 (has links) (PDF)
The history of populations or species is of fundamental importance in a variety of areas. Gaining details about demographic, cultural, climatic or political aspects of the past may provide insights that improve the understanding of how populations have evolved over time and how they may evolve in future. Different types of resources can be informative about different periods of time.
One especially important resource is genetic data, either from a single individual or a group of organisms. Environmental conditions and circumstances can directly affect the existence and success of a group of individuals. Since genetic material gets passed on from generation to generation, traces of past events can still be detected in today\'s genetic data. For many decades scientists have tried to understand the principles of how external influences can directly affect the appearance and features of populations, leading to theoretical models that can interpret modern day genetic variation in the light of past events.
Among other influencing factors like migration, natural selection etc., population size changes can have a great impact on the genetic diversity of a group of organisms. For example, in the field of conservation biology, gaining insights into how the size of a population evolves may assist in detecting past or ongoing temporal reductions of population size. This seems crucial since the reduction in size also correlates with a reduction in genetic diversity which in turn might negatively affect the evolutionary potential of a population. Using computational and population genetics methods, sequences from whole genomes can be scanned for traces of such events and therefore assist in new interpretations of historical details of populations or groups of interest.
This thesis focuses on the detection and interpretation of past population size changes. Two approaches to infer particular parameters from underlying demographic models are described. The first part of this thesis introduces two summary statistics which were designed to detect fluctuations in size from genomewide Single Nucleotide Polymorphism (SNP) data. Demographic inferences from such data are inherently complicated due to recombination and ascertainment bias. Hence, two new statistics are introduced: allele frequencyidentity by descent (AFIBD) and allele frequencyidentity by state (AFIBS). Both make use of linkage disequilibrium information and exhibit defined relationships to the time of the underlying mathematical process. A fast and efficient Approximate Bayesian Computation framework based on AFIBD and AFIBS is constructed that can accurately estimate demographic parameters. These two statistics were tested for the biasing effects of hidden recombination events, ascertainment bias and phasing errors. The statistics were found to be robust to a variety of these tested biases. The inference approach was then applied to genomewide SNP data to infer the demographic histories of two human populations: (i) Yoruba from Africa and (ii) French from Europe. Results suggest, that AFIBD and AFIBS are able to capture sufficient amounts of information from underlying data sets in order to accurately infer parameters of interest, such as the beginning, end and strength of periods of varying size. Additionally the results from empirical data suggest a rather stable ancestral population size with a mild recent expansion for Yoruba, whereas the French apparently experienced a rather longlasting strong bottleneck followed by a drastic population growth.
The second part of this thesis introduces a new way of summarizing information from the site frequency spectrum. Commonly applied site frequency spectrum based inference methods make use of allele frequency information from individual segregating sites. Our newly developed method, the 2 point spectrum, summarizes allele frequency information from all possible pairs of segregating sites, thereby increasing the number of potentially informative values from the same underlying data set. These additional information are then incorporated into a Markov Chain Monte Carlo framework. This allows for a high degree of flexibility and implements an efficient method to infer population size trajectories over time. We tested the method on a variety of different simulated data sets from underlying demographic models. Furthermore, we compared the performance and accuracy of our method to already established methods like PSMC and diCal. Results indicate that this nonparametric 2 point spectrum method can accurately infer the extent and times of past population size changes and therefore correctly estimates the history of temporal size fluctuations. Furthermore, the initial results suggest that the amount of required data and the accuracy of the final results are comparable with other publicly available nonparametric methods. An easy to use command line program was implemented and will be made publicly available.
In summary, we introduced three highly sensitive summary statistics and proposed different approaches to infer parameters from demographic models of interest. Both methods provide powerful frameworks for accurate parameter inference from genomewide genetic data. They were tested for a variety of demographic models and provide highly accurate results. They may be used in the settings as described above or incorporated into already existing inference frameworks. Nevertheless, the statistics should prove useful for new insights into populations, especially those with complex demographic histories.

9 
Modeling the NCAA Tournament Through Bayesian Logistic RegressionNelson, Bryan 18 July 2012 (has links)
Many rating systems exist that order the Division I teams in Men's College Basketball that compete in the NCAA Tournament, such as seeding teams on an Scurve, and the Pomeroy and Sagarin ratings, simplifying the process of choosing winners to a comparison of two numbers. Rather than creating a rating system, we analyze each matchup by using the difference between the teams' individual regular season statistics as the independent variables. We use an MCMC approach and logistic regression along with several model selection techniques to arrive at models for predicting the winner of each game. When given the 63 actual games in the 2012 tournament, eight of our models performed as well as Pomeroy's rating system and four did as well as Sagarin's rating system when given the 63 actual games. Not allowing the models to fix their mistakes resulted in only one model outperforming both Pomeroy and Sagarin's systems. / McAnulty College and Graduate School of Liberal Arts / Computational Mathematics / MS / Thesis

10 
Vehicle Tracking in Occlusion and ClutterMcBride, Kurtis January 2007 (has links)
Vehicle tracking in environments containing occlusion and clutter is an active research area. The problem of tracking vehicles through such environments presents a variety of challenges. These challenges include vehicle track initialization, tracking an
unknown number of targets and the variations in realworld lighting, scene conditions and camera vantage. Scene clutter and target
occlusion present additional challenges. A stochastic framework is proposed which allows for vehicles tracks to be identified from a sequence of images. The work focuses on the identification of vehicle tracks present in transportation scenes, namely, vehicle movements at intersections. The framework combines background subtraction and motion history based approaches to deal with the segmentation problem. The tracking problem is solved using a Monte Carlo Markov Chain Data Association (MCMCDA) method. The method includes a novel concept of including the notion of discrete, independent regions in the MCMC scoring function. Results are
presented which show that the framework is capable of tracking vehicles in scenes containing multiple vehicles that occlude one
another, and that are occluded by foreground scene objects.

Page generated in 0.0342 seconds