Global ETD Search

11	Applying MCMC methods to multi-level models Browne, William J. January 1998 (has links) No description available. 519.5 Markov chain Monte Carlo methods
12	A Multi-GPU Compute Solution for Optimized Genomic Selection Analysis Devore, Trevor 01 June 2014 (has links) (PDF) Many modern-day Bioinformatics algorithms rely heavily on statistical models to analyze their biological data. Some of these statistical models lend themselves nicely to standard high performance computing optimizations such as parallelism, while others do not. One such algorithm is Markov Chain Monte Carlo (MCMC). In this thesis, we present a heterogeneous compute solution for optimizing GenSel, a genetic selection analysis tool. GenSel utilizes a MCMC algorithm to perform Bayesian inference using Gibbs sampling. Optimizing an MCMC algorithm is a difficult problem because it is inherently sequential, containing a loop carried dependence between each Markov Chain iteration. The optimization presented in this thesis utilizes GPU computing to exploit the data-level parallelism within each of these iterations. In addition, it allows for the efficient management of memory, the pipelining of CUDA kernels, and the use of multiple GPUs. The optimizations presented show performance improvements of up to 1.84 times that of the original algorithm. Bioinformatics Multi-GPU Markov Chain Monte Carlo
13	On Stochastic Volatility Models as an Alternative to GARCH Type Models Nilsson, Oscar January 2016 (has links) For the purpose of modelling and prediction of volatility, the family of Stochastic Volatility (SV) models is an alternative to the extensively used ARCH type models. SV models differ in their assumption that volatility itself follows a latent stochastic process. This reformulation of the volatility process makes however model estimation distinctly more complicated for the SV type models, which in this paper is conducted through Markov Chain Monte Carlo methods. The aim of this paper is to assess the standard SV model and the SV model assuming t-distributed errors and compare the results with their corresponding GARCH(1,1) counterpart. The data examined cover daily closing prices of the Swedish stock index OMXS30 for the period 2010-01-05 to 2016- 03-02. The evaluation show that both SV models outperform the two GARCH(1,1) models, where the SV model with assumed t-distributed error distribution give the smallest forecast errors. Stochastic Volatility Heavy tails GARCH Markov Chain Monte Carlo
14	A statistical model for locating regulatory regions in novel DNA sequences Byng, Martyn Charles January 2001 (has links) No description available. 519.5
15	Methods for Bayesian inversion of seismic data Walker, Matthew James January 2015 (has links) The purpose of Bayesian seismic inversion is to combine information derived from seismic data and prior geological knowledge to determine a posterior probability distribution over parameters describing the elastic and geological properties of the subsurface. Typically the subsurface is modelled by a cellular grid model containing thousands or millions of cells within which these parameters are to be determined. Thus such inversions are computationally expensive due to the size of the parameter space (being proportional to the number of grid cells) over which the posterior is to be determined. Therefore, in practice approximations to Bayesian seismic inversion must be considered. A particular, existing approximate workflow is described in this thesis: the so-called two-stage inversion method explicitly splits the inversion problem into elastic and geological inversion stages. These two stages sequentially estimate the elastic parameters given the seismic data, and then the geological parameters given the elastic parameter estimates, respectively. In this thesis a number of methodologies are developed which enhance the accuracy of this approximate workflow. To reduce computational cost, existing elastic inversion methods often incorporate only simplified prior information about the elastic parameters. Thus a method is introduced which transforms such results, obtained using prior information specified using only two-point geostatistics, into new estimates containing sophisticated multi-point geostatistical prior information. The method uses a so-called deep neural network, trained using only synthetic instances (or `examples') of these two estimates, to apply this transformation. The method is shown to improve the resolution and accuracy (by comparison to well measurements) of elastic parameter estimates determined for a real hydrocarbon reservoir. It has been shown previously that so-called mixture density network (MDN) inversion can be used to solve geological inversion analytically (and thus very rapidly and efficiently) but only under certain assumptions about the geological prior distribution. A so-called prior replacement operation is developed here, which can be used to relax these requirements. It permits the efficient MDN method to be incorporated into general stochastic geological inversion methods which are free from the restrictive assumptions. Such methods rely on the use of Markov-chain Monte-Carlo (MCMC) sampling, which estimate the posterior (over the geological parameters) by producing a correlated chain of samples from it. It is shown that this approach can yield biased estimates of the posterior. Thus an alternative method which obtains a set of non-correlated samples from the posterior is developed, avoiding the possibility of bias in the estimate. The new method was tested on a synthetic geological inversion problem; its results compared favourably to those of Gibbs sampling (a MCMC method) on the same problem, which exhibited very significant bias. The geological prior information used in seismic inversion can be derived from real images which bear similarity to the geology anticipated within the target region of the subsurface. Such so-called training images are not always available from which this information (in the form of geostatistics) may be extracted. In this case appropriate training images may be generated by geological experts. However, this process can be costly and difficult. Thus an elicitation method (based on a genetic algorithm) is developed here which obtains the appropriate geostatistics reliably and directly from a geological expert, without the need for training images. 12 experts were asked to use the algorithm (individually) to determine the appropriate geostatistics for a physical (target) geological image. The majority of the experts were able to obtain a set of geostatistics which were consistent with the true (measured) statistics of the target image. 551.22
16	A Bayesian approach to phylogenetic networks Radice, Rosalba January 2011 (has links) Traditional phylogenetic inference assumes that the history of a set of taxa can be explained by a tree. This assumption is often violated as some biological entities can exchange genetic material giving rise to non-treelike events often called reticulations. Failure to consider these events might result in incorrectly inferred phylogenies, and further consequences, for example stagnant and less targeted drug development. Phylogenetic networks provide a flexible tool which allow us to model the evolutionary history of a set of organisms in the presence of reticulation events. In recent years, a number of methods addressing phylogenetic network reconstruction and evaluation have been introduced. One of suchmethods has been proposed byMoret et al. (2004). They defined a phylogenetic network as a directed acyclic graph obtained by positing a set of edges between pairs of the branches of an underlying tree to model reticulation events. Recently, two works by Jin et al. (2006), and Snir and Tuller (2009), respectively, using this definition of phylogenetic network, have appeared. Both works demonstrate the potential of using maximum likelihood estimation for phylogenetic network reconstruction. We propose a Bayesian approach to the estimation of phylogenetic network parameters. We allow for different phylogenies to be inferred at different parts of our DNA alignment in the presence of reticulation events, at the species level, by using the idea that a phylogenetic network can be naturally decomposed into trees. A Markov chainMonte Carlo algorithmis provided for posterior computation of the phylogenetic network parameters. Also a more general algorithm is proposed which allows the data to dictate how many phylogenies are required to explain the data. This can be achieved by using stochastic search variable selection. Both algorithms are tested on simulated data and also demonstrated on the ribosomal protein gene rps11 data from five flowering plants. The proposed approach can be applied to a wide variety of problems which aim at exploring the possibility of reticulation events in the history of a set of taxa. 518
17	A Bayesian Analysis of a Multiple Choice Test Luo, Zhisui 24 April 2013 (has links) In a multiple choice test, examinees gain points based on how many correct responses they got. However, in this traditional grading, it is assumed that questions in the test are replications of each other. We apply an item response theory model to estimate students' abilities characterized by item's feature in a midterm test. Our Bayesian logistic Item response theory model studies the relation between the probability of getting a correct response and the three parameters. One parameter measures the student's ability and the other two measure an item's difficulty and its discriminatory feature. In this model the ability and the discrimination parameters are not identifiable. To address this issue, we construct a hierarchical Bayesian model to nullify the effects of non-identifiability. A Gibbs sampler is used to make inference and to obtain posterior distributions of the three parameters. For a "nonparametric" approach, we implement the item response theory model using a Dirichlet process mixture model. This new approach enables us to grade and cluster students based on their "ability" automatically. Although Dirichlet process mixture model has very good clustering property, it suffers from expensive and complicated computations. A slice sampling algorithm has been proposed to accommodate this issue. We apply our methodology to a real dataset obtained on a multiple choice test from WPIâ€™s Applied Statistics I (Spring 2012) that illustrates how a student's ability relates to the observed scores. Dirichlet Process Mixture Markov Chain Monte Carlo Item Response
18	Deterioration model for ports in the Republic of Korea using Markov chain Monte Carlo with multiple imputation Jeon, Juncheol January 2019 (has links) Condition of infrastructure is deteriorated over time as it gets older. It is the deterioration model that predicts how and when facilities will deteriorate over time. In most infrastructure management system, the deterioration model is a crucial element. Using the deterioration model, it is very helpful to estimate when repair will be carried out, how much will be needed for the maintenance of the entire facilities, and what maintenance costs will be required during the life cycle of the facility. However, the study of deterioration model for civil infrastructures of ports is still in its infancy. In particular, there is almost no related research in South Korea. Thus, this study aims to develop a deterioration model for civil infrastructure of ports in South Korea. There are various methods such as Deterministic, Stochastic, and Artificial Intelligence to develop deterioration model. In this research, Markov model using Markov chain theory, one of the Stochastic methods, is used to develop deterioration model for ports in South Korea. Markov chain is a probabilistic process among states. i.e., in Markov chain, transition among states follows some probability which is called as the transition probability. The key process of developing Markov model is to find this transition probability. This process is called calibration. In this study, the existing methods, Optimization method and Markov Chain Monte Carlo (MCMC), are reviewed, and methods to improve for these are presented. In addition, in this study, only a small amount of data are used, which causes distortion of the model. Thus, supplement techniques are presented to overcome the small size of data. In order to address the problem of the existing methods and the lack of data, the deterioration model developed by the four calibration methods: Optimization, Optimization with Bootstrap, MCMC (Markov Chain Monte Carlo), and MCMC with Multiple imputation, are finally proposed in this study. In addition, comparison between four models are carried out and good performance model is proposed. This research provides deterioration model for port in South Korea, and more accurate calibration technique is suggested. Furthermore, the method of supplementing insufficient data has been combined with existing calibration techniques.
19	Bayesian Models for Repeated Measures Data Using Markov Chain Monte Carlo Methods Li, Yuanzhi 01 May 2016 (has links) Bayesian models for repeated measures data are fitted to three different data an analysis projects. Markov Chain Monte Carlo (MCMC) methodology is applied to each case with Gibbs sampling and / or an adaptive Metropolis-Hastings (MH ) algorithm used to simulate the posterior distribution of parameters. We implement a Bayesian model with different variance-covariance structures to an audit fee data set. Block structures and linear models for variances are used to examine the linear trend and different behaviors before and after regulatory change during year 2004-2005. We proposed a Bayesian hierarchical model with latent teacher effects, to determine whether teacher professional development (PD) utilizing cyber-enabled resources lead to meaningful student learning outcomes measured by 8th grade student end-of-year scores (CRT scores) for students with teachers who underwent PD. Bayesian variable selection methods are applied to select teacher learning instrument variables to predict teacher effects. We fit a Bayesian two-part model with the first-part a multivariate probit model and the second-p art a log-normal regression to a repeated measures health care data set to analyze the relationship between Body Mass Index (BMI) and health care expenditures and the correlation between the probability of expenditures and dollar amount spent given expenditures. Models were fitted to a training set and predictions were made on both the training set and the test set. Bayesian Models Markov Chain Monte Carlo Mathematics Statistics and Probability
20	Bayesian Inference for Stochastic Volatility Models Men, Zhongxian January 1012 (has links) Stochastic volatility (SV) models provide a natural framework for a representation of time series for financial asset returns. As a result, they have become increasingly popular in the finance literature, although they have also been applied in other fields such as signal processing, telecommunications, engineering, biology, and other areas. In working with the SV models, an important issue arises as how to estimate their parameters efficiently and to assess how well they fit real data. In the literature, commonly used estimation methods for the SV models include general methods of moments, simulated maximum likelihood methods, quasi Maximum likelihood method, and Markov Chain Monte Carlo (MCMC) methods. Among these approaches, MCMC methods are most flexible in dealing with complicated structure of the models. However, due to the difficulty in the selection of the proposal distribution for Metropolis-Hastings methods, in general they are not easy to implement and in some cases we may also encounter convergence problems in the implementation stage. In the light of these concerns, we propose in this thesis new estimation methods for univariate and multivariate SV models. In the simulation of latent states of the heavy-tailed SV models, we recommend the slice sampler algorithm as the main tool to sample the proposal distribution when the Metropolis-Hastings method is applied. For the SV models without heavy tails, a simple Metropolis-Hastings method is developed for simulating the latent states. Since the slice sampler can adapt to the analytical structure of the underlying density, it is more efficient. A sample point can be obtained from the target distribution with a few iterations of the sampler, whereas in the original Metropolis-Hastings method many sampled values often need to be discarded. In the analysis of multivariate time series, multivariate SV models with more general specifications have been proposed to capture the correlations between the innovations of the asset returns and those of the latent volatility processes. Due to some restrictions on the variance-covariance matrix of the innovation vectors, the estimation of the multivariate SV (MSV) model is challenging. To tackle this issue, for a very general setting of a MSV model we propose a straightforward MCMC method in which a Metropolis-Hastings method is employed to sample the constrained variance-covariance matrix, where the proposal distribution is an inverse Wishart distribution. Again, the log volatilities of the asset returns can then be simulated via a single-move slice sampler. Recently, factor SV models have been proposed to extract hidden market changes. Geweke and Zhou (1996) propose a factor SV model based on factor analysis to measure pricing errors in the context of the arbitrage pricing theory by letting the factors follow the univariate standard normal distribution. Some modification of this model have been proposed, among others, by Pitt and Shephard (1999a) and Jacquier et al. (1999). The main feature of the factor SV models is that the factors follow a univariate SV process, where the loading matrix is a lower triangular matrix with unit entries on the main diagonal. Although the factor SV models have been successful in practice, it has been recognized that the order of the component may affect the sample likelihood and the selection of the factors. Therefore, in applications, the component order has to be considered carefully. For instance, the factor SV model should be fitted to several permutated data to check whether the ordering affects the estimation results. In the thesis, a new factor SV model is proposed. Instead of setting the loading matrix to be lower triangular, we set it to be column-orthogonal and assume that each column has unit length. Our method removes the permutation problem, since when the order is changed then the model does not need to be refitted. Since a strong assumption is imposed on the loading matrix, the estimation seems even harder than the previous factor models. For example, we have to sample columns of the loading matrix while keeping them to be orthonormal. To tackle this issue, we use the Metropolis-Hastings method to sample the loading matrix one column at a time, while the orthonormality between the columns is maintained using the technique proposed by Hoff (2007). A von Mises-Fisher distribution is sampled and the generated vector is accepted through the Metropolis-Hastings algorithm. Simulation studies and applications to real data are conducted to examine our inference methods and test the fit of our model. Empirical evidence illustrates that our slice sampler within MCMC methods works well in terms of parameter estimation and volatility forecast. Examples using financial asset return data are provided to demonstrate that the proposed factor SV model is able to characterize the hidden market factors that mainly govern the financial time series. The Kolmogorov-Smirnov tests conducted on the estimated models indicate that the models do a reasonable job in terms of describing real data. Bayesian inference Stochastic volatility Markov Chain Monte Carlo Statistics

Search results