Global ETD Search

1	Uncertainty Analysis in Upscaling Well Log data By Markov Chain Monte Carlo Method Hwang, Kyubum 16 January 2010 (has links) More difficulties are now expected in exploring economically valuable reservoirs because most reservoirs have been already developed since beginning seismic exploration of the subsurface. In order to efficiently analyze heterogeneous fine-scale properties in subsurface layers, one ongoing challenge is accurately upscaling fine-scale (high frequency) logging measurements to coarse-scale data, such as surface seismic images. In addition, numerically efficient modeling cannot use models defined on the scale of log data. At this point, we need an upscaling method replaces the small scale data with simple large scale models. However, numerous unavoidable uncertainties still exist in the upscaling process, and these problems have been an important emphasis in geophysics for years. Regarding upscaling problems, there are predictable or unpredictable uncertainties in upscaling processes; such as, an averaging method, an upscaling algorithm, analysis of results, and so forth. To minimize the uncertainties, a Bayesian framework could be a useful tool for providing the posterior information to give a better estimate for a chosen model with a conditional probability. In addition, the likelihood of a Bayesian framework plays an important role in quantifying misfits between the measured data and the calculated parameters. Therefore, Bayesian methodology can provide a good solution for quantification of uncertainties in upscaling. When analyzing many uncertainties in porosities, wave velocities, densities, and thicknesses of rocks through upscaling well log data, the Markov Chain Monte Carlo (MCMC) method is a potentially beneficial tool that uses randomly generated parameters with a Bayesian framework producing the posterior information. In addition, the method provides reliable model parameters to estimate economic values of hydrocarbon reservoirs, even though log data include numerous unknown factors due to geological heterogeneity. In this thesis, fine layered well log data from the North Sea were selected with a depth range of 1600m to 1740m for upscaling using an MCMC implementation. The results allow us to automatically identify important depths where interfaces should be located, along with quantitative estimates of uncertainty in data. Specifically, interfaces in the example are required near depths of 1,650m, 1,695m, 1,710m, and 1,725m. Therefore, the number and location of blocked layers can be effectively quantified in spite of uncertainties in upscaling log data.
2	A comparison of Bayesian model selection based on MCMC with an application to GARCH-type models Miazhynskaia, Tatiana, Frühwirth-Schnatter, Sylvia, Dorffner, Georg January 2003 (has links) (PDF) This paper presents a comprehensive review and comparison of five computational methods for Bayesian model selection, based on MCMC simulations from posterior model parameter distributions. We apply these methods to a well-known and important class of models in financial time series analysis, namely GARCH and GARCH-t models for conditional return distributions (assuming normal and t-distributions). We compare their performance vis--vis the more common maximum likelihood-based model selection on both simulated and real market data. All five MCMC methods proved feasible in both cases, although differing in their computational demands. Results on simulated data show that for large degrees of freedom (where the t-distribution becomes more similar to a normal one), Bayesian model selection results in better decisions in favour of the true model than maximum likelihood. Results on market data show the feasibility of all model selection methods, mainly because the distributions appear to be decisively non-Gaussian. / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
3	Classification of phylogenetic data via Bayesian mixture modelling Loza Reyes, Elisa January 2010 (has links) Conventional probabilistic models for phylogenetic inference assume that an evolutionary tree,andasinglesetofbranchlengthsandstochasticprocessofDNA evolutionare sufficient to characterise the generating process across an entire DNA alignment. Unfortunately such a simplistic, homogeneous formulation may be a poor description of reality when the data arise from heterogeneous processes. A well-known example is when sites evolve at heterogeneous rates. This thesis is a contribution to the modelling and understanding of heterogeneityin phylogenetic data. Weproposea methodfor the classificationof DNA sites based on Bayesian mixture modelling. Our method not only accounts for heterogeneous data but also identifies the underlying classes and enables their interpretation. We also introduce novel MCMC methodology with the same, or greater, estimation performance than existing algorithms but with lower computational cost. We find that our mixture model can successfully detect evolutionary heterogeneity and demonstrate its direct relevance by applying it to real DNA data. One of these applications is the analysis of sixteen strains of one of the bacterial species that cause Lyme disease. Results from that analysis have helped understanding the evolutionary paths of these bacterial strains and, therefore, the dynamics of the spread of Lyme disease. Our method is discussed in the context of DNA but it may be extendedto othertypesof molecular data. Moreover,the classification scheme thatwe propose is evidence of the breadth of application of mixture modelling and a step forwards in the search for more realistic models of theprocesses that underlie phylogenetic data. 519
4	Ideology and interests : a hierarchical Bayesian approach to spatial party preferences Mohanty, Peter Cushner 04 December 2013 (has links) This paper presents a spatial utility model of support for multiple political parties. The model includes a "valence" term, which I reparameterize to include both party competence and the voters' key sociodemographic concerns. The paper shows how this spatial utility model can be interpreted as a hierarchical model using data from the 2009 European Elections Study. I estimate this model via Bayesian Markov Chain Monte Carlo (MCMC) using a block Gibbs sampler and show that the model can capture broad European-wide trends while allowing for significant amounts of heterogeneity. This approach, however, which assumes a normal dependent variable, is only able to partially reproduce the data generating process. I show that the data generating process can be reproduced more accurately with an ordered probit model. Finally, I discuss trade-offs between parsimony and descriptive richness and other practical challenges that may be encountered when v building models of party support and make recommendations for capturing the best of both approaches. / text Hierarchical models Generalized linear models Markov Chain Monte Carlo (MCMC) Public opinion Political parties European Union
5	Bayesian Analysis for Large Spatial Data Park, Jincheol 2012 August 1900 (has links) The Gaussian geostatistical model has been widely used in Bayesian modeling of spatial data. A core difficulty for this model is at inverting the n x n covariance matrix, where n is a sample size. The computational complexity of matrix inversion increases as O(n3). This difficulty is involved in almost all statistical inferences approaches of the model, such as Kriging and Bayesian modeling. In Bayesian inference, the inverse of covariance matrix needs to be evaluated at each iteration in posterior simulations, so Bayesian approach is infeasible for large sample size n due to the current computational power limit. In this dissertation, we propose two approaches to address this computational issue, namely, the auxiliary lattice model (ALM) approach and the Bayesian site selection (BSS) approach. The key feature of ALM is to introduce a latent regular lattice which links Gaussian Markov Random Field (GMRF) with Gaussian Field (GF) of the observations. The GMRF on the auxiliary lattice represents an approximation to the Gaussian process. The distinctive feature of ALM from other approximations lies in that ALM avoids completely the problem of the matrix inversion by using analytical likelihood of GMRF. The computational complexity of ALM is rather attractive, which increase linearly with sample size. The second approach, Bayesian site selection (BSS), attempts to reduce the dimension of data through a smart selection of a representative subset of the observations. The BSS method first split the observations into two parts, the observations near the target prediction sites (part I) and their remaining (part II). Then, by treating the observations in part I as response variable and those in part II as explanatory variables, BSS forms a regression model which relates all observations through a conditional likelihood derived from the original model. The dimension of the data can then be reduced by applying a stochastic variable selection procedure to the regression model, which selects only a subset of the part II data as explanatory data. BSS can provide us more understanding to the underlying true Gaussian process, as it directly works on the original process without any approximations involved. The practical performance of ALM and BSS will be illustrated with simulated data and real data sets. Gaussian Random Field Kriging Markov Chain Monte Carlo (MCMC) Markov Random Field Matrix Inversion Spatial Data
6	Multiple imputation for marginal and mixed models in longitudinal data with informative missingness Deng, Wei 07 October 2005 (has links) No description available. Generalized estimating equations non-ignorable missingness dropout Markov chain Monte Carlo (MCMC) Gibbs' sampler
7	Computational Modeling for Differential Analysis of RNA-seq and Methylation data Wang, Xiao 16 August 2016 (has links) Computational systems biology is an inter-disciplinary field that aims to develop computational approaches for a system-level understanding of biological systems. Advances in high-throughput biotechnology offer broad scope and high resolution in multiple disciplines. However, it is still a major challenge to extract biologically meaningful information from the overwhelming amount of data generated from biological systems. Effective computational approaches are of pressing need to reveal the functional components. Thus, in this dissertation work, we aim to develop computational approaches for differential analysis of RNA-seq and methylation data to detect aberrant events associated with cancers. We develop a novel Bayesian approach, BayesIso, to identify differentially expressed isoforms from RNA-seq data. BayesIso features a joint model of the variability of RNA-seq data and the differential state of isoforms. BayesIso can not only account for the variability of RNA-seq data but also combines the differential states of isoforms as hidden variables for differential analysis. The differential states of isoforms are estimated jointly with other model parameters through a sampling process, providing an improved performance in detecting isoforms of less differentially expressed. We propose to develop a novel probabilistic approach, DM-BLD, in a Bayesian framework to identify differentially methylated genes. The DM-BLD approach features a hierarchical model, built upon Markov random field models, to capture both the local dependency of measured loci and the dependency of methylation change. A Gibbs sampling procedure is designed to estimate the posterior distribution of the methylation change of CpG sites. Then, the differential methylation score of a gene is calculated from the estimated methylation changes of the involved CpG sites and the significance of genes is assessed by permutation-based statistical tests. We have demonstrated the advantage of the proposed Bayesian approaches over conventional methods for differential analysis of RNA-seq data and methylation data. The joint estimation of the posterior distributions of the variables and model parameters using sampling procedure has demonstrated the advantage in detecting isoforms or methylated genes of less differential. The applications to breast cancer data shed light on understanding the molecular mechanisms underlying breast cancer recurrence, aiming to identify new molecular targets for breast cancer treatment. / Ph. D. Differential Analysis Bayesian Modeling Markov Random Field RNA-seq Data Analysis Markov Chain Monte Carlo (MCMC)
8	Bayesian Alignment Model for Analysis of LC-MS-based Omic Data Tsai, Tsung-Heng 22 May 2014 (has links) Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used in various omic studies for biomarker discovery. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time alignment is one of the most important yet challenging preprocessing steps, in order to ensure that ion intensity measurements among multiple LC-MS runs are comparable. In this dissertation, we propose a Bayesian alignment model (BAM) for analysis of LC-MS data. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and provides estimates of the retention time variability along with uncertainty measures, enabling a natural framework to integrate information of various sources. From methodology development to practical application, we investigate the alignment problem through three research topics: 1) development of single-profile Bayesian alignment model, 2) development of multi-profile Bayesian alignment model, and 3) application to biomarker discovery research. Chapter 2 introduces the profile-based Bayesian alignment using a single chromatogram, e.g., base peak chromatogram from each LC-MS run. The single-profile alignment model improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler using a block Metropolis-Hastings algorithm, and 2) an adaptive mechanism for knot specification using stochastic search variable selection (SSVS). Chapter 3 extends the model to integrate complementary information that better captures the variability in chromatographic separation. We use Gaussian process regression on the internal standards to derive a prior distribution for the mapping functions. In addition, a clustering approach is proposed to identify multiple representative chromatograms for each LC-MS run. With the Gaussian process prior, these chromatograms are simultaneously considered in the profile-based alignment, which greatly improves the model estimation and facilitates the subsequent peak matching process. Chapter 4 demonstrates the applicability of the proposed Bayesian alignment model to biomarker discovery research. We integrate the proposed Bayesian alignment model into a rigorous preprocessing pipeline for LC-MS data analysis. Through the developed analysis pipeline, candidate biomarkers for hepatocellular carcinoma (HCC) are identified and confirmed on a complementary platform. / Ph. D. alignment Bayesian inference biomarker discovery Markov chain Monte Carlo (MCMC)
9	Bayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assembly Shi, Xu 24 October 2017 (has links) The rapid development of biotechnology has enabled researchers to collect high-throughput data for studying various biological processes at the genomic level, transcriptomic level, and proteomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. The challenges call for more efforts in developing efficient and effective computational methods to analyze the data at different levels so as to understand the biological systems in different aspects. In this dissertation research, we have developed novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. Specifically, we focus on two research topics in this dissertation: isoform identification and phenotype-specific transcript assembly. For isoform identification, we develop a computational approach, SparseIso, to jointly model the existence and abundance of isoforms in a Bayesian framework. A spike-and-slab prior is incorporated into the model to enforce the sparsity of expressed isoforms. A Gibbs sampler is developed to sample the existence and abundance of isoforms iteratively. For transcript assembly, we develop a Bayesian approach, IntAPT, to assemble phenotype-specific transcripts from multiple RNA sequencing profiles. A two-layer Bayesian framework is used to model the existence of phenotype-specific transcripts and the transcript abundance in individual samples. Based on the hierarchical Bayesian model, a Gibbs sampling algorithm is developed to estimate the joint posterior distribution for phenotype-specific transcript assembly. The performances of our proposed methods are evaluated with simulation data, compared with existing methods and benchmarked with real cell line data. We then apply our methods on breast cancer data to identify biologically meaningful splicing mechanisms associated with breast cancer. For the further work, we will extend our methods for de novo transcript assembly to identify novel isoforms in biological systems; we will incorporate isoform-specific networks into our methods to better understand splicing mechanisms in biological systems. / Ph. D. / The next-generation sequencing technology has significantly improved the resolution of the biomedical research at the genomic level and transcriptomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. In this dissertation, we have developed two novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. We have demonstrated the advantages of our proposed approaches over existing methods on both simulation data and real cell line data. Furthermore, the application of our methods on real breast cancer data and glioblastoma tissue data has further shown the efficacy of our methods in real biological applications. Transcriptome Assembly RNA-seq Data Analysis Bayesian Inference Gibbs Sampling Markov Chain Monte Carlo (MCMC)
10	A Bayesian Approach to Estimating Background Flows from a Passive Scalar Krometis, Justin 26 June 2018 (has links) We consider the statistical inverse problem of estimating a background flow field (e.g., of air or water) from the partial and noisy observation of a passive scalar (e.g., the concentration of a pollutant). Here the unknown is a vector field that is specified by large or infinite number of degrees of freedom. We show that the inverse problem is ill-posed, i.e., there may be many or no background flows that match a given set of observations. We therefore adopt a Bayesian approach, incorporating prior knowledge of background flows and models of the observation error to develop probabilistic estimates of the fluid flow. In doing so, we leverage frameworks developed in recent years for infinite-dimensional Bayesian inference. We provide conditions under which the inference is consistent, i.e., the posterior measure converges to a Dirac measure on the true background flow as the number of observations of the solute concentration grows large. We also define several computationally-efficient algorithms adapted to the problem. One is an adjoint method for computation of the gradient of the log likelihood, a key ingredient in many numerical methods. A second is a particle method that allows direct computation of point observations of the solute concentration, leveraging the structure of the inverse problem to avoid approximation of the full infinite-dimensional scalar field. Finally, we identify two interesting example problems with very different posterior structures, which we use to conduct a large-scale benchmark of the convergence of several Markov Chain Monte Carlo methods that have been developed in recent years for infinite-dimensional settings. / Ph. D. / We consider the problem of estimating a fluid flow (e.g., of air or water) from partial and noisy observations of the concentration of a solute (e.g., a pollutant) dissolved in the fluid. Because of observational noise, and because there are cases where the fluid flow will not affect the movement of the pollutant, the fluid flow cannot be uniquely determined from the observations. We therefore adopt a statistical (Bayesian) approach, developing probabilistic estimates of the fluid flow using models of observation error and our understanding of the flow before measurements are taken. We provide conditions under which, as the number of observations grows large, the approach is able to identify the fluid flow that generated the observations. We define several efficient algorithms for computing statistics of the fluid flow, one of which involves approximating the movement of individual solute particles to estimate concentrations only where required by the inverse problem. We identify two interesting example problems for which the statistics of the fluid flow are very different. The first case produces an approximately normal distribution. The second example exhibits highly nonGaussian structure, where several different classes of fluid flows match the data very well. We use these examples to test the functionality and efficiency of several numerical (Markov Chain Monte Carlo) methods developed in recent years to compute the solution to similar problems. Bayesian Statistical Inversion Bayesian Consistency Markov Chain Monte Carlo (MCMC) Passive Scalars Fluid Turbulence

Search results