Spelling suggestions: "subject:"bayesian statistics"" "subject:"eayesian statistics""
91 |
Capture-recapture Estimation for Conflict Data and Hierarchical Models for Program Impact EvaluationMitchell, Shira Arkin 07 June 2014 (has links)
A relatively recent increase in the popularity of evidence-based activism has created a higher demand for statisticians to work on human rights and economic development projects. The statistical challenges of revealing patterns of violence in armed conflict require efficient use of the data, and careful consideration of the implications of modeling decisions on estimates. Impact evaluation of a complex economic development project requires a careful consideration of causality and transparency to donors and beneficiaries. In this dissertation, I compare marginal and conditional models for capture recapture, and develop new hierarchical models that accommodate challenges in data from the armed conflict in Colombia, and more generally, in many other capture recapture settings. Additionally, I propose a study design for a non-randomized impact evaluation of the Millennium Villages Project (MVP), to be carried out during my postdoctoral fellowship. The design includes small area estimation of baseline variables, propensity score matching, and hierarchical models for causal inference.
|
92 |
Bayesian analysis of the complex Bingham distributionLeu, Richard Hsueh-Yee 21 February 2011 (has links)
While most statistical applications involve real numbers, some demand
complex numbers. Statistical shape analysis is one such area. The complex
Bingham distribution is utilized in the shape analysis of landmark data in two dimensions.
Previous analysis of data arising from this distribution involved
classical statistical techniques. In this report, a full Bayesian inference was
carried out on the posterior distribution of the parameter matrix when data
arise from a complex Bingham distribution. We utilized a Markov chain Monte
Carlo algorithm to sample the posterior distribution of the parameters. A
Metropolis-Hastings algorithm sampled the posterior conditional distribution
of the eigenvalues while a successive conditional Monte Carlo sampler was used
to sample the eigenvectors. The method was successfully verifi ed on simulated
data, using both
at and informative priors. / text
|
93 |
Statistical Study of Magnetic Field Reversals in Geodynamo Models and Paleomagnetic DataMeduri, Domenico Giovanni 29 October 2014 (has links)
No description available.
|
94 |
Bayesian Hierarchical Models for Model ChoiceLi, Yingbo January 2013 (has links)
<p>With the development of modern data collection approaches, researchers may collect hundreds to millions of variables, yet may not need to utilize all explanatory variables available in predictive models. Hence, choosing models that consist of a subset of variables often becomes a crucial step. In linear regression, variable selection not only reduces model complexity, but also prevents over-fitting. From a Bayesian perspective, prior specification of model parameters plays an important role in model selection as well as parameter estimation, and often prevents over-fitting through shrinkage and model averaging.</p><p>We develop two novel hierarchical priors for selection and model averaging, for Generalized Linear Models (GLMs) and normal linear regression, respectively. They can be considered as "spike-and-slab" prior distributions or more appropriately "spike- and-bell" distributions. Under these priors we achieve dimension reduction, since their point masses at zero allow predictors to be excluded with positive posterior probability. In addition, these hierarchical priors have heavy tails to provide robust- ness when MLE's are far from zero.</p><p>Zellner's g-prior is widely used in linear models. It preserves correlation structure among predictors in its prior covariance, and yields closed-form marginal likelihoods which leads to huge computational savings by avoiding sampling in the parameter space. Mixtures of g-priors avoid fixing g in advance, and can resolve consistency problems that arise with fixed g. For GLMs, we show that the mixture of g-priors using a Compound Confluent Hypergeometric distribution unifies existing choices in the literature and maintains their good properties such as tractable (approximate) marginal likelihoods and asymptotic consistency for model selection and parameter estimation under specific values of the hyper parameters.</p><p>While the g-prior is invariant under rotation within a model, a potential problem with the g-prior is that it inherits the instability of ordinary least squares (OLS) estimates when predictors are highly correlated. We build a hierarchical prior based on scale mixtures of independent normals, which incorporates invariance under rotations within models like ridge regression and the g-prior, but has heavy tails like the Zeller-Siow Cauchy prior. We find this method out-performs the gold standard mixture of g-priors and other methods in the case of highly correlated predictors in Gaussian linear models. We incorporate a non-parametric structure, the Dirichlet Process (DP) as a hyper prior, to allow more flexibility and adaptivity to the data.</p> / Dissertation
|
95 |
On Bayesian Analyses of Functional Regression, Correlated Functional Data and Non-homogeneous Computer ModelsMontagna, Silvia January 2013 (has links)
<p>Current frontiers in complex stochastic modeling of high-dimensional processes include major emphases on so-called functional data: problems in which the data are snapshots of curves and surfaces representing fundamentally important scientific quantities. This thesis explores new Bayesian methodologies for functional data analysis. </p><p>The first part of the thesis places emphasis on the role of factor models in functional data analysis. Data reduction becomes mandatory when dealing with such high-dimensional data, more so when data are available on a large number of individuals. In Chapter 2 we present a novel Bayesian framework which employs a latent factor construction to represent each variable by a low dimensional summary. Further, we explore the important issue of modeling and analyzing the relationship of functional data with other covariate and outcome variables simultaneously measured on the same subjects.</p><p>The second part of the thesis is concerned with the analysis of circadian data. The focus is on the identification of circadian genes that is, genes whose expression levels appear to be rhythmic through time with a period of approximately 24 hours. While addressing this goal, most of the current literature does not account for the potential dependence across genes. In Chapter 4, we propose a Bayesian approach which employs latent factors to accommodate dependence and verify patterns and relationships between genes, while representing the true gene expression trajectories in the Fourier domain allows for inference on period, phase, and amplitude of the signal.</p><p>The third part of the thesis is concerned with the statistical analysis of computer models (simulators). The heavy computational demand of these input-output maps calls for statistical techniques that quickly estimate the surface output at untried inputs given a few preliminary runs of the simulator at a set design points. In this regard, we propose a Bayesian methodology based on a non-stationary Gaussian process. Relying on a model-based assessment of uncertainty, we envision a sequential design technique which helps choosing input points where the simulator should be run to minimize the uncertainty in posterior surface estimation in an optimal way. The proposed non-stationary approach adapts well to output surfaces of unconstrained shape.</p> / Dissertation
|
96 |
Monitoring and Improving Markov Chain Monte Carlo Convergence by PartitioningVanDerwerken, Douglas January 2015 (has links)
<p>Since Bayes' Theorem was first published in 1762, many have argued for the Bayesian paradigm on purely philosophical grounds. For much of this time, however, practical implementation of Bayesian methods was limited to a relatively small class of "conjugate" or otherwise computationally tractable problems. With the development of Markov chain Monte Carlo (MCMC) and improvements in computers over the last few decades, the number of problems amenable to Bayesian analysis has increased dramatically. The ensuing spread of Bayesian modeling has led to new computational challenges as models become more complex and higher-dimensional, and both parameter sets and data sets become orders of magnitude larger. This dissertation introduces methodological improvements to deal with these challenges. These include methods for enhanced convergence assessment, for parallelization of MCMC, for estimation of the convergence rate, and for estimation of normalizing constants. A recurring theme across these methods is the utilization of one or more chain-dependent partitions of the state space.</p> / Dissertation
|
97 |
OBSCURATION IN ACTIVE GALACTIC NUCLEINikutta, Robert 01 January 2012 (has links)
All classes of Active Galactic Nuclei (AGN) are fundamentally powered by accretion of gas onto a supermassive black hole. The process converts the potential energy of the infalling matter to X-ray and ultraviolet (UV) radiation, releasing up to several 1012 solar luminosities.
Observations show that the accreting "central engines" in AGN are surrounded by dusty matter. The dust occupies a "torus" around the AGN which is comprised of discrete clumps. If the AGN radiation is propagating through the torus on its way to an observer, it will be heavily re-processed by the dust, i.e. converted from UV to infrared (IR) wavelengths. Much of the information about the input radiation is lost in this conversion process while an imprint of the dusty torus is left in the released IR photons.
Our group was the first to formulate a consistent treatment of radiative transfer in a clumpy medium an important improvement over simpler models with smooth dust distributions previously used by researchers. Our code CLUMPY computes spectral energy distributions (SED) for any set of model parameters values. Fitting these models to observed AGN SEDs allows us to determine important quantities, such as the torus size, the spatial distribution of clumps, the torus covering factor, or the intrinsic AGN luminosity. Detailed modeling also permits us to study the complex behavior of certain spectral features.
IR radiative transfer introduces degeneracies to the solution space: different parameter values can yield similar SEDs. The geometry of the torus further exacerbates the problem. Knowing the amount of parameter degeneracy present in our models is important for quantifying the confidence in data fits. When matching the models to observed SEDs we must employ modern statistical methods. In my research I use Bayesian statistics to determine the likely ranges of parameter values. I have developed all tools required for fitting observed SEDs with our large model database: the latest implementation of CLUMPY, the fit algorithms, the Markov Chain Monte Carlo sampler, and the Bayesian estimator. In collaboration with observing groups we have applied our methods to a multitude of real-life AGN.
|
98 |
Computational Systems Biology of Saccharomyces cerevisiae Cell Growth and DivisionMayhew, Michael Benjamin January 2014 (has links)
<p>Cell division and growth are complex processes fundamental to all living organisms. In the budding yeast, <italic>Saccharomyces cerevisiae</italic>, these two processes are known to be coordinated with one another as a cell's mass must roughly double before division. Moreover, cell-cycle progression is dependent on cell size with smaller cells at birth generally taking more time in the cell cycle. This dependence is a signature of size control. Systems biology is an emerging field that emphasizes connections or dependencies between biological entities and processes over the characteristics of individual entities. Statistical models provide a quantitative framework for describing and analyzing these dependencies. In this dissertation, I take a statistical systems biology approach to study cell division and growth and the dependencies within and between these two processes, drawing on observations from richly informative microscope images and time-lapse movies. I review the current state of knowledge on these processes, highlighting key results and open questions from the biological literature. I then discuss my development of machine learning and statistical approaches to extract cell-cycle information from microscope images and to better characterize the cell-cycle progression of populations of cells. In addition, I analyze single cells to uncover correlation in cell-cycle progression, evaluate potential models of dependence between growth and division, and revisit classical assertions about budding yeast size control. This dissertation presents a unique perspective and approach towards comprehensive characterization of the coordination between growth and division.</p> / Dissertation
|
99 |
Desenvolvimento de interfaces gráficas para estatística Bayesiana aplicada à comparação mista de tratamentos / Graphical User Interface development for Bayesian Statistics applied to Mixed Treatment ComparisonMarcelo Goulart Correia 12 September 2013 (has links)
A partir dos avanços obtidos pela industria farmacêutica surgiram diversos
medicamentos para o combate de enfermidades. Esses medicamentos possuem
efeito tópico similar porém com suaves modificações em sua estrutura bioquímica,
com isso a concorrência entre as industrias farmacêuticas se torna cada vez mais
acirrada. Como forma de comparar a efetividade desses medicamentos, surgem
diversas metodologias, com o objetivo de encontrar qual seria o melhor
medicamento para uma dada situação. Uma das metodologias estudadas é a
comparação mista de tratamentos, cujo objetivo é encontrar a efetividade de
determinadas drogas em estudos e/ou ensaios clínicos que abordem, mesmo que de
maneira indireta, os medicamentos estudados. A utilização dessa metodologia é
demasiadamente complexa pois requer conhecimento de linguagens de
programação em ambientes estatísticos além do domínio sobre as metodologias
aplicadas a essa técnica. O objetivo principal desse estudo é a criação de uma
interface gráfica que facilite a utilização do MTC para usuários que não possuam
conhecimento em linguagens de programação, que seja de código aberto e
multiplataforma. A expectativa é que, com essa interface, a utilização de técnicas
mais abrangentes e avançadas seja facilitada, além disso, venha tornar o
ensinamento sobre o tema mais facilitado para pessoas que ainda não conhecem o
método / Based on the progress made by the pharmaceutical industry, several
medications have emerged to combat diseases. These drugs have similar topic
effects but with subtle changes in their biochemical structure, thus competition
between the pharmaceutical industry becomes increasingly fierce. In order to
compare the effectiveness of these drugs, appear different methodologies with the
objective of find what would be the best medicine for a given situation. One of the
methods studied is the Mixed Treatment Comparision (MTC) whose objective is to
find the effectiveness of certain drugs in studies and / or clinical trials that address,
even if indirectly, the drugs studied. The use of this methodology is too complex
because it requires knowledge of programming languages in statistical environments,
beyond the mastery of the methodologies applied to this technique. The main
objective of this study is to create a graphical user interface (GUI) that facilitates the
use of MTC for users who have no knowledge in programming languages, which is
open source and cross-platform. The expectation about this interface is that the use
of more comprehensive and advanced techniques is facilitated, moreover, make the
teaching about the topic easier for people who do not know the method
|
100 |
Efficient deterministic approximate Bayesian inference for Gaussian process modelsBui, Thang Duc January 2018 (has links)
Gaussian processes are powerful nonparametric distributions over continuous functions that have become a standard tool in modern probabilistic machine learning. However, the applicability of Gaussian processes in the large-data regime and in hierarchical probabilistic models is severely limited by analytic and computational intractabilities. It is, therefore, important to develop practical approximate inference and learning algorithms that can address these challenges. To this end, this dissertation provides a comprehensive and unifying perspective of pseudo-point based deterministic approximate Bayesian learning for a wide variety of Gaussian process models, which connects previously disparate literature, greatly extends them and allows new state-of-the-art approximations to emerge. We start by building a posterior approximation framework based on Power-Expectation Propagation for Gaussian process regression and classification. This framework relies on a structured approximate Gaussian process posterior based on a small number of pseudo-points, which is judiciously chosen to summarise the actual data and enable tractable and efficient inference and hyperparameter learning. Many existing sparse approximations are recovered as special cases of this framework, and can now be understood as performing approximate posterior inference using a common approximate posterior. Critically, extensive empirical evidence suggests that new approximation methods arisen from this unifying perspective outperform existing approaches in many real-world regression and classification tasks. We explore the extensions of this framework to Gaussian process state space models, Gaussian process latent variable models and deep Gaussian processes, which also unify many recently developed approximation schemes for these models. Several mean-field and structured approximate posterior families for the hidden variables in these models are studied. We also discuss several methods for approximate uncertainty propagation in recurrent and deep architectures based on Gaussian projection, linearisation, and simple Monte Carlo. The benefit of the unified inference and learning frameworks for these models are illustrated in a variety of real-world state-space modelling and regression tasks.
|
Page generated in 0.1011 seconds