Spelling suggestions: "subject:"[een] BAYESIAN STATISTICS"" "subject:"[enn] BAYESIAN STATISTICS""
71 |
Seed Dispersal, Gene Flow, and Hybridization in Red OakMoran, Emily Victoria January 2010 (has links)
<p><p>Understanding the ecological and evolutionary responses of plant species to shifts in climate (and other rapid environmental perturbations) will require an improved knowledge of interactions between ecological and evolutionary processes as mediated by reproduction and gene flow. This dissertation research examines the processes of seed dispersal, intra- and inter-specific gene flow, and reproductive success in two red oak populations in North Carolina; the variation in these processes from site to site; and their influence on genetic structure, population dynamics, and migration potential.</p>
<p><p>Using genetic and ecological data collected from two large long-term study sites, I develop a hierarchical Bayesian model to identify the parents of sampled seedlings and characterize the scale of effective seed and pollen dispersal. I examine differences in scale of dispersal between the Appalachian and Piedmont sites in light of the spatial genetic structure and ecological differences of the two sites. I then use the pedigree and dispersal estimates derived from these analyses to examine variation in reproductive success and to test hypotheses about the causes and consequences of such variation. Using parentage estimates and measures of genetic differentiation between species, I study the likely extent of hybridization in these mixed-species secondary forests. Finally, using the SLIP stand simulator, I explore the implications of new genetic dispersal estimates for migration potential in oaks.</p>
<p><p>I find that effective seed dispersal distances are longer than estimated using seed trap data. While at the Piedmont site the large number of seedling found >100 m from their mother trees suggests that animal dispersers play a vital role, at the Appalachian site seedling distributions conform more closely to the original gravity-created pattern of seed density. Individual trees vary widely in their reproductive success. Seedling production was found to be positively associated with annual seed production, but exhibited hump-shaped or reversing relationships with age (suggesting the effect of senescence) and growth rate (suggesting tradeoffs in allocation). Germination fraction was negatively associated with fecundity, suggesting that density-dependent mortality may be acting on the high concentrations of seeds near highly fecund adults. Due to overlapping generations and variation in individual reproductive success, effective population size is estimated to be less than half the size that numbers of "adult" individuals would suggest, with consequences for the relative strength of drift and selection. Hybridization may boost effective population size somewhat; my analyses suggest that inter-specific gene flow is common at both study sites. Finally, simulations show that dispersal has a relatively stronger effect on migration rate and population growth than fecundity or size at maturity, and that genetic estimates of seed dispersal can yield significantly higher rates of migration and/or population persistence than seed-trap based estimates under both competitive and non-competitive conditions.</p> / Dissertation
|
72 |
Bayesian learning methods for potential energy parameter inference in coarse-grained models of atomistic systemsWright, Eric Thomas 27 August 2015 (has links)
The present work addresses issues related to the derivation of reduced models of atomistic systems, their statistical calibration, and their relation to atomistic models of materials. The reduced model, known in the chemical physics community as a coarse-grained model, is calibrated within a Bayesian framework. Particular attention is given to developing likelihood functions, assigning priors on coarse-grained model parameters, and using data from molecular dynamics representations of atomistic systems to calibrate coarse-grained models such that certain physically relevant atomistic observables are accurately reproduced. The developed Bayesian framework is then applied in three case studies of increasing complexity and practical application. A freely jointed chain model is considered first for illustrative purposes. The next example entails the construction of a coarse-grained model for a liquid heptane system, with the explicit design goal of accurately predicting a vapor-liquid transfer free energy. Finally, a coarse-grained model is developed for an alkylthiophene polymer that has been shown to have practical use in certain types of photovoltaic cells. The development therein employs Bayesian decision theory to select an optimal CG potential energy function. Subsequently, this model is subjected to validation tests in a prediction scenario that is relevant to the performance of a polyalkylthiophene-based solar cell. / text
|
73 |
Bayesian Methods and Computation for Large Observational DatasetsWatts, Krista Leigh 30 September 2013 (has links)
Much health related research depends heavily on the analysis of a rapidly expanding universe of observational data. A challenge in analysis of such data is the lack of sound statistical methods and tools that can address multiple facets of estimating treatment or exposure effects in observational studies with a large number of covariates. We sought to advance methods to improve analysis of large observational datasets with an end goal of understanding the effect of treatments or exposures on health. First we compared existing methods for propensity score (PS) adjustment, specifically Bayesian propensity scores. This concept had previously been introduced (McCandless et al., 2009) but no rigorous evaluation had been done to evaluate the impact of feedback when fitting the joint likelihood for both the PS and outcome models. We determined that unless specific steps were taken to mitigate the impact of feedback, it has the potential to distort estimates of the treatment effect. Next, we developed a method for accounting for uncertainty in confounding adjustment in the context of multiple exposures. Our method allows us to select confounders based on their association with the joint exposure and the outcome while also accounting for the uncertainty in the confounding adjustment. Finally, we developed two methods to combine het- erogenous sources of data for effect estimation, specifically information coming from a primary data source that provides information for treatments, outcomes, and a limited set of measured confounders on a large number of people and smaller supplementary data sources containing a much richer set of covariates. Our methods avoid the need to specify the full joint distribution of all covariates.
|
74 |
Rich Linguistic Structure from Large-Scale Web DataYamangil, Elif 18 October 2013 (has links)
The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have been shown to outperform sophisticated ones. It has been argued that the effectiveness of Web-scale data has undermined the necessity of sophisticated modeling or laborious data set curation. In this thesis, we argue for and illustrate an alternative view, that Web-scale data not only serves to improve the performance of simple models, but also can allow the use of qualitatively more sophisticated models that would not be deployable otherwise, leading to even further performance gains. / Engineering and Applied Sciences
|
75 |
Capture-recapture Estimation for Conflict Data and Hierarchical Models for Program Impact EvaluationMitchell, Shira Arkin 07 June 2014 (has links)
A relatively recent increase in the popularity of evidence-based activism has created a higher demand for statisticians to work on human rights and economic development projects. The statistical challenges of revealing patterns of violence in armed conflict require efficient use of the data, and careful consideration of the implications of modeling decisions on estimates. Impact evaluation of a complex economic development project requires a careful consideration of causality and transparency to donors and beneficiaries. In this dissertation, I compare marginal and conditional models for capture recapture, and develop new hierarchical models that accommodate challenges in data from the armed conflict in Colombia, and more generally, in many other capture recapture settings. Additionally, I propose a study design for a non-randomized impact evaluation of the Millennium Villages Project (MVP), to be carried out during my postdoctoral fellowship. The design includes small area estimation of baseline variables, propensity score matching, and hierarchical models for causal inference.
|
76 |
Bayesian analysis of the complex Bingham distributionLeu, Richard Hsueh-Yee 21 February 2011 (has links)
While most statistical applications involve real numbers, some demand
complex numbers. Statistical shape analysis is one such area. The complex
Bingham distribution is utilized in the shape analysis of landmark data in two dimensions.
Previous analysis of data arising from this distribution involved
classical statistical techniques. In this report, a full Bayesian inference was
carried out on the posterior distribution of the parameter matrix when data
arise from a complex Bingham distribution. We utilized a Markov chain Monte
Carlo algorithm to sample the posterior distribution of the parameters. A
Metropolis-Hastings algorithm sampled the posterior conditional distribution
of the eigenvalues while a successive conditional Monte Carlo sampler was used
to sample the eigenvectors. The method was successfully verifi ed on simulated
data, using both
at and informative priors. / text
|
77 |
Statistical Study of Magnetic Field Reversals in Geodynamo Models and Paleomagnetic DataMeduri, Domenico Giovanni 29 October 2014 (has links)
No description available.
|
78 |
Bayesian Hierarchical Models for Model ChoiceLi, Yingbo January 2013 (has links)
<p>With the development of modern data collection approaches, researchers may collect hundreds to millions of variables, yet may not need to utilize all explanatory variables available in predictive models. Hence, choosing models that consist of a subset of variables often becomes a crucial step. In linear regression, variable selection not only reduces model complexity, but also prevents over-fitting. From a Bayesian perspective, prior specification of model parameters plays an important role in model selection as well as parameter estimation, and often prevents over-fitting through shrinkage and model averaging.</p><p>We develop two novel hierarchical priors for selection and model averaging, for Generalized Linear Models (GLMs) and normal linear regression, respectively. They can be considered as "spike-and-slab" prior distributions or more appropriately "spike- and-bell" distributions. Under these priors we achieve dimension reduction, since their point masses at zero allow predictors to be excluded with positive posterior probability. In addition, these hierarchical priors have heavy tails to provide robust- ness when MLE's are far from zero.</p><p>Zellner's g-prior is widely used in linear models. It preserves correlation structure among predictors in its prior covariance, and yields closed-form marginal likelihoods which leads to huge computational savings by avoiding sampling in the parameter space. Mixtures of g-priors avoid fixing g in advance, and can resolve consistency problems that arise with fixed g. For GLMs, we show that the mixture of g-priors using a Compound Confluent Hypergeometric distribution unifies existing choices in the literature and maintains their good properties such as tractable (approximate) marginal likelihoods and asymptotic consistency for model selection and parameter estimation under specific values of the hyper parameters.</p><p>While the g-prior is invariant under rotation within a model, a potential problem with the g-prior is that it inherits the instability of ordinary least squares (OLS) estimates when predictors are highly correlated. We build a hierarchical prior based on scale mixtures of independent normals, which incorporates invariance under rotations within models like ridge regression and the g-prior, but has heavy tails like the Zeller-Siow Cauchy prior. We find this method out-performs the gold standard mixture of g-priors and other methods in the case of highly correlated predictors in Gaussian linear models. We incorporate a non-parametric structure, the Dirichlet Process (DP) as a hyper prior, to allow more flexibility and adaptivity to the data.</p> / Dissertation
|
79 |
On Bayesian Analyses of Functional Regression, Correlated Functional Data and Non-homogeneous Computer ModelsMontagna, Silvia January 2013 (has links)
<p>Current frontiers in complex stochastic modeling of high-dimensional processes include major emphases on so-called functional data: problems in which the data are snapshots of curves and surfaces representing fundamentally important scientific quantities. This thesis explores new Bayesian methodologies for functional data analysis. </p><p>The first part of the thesis places emphasis on the role of factor models in functional data analysis. Data reduction becomes mandatory when dealing with such high-dimensional data, more so when data are available on a large number of individuals. In Chapter 2 we present a novel Bayesian framework which employs a latent factor construction to represent each variable by a low dimensional summary. Further, we explore the important issue of modeling and analyzing the relationship of functional data with other covariate and outcome variables simultaneously measured on the same subjects.</p><p>The second part of the thesis is concerned with the analysis of circadian data. The focus is on the identification of circadian genes that is, genes whose expression levels appear to be rhythmic through time with a period of approximately 24 hours. While addressing this goal, most of the current literature does not account for the potential dependence across genes. In Chapter 4, we propose a Bayesian approach which employs latent factors to accommodate dependence and verify patterns and relationships between genes, while representing the true gene expression trajectories in the Fourier domain allows for inference on period, phase, and amplitude of the signal.</p><p>The third part of the thesis is concerned with the statistical analysis of computer models (simulators). The heavy computational demand of these input-output maps calls for statistical techniques that quickly estimate the surface output at untried inputs given a few preliminary runs of the simulator at a set design points. In this regard, we propose a Bayesian methodology based on a non-stationary Gaussian process. Relying on a model-based assessment of uncertainty, we envision a sequential design technique which helps choosing input points where the simulator should be run to minimize the uncertainty in posterior surface estimation in an optimal way. The proposed non-stationary approach adapts well to output surfaces of unconstrained shape.</p> / Dissertation
|
80 |
Monitoring and Improving Markov Chain Monte Carlo Convergence by PartitioningVanDerwerken, Douglas January 2015 (has links)
<p>Since Bayes' Theorem was first published in 1762, many have argued for the Bayesian paradigm on purely philosophical grounds. For much of this time, however, practical implementation of Bayesian methods was limited to a relatively small class of "conjugate" or otherwise computationally tractable problems. With the development of Markov chain Monte Carlo (MCMC) and improvements in computers over the last few decades, the number of problems amenable to Bayesian analysis has increased dramatically. The ensuing spread of Bayesian modeling has led to new computational challenges as models become more complex and higher-dimensional, and both parameter sets and data sets become orders of magnitude larger. This dissertation introduces methodological improvements to deal with these challenges. These include methods for enhanced convergence assessment, for parallelization of MCMC, for estimation of the convergence rate, and for estimation of normalizing constants. A recurring theme across these methods is the utilization of one or more chain-dependent partitions of the state space.</p> / Dissertation
|
Page generated in 0.0461 seconds