Spelling suggestions: "subject:"[een] BAYESIAN STATISTICS"" "subject:"[enn] BAYESIAN STATISTICS""
41 |
Analysis of Otolith Microchemistry Using Bayesian Hierarchical Mixture ModelsPflugeisen, Bethann Mangel 01 September 2010 (has links)
No description available.
|
42 |
Bridging data gaps for strategic conservation of Gulf of Mexico coastal region landscapesShamaskin, Andrew Challen 30 April 2021 (has links)
The Gulf Coast Region (GCR) of the United States holds immense ecological and cultural value. However, constant environmental changes, from sea-level rise and hurricanes to the Deepwater Horizon oil spill in 2010, threaten many of the values that define the region. Additionally, recent financial settlements from civil and criminal penalties of the Deepwater Horizon oil spill have created an unprecedented opportunity to fund conservation throughout the region. With such a large area of interest (over 700,000 km2) and so many conservation priorities throughout the GCR, there is a great need to strategize which lands are most efficacious for conservation to optimize the protection of ecological and socioeconomic values. Given the importance of ecologically sound data to informing conservation planning, I directed my dissertation to develop gulf-wide datasets to be used in a geospatial tool to support land conservation actions in the GCR. My dissertation addresses three fundamental objectives: 1) assessing how landscapes are associated with estuarine biotic health; 2) mapping hydrologic response to changes in land-use; and, 3) creating indices of land conservation value with regards to modeled associations (from objective 1) with estuarine biotic health. For objective 1, I constructed three hierarchical models across 33 GCR estuaries and their associated watersheds. I estimated the expected number of fish and shrimp species observed in a trawl sample based on temperature, salinity, and runoff volume per catchment area across six different land-use/land-cover (LULC) classes. These models can provide a quantitative basis for assigning offsite values to lands for conservation potential within the GCR. For objective 2, I assessed associations of different LULC classes with hydrologic changes, measured by peak flow (cfs), from 1996-2016 within each GCR watershed, which can be valuable to conservation planning that seeks to focus on preserving or restoring more typical flow regimes. For my 3rd objective, I developed an index of conservation value which incorporates relationships among LULC, hydrologic connectivity, and estuarine biotic health for lands within the GCR. These elements will help address lesser understood land conservation needs in the GCR to better enable conservation planners to protect the values of this region in the face of inevitable change.
|
43 |
Modeling Error in Geographic Information SystemsLove, Kimberly R. 09 January 2008 (has links)
Geographic information systems (GISs) are a highly influential tool in today's society, and are used in a growing number of applications, including planning, engineering, land management,and environmental study. As the field of GISs continues to expand, it is very important to observe and account for the error that is unavoidable in computerized maps. Currently, both statistical and non-statistical models are available to do so, although there is very little implementation of these methods.
In this dissertation, I have focused on improving the methods available for analyzing error in GIS vector data. In particular, I am incorporating Bayesian methodology into the currently popular G-band error model through the inclusion of a prior distribution on point locations. This has the advantage of working well with a small number of points, and being able to synthesize information from multiple sources. I have also calculated the boundary of the confidence region explicitly, which has not been done before, and this will aid in the eventual inclusion of these methods in GIS software. Finally, I have included a statistical point deletion algorithm, designed for use in situations where map precision has surpassed map accuracy. It is very similar to the Douglas-Peucker algorithm, and can be used in a general line simplification situation, but has the advantage that it works with the error information that is already known about a map rather than adding unknown error. These contributions will make it more realistic for GIS users to implement techniques for error analysis. / Ph. D.
|
44 |
Setting location priors using beamforming improves model comparison in MEG-DCMCarter, Matthew Edward 25 August 2014 (has links)
Modelling neuronal interactions using a directed network can be used to provide insight into the activity of the brain during experimental tasks. Magnetoencephalography (MEG) allows for the observation of the fast neuronal dynamics necessary to characterize the activity of sources and their interactions. A network representation of these sources and their connections can be formed by mapping them to nodes and their connection strengths to edge weights. Dynamic Causal Modelling (DCM) presents a Bayesian framework to estimate the parameters of these networks, as well as the ability to test hypotheses on the structure of the network itself using Bayesian model comparison. DCM uses a neurologically-informed representation of the active neural sources, which leads to an underdetermined system and increased complexity in estimating the network parameters. This work shows that inform- ing the MEG DCM source location with prior distributions defined using a MEG source localization algorithm improves model selection accuracy. DCM inversion of a group of candidate models shows an enhanced ability to identify a ground-truth network structure when source-localized prior means are used. / Master of Science
|
45 |
FORECASTING INSURANCE CLAIMS RELATED TO WASPSPeetre Malthe, Olivia Linda Evelina January 2024 (has links)
Many insurance companies offer coverage for damages caused by wasps to living environments. The frequency of insurance claims related to wasps varies yearly, and anticipating the exact frequency is a complex task. By forecasting insurance claims, companies can optimize their resource allocation and management. The objective of this thesis is to forecast the frequency of insurance claims related to wasps using weather data collected over time and space by developing probabilistic models within the Bayesian framework. Weather data is used as it is assumed to capture environmental conditions that affect wasp behavior, and conditions that increase the chances of damages caused by wasps being detected. The Bayesian framework is employed as it offers an efficient way to model uncertainty by treating parameters as random. Twelve models were fitted and their predictive performance for June, July, and August were evaluated during 2022 and 2023. For June, a Negative Binomial model incorporating a spatial adjustment component with a CAR prior, and weather covariates, demonstrated the highest predictive performance. For July, a model incorporating an autoregressive parameter and the weather effect three weeks preceding the insurance claim performed best. For August, a model incorporating only weather covariates outperformed the others. The differing results show that the models capture different underlying processes in the months.
|
46 |
Inference on Markov random fields : methods and applicationsLienart, Thibaut January 2017 (has links)
This thesis considers the problem of performing inference on undirected graphical models with continuous state spaces. These models represent conditional independence structures that can appear in the context of Bayesian Machine Learning. In the thesis, we focus on computational methods and applications. The aim of the thesis is to demonstrate that the factorisation structure corresponding to the conditional independence structure present in high-dimensional models can be exploited to decrease the computational complexity of inference algorithms. First, we consider the smoothing problem on Hidden Markov Models (HMMs) and discuss novel algorithms that have sub-quadratic computational complexity in the number of particles used. We show they perform on par with existing state-of-the-art algorithms with a quadratic complexity. Further, a novel class of rejection free samplers for graphical models known as the Local Bouncy Particle Sampler (LBPS) is explored and applied on a very large instance of the Probabilistic Matrix Factorisation (PMF) problem. We show the method performs slightly better than Hamiltonian Monte Carlo methods (HMC). It is also the first such practical application of the method to a statistical model with hundreds of thousands of dimensions. In a second part of the thesis, we consider approximate Bayesian inference methods and in particular the Expectation Propagation (EP) algorithm. We show it can be applied as the backbone of a novel distributed Bayesian inference mechanism. Further, we discuss novel variants of the EP algorithms and show that a specific type of update mechanism, analogous to the mirror descent algorithm outperforms all existing variants and is robust to Monte Carlo noise. Lastly, we show that EP can be used to help the Particle Belief Propagation (PBP) algorithm in order to form cheap and adaptive proposals and significantly outperform classical PBP.
|
47 |
Bayesian meta-analysis models for heterogeneous genomics dataZheng, Lingling January 2013 (has links)
<p>The accumulation of high-throughput data from vast sources has drawn a lot attentions to develop methods for extracting meaningful information out of the massive data. More interesting questions arise from how to combine the disparate information, which goes beyond modeling sparsity and dimension reduction. This dissertation focuses on the innovations in the area of heterogeneous data integration.</p><p>Chapter 1 contextualizes this dissertation by introducing different aspects of meta-analysis and model frameworks for high-dimensional genomic data.</p><p>Chapter 2 introduces a novel technique, joint Bayesian sparse factor analysis model, to vertically integrate multi-dimensional genomic data from different platforms. </p><p>Chapter 3 extends the above model to a nonparametric Bayes formula. It directly infers number of factors from a model-based approach.</p><p>On the other hand, chapter 4 deals with horizontal integration of diverse gene expression data; the model infers pathway activities across various experimental conditions. </p><p>All the methods mentioned above are demonstrated in both simulation studies and real data applications in chapters 2-4.</p><p>Finally, chapter 5 summarizes the dissertation and discusses future directions.</p> / Dissertation
|
48 |
Statistical methods for mapping complex traitsAllchin, Lorraine Doreen May January 2014 (has links)
The first section of this thesis addresses the problem of simultaneously identifying multiple loci that are associated with a trait, using a Bayesian Markov Chain Monte Carlo method. It is applicable to both case/control and quantitative data. I present simulations comparing the methods to standard frequentist methods in human case/control and mouse QTL datasets, and show that in the case/control simulations the standard frequentist method out performs my model for all but the highest effect simulations and that for the mouse QTL simulations my method performs as well as the frequentist method in some cases and worse in others. I also present analysis of real data and simulations applying my method to a simulated epistasis data set. The next section was inspired by the challenges involved in applying a Markov Chain Monte Carlo method to genetic data. It is an investigation into the performance and benefits of the Matlab parallel computing toolbox, specifically its implementation of the Cuda programing language to Matlab's higher level language. Cuda is a language which allows computational calculations to be carried out on the computer's graphics processing unit (GPU) rather than its central processing unit (CPU). The appeal of this tool box is its ease of use as few code adaptions are needed. The final project of this thesis was to develop an HMM for reconstructing the founders of sparsely sequenced inbred populations. The motivation here, that whilst sequencing costs are rapidly decreasing, it is still prohibitively expensive to fully sequence a large number of individuals. It was proposed that, for populations descended from a known number of founders, it would be possible to sequence these individuals with a very low coverage, use a hidden Markov model (HMM) to represent the chromosomes as mosaics of the founders, then use these states to impute the missing data. For this I developed a Viterbi algorithm with a transition probability matrix based on recombination rate which changes for each observed state.
|
49 |
Statistical methods for quantifying uncertainty in climate projections from ensembles of climate modelsSansom, Philip George January 2014 (has links)
Appropriate and defensible statistical frameworks are required in order to make credible inferences about future climate based on projections derived from multiple climate models. It is shown that a two-way analysis of variance framework can be used to estimate the response of the actual climate, if all the climate models in an ensemble simulate the same response. The maximum likelihood estimate of the expected response provides a set of weights for combining projections from multiple climate models. Statistical F tests are used to show that the differences between the climate response of the North Atlantic storm track simulated by a large ensemble of climate models cannot be distinguished from internal variability. When climate models simulate different responses, the differences between the re- sponses represent an additional source of uncertainty. Projections simulated by climate models that share common components cannot be considered independent. Ensemble thinning is advocated in order to obtain a subset of climate models whose outputs are judged to be exchangeable and can be modelled as a random sample. It is shown that the agreement between models on the climate response in the North Atlantic storm track is overestimated due to model dependence. Correlations between the climate responses and historical climates simulated by cli- mate models can be used to constrain projections of future climate. It is shown that the estimate of any such emergent relationship will be biased, if internal variability is large compared to the model uncertainty about the historical climate. A Bayesian hierarchical framework is proposed that is able to separate model uncertainty from internal variability, and to estimate emergent constraints without bias. Conditional cross-validation is used to show that an apparent emergent relationship in the North Atlantic storm track is not robust. The uncertain relationship between an ensemble of climate models and the actual climate can be represented by a random discrepancy. It is shown that identical inferences are obtained whether the climate models are treated as predictors for the actual climate or vice versa, provided that the discrepancy is assumed to be sym- metric. Emergent relationships are reinterpreted as constraints on the discrepancy between the expected response of the ensemble and the actual climate response, onditional on observations of the recent climate. A simple method is proposed for estimating observation uncertainty from reanalysis data. It is estimated that natural variability accounts for 30-45% of the spread in projections of the climate response in the North Atlantic storm track.
|
50 |
Aspects of probabilistic modelling for data analysisDelannay, Nicolas 23 October 2007 (has links)
Computer technologies have revolutionised the processing of information and the search for knowledge. With the ever increasing computational power, it is becoming possible to tackle new data analysis applications as diverse as mining the Internet resources, analysing drugs effects on the organism or assisting wardens with autonomous video detection techniques.
Fundamentally, the principle of any data analysis task is to fit a model which encodes well the dependencies (or patterns) present in the data. However, the difficulty is precisely to define such proper model when data are noisy, dependencies are highly stochastic and there is no simple physical rule to represent them.
The aim of this work is to discuss the principles, the advantages and weaknesses of the probabilistic modelling framework for data analysis. The main idea of the framework is to model dispersion of data as well as uncertainty about the model itself by probability distributions. Three data analysis tasks are presented and for each of them the discussion is based on experimental results from real datasets.
The first task considers the problem of linear subspaces identification. We show how one can replace a Gaussian noise model by a Student-t noise to make the identification more robust to atypical samples and still keep the learning procedure simple. The second task is about regression applied more specifically to near-infrared spectroscopy datasets. We show how spectra should be pre-processed before entering the regression model. We then analyse the validity of the Bayesian model selection principle for this application (and in particular within the Gaussian Process formulation) and compare this principle to the resampling selection scheme. The final task considered is Collaborative Filtering which is related to applications such as recommendation for e-commerce and text mining. This task is illustrative of the way how intuitive considerations can guide the design of the model and the choice of the probability distributions appearing in it. We compare the intuitive approach with a simpler matrix factorisation approach.
|
Page generated in 0.0784 seconds