291 |
Topics in Modern Bayesian ComputationQamar, Shaan January 2015 (has links)
<p>Collections of large volumes of rich and complex data has become ubiquitous in recent years, posing new challenges in methodological and theoretical statistics alike. Today, statisticians are tasked with developing flexible methods capable of adapting to the degree of complexity and noise in increasingly rich data gathered across a variety of disciplines and settings. This has spurred the need for novel multivariate regression techniques that can efficiently capture a wide range of naturally occurring predictor-response relations, identify important predictors and their interactions and do so even when the number of predictors is large but the sample size remains limited. </p><p>Meanwhile, efficient model fitting tools must evolve quickly to keep pace with the rapidly growing dimension and complexity of data they are applied to. Aided by the tremendous success of modern computing, Bayesian methods have gained tremendous popularity in recent years. These methods provide a natural probabilistic characterization of uncertainty in the parameters and in predictions. In addition, they provide a practical way of encoding model structure that can lead to large gains in statistical estimation and more interpretable results. However, this flexibility is often hindered in applications to modern data which are increasingly high dimensional, both in the number of observations $n$ and the number of predictors $p$. Here, computational complexity and the curse of dimensionality typically render posterior computation inefficient. In particular, Markov chain Monte Carlo (MCMC) methods which remain the workhorse for Bayesian computation (owing to their generality and asymptotic accuracy guarantee), typically suffer data processing and computational bottlenecks as a consequence of (i) the need to hold the entire dataset (or available sufficient statistics) in memory at once; and (ii) having to evaluate of the (often expensive to compute) data likelihood at each sampling iteration. </p><p>This thesis divides into two parts. The first part concerns itself with developing efficient MCMC methods for posterior computation in the high dimensional {\em large-n large-p} setting. In particular, we develop an efficient and widely applicable approximate inference algorithm that extends MCMC to the online data setting, and separately propose a novel stochastic search sampling scheme for variable selection in high dimensional predictor settings. The second part of this thesis develops novel methods for structured sparsity in the high-dimensional {\em large-p small-n} regression setting. Here, statistical methods should scale well with the predictor dimension and be able to efficiently identify low dimensional structure so as to facilitate optimal statistical estimation in the presence of limited data. Importantly, these methods must be flexible to accommodate potentially complex relationships between the response and its associated explanatory variables. The first work proposes a nonparametric additive Gaussian process model to learn predictor-response relations that may be highly nonlinear and include numerous lower order interaction effects, possibly in different parts of the predictor space. A second work proposes a novel class of Bayesian shrinkage priors for multivariate regression with a tensor valued predictor. Dimension reduction is achieved using a low-rank additive decomposition for the latter, enabling a highly flexible and rich structure within which excellent cell-estimation and region selection may be obtained through state-of-the-art shrinkage methods. In addition, the methods developed in these works come with strong theoretical guarantees.</p> / Dissertation
|
292 |
AN EXTENSION OF SOQPSK TO M-ARY SIGNALLINGBishop, Chris, Fahey, Mike 10 1900 (has links)
International Telemetering Conference Proceedings / October 20-23, 2003 / Riviera Hotel and Convention Center, Las Vegas, Nevada / Shaped Offset Quadrature Phase Shift Keying (SOQPSK) has the advantages of low sidelobes and high detection probability; however, its main lobe has a fixed width set by the number of constellation points. By slightly modifying the modulation scheme, the four constellation points of quadrature shift keying can be changed to M constellation points where M is a power of 2. After this change, the power spectral density (PSD) retains low sidelobes, and the desirable property of being able to detect the signal by integrating over two symbol periods is retained.
|
293 |
Bayesian Generative Modeling of Complex Dynamical SystemsGuan, Jinyan January 2016 (has links)
This dissertation presents a Bayesian generative modeling approach for complex dynamical systems for emotion-interaction patterns within multivariate data collected in social psychology studies. While dynamical models have been used by social psychologists to study complex psychological and behavior patterns in recent years, most of these studies have been limited by using regression methods to fit the model parameters from noisy observations. These regression methods mostly rely on the estimates of the derivatives from the noisy observation, thus easily result in overfitting and fail to predict future outcomes. A Bayesian generative model solves the problem by integrating the prior knowledge of where the data comes from with the observed data through posterior distributions. It allows the development of theoretical ideas and mathematical models to be independent of the inference concerns. Besides, Bayesian generative statistical modeling allows evaluation of the model based on its predictive power instead of the model residual error reduction in regression methods to prevent overfitting in social psychology data analysis. In the proposed Bayesian generative modeling approach, this dissertation uses the State Space Model (SSM) to model the dynamics of emotion interactions. Specifically, it tests the approach in a class of psychological models aimed at explaining the emotional dynamics of interacting couples in committed relationships. The latent states of the SSM are composed of continuous real numbers that represent the level of the true emotional states of both partners. One can obtain the latent states at all subsequent time points by evolving a differential equation (typically a coupled linear oscillator (CLO)) forward in time with some known initial state at the starting time. The multivariate observed states include self-reported emotional experiences and physiological measurements of both partners during the interactions. To test whether well-being factors, such as body weight, can help to predict emotion-interaction patterns, we construct functions that determine the prior distributions of the CLO parameters of individual couples based on existing emotion theories. Besides, we allow a single latent state to generate multivariate observations and learn the group-shared coefficients that specify the relationship between the latent states and the multivariate observations. Furthermore, we model the nonlinearity of the emotional interaction by allowing smooth changes (drift) in the model parameters. By restricting the stochasticity to the parameter level, the proposed approach models the dynamics in longer periods of social interactions assuming that the interaction dynamics slowly and smoothly vary over time. The proposed approach achieves this by applying Gaussian Process (GP) priors with smooth covariance functions to the CLO parameters. Also, we propose to model the emotion regulation patterns as clusters of the dynamical parameters. To infer the parameters of the proposed Bayesian generative model from noisy experimental data, we develop a Gibbs sampler to learn the parameters of the patterns using a set of training couples. To evaluate the fitted model, we develop a multi-level cross-validation procedure for learning the group-shared parameters and distributions from training data and testing the learned models on held-out testing data. During testing, we use the learned shared model parameters to fit the individual CLO parameters to the first 80% of the time points of the testing data by Monte Carlo sampling and then predict the states of the last 20% of the time points. By evaluating models with cross-validation, one can estimate whether complex models are overfitted to noisy observations and fail to generalize to unseen data. I test our approach on both synthetic data that was generated by the generative model and real data that was collected in multiple social psychology experiments. The proposed approach has the potential to model other complex behavior since the generative model is not restricted to the forms of the underlying dynamics.
|
294 |
Spatial Growth Regressions: Model Specification, Estimation and InterpretationLeSage, James P., Fischer, Manfred M. 04 1900 (has links) (PDF)
This paper uses Bayesian model comparison methods to simultaneously specify both the
spatial weight structure and explanatory variables for a spatial growth regression involving
255 NUTS 2 regions across 25 European countries. In addition, a correct interpretation of
the spatial regression parameter estimates that takes into account the simultaneous feed-
back nature of the spatial autoregressive model is provided. Our findings indicate that
incorporating model uncertainty in conjunction with appropriate parameter interpretation
decreased the importance of explanatory variables traditionally thought to exert an important influence on regional income growth rates. (authors' abstract)
|
295 |
Probabilistic Models for Species Tree Inference and Orthology AnalysisUllah, Ikram January 2015 (has links)
A phylogenetic tree is used to model gene evolution and species evolution using molecular sequence data. For artifactual and biological reasons, a gene tree may differ from a species tree, a phenomenon known as gene tree-species tree incongruence. Assuming the presence of one or more evolutionary events, e.g., gene duplication, gene loss, and lateral gene transfer (LGT), the incongruence may be explained using a reconciliation of a gene tree inside a species tree. Such information has biological utilities, e.g., inference of orthologous relationship between genes. In this thesis, we present probabilistic models and methods for orthology analysis and species tree inference, while accounting for evolutionary factors such as gene duplication, gene loss, and sequence evolution. Furthermore, we use a probabilistic LGT-aware model for inferring gene trees having temporal information for duplication and LGT events. In the first project, we present a Bayesian method, called DLRSOrthology, for estimating orthology probabilities using the DLRS model: a probabilistic model integrating gene evolution, a relaxed molecular clock for substitution rates, and sequence evolution. We devise a dynamic programming algorithm for efficiently summing orthology probabilities over all reconciliations of a gene tree inside a species tree. Furthermore, we present heuristics based on receiver operating characteristics (ROC) curve to estimate suitable thresholds for deciding orthology events. Our method, as demonstrated by synthetic and biological results, outperforms existing probabilistic approaches in accuracy and is robust to incomplete taxon sampling artifacts. In the second project, we present a probabilistic method, based on a mixture model, for species tree inference. The method employs a two-phase approach, where in the first phase, a structural expectation maximization algorithm, based on a mixture model, is used to reconstruct a maximum likelihood set of candidate species trees. In the second phase, in order to select the best species tree, each of the candidate species tree is evaluated using PrIME-DLRS: a method based on the DLRS model. The method is accurate, efficient, and scalable when compared to a recent probabilistic species tree inference method called PHYLDOG. We observe that, in most cases, the analysis constituted only by the first phase may also be used for selecting the target species tree, yielding a fast and accurate method for larger datasets. Finally, we devise a probabilistic method based on the DLTRS model: an extension of the DLRS model to include LGT events, for sampling reconciliations of a gene tree inside a species tree. The method enables us to estimate gene trees having temporal information for duplication and LGT events. To the best of our knowledge, this is the first probabilistic method that takes gene sequence data directly into account for sampling reconciliations that contains information about LGT events. Based on the synthetic data analysis, we believe that the method has the potential to identify LGT highways. / <p>QC 20150529</p>
|
296 |
Essays on Bayesian Inference for Social NetworksKoskinen, Johan January 2004 (has links)
<p>This thesis presents Bayesian solutions to inference problems for three types of social network data structures: a single observation of a social network, repeated observations on the same social network, and repeated observations on a social network developing through time.</p><p>A social network is conceived as being a structure consisting of actors and their social interaction with each other. A common conceptualisation of social networks is to let the actors be represented by nodes in a graph with edges between pairs of nodes that are relationally tied to each other according to some definition. Statistical analysis of social networks is to a large extent concerned with modelling of these relational ties, which lends itself to empirical evaluation.</p><p>The first paper deals with a family of statistical models for social networks called exponential random graphs that takes various structural features of the network into account. In general, the likelihood functions of exponential random graphs are only known up to a constant of proportionality. A procedure for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods is presented. The algorithm consists of two basic steps, one in which an ordinary Metropolis-Hastings up-dating step is used, and another in which an importance sampling scheme is used to calculate the acceptance probability of the Metropolis-Hastings step.</p><p>In paper number two a method for modelling reports given by actors (or other informants) on their social interaction with others is investigated in a Bayesian framework. The model contains two basic ingredients: the unknown network structure and functions that link this unknown network structure to the reports given by the actors. These functions take the form of probit link functions. An intrinsic problem is that the model is not identified, meaning that there are combinations of values on the unknown structure and the parameters in the probit link functions that are observationally equivalent. Instead of using restrictions for achieving identification, it is proposed that the different observationally equivalent combinations of parameters and unknown structure be investigated a posteriori. Estimation of parameters is carried out using Gibbs sampling with a switching devise that enables transitions between posterior modal regions. The main goal of the procedures is to provide tools for comparisons of different model specifications.</p><p>Papers 3 and 4, propose Bayesian methods for longitudinal social networks. The premise of the models investigated is that overall change in social networks occurs as a consequence of sequences of incremental changes. Models for the evolution of social networks using continuos-time Markov chains are meant to capture these dynamics. Paper 3 presents an MCMC algorithm for exploring the posteriors of parameters for such Markov chains. More specifically, the unobserved evolution of the network in-between observations is explicitly modelled thereby avoiding the need to deal with explicit formulas for the transition probabilities. This enables likelihood based parameter inference in a wider class of network evolution models than has been available before. Paper 4 builds on the proposed inference procedure of Paper 3 and demonstrates how to perform model selection for a class of network evolution models.</p>
|
297 |
Bayesian stochastic differential equation modelling with application to financeAl-Saadony, Muhannad January 2013 (has links)
In this thesis, we consider some popular stochastic differential equation models used in finance, such as the Vasicek Interest Rate model, the Heston model and a new fractional Heston model. We discuss how to perform inference about unknown quantities associated with these models in the Bayesian framework. We describe sequential importance sampling, the particle filter and the auxiliary particle filter. We apply these inference methods to the Vasicek Interest Rate model and the standard stochastic volatility model, both to sample from the posterior distribution of the underlying processes and to update the posterior distribution of the parameters sequentially, as data arrive over time. We discuss the sensitivity of our results to prior assumptions. We then consider the use of Markov chain Monte Carlo (MCMC) methodology to sample from the posterior distribution of the underlying volatility process and of the unknown model parameters in the Heston model. The particle filter and the auxiliary particle filter are also employed to perform sequential inference. Next we extend the Heston model to the fractional Heston model, by replacing the Brownian motions that drive the underlying stochastic differential equations by fractional Brownian motions, so allowing a richer dependence structure across time. Again, we use a variety of methods to perform inference. We apply our methodology to simulated and real financial data with success. We then discuss how to make forecasts using both the Heston and the fractional Heston model. We make comparisons between the models and show that using our new fractional Heston model can lead to improve forecasts for real financial data.
|
298 |
A model-based statistical approach to functional MRI group studiesBothma, Adel January 2010 (has links)
Functional Magnetic Resonance Imaging (fMRI) is a noninvasive imaging method that reflects local changes in brain activity. FMRI group studies involves the analysis of the functional images acquired for each of a group of subjects under the same experimental conditions. We propose a spatial marked point-process model for the activation patterns of the subjects in a group study. Each pattern is described as the sum of individual centres of activation. The marked point-process that we propose allows the researcher to enforce repulsion between all pairs of centres of an individual subject that are within a specified minimum distance of each other. It also allows the researcher to enforce attraction between similarly-located centres from different subjects. This attraction helps to compensate for the misalignment of corresponding functional areas across subjects and is a novel method of addressing the problem of imperfect inter-subject registration of functional images. We use a Bayesian framework and choose prior distributions according to current understanding of brain activity. Simulation studies and exploratory studies of our reference dataset are used to fine-tune the prior distributions. We perform inference via Markov chain Monte Carlo. The fitted model gives a summary of the activation in terms of its location, height and size. We use this summary both to identify brain regions that were activated in response to the stimuli under study and to quantify the discrepancies between the activation maps of subjects. Applied to our reference dataset, our measure is successful in separating out those subjects with activation patterns that do not agree with the overall group pattern. In addition, our measure is sensitive to subjects with a large number of activation centres relative to the other subjects in the group. The activation summary given by our model makes it possible to pursue a range of inferential questions that cannot be addressed with ease by current model-based approaches.
|
299 |
On auxiliary variables and many-core architectures in computational statisticsLee, Anthony January 2011 (has links)
Emerging many-core computer architectures provide an incentive for computational methods to exhibit specific types of parallelism. Our ability to perform inference in Bayesian statistics is often dependent upon our ability to approximate expectations of functions of random variables, for which Monte Carlo methodology provides a general purpose solution using a computer. This thesis is primarily concerned with exploring the gains that can be obtained by using many-core architectures to accelerate existing population-based Monte Carlo algorithms, as well as providing a novel general framework that can be used to devise new population-based methods. Monte Carlo algorithms are often concerned with sampling random variables taking values in X whose density is known up to a normalizing constant. Population-based methods typically make use of collections of interacting auxiliary random variables, each of which is in X, in specifying an algorithm. Such methods are good candidates for parallel implementation when the collection of samples can be generated in parallel and their interaction steps are either parallelizable or negligible in cost. The first contribution of this thesis is in demonstrating the potential speedups that can be obtained for two common population-based methods, population-based Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC). The second contribution of this thesis is in the derivation of a hierarchical family of sparsity-inducing priors in regression and classification settings. Here, auxiliary variables make possible the implementation of a fast algorithm for finding local modes of the posterior density. SMC, accelerated on a many-core architecture, is then used to perform inference for a range of prior specifications to gain an understanding of sparse association signal in the context of genome-wide association studies. The third contribution is in the use of a new perspective on reversible MCMC kernels that allows for the construction of novel population-based methods. These methods differ from most existing methods in that one can make the resulting kernels define a Markov chain on X. A further development is that one can define kernels in which the number of auxiliary variables is given a distribution conditional on the values of the auxiliary variables obtained so far. This is perhaps the most important methodological contribution of the thesis, and the adaptation of the number of particles used within a particle MCMC algorithm provides a general purpose algorithm for sampling from a variety of complex distributions.
|
300 |
Addressing Issues in the Detection of Gene-Environment Interaction Through the Study of Conduct DisorderProm, Elizabeth Chin 01 January 2007 (has links)
This work addresses issues in the study of gene-environment interaction (GxE) through research of conduct disorder (CD) among adolescents and extends the recent report of significant GxE and subsequent replication studies. A sub-sample of 1,299 individual participants/649 twin pairs and their parents from the Virginia Twin Study of Adolescent and Behavioral Development was used for whom Monoamine Oxidase A (MAOA) genotype, diagnosis of CD, maternal antisocial personality symptoms, and household neglect were obtained. This dissertation (1) tested for GxE by gender using MAOA and childhood adversity using multiple approaches to CD measurement and model assessment, (2) determined whether other mechanisms would explain differences in GxE by gender and (3) identified and assessed other genes and environments related to the interaction MAOA and childhood adversity. Using a multiple regression approach, a main effect of the low/low MAOA genotype remained after controlling other risk factors in females. However, the effects of GxE were modest and were removed by transforming the environmental measures. In contrast, there was no significant effect of the low activity MAOA allele in males although significant GxE was detected and remained after transformation. The sign of the interaction for males was opposite from females, indicating genetic sensitivity to childhood adversity may differ by gender. Upon further investigation, gender differences in GxE were due to genotype-sex interaction and may involve MAOA. A Markov Chain Monte Carlo approach including a genetic Item Response Theory modeled CD as a trait with continuous liability, since false detection of GxE may result from measurement. In males and females, the inclusion of GxE while controlling for the other covariates was appropriate, but was little improvement in model fit and effect sizes of GxE were small. Other candidate genes functioning in the serotonin and dopamine neurotransmitter systems were tested for interaction with MAOA to affect risk for CD. Main genetic effects of dopamine transporter genotype and MAOA in the presence of comorbidity were detected. No epistatic effects were detected. The use of random forests systematically assessed the environment and produced several interesting environments that will require more thoughtful consideration before incorporation into a model testing GxE.
|
Page generated in 0.0689 seconds