Spelling suggestions: "subject:"bayesian inference"" "subject:"eayesian inference""
141 |
STORI: selectable taxon ortholog retrieval iterativelyStern, Joshua Gallant 08 June 2015 (has links)
Speciation and gene duplication are fundamental evolutionary processes that enable biological innovation. For over a decade, biologists have endeavored to distinguish orthology (homology caused by speciation) from paralogy (homology caused by duplication). Disentangling orthology and paralogy is useful to diverse fields such as phylogenetics, protein engineering, and genome content comparison.
A common step in ortholog detection is the computation of Bidirectional Best Hits (BBH). However, we found this computation impractical for more than 24 Eukaryotic proteomes. Attempting to retrieve orthologs in less time than previous methods require, we developed a novel algorithm and implemented it as a suite of Perl scripts. This software, Selectable Taxon Ortholog Retrieval Iteratively (STORI), retrieves orthologous protein sequences for a set of user-defined proteomes and query sequences. While the time complexity of the BBH method is O(#taxa^2), we found that the average CPU time used by STORI may increase linearly with the number of taxa.
To demonstrate one aspect of STORI’s usefulness, we used this software to infer the orthologous sequences of 26 ribosomal proteins (rProteins) from the large ribosomal subunit (LSU), for a set of 115 Bacterial and 94 Archaeal proteomes. Next, we used established tree-search methods to seek the most probable evolutionary explanation of these data. The current implementation of STORI runs on Red Hat Enterprise Linux 6.0 with installations of Moab 5.3.7, Perl 5 and several Perl modules. STORI is available at: <http://github.com/jgstern/STORI>.
|
142 |
Bayesian Estimation of Panel Data Fractional Response Models with Endogeneity: An Application to Standardized Test RatesKessler, Lawrence 01 January 2013 (has links)
In this paper I propose Bayesian estimation of a nonlinear panel data model with a fractional dependent variable (bounded between 0 and 1). Specifically, I estimate a panel data fractional probit model which takes into account the bounded nature of the fractional response variable. I outline estimation under the assumption of strict exogeneity as well as when allowing for potential endogeneity. Furthermore, I illustrate how transitioning from the strictly exogenous case to the case of endogeneity only requires slight adjustments. For comparative purposes I also estimate linear specifications of these models and show how quantities of interest such as marginal effects can be calculated and compared across models. Using data from the state of Florida, I examine the relationship between school spending and student achievement, and find that increased spending has a positive and statistically significant effect on student achievement. Furthermore, this effect is roughly 50% larger in the model which allows for endogenous spending. Specifically, a $1,000 increase in per-pupil spending is associated with an increase in standardized test pass rates ranging from 6.2-10.1%.
|
143 |
Selection, calibration, and validation of coarse-grained models of atomistic systemsFarrell, Kathryn Anne 03 September 2015 (has links)
This dissertation examines the development of coarse-grained models of atomistic systems for the purpose of predicting target quantities of interest in the presence of uncertainties. It addresses fundamental questions in computational science and engineering concerning model selection, calibration, and validation processes that are used to construct predictive reduced order models through a unified Bayesian framework. This framework, enhanced with the concepts of information theory, sensitivity analysis, and Occam's Razor, provides a systematic means of constructing coarse-grained models suitable for use in a prediction scenario. The novel application of a general framework of statistical calibration and validation to molecular systems is presented. Atomistic models, which themselves contain uncertainties, are treated as the ground truth and provide data for the Bayesian updating of model parameters. The open problem of the selection of appropriate coarse-grained models is addressed through the powerful notion of Bayesian model plausibility. A new, adaptive algorithm for model validation is presented. The Occam-Plausibility ALgorithm (OPAL), so named for its adherence to Occam's Razor and the use of Bayesian model plausibilities, identifies, among a large set of models, the simplest model that passes the Bayesian validation tests, and may therefore be used to predict chosen quantities of interest. By discarding or ignoring unnecessarily complex models, this algorithm contains the potential to reduce computational expense with the systematic process of considering subsets of models, as well as the implementation of the prediction scenario with the simplest valid model. An application to the construction of a coarse-grained system of polyethylene is given to demonstrate the implementation of molecular modeling techniques; the process of Bayesian selection, calibration, and validation of reduced-order models; and OPAL. The potential of the Bayesian framework for the process of coarse graining and of OPAL as a means of determining a computationally conservative valid model is illustrated on the polyethylene example. / text
|
144 |
ベイス推定に基づく音楽アライメント / Bayesian Music Alignment前澤, 陽 23 March 2015 (has links)
Kyoto University (京都大学) / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19106号 / 情博第552号 / 新制||情||98 / 32057 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 田中 利幸, 講師 吉井 和佳 / 学位規則第4条第1項該当
|
145 |
Statistical Learning of Some Complex Systems: From Dynamic Systems to Market MicrostructureTong, Xiao Thomas 27 September 2013 (has links)
A complex system is one with many parts, whose behaviors are strongly dependent on each other. There are two interesting questions about complex systems. One is to understand how to recover the true structure of a complex system from noisy data. The other is to understand how the system interacts with its environment. In this thesis, we address these two questions by studying two distinct complex systems: dynamic systems and market microstructure. To address the first question, we focus on some nonlinear dynamic systems. We develop a novel Bayesian statistical method, Gaussian Emulator, to estimate the parameters of dynamic systems from noisy data, when the data are either fully or partially observed. Our method shows that estimation accuracy is substantially improved and computation is faster, compared to the numerical solvers. To address the second question, we focus on the market microstructure of hidden liquidity. We propose some statistical models to explain the hidden liquidity under different market conditions. Our statistical results suggest that hidden liquidity can be reliably predicted given the visible state of the market. / Statistics
|
146 |
Bayesian Inference Approaches for Particle Trajectory Analysis in Cell BiologyMonnier, Nilah 28 August 2013 (has links)
Despite the importance of single particle motion in biological systems, systematic inference approaches to analyze particle trajectories and evaluate competing motion models are lacking. An automated approach for robust evaluation of motion models that does not require manual intervention is highly desirable to enable analysis of datasets from high-throughput imaging technologies that contain hundreds or thousands of trajectories of biological particles, such as membrane receptors, vesicles, chromosomes or kinetochores, mRNA particles, or whole cells in developing embryos. Bayesian inference is a general theoretical framework for performing such model comparisons that has proven successful in handling noise and experimental limitations in other biological applications. The inherent Bayesian penalty on model complexity, which avoids overfitting, is particularly important for particle trajectory analysis given the highly stochastic nature of particle diffusion. This thesis presents two complementary approaches for analyzing particle motion using Bayesian inference. The first method, MSD-Bayes, discriminates a wide range of motion models--including diffusion, directed motion, anomalous and confined diffusion--based on mean- square displacement analysis of a set of particle trajectories, while the second method, HMM-Bayes, identifies dynamic switching between diffusive and directed motion along individual trajectories using hidden Markov models. These approaches are validated on biological particle trajectory datasets from a wide range of experimental systems, demonstrating their broad applicability to research in cell biology.
|
147 |
Function-on-Function Regression with Public Health ApplicationsMeyer, Mark John 06 June 2014 (has links)
Medical research currently involves the collection of large and complex data. One such type of data is functional data where the unit of measurement is a curve measured over a grid. Functional data comes in a variety of forms depending on the nature of the research. Novel methodologies are required to accommodate this growing volume of functional data alongside new testing procedures to provide valid inferences. In this dissertation, I propose three novel methods to accommodate a variety of questions involving functional data of multiple forms. I consider three novel methods: (1) a function-on-function regression for Gaussian data; (2) a historical functional linear models for repeated measures; and (3) a generalized functional outcome regression for ordinal data. For each method, I discuss the existing shortcomings of the literature and demonstrate how my method fills those gaps. The abilities of each method are demonstrated via simulation and data application.
|
148 |
Accelerating Markov chain Monte Carlo via parallel predictive prefetchingAngelino, Elaine Lee 21 October 2014 (has links)
We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. This dissertation demonstrates that MCMC inference can be accelerated in a model of parallel computation that uses speculation to predict and complete computational work ahead of when it is known to be useful. By exploiting fast, iterative approximations to the target density, we can speculatively evaluate many potential future steps of the chain in parallel. In Bayesian inference problems, this approach can accelerate sampling from the target distribution, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores. / Engineering and Applied Sciences
|
149 |
The effects of three different priors for variance parameters in the normal-mean hierarchical modelChen, Zhu, 1985- 01 December 2010 (has links)
Many prior distributions are suggested for variance parameters in the hierarchical model. The “Non-informative” interval of the conjugate inverse-gamma prior might cause problems. I consider three priors – conjugate inverse-gamma, log-normal and truncated normal for the variance parameters and do the numerical analysis on Gelman’s 8-schools data. Then with the posterior draws, I compare the Bayesian credible intervals of parameters using the three priors. I use predictive distributions to do predictions and then discuss the differences of the three priors suggested. / text
|
150 |
Top-Down Bayesian Modeling and Inference for Indoor ScenesDel Pero, Luca January 2013 (has links)
People can understand the content of an image without effort. We can easily identify the objects in it, and figure out where they are in the 3D world. Automating these abilities is critical for many applications, like robotics, autonomous driving and surveillance. Unfortunately, despite recent advancements, fully automated vision systems for image understanding do not exist. In this work, we present progress restricted to the domain of images of indoor scenes, such as bedrooms and kitchens. These environments typically have the "Manhattan" property that most surfaces are parallel to three principal ones. Further, the 3D geometry of a room and the objects within it can be approximated with simple geometric primitives, such as 3D blocks. Our goal is to reconstruct the 3D geometry of an indoor environment while also understanding its semantic meaning, by identifying the objects in the scene, such as beds and couches. We separately model the 3D geometry, the camera, and an image likelihood, to provide a generative statistical model for image data. Our representation captures the rich structure of an indoor scene, by explicitly modeling the contextual relationships among its elements, such as the typical size of objects and their arrangement in the room, and simple physical constraints, such as 3D objects do not intersect. This ensures that the predicted image interpretation will be globally coherent geometrically and semantically, which allows tackling the ambiguities caused by projecting a 3D scene onto an image, such as occlusions and foreshortening. We fit this model to images using MCMC sampling. Our inference method combines bottom-up evidence from the data and top-down knowledge from the 3D world, in order to explore the vast output space efficiently. Comprehensive evaluation confirms our intuition that global inference of the entire scene is more effective than estimating its individual elements independently. Further, our experiments show that our approach is competitive and often exceeds the results of state-of-the-art methods.
|
Page generated in 0.052 seconds