Spelling suggestions: "subject:"bayesian inference"" "subject:"eayesian inference""
1 |
From genomes to post-processing of Bayesian inference of phylogenyAli, Raja Hashim January 2016 (has links)
Life is extremely complex and amazingly diverse; it has taken billions of years of evolution to attain the level of complexity we observe in nature now and ranges from single-celled prokaryotes to multi-cellular human beings. With availability of molecular sequence data, algorithms inferring homology and gene families have emerged and similarity in gene content between two genes has been the major signal utilized for homology inference. Recently there has been a significant rise in number of species with fully sequenced genome, which provides an opportunity to investigate and infer homologs with greater accuracy and in a more informed way. Phylogeny analysis explains the relationship between member genes of a gene family in a simple, graphical and plausible way using a tree representation. Bayesian phylogenetic inference is a probabilistic method used to infer gene phylogenies and posteriors of other evolutionary parameters. Markov chain Monte Carlo (MCMC) algorithm, in particular using Metropolis-Hastings sampling scheme, is the most commonly employed algorithm to determine evolutionary history of genes. There are many softwares available that process results from each MCMC run, and explore the parameter posterior but there is a need for interactive software that can analyse both discrete and real-valued parameters, and which has convergence assessment and burnin estimation diagnostics specifically designed for Bayesian phylogenetic inference. In this thesis, a synteny-aware approach for gene homology inference, called GenFamClust (GFC), is proposed that uses gene content and gene order conservation to infer homology. The feature which distinguishes GFC from earlier homology inference methods is that local synteny has been combined with gene similarity to infer homologs, without inferring homologous regions. GFC was validated for accuracy on a simulated dataset. Gene families were computed by applying clustering algorithms on homologs inferred from GFC, and compared for accuracy, dependence and similarity with gene families inferred from other popular gene family inference methods on a eukaryotic dataset. Gene families in fungi obtained from GFC were evaluated against pillars from Yeast Gene Order Browser. Genome-wide gene families for some eukaryotic species are computed using this approach. Another topic focused in this thesis is the processing of MCMC traces for Bayesian phylogenetics inference. We introduce a new software VMCMC which simplifies post-processing of MCMC traces. VMCMC can be used both as a GUI-based application and as a convenient command-line tool. VMCMC supports interactive exploration, is suitable for automated pipelines and can handle both real-valued and discrete parameters observed in a MCMC trace. We propose and implement joint burnin estimators that are specifically applicable to Bayesian phylogenetics inference. These methods have been compared for similarity with some other popular convergence diagnostics. We show that Bayesian phylogenetic inference and VMCMC can be applied to infer valuable evolutionary information for a biological case – the evolutionary history of FERM domain. / <p>QC 20160201</p>
|
2 |
Bayesian inference in parameter estimation of bioprocessesMathias, Nigel January 2024 (has links)
The following thesis explores the use of Bayes’ theorem for modelling bioprocesses, specifically using a combination of data-driven modelling techniques and Bayesian inference, to address practical concerns that arise when estimating parameters. This thesis is divided into four chapters, including a novel contribution to the use of sur- rogate modelling and parameter estimation algorithms for noisy data.
The 2nd chapter addresses the problem of high computational expense when estimat- ing parameters using complex models. The main solution here is the use of surrogate modelling. This method was then applied to a high-fidelity model provided by Sarto- rius AG. In this, a 3-batch run (simulated) of the bioreactor was passed through the algorithm, and two influential parameters, the growth and death rates of the live cell cultures, were estimated.
The 3rd chapter addresses other challenges that arise in parameter estimation prob- lems. Specifically, the issue of having limited data on a new process can be addressed using historical data, a distinct feature in Bayesian Learning. Finally, the problem with choosing the “right” model for a given process is studied through the use of a term in Bayesian inference known as the evidence. In this, the evidence is used to select between a series of models based on both model complexity and goodness-of-fit to the data. / Thesis / Master of Applied Science (MASc)
|
3 |
Efficient implementation of Markov chain Monte CarloFan, Yanan January 2001 (has links)
No description available.
|
4 |
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic MethodsAbeyruwan, Saminda Wishwajith 01 January 2010 (has links)
An ontology is a formal, explicit specification of a shared conceptualization. Formalizing an ontology for a domain is a tedious and cumbersome process. It is constrained by the knowledge acquisition bottleneck (KAB). There exists a large number of text corpora that can be used for classification in order to create ontologies with the intention to provide better support for the intended parties. In our research we provide a novel unsupervised bottom-up ontology generation method. This method is based on lexico-semantic structures and Bayesian reasoning to expedite the ontology generation process. This process also provides evidence to domain experts to build ontologies based on top-down approaches.
|
5 |
Mixtures of triangular densities with applications to Bayesian mode regressionsHo, Chi-San 22 September 2014 (has links)
The main focus of this thesis is to develop full parametric and semiparametric Bayesian inference for data arising from triangular distributions. A natural consequence of working with such distributions is it allows one to consider regression models where the response variable is now the mode of the data distribution. A new family of nonparametric prior distributions is developed for a certain class of convex densities of particular relevance to mode regressions. Triangular distributions arise in several contexts such as geosciences, econometrics, finance, health care management, sociology, reliability engineering, decision and risk analysis, etc. In many fields, experts, typically, have a reasonable idea about the range and most likely values that define a data distribution. Eliciting these quantities is thus, generally, easier than eliciting moments of other commonly known distributions. Using simulated and actual data, applications of triangular distributions, with and without mode regressions, in some of the aforementioned areas are tackled. / text
|
6 |
Statistical model selection techniques for data analysisStark, J. Alex January 1995 (has links)
No description available.
|
7 |
Metanálise para Modelos de Regressão / Meta-analysis for Regression ModelsSantos, Laryssa Vieira dos 28 October 2016 (has links)
A metanálise tem sido amplamente utilizada em estudos médicos especialmente em revisões sistemáticas de ensaios clínicos aleatorizados. Para modelos de regressão a técnica ainda é muito escassa e limitada. Geralmente trata-se apenas de uma medida baseada nas médias de estimativas pontuais dos diferentes estudos, perdendo-se muita informação dos dados originais. Atualmente torna-se cada vez mais fundamental o uso da metanálise para sumarizar estudos de mesmo objetivo, em razão do avanço da ciência e o desejo de usar o menor número de seres humanos em ensaios clínicos. Utilizando uma medida metanalítica Bayesiana, o objetivo é propor um método genérico e eficiente para realizar metanálise em modelos de regressão. / Meta analysis has been widely used in medical studies especially in systematic reviews of randomized clinical trials. For regression models the technique is still very scarce and limited. Usually it is just a measure based on the average point estimates of dierent studies, losing a lot of information of the original data. Currently it becomes increasingly important to use the meta-analysis to summarize the same objective studies, due to the advancement of science and the desire to use the smallest number of human subjects in clinical trials. Using a meta-analytic Bayesian measure, the objective is to propose a generic and ecient method to perform meta-analysis in regression models.
|
8 |
Supervised learning for back analysis of excavations in the observational methodJin, Yingyan January 2018 (has links)
In the past few decades, demand for construction in underground spaces has increased dramatically in urban areas with high population densities. However, the impact of the construction of underground structures on surrounding infrastructure raises concerns since movements caused by deep excavations might damage adjacent buildings. Unfortunately, the prediction of geotechnical behaviour is difficult due to uncertainties and lack of information of on the underground environment. Therefore, to ensure safety, engineers tend to choose very conservative designs that result in requiring unnecessary material and longer construction time. The observational method, which was proposed by Peck in 1969, and formalised in Eurocode 7 in 1987, provides a way to avoid such redundancy by modifying the design based on the knowledge gathered during construction. The review process within the observational method is recognised as back analysis. Supervised learning can aid in this process, providing a systematic procedure to assess soil parameters based on monitoring data and prediction of the ground response. A probabilistic model is developed in this research to account for the uncertainties in the problem. Sequential Bayesian inference is used to update the soil parameters at each excavation stage when observations are available. The accuracy of the prediction for future stages improves at each stage. Meanwhile, the uncertainty contained in the prediction decreases, and therefore the confidence on the corresponding design also increases. Moreover, the Bayesian method integrates subjective engineering experience and objective observations in a rational and quantitative way, which enables the model to update soil parameters even when the amount of data is very limited. It also allows the use of the knowledge learnt from comparable ground conditions, which is particularly useful in the absence of site-specific information on ground conditions. Four probabilistic models are developed in this research. The first two incorporate empirical excavation design methods. These simple models are used to examine the practicality of the approach with several cases. The next two are coupled with a program called FREW, which is able to simulate the excavation process, still in a relatively simplistic way. The baseline model with simple assumptions on model error and another one is a more sophisticated model considering measurement error and spatial relationships among the observations. Their efficiency and accuracy are verified using a synthetic case and tested based on a case history from the London Crossrail project. In the end, the models are compared and their flexibility in different cases is discussed.
|
9 |
Labor market policies in an equilibrium matching model with heterogeneous agents and on-the-job searchStavrunova, Olena 01 January 2007 (has links)
This dissertation quantitatively evaluates selected labor market policies in a search-matching model with skill heterogeneity where high-skilled workers can take temporary jobs with skill requirements below their skill levels. The joint posterior distribution of structural parameters of the theoretical model is obtained conditional on the data on labor markets histories of the NLSY79 respondents. The information on AFQT scores of individuals and the skill requirements of occupations is utilized to identify the skill levels of workers and complexity levels of jobs in the job-worker matches realized in the data. The model and the data are used to simulate the posterior distributions of impacts of labor market policies on the endogenous variables of interest to a policy-maker, including unemployment rates, durations and wages of low- and high-skilled workers. In particular, the effects of the following policies are analyzed: increase in proportion of high-skilled workers, subsidies for employing or hiring high- and low-skilled workers and increase in unemployment income.
|
10 |
Bayesian Methods for On-Line Gross Error Detection and CompensationGonzalez, Ruben 11 1900 (has links)
Data reconciliation and gross error detection are traditional methods toward detecting mass balance inconsistency within process instrument data. These methods use a static approach for statistical evaluation. This thesis is concerned with using an alternative statistical approach (Bayesian statistics) to detect mass balance inconsistency in real time.
The proposed dynamic Baysian solution makes use of a state space process model which incorporates mass balance relationships so that a governing set of mass balance variables can be estimated using a Kalman filter. Due to the incorporation of mass balances, many model parameters are defined by first principles. However, some parameters, namely the observation and state covariance matrices, need to be estimated from process data before the dynamic Bayesian methods could be applied. This thesis makes use of Bayesian machine learning techniques to estimate these parameters, separating process disturbances from instrument measurement noise. / Process Control
|
Page generated in 0.0536 seconds