Global ETD Search

1	Efficient implementation of Markov chain Monte Carlo Fan, Yanan January 2001 (has links) No description available. 519.5 Perfect simulation; Bayesian inference
2	PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods Abeyruwan, Saminda Wishwajith 01 January 2010 (has links) An ontology is a formal, explicit specification of a shared conceptualization. Formalizing an ontology for a domain is a tedious and cumbersome process. It is constrained by the knowledge acquisition bottleneck (KAB). There exists a large number of text corpora that can be used for classification in order to create ontologies with the intention to provide better support for the intended parties. In our research we provide a novel unsupervised bottom-up ontology generation method. This method is based on lexico-semantic structures and Bayesian reasoning to expedite the ontology generation process. This process also provides evidence to domain experts to build ontologies based on top-down approaches.
3	Statistical model selection techniques for data analysis Stark, J. Alex January 1995 (has links) No description available. 519.5
4	The relationships between crime rate and income inequality : evidence from China Zhang, Wenjie, active 2013 05 December 2013 (has links) The main purpose of this study is to determine if a Bayesian approach can better capture and provide reasonable predictions for the complex linkage between crime and income inequality. In this research, we conduct a model comparison between classical inference and Bayesian inference. The conventional studies on the relationship between crime and income inequality usually employ regression analysis to demonstrate whether these two issues are associated. However, there seems to be lack of use of Bayesian approaches in regard to this matter. Studying the panel data of China from 1993 to 2009, we found that in addition to a linear mixed effects model, a Bayesian hierarchical model with informative prior is also a good model to describe the linkage between crime rate and income inequality. The choice of models really depends on the research needs and data availability. / text Crime rate Inequality Classical inference Bayesian inference
5	Monte Carlo integration in discrete undirected probabilistic models Hamze, Firas 05 1900 (has links) This thesis contains the author’s work in and contributions to the field of Monte Carlo sampling for undirected graphical models, a class of statistical model commonly used in machine learning, computer vision, and spatial statistics; the aim is to be able to use the methodology and resultant samples to estimate integrals of functions of the variables in the model. Over the course of the study, three different but related methods were proposed and have appeared as research papers. The thesis consists of an introductory chapter discussing the models considered, the problems involved, and a general outline of Monte Carlo methods. The three subsequent chapters contain versions of the published work. The second chapter, which has appeared in (Hamze and de Freitas 2004), is a presentation of new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient. Although the work discussed in Chapter 2 exhibited promise on the class of graphs to which it was suited, there are many cases where limiting the topology is quite a handicap. The work in Chapter 3 was an exploration in an alternative methodology for approximating functions of variables representable as undirected graphical models of arbitrary connectivity with pairwise potentials, as well as for estimating the notoriously difficult partition function of the graph. The algorithm, published in (Hamze and de Freitas 2005), fits into the framework of sequential Monte Carlo methods rather than the more widely used MCMC, and relies on constructing a sequence of intermediate distributions which get closer to the desired one. While the idea of using “tempered” proposals is known, we construct a novel sequence of target distributions where, rather than dropping a global temperature parameter, we sequentially couple individual pairs of variables that are, initially, sampled exactly from a spanning treeof the variables. We present experimental results on inference and estimation of the partition function for sparse and densely-connected graphs. The final contribution of this thesis, presented in Chapter 4 and also in (Hamze and de Freitas 2007), emerged from some empirical observations that were made while trying to optimize the sequence of edges to add to a graph so as to guide the population of samples to the high-probability regions of the model. Most important among these observations was that while several heuristic approaches, discussed in Chapter 1, certainly yielded improvements over edge sequences consisting of random choices, strategies based on forcing the particles to take large, biased random walks in the state-space resulted in a more efficient exploration, particularly at low temperatures. This motivated a new Monte Carlo approach to treating complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get “trapped” in cycles. We surmount this problem by modifying the sampling process to result in biased state-space paths of randomly chosen length. This alteration does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler. Graphical models Monte Carlo inference Bayesian inference
6	Monte Carlo integration in discrete undirected probabilistic models Hamze, Firas 05 1900 (has links) This thesis contains the author’s work in and contributions to the field of Monte Carlo sampling for undirected graphical models, a class of statistical model commonly used in machine learning, computer vision, and spatial statistics; the aim is to be able to use the methodology and resultant samples to estimate integrals of functions of the variables in the model. Over the course of the study, three different but related methods were proposed and have appeared as research papers. The thesis consists of an introductory chapter discussing the models considered, the problems involved, and a general outline of Monte Carlo methods. The three subsequent chapters contain versions of the published work. The second chapter, which has appeared in (Hamze and de Freitas 2004), is a presentation of new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient. Although the work discussed in Chapter 2 exhibited promise on the class of graphs to which it was suited, there are many cases where limiting the topology is quite a handicap. The work in Chapter 3 was an exploration in an alternative methodology for approximating functions of variables representable as undirected graphical models of arbitrary connectivity with pairwise potentials, as well as for estimating the notoriously difficult partition function of the graph. The algorithm, published in (Hamze and de Freitas 2005), fits into the framework of sequential Monte Carlo methods rather than the more widely used MCMC, and relies on constructing a sequence of intermediate distributions which get closer to the desired one. While the idea of using “tempered” proposals is known, we construct a novel sequence of target distributions where, rather than dropping a global temperature parameter, we sequentially couple individual pairs of variables that are, initially, sampled exactly from a spanning treeof the variables. We present experimental results on inference and estimation of the partition function for sparse and densely-connected graphs. The final contribution of this thesis, presented in Chapter 4 and also in (Hamze and de Freitas 2007), emerged from some empirical observations that were made while trying to optimize the sequence of edges to add to a graph so as to guide the population of samples to the high-probability regions of the model. Most important among these observations was that while several heuristic approaches, discussed in Chapter 1, certainly yielded improvements over edge sequences consisting of random choices, strategies based on forcing the particles to take large, biased random walks in the state-space resulted in a more efficient exploration, particularly at low temperatures. This motivated a new Monte Carlo approach to treating complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get “trapped” in cycles. We surmount this problem by modifying the sampling process to result in biased state-space paths of randomly chosen length. This alteration does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler. Graphical models Monte Carlo inference Bayesian inference
7	Acquisition and influence of expectations about visual speed Sotiropoulos, Grigorios January 2016 (has links) It has been long hypothesized that due to the inherent ambiguities of visual input and the limitations of the visual system, vision is a form of “unconscious inference” whereby the brain relies on assumptions (aka expectations) to interpret the external world. This hypothesis has been recently formalized into Bayesian models of perception (the “Bayesian brain”) that represent these expectations as prior probabilities. In this thesis, I focus on a particular kind of expectation that humans are thought to possess – that objects in the world tend to be still or move slowly – known as the “slow speed prior”. Through a combination of experimental and theoretical work, I investigate how the speed prior is acquired and how it impacts motion perception. The first part of my work consists of an experiment where subjects are exposed to simple "training" stimuli moving more often at high speeds than at low speeds. By subsequently testing the subjects with slow-moving stimuli of high uncertainty (low contrast), I find that their perception gradually changes in a manner consistent with the progressive acquisition of an expectation that favours progressively higher speeds. Thus subjects appear to gradually internalize the speed statistics of the stimulus ensemble over the duration of the experiment. I model these results using an existing Bayesian model of motion perception that incorporates a speed prior with a peak at zero, extending the model so that the mean gradually shifts away from zero. Although the first experiment presents evidence for the plasticity of the speed prior, the experimental paradigm and the constraints of the model limit the accuracy and precision in the reconstruction of observers’ priors. To address these limitations, I perform a different experiment where subjects compare the speed of moving gratings of different contrasts. The new paradigm allows more precise measurements of the contrast-dependent biases in perceived speed. Using a less constrained Bayesian model, I extract the priors of subjects and find considerable interindividual variability. Furthermore, noting that the Bayesian model cannot account for certain subtleties in the data, I combine the model with a non-Bayesian, physiologically motivated model of speed tuning of cortical neurons and show that the combination offers an improved description of the data. Using the paradigm of the second experiment, I then explore the role of visual experience on the form of the speed prior. By recruiting avid video gamers (who are routinely exposed to high speeds) and nongamers of both sexes, I study the differences in the prior among groups and find, surprisingly, that subjects’ speed priors depend more on gender than on gaming experience. In a final series of experiments similar to the first, I also test subjects on variations of the trained stimulus configuration – namely different orientations and motion directions. Subjects’ responses suggest that they are able to apply the changed prior to different orientations and, furthermore, that the changed prior persists for at least a week after the end of the experiment. These results provide further support for the plasticity of the speed prior but also suggest that the learned prior may be used only across similar stimulus configurations, whereas in sufficiently different configurations or contexts a “default” prior may be used instead. 612.8
8	Monte Carlo integration in discrete undirected probabilistic models Hamze, Firas 05 1900 (has links) This thesis contains the author’s work in and contributions to the field of Monte Carlo sampling for undirected graphical models, a class of statistical model commonly used in machine learning, computer vision, and spatial statistics; the aim is to be able to use the methodology and resultant samples to estimate integrals of functions of the variables in the model. Over the course of the study, three different but related methods were proposed and have appeared as research papers. The thesis consists of an introductory chapter discussing the models considered, the problems involved, and a general outline of Monte Carlo methods. The three subsequent chapters contain versions of the published work. The second chapter, which has appeared in (Hamze and de Freitas 2004), is a presentation of new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient. Although the work discussed in Chapter 2 exhibited promise on the class of graphs to which it was suited, there are many cases where limiting the topology is quite a handicap. The work in Chapter 3 was an exploration in an alternative methodology for approximating functions of variables representable as undirected graphical models of arbitrary connectivity with pairwise potentials, as well as for estimating the notoriously difficult partition function of the graph. The algorithm, published in (Hamze and de Freitas 2005), fits into the framework of sequential Monte Carlo methods rather than the more widely used MCMC, and relies on constructing a sequence of intermediate distributions which get closer to the desired one. While the idea of using “tempered” proposals is known, we construct a novel sequence of target distributions where, rather than dropping a global temperature parameter, we sequentially couple individual pairs of variables that are, initially, sampled exactly from a spanning treeof the variables. We present experimental results on inference and estimation of the partition function for sparse and densely-connected graphs. The final contribution of this thesis, presented in Chapter 4 and also in (Hamze and de Freitas 2007), emerged from some empirical observations that were made while trying to optimize the sequence of edges to add to a graph so as to guide the population of samples to the high-probability regions of the model. Most important among these observations was that while several heuristic approaches, discussed in Chapter 1, certainly yielded improvements over edge sequences consisting of random choices, strategies based on forcing the particles to take large, biased random walks in the state-space resulted in a more efficient exploration, particularly at low temperatures. This motivated a new Monte Carlo approach to treating complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get “trapped” in cycles. We surmount this problem by modifying the sampling process to result in biased state-space paths of randomly chosen length. This alteration does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler. / Science, Faculty of / Computer Science, Department of / Graduate Graphical models Monte Carlo inference Bayesian inference
9	Bayesian methods for gravitational waves and neural networks Graff, Philip B. January 2012 (has links) Einstein’s general theory of relativity has withstood 100 years of testing and will soon be facing one of its toughest challenges. In a few years we expect to be entering the era of the first direct observations of gravitational waves. These are tiny perturbations of space-time that are generated by accelerating matter and affect the measured distances between two points. Observations of these using the laser interferometers, which are the most sensitive length-measuring devices in the world, will allow us to test models of interactions in the strong field regime of gravity and eventually general relativity itself. I apply the tools of Bayesian inference for the examination of gravitational wave data from the LIGO and Virgo detectors. This is used for signal detection and estimation of the source parameters. I quantify the ability of a network of ground-based detectors to localise a source position on the sky for electromagnetic follow-up. Bayesian criteria are also applied to separating real signals from glitches in the detectors. These same tools and lessons can also be applied to the type of data expected from planned space-based detectors. Using simulations from the Mock LISA Data Challenges, I analyse our ability to detect and characterise both burst and continuous signals. The two seemingly different signal types will be overlapping and confused with one another for a space-based detector; my analysis shows that we will be able to separate and identify many signals present. Data sets and astrophysical models are continuously increasing in complexity. This will create an additional computational burden for performing Bayesian inference and other types of data analysis. I investigate the application of the MOPED algorithm for faster parameter estimation and data compression. I find that its shortcomings make it a less favourable candidate for further implementation. The framework of an artificial neural network is a simple model for the structure of a brain which can “learn” functional relationships between sets of inputs and outputs. I describe an algorithm developed for the training of feed-forward networks on pre-calculated data sets. The trained networks can then be used for fast prediction of outputs for new sets of inputs. After demonstrating capabilities on toy data sets, I apply the ability of the network to classifying handwritten digits from the MNIST database and measuring ellipticities of galaxies in the Mapping Dark Matter challenge. The power of neural networks for learning and rapid prediction is also useful in Bayesian inference where the likelihood function is computationally expensive. The new BAMBI algorithm is detailed, in which our network training algorithm is combined with the nested sampling algorithm MULTINEST to provide rapid Bayesian inference. Using samples from the normal inference, a network is trained on the likelihood function and eventually used in its place. This is able to provide significant increase in the speed of Bayesian inference while returning identical results. The trained networks can then be used for extremely rapid follow-up analyses with different priors, obtaining orders of magnitude of speed increase. Learning how to apply the tools of Bayesian inference for the optimal recovery of gravitational wave signals will provide the most scientific information when the first detections are made. Complementary to this, the improvement of our analysis algorithms to provide the best results in less time will make analysis of larger and more complicated models and data sets practical. 530
10	A Study of Bayesian Inference in Medical Diagnosis Herzig, Michael 05 1900 (has links) <p> Bayes' formula may be written as follows: </p> <p> P(yᵢ\|X) = P(X\|yᵢ)・P(yᵢ)/j=K Σ j=1 P(X\|yⱼ)・P(yⱼ) where (1) </p> <p> Y = {y₁, y₂,..., y_K} </p> <P> X = {x₁, x₂,..., xₖ} </p> <p> Assuming independence of attributes x₁, x₂,..., xₖ, Bayes' formula may be rewritten as follows: </p> <p> P(yᵢ\|X) = P(x₁\|yᵢ)・P(x₂\|yᵢ)・...・P(xₖ\|yᵢ)・P(yᵢ)/j=K Σ j=1 P(x₁\|yⱼ)・P(x₂\|yⱼ)・...・P(xₖ\|yⱼ)・P(yⱼ) (2) </p> <p> In medical diagnosis the y's denote disease states and the x's denote the presence or absence of symptoms. Bayesian inference is applied to medical diagnosis as follows: for an individual with data set X, the predicted diagnosis is the disease yⱼ such that P(yⱼ\|X) = max_i P(yᵢ\|X), i=1,2,...,K (3) </p> <p> as calculated from (2). </p> <p> Inferences based on (2) and (3) correctly allocate a high proportion of patients (>70%) in studies to date, despite violations of the independence assumption. The aim of this thesis is modest, (i) to demonstrate the applicability of Bayesian inference to the problem of medical diagnosis (ii) to review pertinent literature (iii) to present a Monte Carlo method which simulates the application of Bayes' formula to distinguish among diseases (iv) to present and discuss the results of Monte Carlo experiments which allow statistical statements to be made concerning the accuracy of Bayesian inference when the assumption of independence is violated. </p> <p> The Monte Carlo study considers paired dependence among attributes when Bayes' formula is used to predict diagnoses from among 6 disease categories. A parameter which measured deviations from attribute independence is defined by DH=(j=6 Σ j=1\|P(x_B\|x_A,yⱼ)-P(x_B\|yⱼ)\|)/6, where x_A and x_B denote a dependent attribute pair. It was found that the correct number of Bayesian predictions, M, decreases markedly as attributes increasing diverge from independence, ie, as DH increases. However, a simple first order linear model of the form M = B₀+B₁・DH does not consistently explain the variation in M. </p> / Thesis / Master of Science (MSc) statistics Bayesian inference; Bayes' formula medical diagnosis

Search results