Global ETD Search

1	Delayed sampling in the probabilistic programming language Anglican / Fördröjd provtagning i det probabilistiska programmeringsspråket Anglican Lundén, Daniel January 2017 (has links) Many probabilistic inference algorithms, both exact and approximate, have been developed to run efficiently on various probabilistic models in the recent decades. For many probabilistic models, exact inference is, however, infeasible or impossible. As such, approximate algorithms are often necessary. In this thesis, a method for partially applying exact inference in probabilistic programming languages using Monte Carlo inference algorithms is introduced and formalized. More specifically, this method allows for conditioning on observations in the probabilistic program before performing sampling, where applicable. We show that this method, called delayed sampling, can be used to reduce mean squared error of estimators based on samples generated by probabilistic programs. We also show that delayed sampling never leads to an increase in mean squared error of estimators. An evaluation is performed with an implementation of delayed sampling in the probabilistic programming language Anglican. The results demonstrate clear reductions in mean squared error for simple examples, but the overhead is also shown to be quite substantial for the Anglican implementation. / Många probabilistiska inferensalgoritmer, både exakta och approximativa, har utvecklats för att fungera effektivt på olika probabilistiska modeller under de senaste decennierna. För många probabilistiska modeller är exakt inferens emellertid otänkbar eller omöjlig. På grund av detta är ofta approximativa algoritmer nödvändiga. I denna avhandling introduceras och formaliseras en metod för att delvis tillämpa exakt inferens i probabilistiska programmeringsspråk med Monte Carlo-inferensalgoritmer. Mer specifikt tillåter denna metod att konditionera på observationer i det probabilistiska programmet innan provtagning utförs, där så är tillämpligt. Vi visar att den här metoden, som kallas fördröjd provtagning, kan användas för att minska genomsnittliga kvadratiska fel för estimatorer som baseras på prover genererade av probabilistiska program. Vi visar också att fördröjd provtagning aldrig leder till en ökning av genomsnittliga kvadratiska fel för estimatorer. En utvärdering utförs med en implementering av fördröjd provtagning i det probabilistiska programmeringsspråket Anglican. Resultaten visar tydliga minskningar i genomsnittligt kvadratfel för enkla exempel, men beräkningskostnaderna visar sig också vara ganska betydande för implementationen i Anglican. probabilistic programming Computer Sciences Datavetenskap (datalogi)
2	Automating inference, learning, and design using probabilistic programming Rainforth, Thomas William Gamlen January 2017 (has links) Imagine a world where computational simulations can be inverted as easily as running them forwards, where data can be used to refine models automatically, and where the only expertise one needs to carry out powerful statistical analysis is a basic proficiency in scientific coding. Creating such a world is the ambitious long-term aim of probabilistic programming. The bottleneck for improving the probabilistic models, or simulators, used throughout the quantitative sciences, is often not an ability to devise better models conceptually, but a lack of expertise, time, or resources to realize such innovations. Probabilistic programming systems (PPSs) help alleviate this bottleneck by providing an expressive and accessible modeling framework, then automating the required computation to draw inferences from the model, for example finding the model parameters likely to give rise to a certain output. By decoupling model specification and inference, PPSs streamline the process of developing and drawing inferences from new models, while opening up powerful statistical methods to non-experts. Many systems further provide the flexibility to write new and exciting models which would be hard, or even impossible, to convey using conventional statistical frameworks. The central goal of this thesis is to improve and extend PPSs. In particular, we will make advancements to the underlying inference engines and increase the range of problems which can be tackled. For example, we will extend PPSs to a mixed inference-optimization framework, thereby providing automation of tasks such as model learning and engineering design. Meanwhile, we make inroads into constructing systems for automating adaptive sequential design problems, providing potential applications across the sciences. Furthermore, the contributions of the work reach far beyond probabilistic programming, as achieving our goal will require us to make advancements in a number of related fields such as particle Markov chain Monte Carlo methods, Bayesian optimization, and Monte Carlo fundamentals.
3	Formal language for statistical inference of uncertain stochastic systems Georgoulas, Anastasios-Andreas January 2016 (has links) Stochastic models, in particular Continuous Time Markov Chains, are a commonly employed mathematical abstraction for describing natural or engineered dynamical systems. While the theory behind them is well-studied, their specification can be problematic in a number of ways. Firstly, the size and complexity of the model can make its description difficult without using a high-level language. Secondly, knowledge of the system is usually incomplete, leaving one or more parameters with unknown values, thus impeding further analysis. Sophisticated machine learning algorithms have been proposed for the statistically rigorous estimation and handling of this uncertainty; however, their applicability is often limited to systems with finite state-space, and there has not been any consideration for their use on high-level descriptions. Similarly, high-level formal languages have been long used for describing and reasoning about stochastic systems, but require a full specification; efforts to estimate parameters for such formal models have been limited to simple inference algorithms. This thesis explores how these two approaches can be brought together, drawing ideas from the probabilistic programming paradigm. We introduce ProPPA, a process algebra for the specification of stochastic systems with uncertain parameters. The language is equipped with a semantics, allowing a formal interpretation of models written in it. This is the first time that uncertainty has been incorporated into the syntax and semantics of a formal language, and we describe a new mathematical object capable of capturing this information. We provide a series of algorithms for inference which can be automatically applied to ProPPA models without the need to write extra code. As part of these, we develop a novel inference scheme for infinite-state systems, based on random truncations of the state-space. The expressive power and inference capabilities of the framework are demonstrated in a series of small examples as well as a larger-scale case study. We also present a review of the state-of-the-art in both machine learning and formal modelling with respect to stochastic systems. We close with a discussion of potential extensions of this work, and thoughts about different ways in which the fields of statistical machine learning and formal modelling can be further integrated. 006.3
4	Probabilistic Programming for Theory of Mind for Autonomous Decision Making Seaman, Iris Rubi 01 June 2018 (has links) As autonomous agents (such as unmanned aerial vehicles, or UAVs) become more ubiquitous, they are being used for increasingly complex tasks. Eventually, they will have to reason about the mental state of other agents, including those agents' beliefs, desires and goals – so-called Theory of Mind – and make decisions based on that reasoning. We describe increasingly complex theory of mind models of a UAV pursuing an intruder, and show that (1) there is a natural Bayesian formulation to reasoning about the uncertainty inherent in our estimate of another agent's mental state, and that (2) probabilistic programming is a natural way to describe models that involve one agent reasoning about another agent, where the target agent uses complex primitives such as path planners and saliency maps to make decisions. We propose a nested self-normalized importance sampling inference algorithm for probabilistic programs, and show that it can be used with planning-as-inference to simultaneously reason about other agents' plans and craft counter plans. We demonstrate that more complex models lead to improved performance, and that nested modeling manifests a wide variety of rational agent behavior. probabilistic programming autonomous decision making planning nested inference Theory of Mind Computer Sciences
5	Syntactic foundations for machine learning Bhat, Sooraj 08 April 2013 (has links) Machine learning has risen in importance across science, engineering, and business in recent years. Domain experts have begun to understand how their data analysis problems can be solved in a principled and efficient manner using methods from machine learning, with its simultaneous focus on statistical and computational concerns. Moreover, the data in many of these application domains has exploded in availability and scale, further underscoring the need for algorithms which find patterns and trends quickly and correctly. However, most people actually analyzing data today operate far from the expert level. Available statistical libraries and even textbooks contain only a finite sample of the possibilities afforded by the underlying mathematical principles. Ideally, practitioners should be able to do what machine learning experts can do--employ the fundamental principles to experiment with the practically infinite number of possible customized statistical models as well as alternative algorithms for solving them, including advanced techniques for handling massive datasets. This would lead to more accurate models, the ability in some cases to analyze data that was previously intractable, and, if the experimentation can be greatly accelerated, huge gains in human productivity. Fixing this state of affairs involves mechanizing and automating these statistical and algorithmic principles. This task has received little attention because we lack a suitable syntactic representation that is capable of specifying machine learning problems and solutions, so there is no way to encode the principles in question, which are themselves a mapping between problem and solution. This work focuses on providing the foundational layer for enabling this vision, with the thesis that such a representation is possible. We demonstrate the thesis by defining a syntactic representation of machine learning that is expressive, promotes correctness, and enables the mechanization of a wide variety of useful solution principles. Probabilistic programming Type theory Formal languages Probability Optimization Semantics Machine learning Stochastic models Computer programming
6	Programming language semantics as a foundation for Bayesian inference Szymczak, Marcin January 2018 (has links) Bayesian modelling, in which our prior belief about the distribution on model parameters is updated by observed data, is a popular approach to statistical data analysis. However, writing specific inference algorithms for Bayesian models by hand is time-consuming and requires significant machine learning expertise. Probabilistic programming promises to make Bayesian modelling easier and more accessible by letting the user express a generative model as a short computer program (with random variables), leaving inference to the generic algorithm provided by the compiler of the given language. However, it is not easy to design a probabilistic programming language correctly and define the meaning of programs expressible in it. Moreover, the inference algorithms used by probabilistic programming systems usually lack formal correctness proofs and bugs have been found in some of them, which limits the confidence one can have in the results they return. In this work, we apply ideas from the areas of programming language theory and statistics to show that probabilistic programming can be a reliable tool for Bayesian inference. The first part of this dissertation concerns the design, semantics and type system of a new, substantially enhanced version of the Tabular language. Tabular is a schema-based probabilistic language, which means that instead of writing a full program, the user only has to annotate the columns of a schema with expressions generating corresponding values. By adopting this paradigm, Tabular aims to be user-friendly, but this unusual design also makes it harder to define the syntax and semantics correctly and reason about the language. We define the syntax of a version of Tabular extended with user-defined functions and pseudo-deterministic queries, design a dependent type system for this language and endow it with a precise semantics. We also extend Tabular with a concise formula notation for hierarchical linear regressions, define the type system of this extended language and show how to reduce it to pure Tabular. In the second part of this dissertation, we present the first correctness proof for a Metropolis-Hastings sampling algorithm for a higher-order probabilistic language. We define a measure-theoretic semantics of the language by means of an operationally-defined density function on program traces (sequences of random variables) and a map from traces to program outputs. We then show that the distribution of samples returned by our algorithm (a variant of “Trace MCMC” used by the Church language) matches the program semantics in the limit.
7	Real-time probabilistic reasoning system using Lambda architecture Anikwue, Arinze January 2019 (has links) Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2019 / The proliferation of data from sources like social media, and sensor devices has become overwhelming for traditional data storage and analysis technologies to handle. This has prompted a radical improvement in data management techniques, tools and technologies to meet the increasing demand for effective collection, storage and curation of large data set. Most of the technologies are open-source. Big data is usually described as very large dataset. However, a major feature of big data is its velocity. Data flow in as continuous stream and require to be actioned in real-time to enable meaningful, relevant value. Although there is an explosion of technologies to handle big data, they are usually targeted at processing large dataset (historic) and real-time big data independently. Thus, the need for a unified framework to handle high volume dataset and real-time big data. This resulted in the development of models such as the Lambda architecture. Effective decision-making requires processing of historic data as well as real-time data. Some decision-making involves complex processes, depending on the likelihood of events. To handle uncertainty, probabilistic systems were designed. Probabilistic systems use probabilistic models developed with probability theories such as hidden Markov models with inference algorithms to process data and produce probabilistic scores. However, development of these models requires extensive knowledge of statistics and machine learning, making it an uphill task to model real-life circumstances. A new research area called probabilistic programming has been introduced to alleviate this bottleneck. This research proposes the combination of modern open-source big data technologies with probabilistic programming and Lambda architecture on easy-to-get hardware to develop a highly fault-tolerant, and scalable processing tool to process both historic and real-time big data in real-time; a common solution. This system will empower decision makers with the capacity to make better informed resolutions especially in the face of uncertainty. The outcome of this research will be a technology product, built and assessed using experimental evaluation methods. This research will utilize the Design Science Research (DSR) methodology as it describes guidelines for the effective and rigorous construction and evaluation of an artefact. Probabilistic programming in the big data domain is still at its infancy, however, the developed artefact demonstrated an important potential of probabilistic programming combined with Lambda architecture in the processing of big data. Big Data big data processing probabilistic reasoning probabilistic programming Lambda architecture
8	Stochastic EM for generic topic modeling using probabilistic programming Saberi Nasseri, Robin January 2021 (has links) Probabilistic topic models are a versatile class of models for discovering latent themes in document collections through unsupervised learning. Conventional inferential methods lack the scaling capabilities necessary for extensions to large-scale applications. In recent years Stochastic Expectation Maximization has proven scalable for the simplest topic model: Latent Dirichlet Allocation. Performing analytical maximization is unfortunately not possible for many more complex topic models. With the rise of probabilistic programming languages, the ability to infer flexibly specified probabilistic models using sophisticated numerical optimization procedures has become widely available. These frameworks have however mainly been developed for optimization of continuous parameters, often prohibiting direct optimization of discrete parameters. This thesis explores the potential of utilizing probabilistic programming for generic topic modeling using Stochastic Expectation Maximization with numerical maximization of discrete parameters reparameterized to unconstrained space. The method achieves results of similar quality as other methods for Latent Dirichlet Allocation in simulated experiments. Further application is made to infer a Dirichlet-multinomial Regression model with metadata covariates. A real dataset is used and the method produces interpretable topics. SEM topic model probabilistic programming LDA DMR TFP Probability Theory and Statistics Sannolikhetsteori och statistik
9	Formally Verified Samplers From Discrete Probabilistic Programs Bagnall, Alexander 05 June 2023 (has links) No description available. Computer Science Zar cpGCL compiler probabilistic programming theorem proving verification induction coinduction algebraic CPO domain theory
10	Extensions of Multistage Stochastic Optimization with Applications in Energy and Healthcare Kuznia, Ludwig Charlemagne 01 January 2012 (has links) This dissertation focuses on extending solution methods in the area of stochastic optimization. Attention is focused to three specific problems in the field. First, a solution method for mixed integer programs subject to chance constraints is discussed. This class of problems serves as an effective modeling framework for a wide variety of applied problems. Unfortunately, chance constrained mixed integer programs tend to be very challenging to solve. Thus, the aim of this work is to address some of these challenges by exploiting the structure of the deterministic reformulation for the problem. Second, a stochastic program for integrating renewable energy sources into traditional energy systems is developed. As the global push for higher utilization of such green resources increases, such models will prove invaluable to energy system designers. Finally, a process for transforming clinical medical data into a model to assist decision making during the treatment planning phase for palliative chemotherapy is outlined. This work will likely provide decision support tools for oncologists. Moreover, given the new requirements for the usage electronic medical records, such techniques will have applicability to other treatment planning applications in the future. Benders' decomposition chemotherapy Markov decision process probabilistic programming random processes renewable energy systems American Studies Arts and Humanities Operational Research

Search results