Global ETD Search

441	Optimization for Probabilistic Machine Learning Fazelnia, Ghazal January 2019 (has links) We have access to great variety of datasets more than any time in the history. Everyday, more data is collected from various natural resources and digital platforms. Great advances in the area of machine learning research in the past few decades have relied strongly on availability of these datasets. However, analyzing them imposes significant challenges that are mainly due to two factors. First, the datasets have complex structures with hidden interdependencies. Second, most of the valuable datasets are high dimensional and are largely scaled. The main goal of a machine learning framework is to design a model that is a valid representative of the observations and develop a learning algorithm to make inference about unobserved or latent data based on the observations. Discovering hidden patterns and inferring latent characteristics in such datasets is one of the greatest challenges in the area of machine learning research. In this dissertation, I will investigate some of the challenges in modeling and algorithm design, and present my research results on how to overcome these obstacles. Analyzing data generally involves two main stages. The first stage is designing a model that is flexible enough to capture complex variation and latent structures in data and is robust enough to generalize well to the unseen data. Designing an expressive and interpretable model is one of crucial objectives in this stage. The second stage involves training learning algorithm on the observed data and measuring the accuracy of model and learning algorithm. This stage usually involves an optimization problem whose objective is to tune the model to the training data and learn the model parameters. Finding global optimal or sufficiently good local optimal solution is one of the main challenges in this step. Probabilistic models are one of the best known models for capturing data generating process and quantifying uncertainties in data using random variables and probability distributions. They are powerful models that are shown to be adaptive and robust and can scale well to large datasets. However, most probabilistic models have a complex structure. Training them could become challenging commonly due to the presence of intractable integrals in the calculation. To remedy this, they require approximate inference strategies that often results in non-convex optimization problems. The optimization part ensures that the model is the best representative of data or data generating process. The non-convexity of an optimization problem take away the general guarantee on finding a global optimal solution. It will be shown later in this dissertation that inference for a significant number of probabilistic models require solving a non-convex optimization problem. One of the well-known methods for approximate inference in probabilistic modeling is variational inference. In the Bayesian setting, the target is to learn the true posterior distribution for model parameters given the observations and prior distributions. The main challenge involves marginalization of all the other variables in the model except for the variable of interest. This high-dimensional integral is generally computationally hard, and for many models there is no known polynomial time algorithm for calculating them exactly. Variational inference deals with finding an approximate posterior distribution for Bayesian models where finding the true posterior distribution is analytically or numerically impossible. It assumes a family of distribution for the estimation, and finds the closest member of that family to the true posterior distribution using a distance measure. For many models though, this technique requires solving a non-convex optimization problem that has no general guarantee on reaching a global optimal solution. This dissertation presents a convex relaxation technique for dealing with hardness of the optimization involved in the inference. The proposed convex relaxation technique is based on semidefinite optimization that has a general applicability to polynomial optimization problem. I will present theoretical foundations and in-depth details of this relaxation in this work. Linear dynamical systems represent the functionality of many real-world physical systems. They can describe the dynamics of a linear time-varying observation which is controlled by a controller unit with quadratic cost function objectives. Designing distributed and decentralized controllers is the goal of many of these systems, which computationally, results in a non-convex optimization problem. In this dissertation, I will further investigate the issues arising in this area and develop a convex relaxation framework to deal with the optimization challenges. Setting the correct number of model parameters is an important aspect for a good probabilistic model. If there are only a few parameters, model may lack capturing all the essential relations and components in the observations while too many parameters may cause significant complications in learning or overfit to the observations. Non-parametric models are suitable techniques to deal with this issue. They allow the model to learn the appropriate number of parameters to describe the data and make predictions. In this dissertation, I will present my work on designing Bayesian non-parametric models as powerful tools for learning representations of data. Moreover, I will describe the algorithm that we derived to efficiently train the model on the observations and learn the number of model parameters. Later in this dissertation, I will present my works on designing probabilistic models in combination with deep learning methods for representing sequential data. Sequential datasets comprise a significant portion of resources in the area of machine learning research. Designing models to capture dependencies in sequential datasets are of great interest and have a wide variety of applications in engineering, medicine and statistics. Recent advances in deep learning research has shown exceptional promises in this area. However, they lack interpretability in their general form. To remedy this, I will present my work on mixing probabilistic models with neural network models that results in better performance and expressiveness of the results. Artificial intelligence Machine learning Electronic data processing Mathematical optimization Probabilities--Mathematical models
442	Identification of the Parameters When the Density of the Minimum is Given Davis, John C 30 May 2007 (has links) Let (X1, X2, X3) be a tri-variate normal vector with a non-singular co-variance matrix ∑ , where for i ≠ j, ∑ij < 0 . It is shown here that it is then possible to determine the three means, the three variances and the three correlation coefficients based only on the knowledge of the probability density function for the minimum variate Y = min{X1 , X2 , X3 }. We will present a method for identifying the nine parameters which consists of careful determination of the asymptotic orders of various bivariate tail probabilities. Tri-variate normal Parameter identification Minimum variate Asymptotic order Tail probabilities American Studies Arts and Humanities
443	Sannolikheter i fotbollsmatcher : -Kan man skapa användbara odds med hjälp av statistiska metoder? / Probabilities in football games : -Can you create functional odds with the use of statistical methods? Lundgren, Marcus, Strandberg, Oskar January 2008 (has links) <p>Betting under ordered forms has been around for a long time, but the recent increase in Internet betting and the large sums of money that are now involved makes it even more important for betting companies to have correct odds.</p><p> </p><p>The purpose of the essay is to calculate probabilities for outcomes of football games using a statistical model and to see if you can find better odds than a betting company.</p><p>The data contains the 380 games from the 2004/2005 season and the variables form, head-to-heads, league position, points, home/away, average attendance, promoted team, distance and final league position from previous season.</p><p> </p><p>After performing an ordered probit regression we only find the variable “form of the away team” to be significant at the 5 % level. We suspect the presence of multicollinearity and perform a VIF-test which confirms this. To fix this problem we perform a second ordered probit regression where a number of variables are combined to index variables. In the second regression we once again find only one significant variable. This time it is the variable “difference between home and away teams’ final league position”. A reason for the lack of significant variables could be the size of the data. A new model with five variables is examined and it results in four significant variables.</p><p> </p><p>The calculated odds pick the correct result in 200, 203 and 198 out of 380 games respectively, compared to 197 out of 380 for Unibet. Betting one krona on the lowest calculated odds from the second model will result in a positive yield for season 2004/2005 when using Unibet’s odds.</p> / <p>Vadslagning under ordnade former har funnits under en längre tid, men de senaste årens explosionsartade ökning av Internetspel och de stora summor som då omsätts har gjort det allt viktigare för spelbolagen att sätta korrekta odds.</p><p> </p><p>Syftet med uppsatsen är att med hjälp av en statistisk modell räkna ut sannolikheter för utfall i fotbollsmatcher och att undersöka om man kan hitta bättre odds än ett spelbolag.</p><p>Datamaterialet innefattar de 380 matcherna som spelades säsongen 2004/2005 samt de oberoende variablerna form, inbördes möten, tabellplacering, poängskörd, hemmaplan/bortaplan, publiksnitt, uppflyttat lag, avstånd och slutplacering.</p><p> </p><p>Efter utförd ordered probit regression erhåller vi endast en signifikant variabel vid en signifikansnivå på 5 %, nämligen ”bortalagets form”. Vi misstänker att det kan förekomma multikollinearitet och utför därför ett VIF-test som bekräftar detta. För att råda bot på detta problem genomför vi en andra ordered probit regression där flera variabler slås ihop till indexvariabler. I den andra regressionen får vi åter igen en enda signifikant variabel, men i detta fall är det variabeln ”differensen mellan hemma- och bortalagets slutplaceringar”. Ett skäl till att det inte blir fler signifikanta variabler misstänks vara storleken på datamaterialet. En ny modell med fem variabler undersöks och då blir fyra variabler signifikanta.</p><p> </p><p>De beräknade oddsen väljer rätt utfall i 200, 203 respektive 198 av 380 matcher för de tre modellerna mot Unibets 197 av 380 matcher. I modell 2 ger en spelad krona på utfallet med lägst beräknat odds positiv avkastning under säsongen vid spel hos Unibet.</p> Odds probabilities football games Premier League ordered probit Odds sannolikheter fotbollsmatcher Premier League ordered probit
444	Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models Van der Merwe, Rudolph 04 1900 (has links) (PDF) Ph.D. / Electrical and Computer Engineering / Probabilistic inference is the problem of estimating the hidden variables (states or parameters) of a system in an optimal and consistent fashion as a set of noisy or incomplete observations of the system becomes available online. The optimal solution to this problem is given by the recursive Bayesian estimation algorithm which recursively updates the posterior density of the system state as new observations arrive. This posterior density constitutes the complete solution to the probabilistic inference problem, and allows us to calculate any "optimal" estimate of the state. Unfortunately, for most real-world problems, the optimal Bayesian recursion is intractable and approximate solutions must be used. Within the space of approximate solutions, the extended Kalman filter (EKF) has become one of the most widely used algorithms with applications in state, parameter and dual estimation. Unfortunately, the EKF is based on a sub-optimal implementation of the recursive Bayesian estimation framework applied to Gaussian random variables. This can seriously affect the accuracy or even lead to divergence of any inference system that is based on the EKF or that uses the EKF as a component part. Recently a number of related novel, more accurate and theoretically better motivated algorithmic alternatives to the EKF have surfaced in the literature, with specific application to state estimation for automatic control. We have extended these algorithms, all based on derivativeless deterministic sampling based approximations of the relevant Gaussian statistics, to a family of algorithms called Sigma-Point Kalman Filters (SPKF). Furthermore, we successfully expanded the use of this group of algorithms (SPKFs) within the general field of probabilistic inference and machine learning, both as stand-alone filters and as subcomponents of more powerful sequential Monte Carlo methods (particle filters). We have consistently shown that there are large performance benefits to be gained by applying Sigma-Point Kalman filters to areas where EKFs have been used as the de facto standard in the past, as well as in new areas where the use of the EKF is impossible.
445	A domain-specific embedded language for probabilistic programming Kollmansberger, Steven 19 December 2005 (has links) Graduation date: 2006 / Functional programming is concerned with referential transparency, that is, given a certain function and its parameter, that the result will always be the same. However, it seems that this is violated in applications involving uncertainty, such as rolling a dice. This thesis defines the background of probabilistic programming and domain-specific languages, and builds on these ideas to construct a domain-specific embedded language (DSEL) for probabilistic programming in a purely functional language. This DSEL is then applied in a real-world setting to develop an application in use by the Center for Gene Research at Oregon State University. The process and results of this development are discussed. DSEL Probabilistic Modeling Functional Embedded computer systems Probabilities
446	Sannolikheter i fotbollsmatcher : -Kan man skapa användbara odds med hjälp av statistiska metoder? / Probabilities in football games : -Can you create functional odds with the use of statistical methods? Lundgren, Marcus, Strandberg, Oskar January 2008 (has links) Betting under ordered forms has been around for a long time, but the recent increase in Internet betting and the large sums of money that are now involved makes it even more important for betting companies to have correct odds. The purpose of the essay is to calculate probabilities for outcomes of football games using a statistical model and to see if you can find better odds than a betting company. The data contains the 380 games from the 2004/2005 season and the variables form, head-to-heads, league position, points, home/away, average attendance, promoted team, distance and final league position from previous season. After performing an ordered probit regression we only find the variable “form of the away team” to be significant at the 5 % level. We suspect the presence of multicollinearity and perform a VIF-test which confirms this. To fix this problem we perform a second ordered probit regression where a number of variables are combined to index variables. In the second regression we once again find only one significant variable. This time it is the variable “difference between home and away teams’ final league position”. A reason for the lack of significant variables could be the size of the data. A new model with five variables is examined and it results in four significant variables. The calculated odds pick the correct result in 200, 203 and 198 out of 380 games respectively, compared to 197 out of 380 for Unibet. Betting one krona on the lowest calculated odds from the second model will result in a positive yield for season 2004/2005 when using Unibet’s odds. / Vadslagning under ordnade former har funnits under en längre tid, men de senaste årens explosionsartade ökning av Internetspel och de stora summor som då omsätts har gjort det allt viktigare för spelbolagen att sätta korrekta odds. Syftet med uppsatsen är att med hjälp av en statistisk modell räkna ut sannolikheter för utfall i fotbollsmatcher och att undersöka om man kan hitta bättre odds än ett spelbolag. Datamaterialet innefattar de 380 matcherna som spelades säsongen 2004/2005 samt de oberoende variablerna form, inbördes möten, tabellplacering, poängskörd, hemmaplan/bortaplan, publiksnitt, uppflyttat lag, avstånd och slutplacering. Efter utförd ordered probit regression erhåller vi endast en signifikant variabel vid en signifikansnivå på 5 %, nämligen ”bortalagets form”. Vi misstänker att det kan förekomma multikollinearitet och utför därför ett VIF-test som bekräftar detta. För att råda bot på detta problem genomför vi en andra ordered probit regression där flera variabler slås ihop till indexvariabler. I den andra regressionen får vi åter igen en enda signifikant variabel, men i detta fall är det variabeln ”differensen mellan hemma- och bortalagets slutplaceringar”. Ett skäl till att det inte blir fler signifikanta variabler misstänks vara storleken på datamaterialet. En ny modell med fem variabler undersöks och då blir fyra variabler signifikanta. De beräknade oddsen väljer rätt utfall i 200, 203 respektive 198 av 380 matcher för de tre modellerna mot Unibets 197 av 380 matcher. I modell 2 ger en spelad krona på utfallet med lägst beräknat odds positiv avkastning under säsongen vid spel hos Unibet. Odds probabilities football games Premier League ordered probit Odds sannolikheter fotbollsmatcher Premier League ordered probit
447	The Torsion Angle of Random Walks He, Mu 01 May 2013 (has links) In this thesis, we study the expected mean of the torsion angle of an n-stepequilateral random walk in 3D. We consider the random walk is generated within a confining sphere or without a confining sphere: given three consecutive vectors →e1 , →e2 , and →e3 of the random walk then the vectors →e1 and →e2 define a plane and the vectors →e2 and →e3 define a second plane. The angle between the two planes is called the torsion angle of the three vectors. Algorithms are described to generate random walks which are used in a particular space (both without and with confinement). The torsion angle is expressed as a function of six variables for a random walk in both cases: without confinement and with confinement, respectively. Then we find the probability density functions of these six variables of a random walk and demonstrate an explicit integral expression for the expected mean torsion value. Finally, we conclude that the expected torsion angle obtained by the integral agrees with the numerical average torsion obtained by a simulation of random walks with confinement. Coordinates Integration Probabilities Polymers Polygons Torsion DNA Viruses Genetics and Genomics Mathematics Probability
448	The Torsion Angle of Random Walks He, Mu 01 May 2013 (has links) In this thesis, we study the expected mean of the torsion angle of an n-stepequilateral random walk in 3D. We consider the random walk is generated within a confining sphere or without a confining sphere: given three consecutive vectors →e1 , →e2 , and →e3 of the random walk then the vectors →e1 and →e2 define a plane and the vectors →e2 and →e3 define a second plane. The angle between the two planes is called the torsion angle of the three vectors. Algorithms are described to generate random walks which are used in a particular space (both without and with confinement). The torsion angle is expressed as a function of six variables for a random walk in both cases: without confinement and with confinement, respectively. Then we find the probability density functions of these six variables of a random walk and demonstrate an explicit integral expression for the expected mean torsion value. Finally, we conclude that the expected torsion angle obtained by the integral agrees with the numerical average torsion obtained by a simulation of random walks with confinement. Coordinates Integration Probabilities Polymers Polygons Torsion DNA Viruses Genetics and Genomics Mathematics Probability
449	Knotting statistics after a local strand passage in unknotted self-avoiding polygons in Z<sup>3</sup> Szafron, Michael Lorne 15 April 2009 We study here a model for a strand passage in a ring polymer about a randomly chosen location at which two strands of the polymer have been brought gcloseh together. The model is based on ¦-SAPs, which are unknotted self-avoiding polygons in Z^3 that contain a fixed structure ¦ that forces two segments of the polygon to be close together. To study this model, the Composite Markov Chain Monte Carlo (CMCMC) algorithm, referred to as the CMC ¦-BFACF algorithm, that I developed and proved to be ergodic for unknotted ¦-SAPs in my M. Sc. Thesis, is used. Ten simulations (each consisting of 9.6~10^10 time steps) of the CMC ¦-BFACF algorithm are performed and the results from a statistical analysis of the simulated data are presented. To this end, a new maximum likelihood method, based on previous work of Berretti and Sokal, is developed for obtaining maximum likelihood estimates of the growth constants and critical exponents associated respectively with the numbers of unknotted (2n)-edge ¦-SAPs, unknotted (2n)-edge successful-strand-passage ¦-SAPs, unknotted (2n)-edge failed-strand-passage ¦-SAPs, and (2n)-edge after-strand-passage-knot-type-K unknotted successful-strand-passage ¦-SAPs. The maximum likelihood estimates are consistent with the result (proved here) that the growth constants are all equal, and provide evidence that the associated critical exponents are all equal.<p> We then investigate the question gGiven that a successful local strand passage occurs at a random location in a (2n)-edge knot-type K ¦-SAP, with what probability will the ¦-SAP have knot-type Kf after the strand passage?h. To this end, the CMCMC data is used to obtain estimates for the probability of knotting given a (2n)-edge successful-strand-passage ¦-SAP and the probability of an after-strand-passage polygon having knot-type K given a (2n)-edge successful-strand-passage ¦-SAP. The computed estimates numerically support the unproven conjecture that these probabilities, in the n¨ limit, go to a value lying strictly between 0 and 1. We further prove here that the rate of approach to each of these limits (should the limits exist) is less than exponential.<p> We conclude with a study of whether or not there is a difference in the gsizeh of an unknotted successful-strand-passage ¦-SAP whose after-strand-passage knot-type is K when compared to the gsizeh of a ¦-SAP whose knot-type does not change after strand passage. The two measures of gsizeh used are the expected lengths of, and the expected mean-square radius of gyration of, subsets of ¦-SAPs. How these two measures of gsizeh behave as a function of a polygonfs length and its after-strand-passage knot-type is investigated. Knotting transition probabilities Monte Carlo simulation
450	Reliability Based Safety Assessment Of Buried Continuous Pipelines Subjected To Earthquake Effects Yavuz, Ercan Aykan 01 January 2013 (has links) (PDF) Lifelines provide the vital utilities for human being in the modern life. They convey a great variety of products in order to meet the general needs. Also, buried continuous pipelines are generally used to transmit energy sources, such as natural gas and crude oil, from production sources to target places. To be able to sustain this energy corridor efficiently and safely, interruption of the flow should be prevented as much as possible. This can be achieved providing target reliability index standing for the desired level of performance and reliability. For natural gas transmission, assessment of earthquake threats to buried continuous pipelines is the primary concern of this thesis in terms of reliability. Operating loads due to internal pressure and temperature changes are also discussed. Seismic wave propagation effects, liquefaction induced lateral spreading, including longitudinal and transverse permanent ground deformation effects, liquefaction induced buoyancy effects and fault crossing effects that the buried continuous pipelines subjected to are explained in detail. Limit state functions are presented for each one of the above mentioned earthquake effects combined with operating loads. Advanced First Order Second Moment method is used in reliability calculations. Two case studies are presented. In the first study, considering only the load effect due to internal pressure, reliability of an existing natural gas pipeline is evaluated. Additionally, safety factors are recommended for achieving the specified target reliability indexes. In the second case study, reliability of another existing natural gas pipeline subjected to above mentioned earthquake effects is evaluated in detail. QA Probabilities 273-274.76

Search results