Global ETD Search

161	Stochastic Geometry, Data Structures and Applications of Ancestral Selection Graphs Cloete, Nicoleen January 2006 (has links) The genealogy of a random sample of a population of organisms can be represented as a rooted binary tree. Population dynamics determine a distribution over sample genealogies. For large populations of constant size and in the absence of selection effects, the coalescent process of Kingman determines a suitable distribution. Neuhauser and Krone gave a stochastic model generalising the Kingman coalescent in a natural way to include the effects of selection. The model of Neuhauser and Krone determines a distribution over a class of graphs of randomly variable vertex number, known as ancestral selection graphs. Because vertices have associated scalar ages, realisations of the ancestral selection graph process have randomly variable dimensions. A Markov chain Monte Carlo method is used to simulate the posterior distribution for population parameters of interest. The state of the Markov chain Monte Carlo is a random graph, with random dimension and equilibrium distribution equal to the posterior distribution. The aim of the project is to determine if the data is informative of the selection parameter by fitting the model to synthetic data. / Foundation for Research Science and Technology Top Achiever Doctoral Scolarship Applied mathematics Markov chain Monte Carlo Statistics Applied probability Evolutionary biology Random graphs Ancestral selection graph n coalescent
162	Stochastic Geometry, Data Structures and Applications of Ancestral Selection Graphs Cloete, Nicoleen January 2006 (has links) The genealogy of a random sample of a population of organisms can be represented as a rooted binary tree. Population dynamics determine a distribution over sample genealogies. For large populations of constant size and in the absence of selection effects, the coalescent process of Kingman determines a suitable distribution. Neuhauser and Krone gave a stochastic model generalising the Kingman coalescent in a natural way to include the effects of selection. The model of Neuhauser and Krone determines a distribution over a class of graphs of randomly variable vertex number, known as ancestral selection graphs. Because vertices have associated scalar ages, realisations of the ancestral selection graph process have randomly variable dimensions. A Markov chain Monte Carlo method is used to simulate the posterior distribution for population parameters of interest. The state of the Markov chain Monte Carlo is a random graph, with random dimension and equilibrium distribution equal to the posterior distribution. The aim of the project is to determine if the data is informative of the selection parameter by fitting the model to synthetic data. / Foundation for Research Science and Technology Top Achiever Doctoral Scolarship Applied mathematics Markov chain Monte Carlo Statistics Applied probability Evolutionary biology Random graphs Ancestral selection graph n coalescent
163	Optimal Active Learning: experimental factors and membership query learning Yu-hui Yeh Unknown Date (has links) The field of Machine Learning is concerned with the development of algorithms, models and techniques that solve challenging computational problems by learning from data representative of the problem (e.g. given a set of medical images previously classified by a human expert, build a model to predict unseen images as either benign or malignant). Many important real-world problems have been formulated as supervised learning problems. The assumption is that a data set is available containing the correct output (e.g. class label or target value) for each given data point. In many application domains, obtaining the correct outputs (labels) for data points is a costly and time-consuming task. This has provided the motivation for the development of Machine Learning techniques that attempt to minimize the number of labeled data points while maintaining good generalization performance on a given problem. Active Learning is one such class of techniques and is the focus of this thesis. Active Learning algorithms select or generate unlabeled data points to be labeled and use these points for learning. If successful, an Active Learning algorithm should be able to produce learning performance (e.g test set error) comparable to an equivalent supervised learner using fewer labeled data points. Theoretical, algorithmic and experimental Active Learning research has been conducted and a number of successful applications have been demonstrated. However, the scope of many of the experimental studies on Active Learning has been relatively small and there are very few large-scale experimental evaluations of Active Learning techniques. A significant amount of performance variability exists across Active Learning experimental results in the literature. Furthermore, the implementation details and effects of experimental factors have not been closely examined in empirical Active Learning research, creating some doubt over the strength and generality of conclusions that can be drawn from such results. The Active Learning model/system used in this thesis is the Optimal Active Learning algorithm framework with Gaussian Processes for regression problems (however, most of the research questions are of general interest in many other Active Learning scenarios). Experimental and implementation details of the Active Learning system used are described in detail, using a number of regression problems and datasets of different types. It is shown that the experimental results of the system are subject to significant variability across problem datasets. The hypothesis that experimental factors can account for this variability is then investigated. The results show the impact of sampling and sizes of the datasets used when generating experimental results. Furthermore, preliminary experimental results expose performance variability across various real-world regression problems. The results suggest that these experimental factors can (to a large extent) account for the variability observed in experimental results. A novel resampling technique for Optimal Active Learning, called '3-Sets Cross-Validation', is proposed as a practical solution to reduce experimental performance variability. Further results confirm the usefulness of the technique. The thesis then proposes an extension to the Optimal Active Learning framework, to perform learning via membership queries via a novel algorithm named MQOAL. The MQOAL algorithm employs the Metropolis-Hastings Markov chain Monte Carlo (MCMC) method to sample data points for query selection. Experimental results show that MQOAL provides comparable performance to the pool-based OAL learner, using a very generic, simple MCMC technique, and is robust to experimental factors related to the MCMC implementation. The possibility of making queries in batches is also explored experimentally, with results showing that while some performance degradation does occur, it is minimal for learning in small batch sizes, which is likely to be valuable in some real-world problem domains. active learning probabilistic data selection experimental factor cross-validation membership query gaussian processes markov chain monte carlo batch learning
164	Stochastic Geometry, Data Structures and Applications of Ancestral Selection Graphs Cloete, Nicoleen January 2006 (has links) The genealogy of a random sample of a population of organisms can be represented as a rooted binary tree. Population dynamics determine a distribution over sample genealogies. For large populations of constant size and in the absence of selection effects, the coalescent process of Kingman determines a suitable distribution. Neuhauser and Krone gave a stochastic model generalising the Kingman coalescent in a natural way to include the effects of selection. The model of Neuhauser and Krone determines a distribution over a class of graphs of randomly variable vertex number, known as ancestral selection graphs. Because vertices have associated scalar ages, realisations of the ancestral selection graph process have randomly variable dimensions. A Markov chain Monte Carlo method is used to simulate the posterior distribution for population parameters of interest. The state of the Markov chain Monte Carlo is a random graph, with random dimension and equilibrium distribution equal to the posterior distribution. The aim of the project is to determine if the data is informative of the selection parameter by fitting the model to synthetic data. / Foundation for Research Science and Technology Top Achiever Doctoral Scolarship Applied mathematics Markov chain Monte Carlo Statistics Applied probability Evolutionary biology Random graphs Ancestral selection graph n coalescent
165	Stochastic Geometry, Data Structures and Applications of Ancestral Selection Graphs Cloete, Nicoleen January 2006 (has links) The genealogy of a random sample of a population of organisms can be represented as a rooted binary tree. Population dynamics determine a distribution over sample genealogies. For large populations of constant size and in the absence of selection effects, the coalescent process of Kingman determines a suitable distribution. Neuhauser and Krone gave a stochastic model generalising the Kingman coalescent in a natural way to include the effects of selection. The model of Neuhauser and Krone determines a distribution over a class of graphs of randomly variable vertex number, known as ancestral selection graphs. Because vertices have associated scalar ages, realisations of the ancestral selection graph process have randomly variable dimensions. A Markov chain Monte Carlo method is used to simulate the posterior distribution for population parameters of interest. The state of the Markov chain Monte Carlo is a random graph, with random dimension and equilibrium distribution equal to the posterior distribution. The aim of the project is to determine if the data is informative of the selection parameter by fitting the model to synthetic data. / Foundation for Research Science and Technology Top Achiever Doctoral Scolarship Applied mathematics Markov chain Monte Carlo Statistics Applied probability Evolutionary biology Random graphs Ancestral selection graph n coalescent
166	Optimal Active Learning: experimental factors and membership query learning Yu-hui Yeh Unknown Date (has links) The field of Machine Learning is concerned with the development of algorithms, models and techniques that solve challenging computational problems by learning from data representative of the problem (e.g. given a set of medical images previously classified by a human expert, build a model to predict unseen images as either benign or malignant). Many important real-world problems have been formulated as supervised learning problems. The assumption is that a data set is available containing the correct output (e.g. class label or target value) for each given data point. In many application domains, obtaining the correct outputs (labels) for data points is a costly and time-consuming task. This has provided the motivation for the development of Machine Learning techniques that attempt to minimize the number of labeled data points while maintaining good generalization performance on a given problem. Active Learning is one such class of techniques and is the focus of this thesis. Active Learning algorithms select or generate unlabeled data points to be labeled and use these points for learning. If successful, an Active Learning algorithm should be able to produce learning performance (e.g test set error) comparable to an equivalent supervised learner using fewer labeled data points. Theoretical, algorithmic and experimental Active Learning research has been conducted and a number of successful applications have been demonstrated. However, the scope of many of the experimental studies on Active Learning has been relatively small and there are very few large-scale experimental evaluations of Active Learning techniques. A significant amount of performance variability exists across Active Learning experimental results in the literature. Furthermore, the implementation details and effects of experimental factors have not been closely examined in empirical Active Learning research, creating some doubt over the strength and generality of conclusions that can be drawn from such results. The Active Learning model/system used in this thesis is the Optimal Active Learning algorithm framework with Gaussian Processes for regression problems (however, most of the research questions are of general interest in many other Active Learning scenarios). Experimental and implementation details of the Active Learning system used are described in detail, using a number of regression problems and datasets of different types. It is shown that the experimental results of the system are subject to significant variability across problem datasets. The hypothesis that experimental factors can account for this variability is then investigated. The results show the impact of sampling and sizes of the datasets used when generating experimental results. Furthermore, preliminary experimental results expose performance variability across various real-world regression problems. The results suggest that these experimental factors can (to a large extent) account for the variability observed in experimental results. A novel resampling technique for Optimal Active Learning, called '3-Sets Cross-Validation', is proposed as a practical solution to reduce experimental performance variability. Further results confirm the usefulness of the technique. The thesis then proposes an extension to the Optimal Active Learning framework, to perform learning via membership queries via a novel algorithm named MQOAL. The MQOAL algorithm employs the Metropolis-Hastings Markov chain Monte Carlo (MCMC) method to sample data points for query selection. Experimental results show that MQOAL provides comparable performance to the pool-based OAL learner, using a very generic, simple MCMC technique, and is robust to experimental factors related to the MCMC implementation. The possibility of making queries in batches is also explored experimentally, with results showing that while some performance degradation does occur, it is minimal for learning in small batch sizes, which is likely to be valuable in some real-world problem domains. active learning probabilistic data selection experimental factor cross-validation membership query gaussian processes markov chain monte carlo batch learning
167	Efficient Bayesian Inference for Multivariate Factor Stochastic Volatility Models Kastner, Gregor, Frühwirth-Schnatter, Sylvia, Lopes, Hedibert Freitas 24 February 2016 (has links) (PDF) We discuss efficient Bayesian estimation of dynamic covariance matrices in multivariate time series through a factor stochastic volatility model. In particular, we propose two interweaving strategies (Yu and Meng, Journal of Computational and Graphical Statistics, 20(3), 531-570, 2011) to substantially accelerate convergence and mixing of standard MCMC approaches. Similar to marginal data augmentation techniques, the proposed acceleration procedures exploit non-identifiability issues which frequently arise in factor models. Our new interweaving strategies are easy to implement and come at almost no extra computational cost; nevertheless, they can boost estimation efficiency by several orders of magnitude as is shown in extensive simulation studies. To conclude, the application of our algorithm to a 26-dimensional exchange rate data set illustrates the superior performance of the new approach for real-world data. / Series: Research Report Series / Department of Statistics and Mathematics
168	3 essays on credit risk modeling and the macroeconomic environment Papanastasiou, Dimitrios January 2015 (has links) In the aftermath of the recent financial crisis, the way credit risk is affected by and affects the macroeconomic environment has been the focus of academics, risk practitioners and central bankers alike. In this thesis I approach three distinct questions that aim to provide valuable insight into how corporate defaults, recoveries and credit ratings interact with the conditions in the wider economy. The first question focuses on how well the macroeconomic environment forecasts corporate bond defaults. I approach the question from a macroeconomic perspective and I make full use of the multitude of lengthy macroeconomic time series available. Following the recent literature on data-rich environment modelling, I summarise a large panel of 103 macroeconomic time series into a small set of 6 dynamic factors; the factors capture business cycle, yield curve, credit premia and equity market conditions. Prior studies on dynamic factors use identification schemes based on principal components or recursive short-run restrictions. The main contribution to the body of existing literature is that I provide a novel and more robust identification scheme for the 6 macro-financial stochastic factors, based on a set of over-identifying restrictions. This allows for a more straightforward interpretation of the extracted factors and a more meaningful decomposition of the corporate default dynamics. Furthermore, I use a novel Bayesian estimation scheme based on a Markov chain Monte Carlo algorithm that has not been used before in a credit risk context. I argue that the proposed algorithm provides an effcient and flexible alternative to the simulation based estimation approaches used in the existing literature. The sampling scheme is used to estimate a state-of-the-art dynamic econometric specification that is able to separate macro-economic fluctuations from unobserved default clustering. Finally, I provide evidence that the macroeconomic factors can lead to significant improvements in default probability forecasting performance. The forecasting performance gains become less pronounced the longer the default forecasting horizon. The second question explores the sensitivity of corporate bond defaults and recoveries on monetary policy and macro-financial shocks. To address the question, I follow a more structural approach to extract theory-based economic shocks and quantify the magnitude of the impact on the two main credit risk drivers. This is the first study that approaches the decomposition of the movements in credit risk metrics from a structural perspective. I introduce a VAR model with a novel semi-structural identification scheme to isolate the various shocks at the macro level. The dynamic econometric specification for defaults and recoveries is similar to the one used to address the first question. The specification is flexible enough to allow for the separation of the macroeconomic movements from the credit risk specific unobserved correlation and, therefore, isolate the different shock transmission mechanisms. I report that the corporate default likelihood is strongly affected by balance sheet and real economy shocks for the cyclical industry sectors, while the effects of monetary policy shocks typically take up to one year to materialise. In contrast, recovery rates tend to be more sensitive to asset price shocks, while real economy shocks mainly affect secured debt recovery values. The third question shifts the focus to credit ratings and addresses the Through-the- Cycle dynamics of the serial dependence in rating migrations. The existing literature treats the so-called rating momentum as constant through time. I show that the rating momentum is far from constant, it changes with the business cycle and its magnitude exhibits a non-linear dependence on time spent in a given rating grade. Furthermore, I provide robust evidence that the time-varying rating momentum substantially increases actual and Marked-to-Market losses in periods of stress. The impact on regulatory capital for financial institutions is less clear; nevertheless, capital requirements for high credit quality portfolios can be significantly underestimated during economic downturns. 332.7
169	Applied Meta-Analysis of Lead-Free Solder Reliability January 2014 (has links) abstract: This thesis presents a meta-analysis of lead-free solder reliability. The qualitative analyses of the failure modes of lead- free solder under different stress tests including drop test, bend test, thermal test and vibration test are discussed. The main cause of failure of lead- free solder is fatigue crack, and the speed of propagation of the initial crack could differ from different test conditions and different solder materials. A quantitative analysis about the fatigue behavior of SAC lead-free solder under thermal preconditioning process is conducted. This thesis presents a method of making prediction of failure life of solder alloy by building a Weibull regression model. The failure life of solder on circuit board is assumed Weibull distributed. Different materials and test conditions could affect the distribution by changing the shape and scale parameters of Weibull distribution. The method is to model the regression of parameters with different test conditions as predictors based on Bayesian inference concepts. In the process of building regression models, prior distributions are generated according to the previous studies, and Markov Chain Monte Carlo (MCMC) is used under WinBUGS environment. / Dissertation/Thesis / Masters Thesis Industrial Engineering 2014 Industrial engineering Statistics Bayesian inference Lead-free solder Markov Chain Monte Carlo Meta-analysis Reliability Weibull regression
170	A Bayesian Finite Mixture Model for Network-Telecommunication Data Manikas, Vasileios January 2016 (has links) A data modeling procedure called Mixture model, is introduced beneficial to the characteristics of our data. Mixture models have been proved flexible and easy to use, a situation which can be confirmed from the majority of papers and books which have been published the last twenty years. The models are estimated using a Bayesian inference through an efficient Markov Chain Monte Carlo (MCMC) algorithm, known as Gibbs Sampling. The focus of the paper is on models for network-telecommunication lab data (not time dependent data) and on the valid predictions we can accomplish. We categorize our variables (based on their distribution) in three cases, a mixture of Normal distributions with known allocation, a mixture of Negative Binomial Distributions with known allocations and a mixture of Normal distributions with unknown allocation. Mixture Model Bayesian Inference Markov Chain Monte Carlo Gibbs Sampling Network-Telecommunication Lab Data Probability Theory and Statistics Sannolikhetsteori och statistik

Search results