Global ETD Search

1	Anomaly Detection Based on Disentangled Representation Learning Li, Xiaoyan 20 April 2020 (has links) In the era of Internet of Things (IoT) and big data, collecting, processing and analyzing enormous data faces unprecedented challenges even when being stored in preprocessed form. Anomaly detection, statistically viewed as identifying outliers having low probabilities from the modelling of data distribution p(x), becomes more crucial. In this Master thesis, two (supervised and unsupervised) novel deep anomaly detection frameworks are presented which can achieve state-of-art performance on a range of datasets. Capsule net is an advanced artificial neural network, being able to encode intrinsic spatial relationship between parts and a whole. This property allows it to work as both a classifier and a deep autoencoder. Taking this advantage of CapsNet, a new anomaly detection technique named AnoCapsNet is proposed and three normality score functions are designed: prediction-probability-based (PP-based) normality score function, reconstruction-error-based (RE-based) normality score function, and a normality score function that combines prediction-probability-based and reconstruction-error-based together (named as PP+RE-based normality score function) for evaluating the "outlierness" of unseen images. The results on four datasets demonstrate that the PP-based method performs consistently well, while the RE-based approach is relatively sensitive to the similarity between labeled and unlabeled images. The PP+RE-based approach effectively takes advantages of both methods and achieves state-of-the-art results. In many situations, neither the domain of anomalous samples can be fully understood, nor the domain of the normal samples is straightforward. Thus deep generative models are more suitable than supervised methods in such cases. As a variant of variational autoencoder (VAE), beta-VAE is designed for automated discovery of interpretable factorised latent representations from raw image data in a completely unsupervised manner. The t-Distributed Stochastic Neighbor Embedding (t-SNE), an unsupervised non-linear technique primarily used for data exploration and visualizing high-dimensional data, has advantages at creating a single map that reveals local and important global structure at many different scales. Taking advantages of both disentangled representation learning (using beta-VAE as an implementation) and low-dimensional neighbor embedding (using t-SNE as an implementation), another novel anomaly detection approach named AnoDM (stands for Anomaly detection based on unsupervised Disentangled representation learning and Manifold learning) is presented. A new anomaly score function is defined by combining (1) beta-VAE's reconstruction error, and (2) latent representations' distances in the t-SNE space. This is a general framework, thus any disentangled representation learning and low-dimensional embedding techniques can be applied. AnoDM is evaluated on both image and time-series data and achieves better results than models that use just one of the two measures and other existing advanced deep learning methods. Anomaly detection Disentangled representation learning Manifold learning Normality score function
2	Estimação em modelos funcionais com erro normais e repetições não balanceadas / Estimation in functional models by using a normal error and replications unbalanced Joan Neylo da Cruz Rodriguez 29 April 2008 (has links) Esta dissertação compreende um estudo da eficiência de estimadores dos parâmetros no modelo funcional com erro nas variáveis, com repetições para contornar o problema de falta de identificação. Nela, discute-se os procedimentos baseados nos métodos de máxima verossimilhança e escore corrigido. As estimativas obtidas pelos dois métodos levam a resultados similares. / This work is concerned with a study on the efficiency of parameter estimates in the functional linear relashionship with constant variances. Where the lack of identification is resolved of by considering replications. Estimation is dealt with by using maximum likelihood and the corrected score approach. Comparisons between the approaches are illustrated by using simulated data. EMV escore corrigido Estimador por máxima verossimilhança simulações corrected score function Maximum likelihood estimation replications unbalanced
3	Estimação em modelos funcionais com erro normais e repetições não balanceadas / Estimation in functional models by using a normal error and replications unbalanced Cruz Rodriguez, Joan Neylo da 29 April 2008 (has links) Esta dissertação compreende um estudo da eficiência de estimadores dos parâmetros no modelo funcional com erro nas variáveis, com repetições para contornar o problema de falta de identificação. Nela, discute-se os procedimentos baseados nos métodos de máxima verossimilhança e escore corrigido. As estimativas obtidas pelos dois métodos levam a resultados similares. / This work is concerned with a study on the efficiency of parameter estimates in the functional linear relashionship with constant variances. Where the lack of identification is resolved of by considering replications. Estimation is dealt with by using maximum likelihood and the corrected score approach. Comparisons between the approaches are illustrated by using simulated data. corrected score function EMV escore corrigido Estimador por máxima verossimilhança Maximum likelihood estimation replications unbalanced simulações
4	Nonparametric Inference for High Dimensional Data Mukhopadhyay, Subhadeep 03 October 2013 (has links) Learning from data, especially ‘Big Data’, is becoming increasingly popular under names such as Data Mining, Data Science, Machine Learning, Statistical Learning and High Dimensional Data Analysis. In this dissertation we propose a new related field, which we call ‘United Nonparametric Data Science’ - applied statistics with “just in time” theory. It integrates the practice of traditional and novel statistical methods for nonparametric exploratory data modeling, and it is applicable to teaching introductory statistics courses that are closer to modern frontiers of scientific research. Our framework includes small data analysis (combining traditional and modern nonparametric statistical inference), big and high dimensional data analysis (by statistical modeling methods that extend our unified framework for small data analysis). The first part of the dissertation (Chapters 2 and 3) has been oriented by the goal of developing a new theoretical foundation to unify many cultures of statistical science and statistical learning methods using mid-distribution function, custom made orthonormal score function, comparison density, copula density, LP moments and comoments. It is also examined how this elegant theory yields solution to many important applied problems. In the second part (Chapter 4) we extend the traditional empirical likelihood (EL), a versatile tool for nonparametric inference, in the high dimensional context. We introduce a modified version of the EL method that is computationally simpler and applicable to a large class of “large p small n” problems, allowing p to grow faster than n. This is an important step in generalizing the EL in high dimensions beyond the p ≤ n threshold where the standard EL and its existing variants fail. We also present detailed theoretical study of the proposed method. Big data Quantile Empirical Likelihood LP score function Copula Nonparametric Classification Data Science
5	Multilevel Methods for Stochastic Forward and Inverse Problems Ballesio, Marco 02 February 2022 (has links) This thesis studies novel and efficient computational sampling methods for appli- cations in three types of stochastic inversion problems: seismic waveform inversion, filtering problems, and static parameter estimation. A primary goal of a large class of seismic inverse problems is to detect parameters that characterize an earthquake. We are interested to solve this task by analyzing the full displacement time series at a given set of seismographs, but approaching the full waveform inversion with the standard Monte Carlo (MC) method is prohibitively expensive. So we study tools that can make this computation feasible. As part of the inversion problem, we must evaluate the misfit between recorded and synthetic seismograms efficiently. We employ as misfit function the Wasserstein metric origi- nally suggested to measure the distance between probability distributions, which is becoming increasingly popular in seismic inversion. To compute the expected values of the misfits, we use a sampling algorithm called Multi-Level Monte Carlo (MLMC). MLMC performs most of the sampling at a coarse space-time resolution, with only a few corrections at finer scales, without compromising the overall accuracy. We further investigate the Wasserstein metric and MLMC method in the context of filtering problems for partially observed diffusions with observations at periodic time intervals. Particle filters can be enhanced by considering hierarchies of discretizations to reduce the computational effort to achieve a given tolerance. This methodology is called Multi-Level Particle Filter (MLPF). However, particle filters, and consequently MLPFs, suffer from particle ensemble collapse, which requires the implementation of a resampling step. We suggest for one-dimensional processes a resampling procedure based on optimal Wasserstein coupling. We show that it is beneficial in terms of computational costs compared to standard resampling procedures. Finally, we consider static parameter estimation for a class of continuous-time state-space models. Unbiasedness of the gradient of the log-likelihood is an important property for gradient ascent (descent) methods to ensure their convergence. We propose a novel unbiased estimator of the gradient of the log-likelihood based on a double-randomization scheme. We use this estimator in the stochastic gradient ascent method to recover unknown parameters of the dynamics. multilevel Monte Carlo PDE with Random coefficients particles filters diffusions score function coupled conditional particle filter
6	The optimal control of a Lévy process DiTanna, Anthony Santino 23 October 2009 (has links) In this thesis we study the optimal stochastic control problem of the drift of a Lévy process. We show that, for a broad class of Lévy processes, the partial integro-differential Hamilton-Jacobi-Bellman equation for the value function admits classical solutions and that control policies exist in feedback form. We then explore the class of Lévy processes that satisfy the requirements of the theorem, and find connections between the uniform integrability requirement and the notions of the score function and Fisher information from information theory. Finally we present three different numerical implementations of the control problem: a traditional dynamic programming approach, and two iterative approaches, one based on a finite difference scheme and the other on the Fourier transform. / text Lévy processes Hamilton-Jacobi-Bellman equation Finite difference scheme Fourier transform Score function Fisher information Optimal stochastic control problem
7	Modelos GAS com distribuições estáveis para séries temporais financeiras / Stable GAS models for financial time series Gomes, Daniel Takata 06 December 2017 (has links) Modelos GARCH tendo a normal e a t-Student como distribuições condicionais são amplamente utilizados para modelagem da volatilidade de dados financeiros. No entanto, tais distribuições podem não ser apropriadas para algumas séries com caudas pesadas e comportamento leptocúrtico. As chamadas distribuições estáveis podem ser mais adequadas para sua modelagem, como já explorado na literatura. Por outro lado, os modelos GAS (Generalized Autoregressive Score), com desenvolvimento recente, tratam-se de modelos dinâmicos que possuem em sua estrutura a função score (derivada do logaritmo da verossimilhança). Tal abordagem oferece uma direção natural para a evolução dos parâmetros da distribuição dos dados. Neste trabalho, é proposto um novo modelo GAS em conjunção com distribuições estáveis simétricas para a modelagem da volatilidade - de fato, é uma generalização do GARCH, pois, para uma particular escolha de distribuição estável e de estrutura do modelo, tem-se o clássico modelo GARCH gaussiano. Como em geral a função densidade das distribuições estáveis não possui forma analítica fechada, é apresentado seu procedimento de cálculo, bem como de suas derivadas, para o completo desenvolvimento do método de estimação dos parâmetros. Também são analisadas as condições de estacionariedade e a estrutura de dependência do modelo. Estudos de simulação são conduzidos, bem como uma aplicação a dados reais, para comparação entre modelos usuais, que utilizam distribuições normal e t-Student, e o modelo proposto, demonstrando a eficácia deste. / GARCH models with normal and t-Student conditional distributions are widely used for volatility modeling in financial data. However, such distributions may not be suitable for some heavy-tailed and leptokurtic series. The stable distributions may be more adequate to fit such characteristics, as already exploited in the literature. On the other hand, the recently developed GAS (Generalized Autoregressive Score) models are dynamic models in which the updating mechanism of the time-varying parameters is based on the score function (first derivative of the log-likelihood function). This provides the natural direction for updating the parameters, based on the complete density. We propose a new GAS model with symmetric stable distribution for volatility modeling. The model can be interpreted as a generalization of the GARCH models, since the classic gaussian GARCH model is derived from it by using particular choices of the stable distribution and the model structure. There are no closed analytical expressions for general stable densities in most cases, hence its numeric computation and derivatives are detailed for the sake of complete development of the estimation process. The stationarity conditions and the dependence structure of the model are analysed. Simulation studies, as well as an application to real data, are presented for comparisons between the usual models and the proposed model, illustrating the effectiveness of the latter. Caudas pesadas Conditional heteroscedasticity Distribuições estáveis Financial modeling Função score Heavy tails Heteroscedasticidade condicional Modelagem financeira Score function Stable distributions
8	Modelos GAS com distribuições estáveis para séries temporais financeiras / Stable GAS models for financial time series Daniel Takata Gomes 06 December 2017 (has links) Modelos GARCH tendo a normal e a t-Student como distribuições condicionais são amplamente utilizados para modelagem da volatilidade de dados financeiros. No entanto, tais distribuições podem não ser apropriadas para algumas séries com caudas pesadas e comportamento leptocúrtico. As chamadas distribuições estáveis podem ser mais adequadas para sua modelagem, como já explorado na literatura. Por outro lado, os modelos GAS (Generalized Autoregressive Score), com desenvolvimento recente, tratam-se de modelos dinâmicos que possuem em sua estrutura a função score (derivada do logaritmo da verossimilhança). Tal abordagem oferece uma direção natural para a evolução dos parâmetros da distribuição dos dados. Neste trabalho, é proposto um novo modelo GAS em conjunção com distribuições estáveis simétricas para a modelagem da volatilidade - de fato, é uma generalização do GARCH, pois, para uma particular escolha de distribuição estável e de estrutura do modelo, tem-se o clássico modelo GARCH gaussiano. Como em geral a função densidade das distribuições estáveis não possui forma analítica fechada, é apresentado seu procedimento de cálculo, bem como de suas derivadas, para o completo desenvolvimento do método de estimação dos parâmetros. Também são analisadas as condições de estacionariedade e a estrutura de dependência do modelo. Estudos de simulação são conduzidos, bem como uma aplicação a dados reais, para comparação entre modelos usuais, que utilizam distribuições normal e t-Student, e o modelo proposto, demonstrando a eficácia deste. / GARCH models with normal and t-Student conditional distributions are widely used for volatility modeling in financial data. However, such distributions may not be suitable for some heavy-tailed and leptokurtic series. The stable distributions may be more adequate to fit such characteristics, as already exploited in the literature. On the other hand, the recently developed GAS (Generalized Autoregressive Score) models are dynamic models in which the updating mechanism of the time-varying parameters is based on the score function (first derivative of the log-likelihood function). This provides the natural direction for updating the parameters, based on the complete density. We propose a new GAS model with symmetric stable distribution for volatility modeling. The model can be interpreted as a generalization of the GARCH models, since the classic gaussian GARCH model is derived from it by using particular choices of the stable distribution and the model structure. There are no closed analytical expressions for general stable densities in most cases, hence its numeric computation and derivatives are detailed for the sake of complete development of the estimation process. The stationarity conditions and the dependence structure of the model are analysed. Simulation studies, as well as an application to real data, are presented for comparisons between the usual models and the proposed model, illustrating the effectiveness of the latter. Caudas pesadas Distribuições estáveis Função score Heteroscedasticidade condicional Modelagem financeira Conditional heteroscedasticity Financial modeling Heavy tails Score function Stable distributions
9	Statistical Process Control for the Fairness of Network Resource Distribution Liu, Qingyun 10 November 2011 (has links) The purpose of this research is to develop a statistical method to monitor the fairness of network resource distribution. The newly developed fairness score function allows users to have the same or different priority levels. Especially, this function possesses all the necessary properties required as a quality characteristic for the purpose of statistical process control. The main objective is to find the critical values for the statistical test. Monte Carlo simulation is used to find the critical values. When the users have the same priority level, a table of the critical values is given for different sample sizes and different significance levels. When the users have different priority levels, it is difficult to generate a similar table since the users’ priority levels vary. Therefore, the critical values are computed for given priority levels. In both cases, an example is given to demonstrate the approach developed in this study. priority level fairness score function network resource distribution statistical process control control chart Monte Carlo simulation critical value
10	Generalized Estimating Equations for Mixed Models Alnaji, Lulah A. 23 July 2018 (has links) No description available. Statistics Mathematics Generalized Estimating Equations Mixed Models Working Correlation Matrix Clustred Data Longitudinal Pearson Residual Score Function

Search results