Global ETD Search

81	Batch and Online Implicit Weighted Gaussian Processes for Robust Novelty Detection Ramirez, Padron Ruben 01 January 2015 (has links) This dissertation aims mainly at obtaining robust variants of Gaussian processes (GPs) that do not require using non-Gaussian likelihoods to compensate for outliers in the training data. Bayesian kernel methods, and in particular GPs, have been used to solve a variety of machine learning problems, equating or exceeding the performance of other successful techniques. That is the case of a recently proposed approach to GP-based novelty detection that uses standard GPs (i.e. GPs employing Gaussian likelihoods). However, standard GPs are sensitive to outliers in training data, and this limitation carries over to GP-based novelty detection. This limitation has been typically addressed by using robust non-Gaussian likelihoods. However, non-Gaussian likelihoods lead to analytically intractable inferences, which require using approximation techniques that are typically complex and computationally expensive. Inspired by the use of weights in quasi-robust statistics, this work introduces a particular type of weight functions, called here data weighers, in order to obtain robust GPs that do not require approximation techniques and retain the simplicity of standard GPs. This work proposes implicit weighted variants of batch GP, online GP, and sparse online GP (SOGP) that employ weighted Gaussian likelihoods. Mathematical expressions for calculating the posterior implicit weighted GPs are derived in this work. In our experiments, novelty detection based on our weighted batch GPs consistently and significantly outperformed standard batch GP-based novelty detection whenever data was contaminated with outliers. Additionally, our experiments show that novelty detection based on online GPs can perform similarly to batch GP-based novelty detection. Membership scores previously introduced by other authors are also compared in our experiments. Engineering
82	Scalable And Efficient Outlier Detection In Large Distributed Data Sets With Mixed-type Attributes Koufakou, Anna 01 January 2009 (has links) An important problem that appears often when analyzing data involves identifying irregular or abnormal data points called outliers. This problem broadly arises under two scenarios: when outliers are to be removed from the data before analysis, and when useful information or knowledge can be extracted by the outliers themselves. Outlier Detection in the context of the second scenario is a research field that has attracted significant attention in a broad range of useful applications. For example, in credit card transaction data, outliers might indicate potential fraud; in network traffic data, outliers might represent potential intrusion attempts. The basis of deciding if a data point is an outlier is often some measure or notion of dissimilarity between the data point under consideration and the rest. Traditional outlier detection methods assume numerical or ordinal data, and compute pair-wise distances between data points. However, the notion of distance or similarity for categorical data is more difficult to define. Moreover, the size of currently available data sets dictates the need for fast and scalable outlier detection methods, thus precluding distance computations. Additionally, these methods must be applicable to data which might be distributed among different locations. In this work, we propose novel strategies to efficiently deal with large distributed data containing mixed-type attributes. Specifically, we first propose a fast and scalable algorithm for categorical data (AVF), and its parallel version based on MapReduce (MR-AVF). We extend AVF and introduce a fast outlier detection algorithm for large distributed data with mixed-type attributes (ODMAD). Finally, we modify ODMAD in order to deal with very high-dimensional categorical data. Experiments with large real-world and synthetic data show that the proposed methods exhibit large performance gains and high scalability compared to the state-of-the-art, while achieving similar accuracy detection rates. Data Mining Outlier Detection Anomaly Detection Distributed Datasets Categorical Datasets Mixed-Type Attribute Datasets Computer Engineering Engineering
83	Outlier detection on sparse-encoded vibration signals from rolling element bearings Al-Kahwati, Kammal January 2019 (has links) The demand for reliable condition monitoring systems on rotating machinery for power generation is continuously increasing due to a wider use of wind power as an energy source, which requires expertise in the diagnostics of these systems. An alternative to the limited availability of diagnostics and maintenance experts in the wind energy sector is to use unsupervised machine learning algorithms as a support tool for condition monitoring. The way condition monitoring systems can employ unsupervised machine learning algorithms consists on prioritizing the assets to monitor via the number of anomalies detected in the vibration signals of the rolling element bearings. Previous work has focused on the detection of anomalies using features taken directly from the time or frequency domain of the vibration signals to determine if a machine has a fault. In this work, I detect outliers using features derived from encoded vibration signals via sparse coding with dictionary learning. I investigate multiple outlier detection algorithms and evaluate their performance using different features taken from the sparse representation. I show that it is possible to detect an abnormal behavior on a bearing earlier than reported fault dates using typical condition monitoring systems. machine learning outlier detection data analysis rolling element bearing maskininlärning dataanalys kullager Computer and Information Sciences Data- och informationsvetenskap
84	Modern Anomaly Detection: Benchmarking, Scalability and a Novel Approach Pasupathipillai, Sivam 27 November 2020 (has links) Anomaly detection consists in automatically detecting the most unusual elements in a data set. Anomaly detection applications emerge in domains such as computer security, system monitoring, fault detection, and wireless sensor networks. The strategic importance of detecting anomalies in these domains makes anomaly detection a critical data analysis task. Moreover, the contextual nature of anomalies, among other issues, makes anomaly detection a particularly challenging problem. Anomaly detection has received significant research attention in the last two decades. Much effort has been invested in the development of novel algorithms for anomaly detection. However, several open challenges still exist in the field.This thesis presents our contributions toward solving these challenges. These contributions include: a methodological survey of the recent literature, a novel benchmarking framework for anomaly detection algorithms, an approach for scaling anomaly detection techniques to massive data sets, and a novel anomaly detection algorithm inspired by the law of universal gravitation. Our methodological survey highlights open challenges in the field, and it provides some motivation for our other contributions. Our benchmarking framework, named BAD, tackles the problem of reliably assess the accuracy of unsupervised anomaly detection algorithms. BAD leverages parallel and distributed computing to enable massive comparison studies and hyperparameter tuning tasks. The challenge of scaling unsupervised anomaly detection techniques to massive data sets is well-known in the literature. In this context, our contributions are twofold: we investigate the trade-offs between a single-threaded implementation and a distributed approach considering price-performance metrics, and we propose a scalable approach for anomaly detection algorithms to arbitrary data volumes. Our results show that, when high scalability is required, our approach can handle arbitrarily large data sets without significantly compromising detection accuracy. We conclude our contributions by proposing a novel algorithm for anomaly detection, named Gravity. Gravity identifies anomalies by considering the attraction forces among massive data elements. Our evaluation shows that Gravity is competitive with other popular anomaly detection techniques on several benchmark data sets. Additionally, the properties of Gravity makes it preferable in cases where hyperparameter tuning is challenging or unfeasible. anomaly detection outlier detection evaluation scalability data management
85	The Generalized Multiset Sampler: Theory and Its Application Kim, Hang Joon 25 June 2012 (has links) No description available. Statistics Advanced MCMC Metropolis Importance sampling Mixture model Local trap Gene expression study Multimodality Bimodality Simultaneous equation Outlier detection
86	High-dimensional statistical methods for inter-subject studies in neuroimaging / Analyse statistique de données en grande dimension : application à l'étude de la variabilité inter-individuelle en neuroimagerie Fritsch, Virgile 18 December 2013 (has links) La variabilité inter-individuelle est un obstacle majeur à l'analyse d'images médicales, en particulier en neuroimagerie. Il convient de distinguer la variabilité naturelle ou statistique, source de potentiels effets d'intérêt pour du diagnostique, de la variabilité artefactuelle, constituée d'effets de nuisance liés à des problèmes expérimentaux ou techniques, survenant lors de l'acquisition ou le traitement des données. La dernière peut s'avérer bien plus importante que la première : en neuroimagerie, les problèmes d'acquisition peuvent ainsi masquer la variabilité fonctionnelle qui est par ailleurs associée à une maladie, un trouble psychologique, ou à l'expression d'un code génétique spécifique. La qualité des procédures statistiques utilisées pour les études de groupe est alors diminuée car lesdites procédures reposent sur l'hypothèse d'une population homogène, hypothèse difficile à vérifier manuellement sur des données de neuroimagerie dont la dimension est élevée. Des méthodes automatiques ont été mises en oeuvre pour tenter d'éliminer les sujets trop déviants et ainsi rendre les groupes étudiés plus homogènes. Cette pratique n'a pas entièrement fait ses preuves pour autant, attendu qu'aucune étude ne l'a clairement validée, et que le niveau de tolérance à choisir reste arbitraire. Une autre approche consiste alors à utiliser des procédures d'analyse et de traitement des données intrinsèquement insensibles à l'hypothèse d'homogénéité. Elles sont en outre mieux adaptées aux données réelles en ce qu'elles tolèrent dans une certaine mesure d'autres violations d'hypothèse plus subtiles telle que la normalité des données. Un autre problème, partiellement lié, est le manque de stabilité et de sensibilité des méthodes d'analyse au niveau voxel, sources de résultats qui ne sont pas reproductibles.Nous commençons cette thèse par le développement d'une méthode de détection d'individus atypiques adaptée aux données de neuroimagerie, qui fournit un contrôle statistique sur l'inclusion de sujets : nous proposons une version regularisée d'un estimateur de covariance robuste pour le rendre utilisable en grande dimension. Nous comparons plusieurs types de régularisation et concluons que les projections aléatoires offrent le meilleur compromis. Nous présentons également des procédures non-paramétriques dont nous montrons la qualité de performance, bien qu'elles n'offrent aucun contrôle statistique. La seconde contribution de cette thèse est une nouvelle approche, nommée RPBI (Randomized Parcellation Based Inference), répondant au manque de reproductibilité des méthodes classiques. Nous stabilisons l'approche d'analyse à l'échelle de la parcelle en agrégeant plusieurs analyses indépendantes, pour lesquelles le partitionnement du cerveau en parcelles varie d'une analyse à l'autre. La méthode permet d'atteindre un niveau de sensibilité supérieur à celui des méthodes de l'état de l'art, ce que nous démontrons par des expériences sur des données synthétiques et réelles. Notre troisième contribution est une application de la régression robuste aux études de neuroimagerie. Poursuivant un travail déjà existant, nous nous concentrons sur les études à grande échelle effectuées sur plus de cent sujets. Considérant à la fois des données simulées et des données réelles, nous montrons que l'utilisation de la régression robuste améliore la sensibilité des analyses. Nous démontrons qu'il est important d'assurer une résistance face aux violations d'hypothèse, même dans les cas où une inspection minutieuse du jeu de données a été conduite au préalable. Enfin, nous associons la régression robuste à notre méthode d'analyse RPBI afin d'obtenir des tests statistiques encore plus sensibles. / La variabilité inter-individuelle est un obstacle majeur à l'analyse d'images médicales, en particulier en neuroimagerie. Il convient de distinguer la variabilité naturelle ou statistique, source de potentiels effets d'intérêt pour du diagnostique, de la variabilité artefactuelle, constituée d'effets de nuisance liés à des problèmes expérimentaux ou techniques, survenant lors de l'acquisition ou le traitement des données. La dernière peut s'avérer bien plus importante que la première : en neuroimagerie, les problèmes d'acquisition peuvent ainsi masquer la variabilité fonctionnelle qui est par ailleurs associée à une maladie, un trouble psychologique, ou à l'expression d'un code génétique spécifique. La qualité des procédures statistiques utilisées pour les études de groupe est alors diminuée car lesdites procédures reposent sur l'hypothèse d'une population homogène, hypothèse difficile à vérifier manuellement sur des données de neuroimagerie dont la dimension est élevée. Des méthodes automatiques ont été mises en oeuvre pour tenter d'éliminer les sujets trop déviants et ainsi rendre les groupes étudiés plus homogènes. Cette pratique n'a pas entièrement fait ses preuves pour autant, attendu qu'aucune étude ne l'a clairement validée, et que le niveau de tolérance à choisir reste arbitraire. Une autre approche consiste alors à utiliser des procédures d'analyse et de traitement des données intrinsèquement insensibles à l'hypothèse d'homogénéité. Elles sont en outre mieux adaptées aux données réelles en ce qu'elles tolèrent dans une certaine mesure d'autres violations d'hypothèse plus subtiles telle que la normalité des données. Un autre problème, partiellement lié, est le manque de stabilité et de sensibilité des méthodes d'analyse au niveau voxel, sources de résultats qui ne sont pas reproductibles.Nous commençons cette thèse par le développement d'une méthode de détection d'individus atypiques adaptée aux données de neuroimagerie, qui fournit un contrôle statistique sur l'inclusion de sujets : nous proposons une version regularisée d'un estimateur de covariance robuste pour le rendre utilisable en grande dimension. Nous comparons plusieurs types de régularisation et concluons que les projections aléatoires offrent le meilleur compromis. Nous présentons également des procédures non-paramétriques dont nous montrons la qualité de performance, bien qu'elles n'offrent aucun contrôle statistique. La seconde contribution de cette thèse est une nouvelle approche, nommée RPBI (Randomized Parcellation Based Inference), répondant au manque de reproductibilité des méthodes classiques. Nous stabilisons l'approche d'analyse à l'échelle de la parcelle en agrégeant plusieurs analyses indépendantes, pour lesquelles le partitionnement du cerveau en parcelles varie d'une analyse à l'autre. La méthode permet d'atteindre un niveau de sensibilité supérieur à celui des méthodes de l'état de l'art, ce que nous démontrons par des expériences sur des données synthétiques et réelles. Notre troisième contribution est une application de la régression robuste aux études de neuroimagerie. Poursuivant un travail déjà existant, nous nous concentrons sur les études à grande échelle effectuées sur plus de cent sujets. Considérant à la fois des données simulées et des données réelles, nous montrons que l'utilisation de la régression robuste améliore la sensibilité des analyses. Nous démontrons qu'il est important d'assurer une résistance face aux violations d'hypothèse, même dans les cas où une inspection minutieuse du jeu de données a été conduite au préalable. Enfin, nous associons la régression robuste à notre méthode d'analyse RPBI afin d'obtenir des tests statistiques encore plus sensibles. Neuroimagerie IRMf Statistiques robustes Estimation de covariance Détection de sujets aberrants Analyse de groupe Grande dimension Neuroimaging FMRI Robust statistics Covariance estimation Outlier detection Group analysis High-dimension
87	Fast and Scalable Outlier Detection with Metric Access Methods / Detecção Rápida e Escalável de Casos de Exceção com Métodos de Acesso Métrico Bispo Junior, Altamir Gomes 25 July 2019 (has links) It is well-known that the existing theoretical models for outlier detection make assumptions that may not reflect the true nature of outliers in every real application. This dissertation describes an empirical study performed on unsupervised outlier detection using 8 algorithms from the state-of-the-art and 8 datasets that refer to a variety of real-world tasks of practical relevance, such as spotting cyberattacks, clinical pathologies and abnormalities occurring in nature. We present our lowdown on the results obtained, pointing out to the strengths and weaknesses of each technique from the application specialists point of view, which is a shift from the designer-based point of view that is commonly adopted. Many of the techniques had unfeasibly high runtime requirements or failed to spot what the specialists consider as outliers in their own data. To tackle this issue, we propose MetricABOD: a novel ABOD-based algorithm that makes the analysis up to thousands of times faster, still being in average 26% more accurate than the most accurate related work. This improvement is tantamount to practical outlier detection in many real-world applications for which the existing methods present unstable accuracy or unfeasible runtime requirements. Finally, we studied two collections of text data to show that our MetricABOD works also for adimensional, purely metric data. / É conhecido e notável que os modelos teóricos existentes empregados na detecção de outliers realizam assunções que podem não refletir a verdadeira natureza dos outliers em cada aplicação. Esta dissertação descreve um estudo empírico sobre detecção de outliers não-supervisionada usando 8 algoritmos do estado-da-arte e 8 conjuntos de dados que foram extraídos de uma variedade de tarefas do mundo real de relevância prática, tais como a detecção de ataques cibernéticos, patologias clínicas e anormalidades naturais. Apresentam-se considerações sobre os resultados obtidos, apontando os pontos positivos e negativos de cada técnica do ponto de vista do especialista da aplicação, o que representa uma mudança do embasamento rotineiro no ponto de vista do desenvolvedor da técnica. A maioria das técnicas estudadas apresentou requerimentos de tempo impraticáveis ou falhou em encontrar o que os especialistas consideram como outliers nos conjuntos de dados confeccionados por eles próprios. Para lidar-se com esta questão, foi desenvolvido o método MetricABOD: um novo algoritmo baseado no ABOD que torna a análise milhares de vezes mais veloz, sendo ainda em média 26% mais acurada do que o trabalho relacionado mais acurado. Esta melhoria equivale a tornar a busca por outliers uma tarefa factível em muitas aplicações do mundo real para as quais os métodos existentes apresentam resultados instáveis ou requerimentos de tempo impassíveis de realização. Finalmente, foram também estudadas duas coleções de dados adimensionais para mostrar que o novo MetricABOD funciona também para dados puramente métricos. Applied computational sciences Ciência computacional aplicada Complex data Dados complexos Data mining Métodos de acesso métrico Metric access methods Mineração de dados Unsupervised outlier detection
88	Urban Change Detection Using Multitemporal SAR Images Yousif, Osama January 2015 (has links) Multitemporal SAR images have been increasingly used for the detection of different types of environmental changes. The detection of urban changes using SAR images is complicated due to the complex mixture of the urban environment and the special characteristics of SAR images, for example, the existence of speckle. This thesis investigates urban change detection using multitemporal SAR images with the following specific objectives: (1) to investigate unsupervised change detection, (2) to investigate effective methods for reduction of the speckle effect in change detection, (3) to investigate spatio-contextual change detection, (4) to investigate object-based unsupervised change detection, and (5) to investigate a new technique for object-based change image generation. Beijing and Shanghai, the largest cities in China, were selected as study areas. Multitemporal SAR images acquired by ERS-2 SAR and ENVISAT ASAR sensors were used for pixel-based change detection. For the object-based approaches, TerraSAR-X images were used. In Paper I, the unsupervised detection of urban change was investigated using the Kittler-Illingworth algorithm. A modified ratio operator that combines positive and negative changes was used to construct the change image. Four density function models were tested and compared. Among them, the log-normal and Nakagami ratio models achieved the best results. Despite the good performance of the algorithm, the obtained results suffer from the loss of fine geometric detail in general. This was a consequence of the use of local adaptive filters for speckle suppression. Paper II addresses this problem using the nonlocal means (NLM) denoising algorithm for speckle suppression and detail preservation. In this algorithm, denoising was achieved through a moving weighted average. The weights are a function of the similarity of small image patches defined around each pixel in the image. To decrease the computational complexity, principle component analysis (PCA) was used to reduce the dimensionality of the neighbourhood feature vectors. Simple methods to estimate the number of significant PCA components to be retained for weights computation and the required noise variance were proposed. The experimental results showed that the NLM algorithm successfully suppressed speckle effects, while preserving fine geometric detail in the scene. The analysis also indicates that filtering the change image instead of the individual SAR images was effective in terms of the quality of the results and the time needed to carry out the computation. The Markov random field (MRF) change detection algorithm showed limited capacity to simultaneously maintain fine geometric detail in urban areas and combat the effect of speckle. To overcome this problem, Paper III utilizes the NLM theory to define a nonlocal constraint on pixels class-labels. The iterated conditional mode (ICM) scheme for the optimization of the MRF criterion function is extended to include a new step that maximizes the nonlocal probability model. Compared with the traditional MRF algorithm, the experimental results showed that the proposed algorithm was superior in preserving fine structural detail, effective in reducing the effect of speckle, less sensitive to the value of the contextual parameter, and less affected by the quality of the initial change map. Paper IV investigates object-based unsupervised change detection using very high resolution TerraSAR-X images over urban areas. Three algorithms, i.e., Kittler-Illingworth, Otsu, and outlier detection, were tested and compared. The multitemporal images were segmented using multidate segmentation strategy. The analysis reveals that the three algorithms achieved similar accuracies. The achieved accuracies were very close to the maximum possible, given the modified ratio image as an input. This maximum, however, was not very high. This was attributed, partially, to the low capacity of the modified ratio image to accentuate the difference between changed and unchanged areas. Consequently, Paper V proposes a new object-based change image generation technique. The strong intensity variations associated with high resolution and speckle effects render object mean intensity unreliable feature. The modified ratio image is, therefore, less efficient in emphasizing the contrast between the classes. An alternative representation of the change data was proposed. To measure the intensity of change at the object in isolation of disturbances caused by strong intensity variations and speckle effects, two techniques based on the Fourier transform and the Wavelet transform of the change signal were developed. Qualitative and quantitative analyses of the result show that improved change detection accuracies can be obtained by classifying the proposed change variables. / <p>QC 20150529</p> Change detection High resolution Image denoising Kittler-Illingworth MAP-MRF Multitemporal SAR images Nonlocal means Object-based Otsu Outlier detection Remote sensing SAR speckle Urban
89	The 3σ-rule for outlier detection from the viewpoint of geodetic adjustment Lehmann, Rüdiger 21 January 2015 (has links) (PDF) The so-called 3σ-rule is a simple and widely used heuristic for outlier detection. This term is a generic term of some statistical hypothesis tests whose test statistics are known as normalized or studentized residuals. The conditions, under which this rule is statistically substantiated, were analyzed, and the extent it applies to geodetic least-squares adjustment was investigated. Then, the efficiency or non-efficiency of this method was analyzed and demonstrated on the example of repeated observations. / Die sogenannte 3σ-Regel ist eine einfache und weit verbreitete Heuristik für die Ausreißererkennung. Sie ist ein Oberbegriff für einige statistische Hypothesentests, deren Teststatistiken als normierte oder studentisierte Verbesserungen bezeichnet werden. Die Bedingungen, unter denen diese Regel statistisch begründet ist, werden analysiert. Es wird untersucht, inwieweit diese Regel auf geodätische Ausgleichungsprobleme anwendbar ist. Die Effizienz oder Nichteffizienz dieser Methode wird analysiert und demonstriert am Beispiel von Wiederholungsmessungen. Ausgleichungsrechnung Ausreißererkennung Normierte Verbesserungen Studentisierte Verbesserungen Hypothesentests Least-squares adjustment Outlier detection Normalized residuals Studentized residuals Hypothesis tests Three-sigma rule ddc:520 rvk:ZI 9075 rvk:ZI 9080
90	Methods in productivity and efficiency analysis with applications to warehousing Johnson, Andrew 31 March 2006 (has links) A set of technical issues are addressed related to benchmarking best practice behavior in warehouses. In order to identify best practice, first performance needs to be measured. There are a variety of tools available to measure productivity and efficiency. One of the most common tools is data envelopment analysis (DEA). Given a system that consumes inputs to generate outputs, previous work has shown production theory can be used to develop basic postulates about the production possibility space and to construct an efficient frontier which is used to quantify efficiency. Beyond inputs and outputs warehouses typically have practices (techniques used in the warehouse) or attributes (characteristics of the environment of the warehouse including demand characteristics) which also influence efficiency. Previously in the literature, a two-stage method has been developed to investigate the impact of practices and attributes on efficiency. When applying this method, two issues arose: how to measure efficiency in small samples and how to identify outliers. The small sample efficiency measurement method developed in this thesis is called multi-input / multi-output quantile based approach (MQBA) and uses deleted residuals to estimate efficiency. The outlier detection method introduces the inefficient frontier. Both overly efficient and overly inefficient outliers can be identified by constructing an efficient and an inefficient frontier. The outlier detection method incorporates an iterative procedure previously described, but has not been implemented in the literature. Further, this thesis also discusses issues related to selecting an orientation in super efficiency models. Super efficiency models are used in outlier detection, but are also commonly used in measuring technical progress via the Malmquist index. These issues are addressed using two data sets recently collected in the warehousing industry. The first data set consists of 390 observations of various types of warehouses. The other data set has 25 observations from a specific industry. For both data sets, it is shown that significantly different results are realized if the methods suggested in this document are adopted. Warehousing Outlier detection Hyperbolic orientation Data envelopment analysis Efficiency Productivity Warehouses Management Benchmarking (Management) Industrial efficiency Data envelopment analysis

Search results