21 |
Une approche de détection d'outliers en présence de l'incertitude / An outlier detection approach in the presence of uncertaintyHacini, Akram 06 December 2018 (has links)
Un des aspects de complexité des nouvelles données, issues des différents systèmes de traitement,sont l’imprécision, l’incertitude, et l’incomplétude. Ces aspects ont aggravés la multiplicité etdissémination des sources productrices de données, qu’on observe facilement dans les systèmesde contrôle et de monitoring. Si les outils de la fouille de données sont devenus assez performants avec des données dont on dispose de connaissances a priori fiables, ils ne peuvent pas êtreappliqués aux données où les connaissances elles mêmes peuvent être entachées d’incertitude etd’imprécision. De ce fait, de nouvelles approches qui prennent en compte cet aspect vont certainement améliorer les performances des systèmes de fouille de données, dont la détection desoutliers, objet de notre recherche dans le cadre de cette thèse. Cette thèse s’inscrit dans cette optique, à savoir la proposition d’une nouvelle méthode pourla détection d’outliers dans les données incertaines et/ou imprécises. En effet, l’imprécision etl’incertitude des expertises relatives aux données d’apprentissage, est un aspect de complexitédes données. Pour pallier à ce problème particulier d’imprécision et d’incertitude des donnéesexpertisées, nous avons combinés des techniques issues de l’apprentissage automatique, et plusparticulièrement le clustering, et des techniques issues de la logique floue, en particulier les ensembles flous, et ce, pour pouvoir projeter de nouvelles observations, sur les clusters des donnéesd’apprentissage, et après seuillage, pouvoir définir les observations à considérer comme aberrantes(outliers) dans le jeu de données considéré.Concrètement, en utilisant les tables de décision ambigües (TDA), nous sommes partis des indices d’ambigüité des données d’apprentissage pour calculer les indices d’ambigüités des nouvellesobservations (données de test), et ce en faisant recours à l’inférence floue. Après un clustering del’ensemble des indices d’ambigüité, une opération α-coupe, nous a permis de définir une frontièrede décision au sein des clusters, et qui a été utilisée à son tour pour catégoriser les observations,en normales (inliers) ou aberrantes (outliers). La force de la méthode proposée réside dans sonpouvoir à traiter avec des données d’apprentissage imprécises et/ou incertaines en utilisant uniquement les indices d’ambigüité, palliant ainsi aux différents problèmes d’incomplétude des jeuxde données. Les métriques de faux positifs et de rappel, nous ont permis d’une part d’évaluer lesperformances de notre méthode, et aussi de la paramétrer selon les choix de l’utilisateur. / One of the complexity aspects of the new data produced by the different processing systems is the inaccuracy, the uncertainty, and the incompleteness. These aspects are aggravated by the multiplicity and the dissemination of data-generating sources, that can be easily observed within various control and monitoring systems. While the tools of data mining have become fairly efficient with data that have reliable prior knowledge, they cannot be applied to data where the knowledge itself may be tainted with uncertainty and inaccuracy. As a result, new approaches that take into account this aspect will certainly improve the performance of data mining systems, including the detection of outliers,which is the subject of our research in this thesis.This thesis deals therefore with a particular aspect of uncertainty and accuracy, namely the proposal of a new method to detect outliers in uncertain and / or inaccurate data. Indeed, the inaccuracy of the expertise related to the learning data, is an aspect of complexity. To overcome this particular problem of inaccuracy and uncertainty of the expertise data, we have combined techniques resulting from machine learning, especially clustering, and techniques derived from fuzzy logic, especially fuzzy sets. So we will be able to project the new observations, on the clusters of the learning data, and after thresholding, defining the observations to consider as aberrant (outliers) in the considered dataset.Specifically, using ambiguous decision tables (ADTs), we proceeded from the ambiguity indices of the learning data to compute the ambiguity indices of the new observations (test data), using the Fuzzy Inference. After clustering, the set of ambiguity indices, an α-cut operation allowed us to define a decision boundary within the clusters, which was used in turn to categorize the observations as normal (inliers ) or aberrant (outliers). The strength of the proposed method lies in its ability to deal with inaccurate and / or uncertain learning data using only the indices of ambiguity, thus overcoming the various problems of incompleteness of the datasets. The metrics of false positives and recall, allowed us on one hand to evaluate the performances of our method, and also to parameterize it according to the choices of the user.
|
22 |
Algoritmos de casamento de imagens com filtragem adaptativa de outliers / Image matching algorithms with adaptive filtering of outliers.Ramos, Jonathan da Silva 01 December 2016 (has links)
O registro de imagens tem um papel importante em várias aplicações, tais como reconstrução de objetos 3D, reconhecimento de padrões, imagens microscópicas, entre outras. Este registro é composto por três passos principais: (1) seleção de pontos de interesse; (2) extração de características dos pontos de interesse; (3) correspondência entre os pontos de interesse de uma imagem para a outra. Para os passos 1 e 2, algoritmos como SIFT e SURF têm apresentado resultados satisfatórios. Entretanto, para o passo 3 ocorre a presença de outliers, ou seja, pontos de interesse que foram incorretamente correspondidos. Uma única correspondência incorreta leva a um resultado final indesejável. Os algoritmos para remoção de outliers (consenso) possuem um alto custo computacional, que cresce à medida que a quantidade de outliers aumenta. Com o objetivo de reduzir o tempo de processamento necessário por esses algoritmos, o algoritmo FOMP(do inglês, Filtering out Outliers from Matched Points), foi proposto e desenvolvido neste trabalho para realizar a filtragem de outliers no conjunto de pontos inicialmente correspondidos. O método FOMP considera cada conjunto de pontos como um grafo completo, no qual os pesos são as distâncias entre os pontos. Por meio da soma de diferenças entre os pesos das arestas, o vértice que apresentar maior valor é removido. Para validar o método FOMP, foram realizados experimentos utilizando quatro bases de imagens. Cada base apresenta características intrínsecas: (a) diferenças de rotação zoom da câmera; (b) padrões repetitivos, os quais geram duplicidade nos vetores de características; (c) objetos de formados, tais como plásticos, papéis ou tecido; (d) transformações afins (diferentes pontos de vista). Os experimentos realizados mostraram que o filtro FOMP remove mais de 65% dos outliers, enquanto mantém cerca de 98%dos inliers. A abordagem proposta mantém a precisão dos métodos de consenso, enquanto reduz o tempo de processamento pela metade para os métodos baseados em grafos. / Image matching plays a major role in many applications, such as pattern recognition and microscopic imaging. It encompasses three steps: 1) interest point selection; 2) feature extraction from each point; 3) feature point matching. For steps 1 and 2, traditional interest point detectors/ extractors have worked well. However, for step 3 even a few points incorrectly matched (outliers), might lead to an undesirable result. State-of-the-art consensus algorithms present a high time cost as the number of outlier increases. Aiming at overcoming this problem, we present FOMP, a preprocessing approach, that reduces the number of outliers in the initial set of matched points. FOMP filters out the vertices that present a higher difference among their edges in a complete graph representation of the points. To validate the proposed method, experiments were performed with four image database: (a) variations of rotation or camera zoom; (b) repetitive patterns, which leads to duplicity of features vectors; (c) deformable objects, such as plastics, clothes or papers; (d) affine transformations (different viewpoint). The experimental results showed that FOMP removes more than 65% of the outliers, while keeping over 98% of the inliers. Moreover, the precision of traditional methods is kept, while reducing the processing time of graph based approaches by half.
|
23 |
Algoritmos de casamento de imagens com filtragem adaptativa de outliers / Image matching algorithms with adaptive filtering of outliers.Jonathan da Silva Ramos 01 December 2016 (has links)
O registro de imagens tem um papel importante em várias aplicações, tais como reconstrução de objetos 3D, reconhecimento de padrões, imagens microscópicas, entre outras. Este registro é composto por três passos principais: (1) seleção de pontos de interesse; (2) extração de características dos pontos de interesse; (3) correspondência entre os pontos de interesse de uma imagem para a outra. Para os passos 1 e 2, algoritmos como SIFT e SURF têm apresentado resultados satisfatórios. Entretanto, para o passo 3 ocorre a presença de outliers, ou seja, pontos de interesse que foram incorretamente correspondidos. Uma única correspondência incorreta leva a um resultado final indesejável. Os algoritmos para remoção de outliers (consenso) possuem um alto custo computacional, que cresce à medida que a quantidade de outliers aumenta. Com o objetivo de reduzir o tempo de processamento necessário por esses algoritmos, o algoritmo FOMP(do inglês, Filtering out Outliers from Matched Points), foi proposto e desenvolvido neste trabalho para realizar a filtragem de outliers no conjunto de pontos inicialmente correspondidos. O método FOMP considera cada conjunto de pontos como um grafo completo, no qual os pesos são as distâncias entre os pontos. Por meio da soma de diferenças entre os pesos das arestas, o vértice que apresentar maior valor é removido. Para validar o método FOMP, foram realizados experimentos utilizando quatro bases de imagens. Cada base apresenta características intrínsecas: (a) diferenças de rotação zoom da câmera; (b) padrões repetitivos, os quais geram duplicidade nos vetores de características; (c) objetos de formados, tais como plásticos, papéis ou tecido; (d) transformações afins (diferentes pontos de vista). Os experimentos realizados mostraram que o filtro FOMP remove mais de 65% dos outliers, enquanto mantém cerca de 98%dos inliers. A abordagem proposta mantém a precisão dos métodos de consenso, enquanto reduz o tempo de processamento pela metade para os métodos baseados em grafos. / Image matching plays a major role in many applications, such as pattern recognition and microscopic imaging. It encompasses three steps: 1) interest point selection; 2) feature extraction from each point; 3) feature point matching. For steps 1 and 2, traditional interest point detectors/ extractors have worked well. However, for step 3 even a few points incorrectly matched (outliers), might lead to an undesirable result. State-of-the-art consensus algorithms present a high time cost as the number of outlier increases. Aiming at overcoming this problem, we present FOMP, a preprocessing approach, that reduces the number of outliers in the initial set of matched points. FOMP filters out the vertices that present a higher difference among their edges in a complete graph representation of the points. To validate the proposed method, experiments were performed with four image database: (a) variations of rotation or camera zoom; (b) repetitive patterns, which leads to duplicity of features vectors; (c) deformable objects, such as plastics, clothes or papers; (d) affine transformations (different viewpoint). The experimental results showed that FOMP removes more than 65% of the outliers, while keeping over 98% of the inliers. Moreover, the precision of traditional methods is kept, while reducing the processing time of graph based approaches by half.
|
24 |
Identify influential observations in the estimation of covariance matrix.January 2000 (has links)
Wong Yuen Kwan Virginia. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 85-86). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Deletion and Distance Measure --- p.6 / Chapter 2.1 --- Mahalanobis and Cook's Distances --- p.6 / Chapter 2.2 --- Defining New Measure Di --- p.8 / Chapter 2.3 --- Derivation of cov(s(i) ´ؤ s) --- p.10 / Chapter 3 --- Procedures for Detecting Influential Observations --- p.18 / Chapter 3.1 --- The One-Step Method --- p.18 / Chapter 3.1.1 --- The Method --- p.18 / Chapter 3.1.2 --- Design of Simulation Studies --- p.19 / Chapter 3.1.3 --- Results of Simulation Studies --- p.21 / Chapter 3.1.4 --- Higher Dimensional Cases --- p.24 / Chapter 3.2 --- The Forward Search Procedure --- p.24 / Chapter 3.2.1 --- Idea of the Forward Search Procedure --- p.25 / Chapter 3.2.2 --- The Algorithm --- p.26 / Chapter 4 --- Examples and Observations --- p.29 / Chapter 4.1 --- Example 1: Brain and Body Weight Data --- p.29 / Chapter 4.2 --- Example 2: Stack Loss Data --- p.34 / Chapter 4.3 --- Example 3: Percentage of Cloud Cover --- p.40 / Chapter 4.4 --- Example 4: Synthetic data of Hawkins et al.(1984) . --- p.46 / Chapter 4.5 --- Observations and Comparison --- p.52 / Chapter 5 --- Discussion and Conclusion --- p.54 / Tables --- p.56 / Figures --- p.77 / Bibliography --- p.85
|
25 |
Parameter estimation when outliers may be present in normal dataQuimby, Barbara Bitz January 2010 (has links)
Typescript (photocopy). / Digitized by Kansas Correctional Industries
|
26 |
Simultaneous prediction intervals for autoregressive integrated moving average models in the presence of outliers.January 2001 (has links)
Cheung Tsai-Yee Crystal. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 83-85). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- The Importance of Forecasting --- p.1 / Chapter 2 --- Methodology --- p.5 / Chapter 2.1 --- Basic Idea --- p.5 / Chapter 2.2 --- Outliers in Time Series --- p.9 / Chapter 2.2.1 --- One Outlier Case --- p.9 / Chapter 2.2.2 --- Two Outliers Case --- p.17 / Chapter 2.2.3 --- General Case --- p.22 / Chapter 2.2.4 --- Time Series Parameters are Unknown --- p.24 / Chapter 2.3 --- Iterative Procedure for Detecting Outliers --- p.25 / Chapter 2.3.1 --- General Procedure for Detecting Outliers --- p.25 / Chapter 2.4 --- Methods of Constructing Simultaneous Prediction Intervals --- p.27 / Chapter 2.4.1 --- The Bonferroni Method --- p.28 / Chapter 2.4.2 --- The Exact Method --- p.28 / Chapter 3 --- An Illustrative Example --- p.29 / Chapter 3.1 --- Case A --- p.31 / Chapter 3.2 --- Case B --- p.32 / Chapter 3.3 --- Comparison --- p.33 / Chapter 4 --- Simulation Study --- p.36 / Chapter 4.1 --- Generate AR(1) with an Outlier --- p.36 / Chapter 4.1.1 --- Case A --- p.38 / Chapter 4.1.2 --- Case B --- p.40 / Chapter 4.2 --- Simulation Results I --- p.42 / Chapter 4.3 --- Generate AR(1) with Two Outliers --- p.45 / Chapter 4.4 --- Simulation Results II --- p.46 / Chapter 4.5 --- Concluding Remarks --- p.47 / Bibliography --- p.83
|
27 |
Estimação e previsão em processos sarfima(p, d, q) x (P, D, Q)ѕ[subscrito] na presença de outliersBisognin, Cleber January 2007 (has links)
Neste trabalho analisamos alguns processos com a propriedade de longa dependência e sazonalidade. Nosso estudo tem por objetivo principal estudar os processos k Factor GARMA p, λ, u, q e SARFIMA p, d, q P, D, Q s, onde s é a sazonalidade. Para os processos k Factor GARMA p, λ, u, q , baseados no conheci- mento das freqüências de Gegenbauer, propomos estimadores da classe semi- paramétrica para o correspondente parâmetro λ. Apresentamos importantes resultados envolvendo a função densidade espectral e os coeficientes das representações auto-regressiva e média móvel destes processos. No estudo dos processos SARFIMA p, d, q P, D, Q s, demonstramos algumas propriedades destes processos, tais como a expressão da função densidade espectral, o seu comportamento próximo às freqüências sazonais, a estacionariedade, as dependências intermediária e longa, a função de autocovariância e a sua expressão assintótica. Investigamos também as condições necessárias e suficientes para a causalidade e a inversibilidade destes processos SARFIMA. Analisamos a ergodicidade e apresentamos a previsão de erro quadrático médio mínimo para estes processos. Apresentamos diversos estimadores na classe dos métodos semiparamétricos para estimar, tanto o parâmetro de diferenciação d, bem como o de diferenciação sazonal D. Na classe paramétrica, apresentamos um método que estima todos os parâmetros do processo. Propomos nova metodologia de estimação para os parâmetros d e D, os chamados estimadores robustos. Introduzimos dois métodos de contaminação por outliers, o modelo multiparamétrico e a contaminação por mistura. Através de simulações de Monte Carlo, analisamos o comportamento dos estimadores das classes semiparamétrica e paramétrica para os parâmetros do processo SARFIMA. Nestas simulações, os processos são considerados com e sem contaminação por outliers do tipo aditivo e de inovação. Apresentamos o teste de verossimilhança para detectar e identificar outliers em processos SARFIMA. Desenvolvemos um estimador para a magnitude de outliers dos tipos aditivo e de inovação. Demonstramos que este estimadoré não viciado e normalmente distribuído. Realizamos a análise da série temporal dos níveis mensais do rio Nilo, em Aswan, com e sem contaminação por outliers do tipo aditivo. / In this work we analyze some processes with long memory and seasonality properties. The main goal is to study the k Factor GARMA p, λ, u, q and SARFIMA p, d, q P, D, Q s processes, where s is the seasonality. For the k Factor GARMA p, λ, u, q process, based on the knowledge of the Gegenbauer frequencies, we propose some estimators in the semiparame- tric class for the corresponding parameter λ. We present important results related to the spectral density function and to the coefficients of the autore- gressive and moving average infinite representations for these processes. In the study of the SARFIMA p, d, q P, D, Q s processes, we prove several properties, such as its spectral density function expression and its behavior near the seasonal frequencies, the stationarity, the intermediate and long memory, the autocovariance function and its asymptotic expres- sion. We also investigate necessary and sufficient conditions for the causality and the invertibility of SARFIMA processes. We analyze the ergodicity and we present the minimum mean squared error forecasting for these processes. We present several estimators in the semiparametric class to estimate both, the degree of differencing d and the seasonal differencing parameter D. In the parametric class, we introduce one method that estimates all the process parameters. We propose a new estimation methodology for the parameters d and D, based on robustness. We introduce two methods for outliers contami- nation, the so-called multi-parametric model and the mixing contamination. Through Monte Carlo simulations, we analyze the semiparametric and pa- rametric estimators behavior for the parameters of SARFIMA processes. In these simulations, the process is considered with and without contamination by addictive and innovation outliers. We present the likelihood test to detect and to identify outliers in SARFIMA processes. We also develop one estima- tor for the outlier’s magnitude for the addictive and innovation types. We show the unbiased property and normal distribution for this estimator. We carry out the analysis on the Nile River monthly flows at Aswan time series, with and without addictive outlier contamination.
|
28 |
Modelling Commodity Prices in The Australian National Electricity MarketThomas, Stuart John, stuart.thomas@rmit.edu.au January 2007 (has links)
Beginning in the early 1990s several countries, including Australia, have pursued programs of deregulation and restructuring of their electricity supply industries. Dissatisfaction with state-run monopoly suppliers and a desire for increased competition and choice for consumers have been the major motivations for reform. In Australia, the historical, vertically-integrated, government-owned electricity authorities were separated into separate generation, transmission, distribution and retail sectors in each State and a competitive, wholesale market for electricity, the National Electricity Market (NEM) began operation in December 1998. The goal of deregulation was (and remains) increased competition in electricity supply, so that consumers may enjoy wider choice and lower prices. The first benefit has largely been delivered but it is arguable whether the second benefit of lower prices has been realised. Increased competition has come at the price of increased wholesale price volatility, which brings with it increased cost as market participants seek to trade profitably and manage the increase in price risk. In the NEM, generators compete to sell into a pool market and distributors purchase electricity from the pool at prices determined by demand and supply, on a half-hourly basis. These market-clearing prices can be extremely volatile. Electricity prices are generally characterised by significant seasonal patterns, on an intra-day, weekly and monthly basis, as demand and supply conditions vary. Prices are also characterised by strong mean-reversion and extremely high spikes in price. While long-run mean prices typically range between $30 and $45 per megawatt hour, prices can spike to levels above $9,000 or $10,000 per megawatt hour from time to time. These spikes tend to be sporadic and very short-lived, rarely lasting for more than an hour or two. Although infrequent, spikes are the major contributor to price volatility and their evolution and causes need to be investigated and understood. The purpose of this thesis is to investigate and model Australian electricity prices. The research work presented is mostly empirical, with the early analytical chapters focusing on investigating the presence and significance of seasonal factors and spikes in electricity price and demand. In subsequent chapters this work is extended into analysis of the underlying volatility processes and the interaction between extreme values in demand and price is specifically investigated. The findings of the thesis are that while the characteristics of strong seasonal patterns and spikes that are generally observed in similar electricity markets are present in the NEM in both price and demand, there is significant variation in their presence and effect between the regional pools. The study also finds that while time-varying volatility is evident in the price series there is again some variation in the way this is characterised between states. A further finding challenges the accepted wisdom that demand peaks drive price spikes at the extremes and shows empirically that price spikes are more likely to be caused by supply disruptions than extremes of demand. The findings provide useful insight into this highly idiosyncratic but economically important national market.
|
29 |
A student's t filter for heavy tailed process and measurement noiseRoth, Michael, Ozkan, Emre, Gustafsson, Fredrik January 2013 (has links)
We consider the filtering problem in linear state space models with heavy tailed process and measurement noise. Our work is based on Student's t distribution, for which we give a number of useful results. The derived filtering algorithm is a generalization of the ubiquitous Kalman filter, and reduces to it as special case. Both Kalman filter and the new algorithm are compared on a challenging tracking example where a maneuvering target is observed in clutter. / MC Impulse
|
30 |
Analysis of outliers using graphical and quasi-Bayesian methods /Fung, Wing-kam, Tony. January 1987 (has links)
Thesis (Ph. D.)--University of Hong Kong, 1987.
|
Page generated in 0.0549 seconds