Global ETD Search

71	Uma arquitetura de software para descoberta de regras de associação multidimensional, multinível e de outliers em cubos OLAP: um estudo de caso com os algoritmos APriori e FPGrowth Moreira Tanuro, Carla 31 January 2010 (has links) Made available in DSpace on 2014-06-12T15:55:26Z (GMT). No. of bitstreams: 2 arquivo2236_1.pdf: 2979608 bytes, checksum: 3c3ed256a9de67bd5b716bb15d15cb6c (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2010 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / O processo tradicional de descoberta de conhecimento em bases de dados (KDD Knowledge Discovery in Databases) não contempla etapas de processamento multidimensional e multinível (i.e., processamento OLAP - OnLine Analytical Processing) para minerar cubos de dados. Por conseqüência, a maioria das abordagens de OLAM (OLAP Mining) propõe adaptações no algoritmo minerador. Dado que esta abordagem provê uma solução fortemente acoplada ao algoritmo minerador, ela impede que as adaptações para mineração multidimensional e multinível sejam utilizadas com outros algoritmos. Além disto, grande parte das propostas de OLAM para regras de associação não considera o uso de um servidor OLAP e não tira proveito de todo o potencial multidimensional e multinível presentes nos cubos OLAP. Por estes motivos, algum retrabalho (e.g., re-implementação de operações OLAP) é realizado e padrões possivelmente fortes decorrentes de generalizações não são identificados. Diante desse cenário, este trabalho propõe a arquitetura DOLAM (Decoupled OLAM) para mineração desacoplada de regras de associação multidimensional, multinível e de outliers em cubos OLAP. A arquitetura DOLAM deve ser inserida no processo de KDD (Knowledge Discovery in Databases) como uma etapa de processamento que fica entre as etapas de Pré-Processamento e Transformação de Dados. A arquitetura DOLAM define e implementa três componentes: 1) Detector de Outliers, 2) Explorador de Subcubos e 3) Expansor de Ancestrais. A partir de uma consulta do usuário, estes componentes são capazes de, respectivamente: 1) identificar ruídos significativos nas células do resultado; 2) explorar, recursivamente, todas as células do resultado, de forma a contemplar todas as possibilidades de combinações multidimensional e multinível e 3) recuperar todos os antecessores (generalizações) das células do resultado. O componente central da arquitetura é o Expansor de Ancestrais - o único de uso obrigatório. Ressalta-se que, a partir desses componentes, o processamento OLAM fica desacoplado do algoritmo minerador e permite realizar descobertas mais abrangentes, as quais, por conseqüência, podem retornar padrões potencialmente mais fortes. Como prova de conceito, foi realizado um estudo de caso com dados reais de uma empresa de micro-crédito. O estudo de caso foi implementado em Java, fez uso do servidor OLAP Mondrian e utilizou as implementações dos algoritmos para mineração de regras de associação APriori e FP-Growth do pacote de software Weka OLAP Mineração de dados KDD OLAM Regras de associação APriori FP-growth Mineração multidimensional Mineração multinível Outlier
72	Técnica de aprendizado semissupervisionado para detecção de outliers / A semi-supervised technique for outlier detection Fabio Willian Zamoner 23 January 2014 (has links) Detecção de outliers desempenha um importante papel para descoberta de conhecimento em grandes bases de dados. O estudo é motivado por inúmeras aplicações reais como fraudes de cartões de crédito, detecção de falhas em componentes industriais, intrusão em redes de computadores, aprovação de empréstimos e monitoramento de condições médicas. Um outlier é definido como uma observação que desvia das outras observações em relação a uma medida e exerce considerável influência na análise de dados. Embora existam inúmeras técnicas de aprendizado de máquina para tratar desse problemas, a maioria delas não faz uso de conhecimento prévio sobre os dados. Técnicas de aprendizado semissupervisionado para detecção de outliers são relativamente novas e incluem apenas um pequeno número de rótulos da classe normal para construir um classificador. Recentemente um modelo semissupervisionado baseado em rede foi proposto para classificação de dados empregando um mecanismo de competição e cooperação de partículas. As partículas são responsáveis pela propagação dos rótulos para toda a rede. Neste trabalho, o modelo foi adaptado a fim de detectar outliers através da definição de um escore de outlier baseado na frequência de visitas. O número de visitas recebido por um outlier é significativamente diferente dos demais objetos de mesma classe. Essa abordagem leva a uma maneira não tradicional de tratar os outliers. Avaliações empíricas sobre bases artificiais e reais demonstram que a técnica proposta funciona bem para bases desbalanceadas e atinge precisão comparável às obtidas pelas técnicas tradicionais de detecção de outliers. Além disso, a técnica pode fornecer novas perspectivas sobre como diferenciar objetos, pois considera não somente a distância física, mas também a formação de padrão dos dados / Outloier detection plays an important role for discovering knowledge in large data sets. The study is motivated by plethora of real applications such as credit card frauds, fault detection in industrial components, network instrusion detection, loan application precoessing and medical condition monitoring. An outlier is defined as an observation that deviates from other observations with respect to a measure and exerts a substantial influence on data analysis. Although numerous machine learning techniques have been developed for attacking this problem, most of them work with no prior knowledge of the data. Semi-supervised outlier detection techniques are reçlatively new and include only a few labels of normal class for building a classifier. Recently, a network-based semi-supervised model was proposed for data clasification by employing a mechanism based on particle competiton and cooperation. Such particle competition and cooperaction. Such particles are responsible for label propagation throughout the network. In this work, we adapt this model by defining a new outlier score based on visit frequency counting. The number of visits received by an outlier is significantly different from the remaining objects. This approach leads to an anorthodox way to deal with outliers. Our empirical ecaluations on both real and simulated data sets demonstrate that proposed technique works well with unbalanced data sets and achieves a precision compared to traditional outlier detection techniques. Moreover, the technique might provide new insights into how to differentiate objects because it considers not only the physical distance but also the pattern formation of the data Aprendizado semisupervisionado Detecção de outliers Outlier detection Particle competition and cooperation Semi-supervised learning
73	Analyzing automatic cow recordings to detect the presence of outliers in feed intake data recorded from dairy cows in Lovsta farm Kogo, Gloria January 2016 (has links) Outliers are a major concern in data quality as it limits the reliability of any data. The objective of our investigation was to examine the presence and cause of outliers in the system for controlling and recording the feed intake of dairy cows in Lovsta farm, Uppsala Sweden. The analyses were made on data recorded as a timestamp of each visit of the cows to the feeding troughs from the period of August 2015 to January 2016. A three step methodology was applied to this data. The first step was fitting a mixed model to the data then the resulting residuals was used in the second step to fit a model based clustering for Gaussian mixture distribution which resulted in clusters of which 2.5% of the observations were in the outlier cluster. Finally, as the third step, a logistic regression was then fit modelling the presence of outliers versus the non-outlier clusters. It appeared that on early hours of the morning between 6am to 11.59am, there is a high possibility of recorded values to be outliers with odds ratio of 1.1227 and this is also the same time frame noted to have the least activity in feed consumption of the cows with a decrease of 0.027 kilograms as compared to the other timeframes. These findings provide a basis for further investigation to more specifically narrow down the causes of the outliers. Outlier detection Anomaly Feed Forage Silage Trough Other Computer and Information Science Annan data- och informationsvetenskap
74	Identificação de outliers em redes complexas baseado em caminhada aleatória / Outlier detection in complex networks based on random walk Bilzã Marques de Araújo 20 September 2010 (has links) Na natureza e na ciência, dados e informações que desviam significativamente da média frequentemente possuem grande relevância. Esses dados são usualmente denominados na literatura como outliers. A identificação de outliers é importante em muitas aplicações reais, tais como detecção de fraudes, diagnóstico de falhas, e monitoramento de condições médicas. Nos últimos anos tem-se testemunhado um grande interesse na área de Redes Complexas. Redes complexas são grafos de grande escala que possuem padrões de conexão não trivial, mostrando-se uma poderosa maneira de representação e abstração de dados. Embora um grande montante de resultados tenham sido reportados nesta área de pesquisa, pouco tem sido explorado acerca de detecção de outliers em redes complexas. Considerando-se a dinâmica de uma caminhada aleatória, foram propostos neste trabalho uma medida de distância e um método de ranqueamento de outliers. Através desta técnica, é possível detectar como outlier não somente nós periféricos, mas também nós centrais (hubs), depedendo da estrutura da rede. Também foi identificado que existem características bem definidas entre os nós outliers, relacionadas a funcionalidade dos mesmos para a rede. Além disso, foi descoberto que nós outliers têm papel importante para a rotulação a priori na tarefa de detecção de comunidades semi-supervisionada. Isto porque os nós centrais são bons difusores de informação e os nós periféricos encontram-se em regiões de borda de comunidade. Baseado nessa observação, foi proposto um método de detecção de comunidades semi-supervisionado. Os resultados de simulações mostram que essa abordagem é promissora / In nature and science, information and data that deviate significantly from the average value often have great relevance. These data are often called in literature as outliers. Outlier identification is important in many real applications, such as fraud detection, fault diagnosis, monitoring of medical conditions. In recent years, it has been witnessed a great interest in the area of Complex Networks. Complex networks are large-scale graphs with non-trivial connection patterns, proving to be a powerful way of data representation and abstraction. Although a large amount of results have been reported in this research area, little has been explored about the outlier detection in complex networks. Considering the dynamics of a random walk, we proposed in this paper a distance measure and a outlier ranking method. By using this technique, we can detect not only peripheral nodes, but also central nodes (hubs) as outliers, depending on the network structure. We also identified that there are well defined relationship between the outlier nodes and the functionality of the same nodes for the network. Furthermore, we found that outliers play an important role to label a priori nodes in the task of semi-supervised community detection. This is because the hubs are good information disseminators and peripheral nodes are usually localized in the regions of community edges. Based on this observation, we proposed a method of semi-supervised community detection. The simulation results show that this approach is promising Caminhada aleatória Identificação de outlies Redes complexas Complex networks Outlier detection Random walk
75	Caracterização de classes e detecção de outliers em redes complexa / Characterization of classes and outliers detection in complex networks Lilian Berton 25 April 2011 (has links) As redes complexas surgiram como uma nova e importante maneira de representação e abstração de dados capaz de capturar as relações espaciais, topológicas, funcionais, entre outras características presentes em muitas bases de dados. Dentre as várias abordagens para a análise de dados, destacam-se a classificação e a detecção de outliers. A classificação de dados permite atribuir uma classe aos dados, baseada nas características de seus atributos e a detecção de outliers busca por dados cujas características se diferem dos demais. Métodos de classificação de dados e de detecção de outliers baseados em redes complexas ainda são pouco estudados. Tendo em vista os benefícios proporcionados pelo uso de redes complexas na representação de dados, o presente trabalho apresenta o desenvolvimento de um método baseado em redes complexas para detecção de outliers que utiliza a caminhada aleatória e um índice de dissimilaridade. Este método possibilita a identificação de diferentes tipos de outliers usando a mesma medida. Dependendo da estrutura da rede, os vértices outliers podem ser tanto aqueles distantes do centro como os centrais, podem ser hubs ou vértices com poucas ligações. De um modo geral, a medida proposta é uma boa estimadora de vértices outliers em uma rede, identificando, de maneira adequada, vértices com uma estrutura diferenciada ou com uma função especial na rede. Foi proposta também uma técnica de construção de redes capaz de representar relações de similaridade entre classes de dados, baseada em uma função de energia que considera medidas de pureza e extensão da rede. Esta rede construída foi utilizada para caracterizar mistura entre classes de dados. A caracterização de classes é uma questão importante na classificação de dados, porém ainda é pouco explorada. Considera-se que o trabalho desenvolvido é uma das primeiras tentativas nesta direção / Complex networks have emerged as a new and important way of representation and data abstraction capable of capturing the spatial relationships, topological, functional, and other features present in many databases. Among the various approaches to data analysis, we highlight classification and outlier detection. Data classification allows to assign a class to the data based on characteristics of their attributes and outlier detection search for data whose characteristics differ from the others. Methods of data classification and outlier detection based on complex networks are still little studied. Given the benefits provided by the use of complex networks in data representation, this study developed a method based on complex networks to detect outliers based on random walk and on a dissimilarity index. The method allows the identification of different types of outliers using the same measure. Depending on the structure of the network, the vertices outliers can be either those distant from the center as the central, can be hubs or vertices with few connections. In general, the proposed measure is a good estimator of outlier vertices in a network, properly identifying vertices with a different structure or a special function in the network. We also propose a technique for building networks capable of representing similarity relationships between classes of data based on an energy function that considers measures of purity and extension of the network. This network was used to characterize mixing among data classes. Characterization of classes is an important issue in data classification, but it is little explored. We consider that this work is one of the first attempts in this direction Classsificação de dados Detecção de outliers Redes complexas Complex network Data classification Outlier detection
76	Avaliação e seleção de modelos em detecção não supervisionada de outliers / On the internal evaluation of unsupervised outlier detection Henrique Oliveira Marques 23 March 2015 (has links) A área de detecção de outliers (ou detecção de anomalias) possui um papel fundamental na descoberta de padrões em dados que podem ser considerados excepcionais sob alguma perspectiva. Uma importante distinção se dá entre as técnicas supervisionadas e não supervisionadas. O presente trabalho enfoca as técnicas de detecção não supervisionadas. Existem dezenas de algoritmos desta categoria na literatura, porém cada um deles utiliza uma intuição própria do que deve ser considerado um outlier ou não, que é naturalmente um conceito subjetivo. Isso dificulta sensivelmente a escolha de um algoritmo em particular e também a escolha de uma configuração adequada para o algoritmo escolhido em uma dada aplicação prática. Isso também torna altamente complexo avaliar a qualidade da solução obtida por um algoritmo/configuração em particular adotados pelo analista, especialmente em função da problemática de se definir uma medida de qualidade que não seja vinculada ao próprio critério utilizado pelo algoritmo. Tais questões estão inter-relacionadas e se referem respectivamente aos problemas de seleção de modelos e avaliação (ou validação) de resultados em aprendizado de máquina não supervisionado. Neste trabalho foi desenvolvido um índice pioneiro para avaliação não supervisionada de detecção de outliers. O índice, chamado IREOS (Internal, Relative Evaluation of Outlier Solutions), avalia e compara diferentes soluções (top-n, i.e., rotulações binárias) candidatas baseando-se apenas nas informações dos dados e nas próprias soluções a serem avaliadas. O índice também é ajustado estatisticamente para aleatoriedade e extensivamente avaliado em vários experimentos envolvendo diferentes coleções de bases de dados sintéticas e reais. / Outlier detection (or anomaly detection) plays an important role in the pattern discovery from data that can be considered exceptional in some sense. An important distinction is that between the supervised and unsupervised techniques. In this work we focus on unsupervised outlier detection techniques. There are dozens of algorithms of this category in literature, however, each of these algorithms uses its own intuition to judge what should be considered an outlier or not, which naturally is a subjective concept. This substantially complicates the selection of a particular algorithm and also the choice of an appropriate configuration of parameters for a given algorithm in a practical application. This also makes it highly complex to evaluate the quality of the solution obtained by an algorithm or configuration adopted by the analyst, especially in light of the problem of defining a measure of quality that is not hooked on the criterion used by the algorithm itself. These issues are interrelated and refer respectively to the problems of model selection and evaluation (or validation) of results in unsupervised learning. Here we developed a pioneer index for unsupervised evaluation of outlier detection results. The index, called IREOS (Internal, Relative Evaluation of Outlier Solutions), can evaluate and compare different candidate (top-n, i.e., binary labelings) solutions based only upon the data information and the solution to be evaluated. The index is also statistically adjusted for chance and extensively evaluated in several experiments involving different collections of synthetic and real data sets. Avaliação não supervisionada Detecção de outliers Seleção de modelos Validação Internal evaluation Models selection Outlier detection Validation
77	Interactive Anomaly Detection With Reduced Expert Effort Cheng, Lingyun, Sundaresh, Sadhana January 2020 (has links) In several applications, when anomalies are detected, human experts have to investigate or verify them one by one. As they investigate, they unwittingly produce a label - true positive (TP) or false positive (FP). In this thesis, we propose two methods (PAD and Clustering-based OMD/OJRank) that exploit this label feedback to minimize the FP rate and detect more relevant anomalies, while minimizing the expert effort required to investigate them. These two methods iteratively suggest the top-1 anomalous instance to a human expert and receive feedback. Before suggesting the next anomaly, the methods re-ranks instances so that the top anomalous instances are similar to the TP instances and dissimilar to the FP instances. This is achieved by learning to score anomalies differently in various regions of the feature space (OMD-Clustering) and by learning to score anomalies based on the distance to the real anomalies (PAD). An experimental evaluation on several real-world datasets is conducted. The results show that OMD-Clustering achieves statistically significant improvement in both detection precision and expert effort compared to state-of-the-art interactive anomaly detection methods. PAD reduces expert effort but there was no improvement in detection precision compared to state-of-the-art methods. We submitted a paper based on the work presented in this thesis, to the ECML/PKDD Workshop on "IoT Stream for Data Driven Predictive Maintenance". Interactive Anomaly Detection Outlier Detection User Feedback Expert Effort Engineering and Technology Teknik och teknologier
78	Machine Learning for Stellar Spectra : Anomaly Detection in stellar spectra using Unsupervised Random ForestSpectral Analysis using Variational Autoencoders Paranjape, Mihir January 2021 (has links) This thesis was carried out in two parts. The stellar spectral data was used from the Gaia-ESO survey. The data used was fromthe public archive as well as data received from Dr. Recio-Blanco at Observatoire Cote D'Azure. 1) I performed anomaly detection using unsupervised random forests, by applying the concept of weirdness scores to identify outliers. 2) Using spectral data along with physical parameters of objects in the galactic bulge of the Gaia-ESO survey, I built a variational autoencoder neural network to reconstruct stellar spectra and explore latent features learning physical parameters by themselves. Stellar spectra outlier VAE neural network machine learning Gaia-ESO Astronomy, Astrophysics and Cosmology Astronomi, astrofysik och kosmologi
79	Data Modeling for Outlier Detection Abghari, Shahrooz January 2018 (has links) This thesis explores the data modeling for outlier detection techniques in three different application domains: maritime surveillance, district heating, and online media and sequence datasets. The proposed models are evaluated and validated under different experimental scenarios, taking into account specific characteristics and setups of the different domains. Outlier detection has been studied and applied in many domains. Outliers arise due to different reasons such as fraudulent activities, structural defects, health problems, and mechanical issues. The detection of outliers is a challenging task that can reveal system faults, fraud, and save people's lives. Outlier detection techniques are often domain-specific. The main challenge in outlier detection relates to modeling the normal behavior in order to identify abnormalities. The choice of model is important, i.e., an incorrect choice of data model can lead to poor results. This requires a good understanding and interpretation of the data, the constraints, and the requirements of the problem domain. Outlier detection is largely an unsupervised problem due to unavailability of labeled data and the fact that labeled data is expensive. We have studied and applied a combination of both machine learning and data mining techniques to build data-driven and domain-oriented outlier detection models. We have shown the importance of data preprocessing as well as feature selection in building suitable methods for data modeling. We have taken advantage of both supervised and unsupervised techniques to create hybrid methods. For example, we have proposed a rule-based outlier detection system based on open data for the maritime surveillance domain. Furthermore, we have combined cluster analysis and regression to identify manual changes in the heating systems at the building level. Sequential pattern mining for identifying contextual and collective outliers in online media data have also been exploited. In addition, we have proposed a minimum spanning tree clustering technique for detection of groups of outliers in online media and sequence data. The proposed models have been shown to be capable of explaining the underlying properties of the detected outliers. This can facilitate domain experts in narrowing down the scope of analysis and understanding the reasons of such anomalous behaviors. We have also investigated the reproducibility of the proposed models in similar application domains. / Scalable resource-efficient systems for big data analytics data modeling cluster analysis stream data outlier detection Computer Sciences Datavetenskap (datalogi)
80	Data mining techniques for modeling the operating behaviors of smart building control valve systems Eghbalian, Amirmohammad January 2020 (has links) Background. One of the challenges about smart control valves system is processing and analyzing sensors data to extract useful information. These types of information can be used to detect the deviating behaviors which can be an indication of faults and issues in the system. Outlier detection is a process in which we try to find these deviating behaviors that occur in the system.Objectives. First, perform a literature review to get an insight about the machine learning (ML) and data mining (DM) techniques that can be applied to extract patternfrom time-series data. Next, model the operating behaviors of the control valve system using appropriate machine learning and data mining techniques. Finally,evaluate the proposed behavioral models on real world data.Methods. To have a better understanding of the different ML and DM techniques for extracting patterns from time-series data and fault detection and diagnosis of building systems, literature review is conducted. Later on, an unsupervised learning approach is proposed for modeling the typical operating behaviors and detecting the deviating operating behaviors of the control valve system. Additionally, the proposed method provides supplementary information for domain experts to help them in their analysis.Results. The outcome from modeling and monitoring the operating behaviors ofthe control valve system are analyzed. The evaluation of the results by the domain experts indicates that the method is capable of detecting deviating or unseen operating behaviors of the system. Moreover, the proposed method provides additional useful information to have a better understanding of the obtained results.Conclusions. The main goal in this study was achieved by proposing a method that can model the typical operating behaviors of the control valve system. The generated model can be used to monitor the newly arrived daily measurements and detect the deviating or unseen operating behaviors of the control valve system. Also, it provides supplementary information that can help domain experts to facilitate and reduce the time of analysis. Machine learning Data mining Outlier detection Time-series HVAC&R Computer Sciences Datavetenskap (datalogi)

Search results