Global ETD Search

41	CLASSIFICATION OF ONE-DIMENSIONAL AND TWO-DIMENSIONAL SIGNALS Kanneganti, Raghuveer 01 August 2014 (has links) This dissertation focuses on the classification of one-dimensional and two-dimensional signals. The one-dimensional signal classification problem involves the classification of brain signals for identifying the emotional responses of human subjects under given drug conditions. A strategy is developed to accurately classify ERPs in order to identify human emotions based on brain reactivity to emotional, neutral, and cigarette-related stimuli in smokers. A multichannel spatio-temporal model is employed to overcome the curse of dimensionality that plagues the design of parametric multivariate classifiers for multi-channel ERPs. The strategy is tested on the ERPs of 156 smokers who participated in a smoking cessation program. One half of the subjects were given nicotine patches and the other half were given placebo patches. ERPs were collected from 29 channel in response to the presentation of the pictures with emotional (pleasant and unpleasant), neutral/boring, and cigarette-related content. It is shown that human emotions can be classified accurately and the results also show that smoking cessation causes a drop in the classification accuracies of emotions in the placebo group, but not in the nicotine patch group. Given that individual brain patterns were compared with group average brain patterns, the findings support the view that individuals tend to have similar brain reactions to different types of emotional stimuli. Overall, this new classification approach to identify differential brain responses to different emotional types could lead to new knowledge concerning brain mechanisms associated with emotions common to most or all people. This novel classification technique for identifying emotions in the present study suggests that smoking cessation without nicotine replacement results in poorer differentiation of brain responses to different emotional stimuli. Future, directions in this area would be to use these methods to assess individual differences in responses to emotional stimuli and to different drug treatments. Advantages of this and other brain-based assessment include temporal precision (e.g, 400-800 ms post stimulus), and the elimination of biases related to self-report measures. The two-dimensional signal classification problems include the detection of graphite in testing documents and the detection of fraudulent bubbles in test sheets. A strategy is developed to detect graphite responses in optical mark recognition (OMR) documents using inexpensive visible light scanners. The main challenge in the formulation of the strategy is that the detection should be invariant to the numerous background colors and artwork in typical optical mark recognition documents. A test document is modeled as a superposition of a graphite response image and a background image. The background image in turn is modeled as superposition of screening artwork, lines, and machine text components. A sequence of image processing operations and a pattern recognition algorithm are developed to estimate the graphite response image from a test document by systematically removing the components of the background image. The proposed strategy is tested on a wide range of scanned documents and it is shown that the estimated graphite response images are visually similar to those scanned by very expensive infra-red scanners currently employed for optical mark recognition. The robustness of the detection strategy is also demonstrated by testing a large number of simulated test documents. A procedure is also developed to autonomously determine if cheating has occurred by detecting the presence of aberrant responses in scanned OMR test books. The challenges introduced by the significant imbalance in the numbers of typical and aberrant bubbles were identified. The aberrant bubble detection problem is formulated as an outlier detection problem. A feature based outlier detection procedure in conjunction with a one-class SVM classifier is developed. A multi-criteria rank-of-rank-sum technique is introduced to rank and select a subset of features from a pool of candidate features. Using the data set of 11 individuals, it is shown that a detection accuracy of over 90% is possible. Experiments conducted on three real test books flagged for suspected cheating showed that the proposed strategy has the potential to be deployed in practice. dimensionality reduction electroencephalogram forensic analysis optical mark recognition outlier detection pattern recognition
42	Detecting Organizational Accounts from Twitter Based on Network and Behavioral Factors January 2017 (has links) abstract: With the rise of Online Social Networks (OSN) in the last decade, social network analysis has become a crucial research topic. The OSN graphs have unique properties that distinguish them from other types of graphs. In this thesis, five month Tweet corpus collected from Bangladesh - between June 2016 and October 2016 is analyzed, in order to detect accounts that belong to groups. These groups consist of official and non-official twitter handles of political organizations and NGOs in Bangladesh. A set of network, temporal, spatial and behavioral features are proposed to discriminate between accounts belonging to individual twitter users, news, groups and organization leaders. Finally, the experimental results are presented and a subset of relevant features is identified that lead to a generalizable model. Detection of tiny number of groups from large network is achieved with 0.8 precision, 0.75 recall and 0.77 F1 score. The domain independent network and behavioral features and models developed here are suitable for solving twitter account classification problem in any context. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Artificial intelligence Computer science Classification Feature mining Outlier detection Social Network Analysis Twitter
43	Spatio-Temporal Data Mining to Detect Changes and Clusters in Trajectories January 2012 (has links) abstract: With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories. / Dissertation/Thesis / Ph.D. Industrial Engineering 2012 Industrial engineering Computer science Change Detection Clustering Hidden Markov Models Outlier Detection Random Forests Trajectories
44	Técnica de aprendizado semissupervisionado para detecção de outliers / A semi-supervised technique for outlier detection Fabio Willian Zamoner 23 January 2014 (has links) Detecção de outliers desempenha um importante papel para descoberta de conhecimento em grandes bases de dados. O estudo é motivado por inúmeras aplicações reais como fraudes de cartões de crédito, detecção de falhas em componentes industriais, intrusão em redes de computadores, aprovação de empréstimos e monitoramento de condições médicas. Um outlier é definido como uma observação que desvia das outras observações em relação a uma medida e exerce considerável influência na análise de dados. Embora existam inúmeras técnicas de aprendizado de máquina para tratar desse problemas, a maioria delas não faz uso de conhecimento prévio sobre os dados. Técnicas de aprendizado semissupervisionado para detecção de outliers são relativamente novas e incluem apenas um pequeno número de rótulos da classe normal para construir um classificador. Recentemente um modelo semissupervisionado baseado em rede foi proposto para classificação de dados empregando um mecanismo de competição e cooperação de partículas. As partículas são responsáveis pela propagação dos rótulos para toda a rede. Neste trabalho, o modelo foi adaptado a fim de detectar outliers através da definição de um escore de outlier baseado na frequência de visitas. O número de visitas recebido por um outlier é significativamente diferente dos demais objetos de mesma classe. Essa abordagem leva a uma maneira não tradicional de tratar os outliers. Avaliações empíricas sobre bases artificiais e reais demonstram que a técnica proposta funciona bem para bases desbalanceadas e atinge precisão comparável às obtidas pelas técnicas tradicionais de detecção de outliers. Além disso, a técnica pode fornecer novas perspectivas sobre como diferenciar objetos, pois considera não somente a distância física, mas também a formação de padrão dos dados / Outloier detection plays an important role for discovering knowledge in large data sets. The study is motivated by plethora of real applications such as credit card frauds, fault detection in industrial components, network instrusion detection, loan application precoessing and medical condition monitoring. An outlier is defined as an observation that deviates from other observations with respect to a measure and exerts a substantial influence on data analysis. Although numerous machine learning techniques have been developed for attacking this problem, most of them work with no prior knowledge of the data. Semi-supervised outlier detection techniques are reçlatively new and include only a few labels of normal class for building a classifier. Recently, a network-based semi-supervised model was proposed for data clasification by employing a mechanism based on particle competiton and cooperation. Such particle competition and cooperaction. Such particles are responsible for label propagation throughout the network. In this work, we adapt this model by defining a new outlier score based on visit frequency counting. The number of visits received by an outlier is significantly different from the remaining objects. This approach leads to an anorthodox way to deal with outliers. Our empirical ecaluations on both real and simulated data sets demonstrate that proposed technique works well with unbalanced data sets and achieves a precision compared to traditional outlier detection techniques. Moreover, the technique might provide new insights into how to differentiate objects because it considers not only the physical distance but also the pattern formation of the data Aprendizado semisupervisionado Detecção de outliers Outlier detection Particle competition and cooperation Semi-supervised learning
45	Analyzing automatic cow recordings to detect the presence of outliers in feed intake data recorded from dairy cows in Lovsta farm Kogo, Gloria January 2016 (has links) Outliers are a major concern in data quality as it limits the reliability of any data. The objective of our investigation was to examine the presence and cause of outliers in the system for controlling and recording the feed intake of dairy cows in Lovsta farm, Uppsala Sweden. The analyses were made on data recorded as a timestamp of each visit of the cows to the feeding troughs from the period of August 2015 to January 2016. A three step methodology was applied to this data. The first step was fitting a mixed model to the data then the resulting residuals was used in the second step to fit a model based clustering for Gaussian mixture distribution which resulted in clusters of which 2.5% of the observations were in the outlier cluster. Finally, as the third step, a logistic regression was then fit modelling the presence of outliers versus the non-outlier clusters. It appeared that on early hours of the morning between 6am to 11.59am, there is a high possibility of recorded values to be outliers with odds ratio of 1.1227 and this is also the same time frame noted to have the least activity in feed consumption of the cows with a decrease of 0.027 kilograms as compared to the other timeframes. These findings provide a basis for further investigation to more specifically narrow down the causes of the outliers. Outlier detection Anomaly Feed Forage Silage Trough Other Computer and Information Science Annan data- och informationsvetenskap
46	Identificação de outliers em redes complexas baseado em caminhada aleatória / Outlier detection in complex networks based on random walk Bilzã Marques de Araújo 20 September 2010 (has links) Na natureza e na ciência, dados e informações que desviam significativamente da média frequentemente possuem grande relevância. Esses dados são usualmente denominados na literatura como outliers. A identificação de outliers é importante em muitas aplicações reais, tais como detecção de fraudes, diagnóstico de falhas, e monitoramento de condições médicas. Nos últimos anos tem-se testemunhado um grande interesse na área de Redes Complexas. Redes complexas são grafos de grande escala que possuem padrões de conexão não trivial, mostrando-se uma poderosa maneira de representação e abstração de dados. Embora um grande montante de resultados tenham sido reportados nesta área de pesquisa, pouco tem sido explorado acerca de detecção de outliers em redes complexas. Considerando-se a dinâmica de uma caminhada aleatória, foram propostos neste trabalho uma medida de distância e um método de ranqueamento de outliers. Através desta técnica, é possível detectar como outlier não somente nós periféricos, mas também nós centrais (hubs), depedendo da estrutura da rede. Também foi identificado que existem características bem definidas entre os nós outliers, relacionadas a funcionalidade dos mesmos para a rede. Além disso, foi descoberto que nós outliers têm papel importante para a rotulação a priori na tarefa de detecção de comunidades semi-supervisionada. Isto porque os nós centrais são bons difusores de informação e os nós periféricos encontram-se em regiões de borda de comunidade. Baseado nessa observação, foi proposto um método de detecção de comunidades semi-supervisionado. Os resultados de simulações mostram que essa abordagem é promissora / In nature and science, information and data that deviate significantly from the average value often have great relevance. These data are often called in literature as outliers. Outlier identification is important in many real applications, such as fraud detection, fault diagnosis, monitoring of medical conditions. In recent years, it has been witnessed a great interest in the area of Complex Networks. Complex networks are large-scale graphs with non-trivial connection patterns, proving to be a powerful way of data representation and abstraction. Although a large amount of results have been reported in this research area, little has been explored about the outlier detection in complex networks. Considering the dynamics of a random walk, we proposed in this paper a distance measure and a outlier ranking method. By using this technique, we can detect not only peripheral nodes, but also central nodes (hubs) as outliers, depending on the network structure. We also identified that there are well defined relationship between the outlier nodes and the functionality of the same nodes for the network. Furthermore, we found that outliers play an important role to label a priori nodes in the task of semi-supervised community detection. This is because the hubs are good information disseminators and peripheral nodes are usually localized in the regions of community edges. Based on this observation, we proposed a method of semi-supervised community detection. The simulation results show that this approach is promising Caminhada aleatória Identificação de outlies Redes complexas Complex networks Outlier detection Random walk
47	Caracterização de classes e detecção de outliers em redes complexa / Characterization of classes and outliers detection in complex networks Lilian Berton 25 April 2011 (has links) As redes complexas surgiram como uma nova e importante maneira de representação e abstração de dados capaz de capturar as relações espaciais, topológicas, funcionais, entre outras características presentes em muitas bases de dados. Dentre as várias abordagens para a análise de dados, destacam-se a classificação e a detecção de outliers. A classificação de dados permite atribuir uma classe aos dados, baseada nas características de seus atributos e a detecção de outliers busca por dados cujas características se diferem dos demais. Métodos de classificação de dados e de detecção de outliers baseados em redes complexas ainda são pouco estudados. Tendo em vista os benefícios proporcionados pelo uso de redes complexas na representação de dados, o presente trabalho apresenta o desenvolvimento de um método baseado em redes complexas para detecção de outliers que utiliza a caminhada aleatória e um índice de dissimilaridade. Este método possibilita a identificação de diferentes tipos de outliers usando a mesma medida. Dependendo da estrutura da rede, os vértices outliers podem ser tanto aqueles distantes do centro como os centrais, podem ser hubs ou vértices com poucas ligações. De um modo geral, a medida proposta é uma boa estimadora de vértices outliers em uma rede, identificando, de maneira adequada, vértices com uma estrutura diferenciada ou com uma função especial na rede. Foi proposta também uma técnica de construção de redes capaz de representar relações de similaridade entre classes de dados, baseada em uma função de energia que considera medidas de pureza e extensão da rede. Esta rede construída foi utilizada para caracterizar mistura entre classes de dados. A caracterização de classes é uma questão importante na classificação de dados, porém ainda é pouco explorada. Considera-se que o trabalho desenvolvido é uma das primeiras tentativas nesta direção / Complex networks have emerged as a new and important way of representation and data abstraction capable of capturing the spatial relationships, topological, functional, and other features present in many databases. Among the various approaches to data analysis, we highlight classification and outlier detection. Data classification allows to assign a class to the data based on characteristics of their attributes and outlier detection search for data whose characteristics differ from the others. Methods of data classification and outlier detection based on complex networks are still little studied. Given the benefits provided by the use of complex networks in data representation, this study developed a method based on complex networks to detect outliers based on random walk and on a dissimilarity index. The method allows the identification of different types of outliers using the same measure. Depending on the structure of the network, the vertices outliers can be either those distant from the center as the central, can be hubs or vertices with few connections. In general, the proposed measure is a good estimator of outlier vertices in a network, properly identifying vertices with a different structure or a special function in the network. We also propose a technique for building networks capable of representing similarity relationships between classes of data based on an energy function that considers measures of purity and extension of the network. This network was used to characterize mixing among data classes. Characterization of classes is an important issue in data classification, but it is little explored. We consider that this work is one of the first attempts in this direction Classsificação de dados Detecção de outliers Redes complexas Complex network Data classification Outlier detection
48	Avaliação e seleção de modelos em detecção não supervisionada de outliers / On the internal evaluation of unsupervised outlier detection Henrique Oliveira Marques 23 March 2015 (has links) A área de detecção de outliers (ou detecção de anomalias) possui um papel fundamental na descoberta de padrões em dados que podem ser considerados excepcionais sob alguma perspectiva. Uma importante distinção se dá entre as técnicas supervisionadas e não supervisionadas. O presente trabalho enfoca as técnicas de detecção não supervisionadas. Existem dezenas de algoritmos desta categoria na literatura, porém cada um deles utiliza uma intuição própria do que deve ser considerado um outlier ou não, que é naturalmente um conceito subjetivo. Isso dificulta sensivelmente a escolha de um algoritmo em particular e também a escolha de uma configuração adequada para o algoritmo escolhido em uma dada aplicação prática. Isso também torna altamente complexo avaliar a qualidade da solução obtida por um algoritmo/configuração em particular adotados pelo analista, especialmente em função da problemática de se definir uma medida de qualidade que não seja vinculada ao próprio critério utilizado pelo algoritmo. Tais questões estão inter-relacionadas e se referem respectivamente aos problemas de seleção de modelos e avaliação (ou validação) de resultados em aprendizado de máquina não supervisionado. Neste trabalho foi desenvolvido um índice pioneiro para avaliação não supervisionada de detecção de outliers. O índice, chamado IREOS (Internal, Relative Evaluation of Outlier Solutions), avalia e compara diferentes soluções (top-n, i.e., rotulações binárias) candidatas baseando-se apenas nas informações dos dados e nas próprias soluções a serem avaliadas. O índice também é ajustado estatisticamente para aleatoriedade e extensivamente avaliado em vários experimentos envolvendo diferentes coleções de bases de dados sintéticas e reais. / Outlier detection (or anomaly detection) plays an important role in the pattern discovery from data that can be considered exceptional in some sense. An important distinction is that between the supervised and unsupervised techniques. In this work we focus on unsupervised outlier detection techniques. There are dozens of algorithms of this category in literature, however, each of these algorithms uses its own intuition to judge what should be considered an outlier or not, which naturally is a subjective concept. This substantially complicates the selection of a particular algorithm and also the choice of an appropriate configuration of parameters for a given algorithm in a practical application. This also makes it highly complex to evaluate the quality of the solution obtained by an algorithm or configuration adopted by the analyst, especially in light of the problem of defining a measure of quality that is not hooked on the criterion used by the algorithm itself. These issues are interrelated and refer respectively to the problems of model selection and evaluation (or validation) of results in unsupervised learning. Here we developed a pioneer index for unsupervised evaluation of outlier detection results. The index, called IREOS (Internal, Relative Evaluation of Outlier Solutions), can evaluate and compare different candidate (top-n, i.e., binary labelings) solutions based only upon the data information and the solution to be evaluated. The index is also statistically adjusted for chance and extensively evaluated in several experiments involving different collections of synthetic and real data sets. Avaliação não supervisionada Detecção de outliers Seleção de modelos Validação Internal evaluation Models selection Outlier detection Validation
49	Interactive Anomaly Detection With Reduced Expert Effort Cheng, Lingyun, Sundaresh, Sadhana January 2020 (has links) In several applications, when anomalies are detected, human experts have to investigate or verify them one by one. As they investigate, they unwittingly produce a label - true positive (TP) or false positive (FP). In this thesis, we propose two methods (PAD and Clustering-based OMD/OJRank) that exploit this label feedback to minimize the FP rate and detect more relevant anomalies, while minimizing the expert effort required to investigate them. These two methods iteratively suggest the top-1 anomalous instance to a human expert and receive feedback. Before suggesting the next anomaly, the methods re-ranks instances so that the top anomalous instances are similar to the TP instances and dissimilar to the FP instances. This is achieved by learning to score anomalies differently in various regions of the feature space (OMD-Clustering) and by learning to score anomalies based on the distance to the real anomalies (PAD). An experimental evaluation on several real-world datasets is conducted. The results show that OMD-Clustering achieves statistically significant improvement in both detection precision and expert effort compared to state-of-the-art interactive anomaly detection methods. PAD reduces expert effort but there was no improvement in detection precision compared to state-of-the-art methods. We submitted a paper based on the work presented in this thesis, to the ECML/PKDD Workshop on "IoT Stream for Data Driven Predictive Maintenance". Interactive Anomaly Detection Outlier Detection User Feedback Expert Effort Engineering and Technology Teknik och teknologier
50	Data Modeling for Outlier Detection Abghari, Shahrooz January 2018 (has links) This thesis explores the data modeling for outlier detection techniques in three different application domains: maritime surveillance, district heating, and online media and sequence datasets. The proposed models are evaluated and validated under different experimental scenarios, taking into account specific characteristics and setups of the different domains. Outlier detection has been studied and applied in many domains. Outliers arise due to different reasons such as fraudulent activities, structural defects, health problems, and mechanical issues. The detection of outliers is a challenging task that can reveal system faults, fraud, and save people's lives. Outlier detection techniques are often domain-specific. The main challenge in outlier detection relates to modeling the normal behavior in order to identify abnormalities. The choice of model is important, i.e., an incorrect choice of data model can lead to poor results. This requires a good understanding and interpretation of the data, the constraints, and the requirements of the problem domain. Outlier detection is largely an unsupervised problem due to unavailability of labeled data and the fact that labeled data is expensive. We have studied and applied a combination of both machine learning and data mining techniques to build data-driven and domain-oriented outlier detection models. We have shown the importance of data preprocessing as well as feature selection in building suitable methods for data modeling. We have taken advantage of both supervised and unsupervised techniques to create hybrid methods. For example, we have proposed a rule-based outlier detection system based on open data for the maritime surveillance domain. Furthermore, we have combined cluster analysis and regression to identify manual changes in the heating systems at the building level. Sequential pattern mining for identifying contextual and collective outliers in online media data have also been exploited. In addition, we have proposed a minimum spanning tree clustering technique for detection of groups of outliers in online media and sequence data. The proposed models have been shown to be capable of explaining the underlying properties of the detected outliers. This can facilitate domain experts in narrowing down the scope of analysis and understanding the reasons of such anomalous behaviors. We have also investigated the reproducibility of the proposed models in similar application domains. / Scalable resource-efficient systems for big data analytics data modeling cluster analysis stream data outlier detection Computer Sciences Datavetenskap (datalogi)

Search results