Spelling suggestions: "subject:"novelty detection"" "subject:"novelty 1detection""
1 |
Featured anomaly detection methods and applicationsHuang, Chengqiang January 2018 (has links)
Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks.
|
2 |
Novelty Detection by Latent Semantic IndexingZhang, Xueshan January 2013 (has links)
As a new topic in text mining, novelty detection is a natural extension of information retrieval systems, or search engines. Aiming at refining raw search results by filtering out old news and saving only the novel messages, it saves modern people from the nightmare of information overload. One of the difficulties in novelty detection is the inherent ambiguity of language, which is the carrier of information. Among the sources of ambiguity, synonymy proves to be a notable factor. To address this issue, previous studies mainly employed WordNet, a lexical database which can be perceived as a thesaurus. Rather than borrowing a dictionary, we proposed a statistical approach employing Latent Semantic Indexing (LSI) to learn semantic relationship automatically with the help of language resources.
To apply LSI which involves matrix factorization, an immediate problem is that the dataset in novelty detection is dynamic and changing constantly. As an imitation of real-world scenario, texts are ranked in chronological order and examined one by one. Each text is only compared with those having appeared earlier, while later ones remain unknown. As a result, the data matrix starts as a one-row vector representing the first report, and has a new row added at the bottom every time we read a new document. Such a changing dataset makes it hard to employ matrix methods directly. Although LSI has long been acknowledged as an effective text mining method when considering semantic structure, it has never been used in novelty detection, nor have other statistical treatments. We tried to change this situation by introducing external text source to build the latent semantic space, onto which the incoming news vectors were projected.
We used the Reuters-21578 dataset and the TREC data as sources of latent semantic information. Topics were divided into years and types in order to take the differences between them into account. Results showed that LSI, though very effective in traditional information retrieval tasks, had only a slight improvement to the performances for some data types. The extent of improvement depended on the similarity between news data and external information. A probing into the co-occurrence matrix attributed such a limited performance to the unique features of microblogs. Their short sentence lengths and restricted dictionary made it very hard to recover and exploit latent semantic information via traditional data structure.
|
3 |
Novelty Detection by Latent Semantic IndexingZhang, Xueshan January 2013 (has links)
As a new topic in text mining, novelty detection is a natural extension of information retrieval systems, or search engines. Aiming at refining raw search results by filtering out old news and saving only the novel messages, it saves modern people from the nightmare of information overload. One of the difficulties in novelty detection is the inherent ambiguity of language, which is the carrier of information. Among the sources of ambiguity, synonymy proves to be a notable factor. To address this issue, previous studies mainly employed WordNet, a lexical database which can be perceived as a thesaurus. Rather than borrowing a dictionary, we proposed a statistical approach employing Latent Semantic Indexing (LSI) to learn semantic relationship automatically with the help of language resources.
To apply LSI which involves matrix factorization, an immediate problem is that the dataset in novelty detection is dynamic and changing constantly. As an imitation of real-world scenario, texts are ranked in chronological order and examined one by one. Each text is only compared with those having appeared earlier, while later ones remain unknown. As a result, the data matrix starts as a one-row vector representing the first report, and has a new row added at the bottom every time we read a new document. Such a changing dataset makes it hard to employ matrix methods directly. Although LSI has long been acknowledged as an effective text mining method when considering semantic structure, it has never been used in novelty detection, nor have other statistical treatments. We tried to change this situation by introducing external text source to build the latent semantic space, onto which the incoming news vectors were projected.
We used the Reuters-21578 dataset and the TREC data as sources of latent semantic information. Topics were divided into years and types in order to take the differences between them into account. Results showed that LSI, though very effective in traditional information retrieval tasks, had only a slight improvement to the performances for some data types. The extent of improvement depended on the similarity between news data and external information. A probing into the co-occurrence matrix attributed such a limited performance to the unique features of microblogs. Their short sentence lengths and restricted dictionary made it very hard to recover and exploit latent semantic information via traditional data structure.
|
4 |
Data fusion models for detection of vital-sign deterioration in acutely ill patientsKhalid, Sara January 2014 (has links)
Vital signs can indicate patient deterioration prior to adverse events such as cardiac arrest, emergency admission to the intensive care unit (ICU), or death. However, many adverse events occur in wards outside the ICU where the level of care and the frequency of patient monitoring are lower than in the ICU. This thesis describes models for detection of deterioration in acutely ill patients in two environments: a step-down unit in which patients recovering from an ICU stay are continuously monitored, and a general ward where patients are intermittently monitored following upper gastrointestinal cancer surgery. Existing data fusion models for classification of vital signs depend on a threshold which defines a “region of normality”. Bradypnoea (low breathing rate) and bradycardia (low heart rate) are relatively rare, and so these two types of abnormalities tend to be misclassified by existing methods. In this thesis, techniques for selecting a threshold are described, such that the classification of vital-sign data is improved. In particular, the proposed approach reduces the misclassification of bradycardia and bradypnoea events, and indicates the type of abnormality associated with the deterioration in a patient’s vital signs. Patients recovering from upper gastrointestinal (GI) surgery have a high risk of emergency admission to the ICU. At present in the UK, most intermediate and general wards outside the ICU depend on intermittent, manual monitoring using track-and-trigger systems. Both manual and automated patient monitoring systems are reported to have high false alert rates. The models described in this thesis take into account the low monitoring frequency in the upper GI ward, such that the false alert rate is reduced. In addition to accuracy, early detection of deterioration is a highly desirable feature in patient monitoring systems. The models proposed in this thesis generate alerts for patients earlier than the early warning systems which are currently in use in hospitals in the UK. The improvements to existing models proposed in this thesis could be applied to continuous and intermittently acquired vital-sign data from other clinical environments.
|
5 |
Detecção de novidade em fluxos contínuos de dados multiclasse / Novelty detection in multiclass data streamsPaiva, Elaine Ribeiro de Faria 08 May 2014 (has links)
Mineração de fluxos contínuos de dados é uma área de pesquisa emergente que visa extrair conhecimento a partir de grandes quantidades de dados, gerados continuamente. Detecção de novidade é uma tarefa de classificação que consiste em reconhecer que um exemplo ou conjunto de exemplos em um fluxo de dados diferem significativamente dos exemplos vistos anteriormente. Essa é uma importante tarefa para fluxos contínuos de dados, principalmente porque novos conceitos podem aparecer, desaparecer ou evoluir ao longo do tempo. A maioria dos trabalhos da literatura apresentam a detecção de novidade como uma tarefa de classificação binária. Poucos trabalhos tratam essa tarefa como multiclasse, mas usam medidas de avaliação binária. Em vários problemas, o correto seria tratar a detecção de novidade em fluxos contínuos de dados como uma tarefa multiclasse, no qual o conceito conhecido do problema é formado por uma ou mais classes, e diferentes novas classes podem aparecer ao longo do tempo. Esta tese propõe um novo algoritmo MINAS para detecção de novidade em fluxos contínuos de dados. MINAS considera que a detecção de novidade é uma tarefa multiclasse. Na fase de treinamento, MINAS constrói um modelo de decisão com base em um conjunto de exemplos rotulados. Na fase de aplicação, novos exemplos são classificados usando o modelo de decisão atual, ou marcados como desconhecidos. Grupos de exemplos desconhecidos podem formar padrões-novidade válidos, que são então adicionados ao modelo de decisão. O modelo de decisão é atualizado ao longo do fluxo a fim de refletir mudanças nas classes conhecidas e permitir inserção de padrões-novidade. Esta tese também propõe uma nova metodologia para avaliação de algoritmos para detecção de novidade em fluxos contínuos de dados. Essa metodologia associa os padrões-novidade não rotulados às classes reais do problema, permitindo assim avaliar a matriz de confusão que é incremental e retangular. Além disso, a metodologia de avaliação propõe avaliar os exemplos desconhecidos separadamente e utilizar medidas de avaliação multiclasse. Por último, esta tese apresenta uma série de experimentos executados usando o MINAS e os principais algoritmos da literatura em bases de dados artificiais e reais. Além disso, o MINAS foi aplicado a um problema real, que consiste no reconhecimento de atividades humanas usando dados de acelerômetro. Os resultados experimentais mostram o potencial do algoritmo e da metodologia propostos / Data stream mining is an emergent research area that aims to extract knowledge from large amounts of continuously generated data. Novelty detection is a classification task that assesses if an example or a set of examples differ significantly from the previously seen examples. This is an important task for data streams, mainly because new concepts may appear, disappear or evolve over time. Most of the work found in the novelty detection literature presents novelty detection as a binary classification task. A few authors treat this task as multiclass, but even they use binary evaluation measures. In several real problems, novelty detection in data streams must be treated as a multiclass task, in which, the known concept about the problem is composed by one or more classes and different new classes may appear over time. This thesis proposes a new algorithm MINAS for novelty detection in data streams. MINAS deals with novelty detection as a multiclass task. In the training phase, MINAS builds a decision model based on a labeled data set. In the application phase, new examples are classified using the decision model, or marked with an unknown profile. Groups of unknown examples can be later used to create valid novelty patterns, which are added to the current decision model. The decision model is updated as new data arrives in the stream in order to reflect changes in the known classes and to allow the addition of novelty patterns. This thesis also proposes a new methodology to evaluate classifiers for novelty detection in data streams. This methodology associates the unlabeled novelty patterns to the true problem classes, allowing the evaluation of a confusion matrix that is incremental and rectangular. In addition, the proposed methodology allows the evaluation of unknown examples separately and the use multiclass evaluation measures. Additionally, this thesis presents a set of experiments carried out comparing the MINAS algorithm and the main novelty detection algorithms found in the literature, using artificial and real data sets. Finally, MINAS was applied to a human activity recognition problem using accelerometer data. The experimental results show the potential of the proposed algorithm and methodologies
|
6 |
Κατασκευή διαγνωστικού συστήματος με στατιστικές μεθόδους αναγνώρισης νέων γεγονότωνΛαμπρόπουλος, Νίκος 01 August 2014 (has links)
Στη συγκεριμένη διπλωματική εργασία γίνεται μια σχοινοτενής μελέτη των
τεχνικών αναγνώρισης νέων γεγονότων (ανωμαλιών ή outliers) σε ευρεία σετ
δεδομένων. Το απαράιτητο θεωρητικό background που απαιτείται για την
κατανόηση των τεχνικών παρέχεται ξεχωριστά προκειμένου να εξασφαλιστεί η
συνοχή του κειμένου.
Στο πρώτο κεφάλαιο γίνεται εισαγωγή στην έννοια και στις εφαρμογές του
novelty detection,
ενώ παρέχεται μια πρώτη κατηγοριοποίηση των τεχνικών
αυτών. Στο Κεφάλαιο 2 αναλύονται οι στατιστικές προσεγγίσεις που έχουν
προταθεί,
τόσο οι παραμετρικές όσο και οι μη-παραμετρικές. Στα κεφάλαια 3
και 4 γίνεται μια εισαγωγή στα νευρωνικά δίκτυα και στα SVM προκείμένου να
εξηγηθεί η χρήση τους στις εφαρμογές αναγνώρισης νέων γεγονότων ή
ανωμαλιών (Κεφάλαιο 5).
Ολοκληρώνοντας στη συγκεκριμένη διπλωματική εργασία στατιστικές
προσεγγίσεις καθώς επίσης και τεχνικές βασιζόμενες σε νευρωνικά δίκτυα και
SVM παρουσιάζονται με σαφήνεια,
για την ανίχνευση νέων γεγονότων,
ενώ η
συγκριτική μελέτη τους παρέχει έναν συνοπτικό οδηγό-εργαλέιο που συνοψίζει
τα πλεονεκτήματα και τα μειονεκτήματα των παρουσιαθέντων τεχνικών. / -
|
7 |
Detecção de novidade em fluxos contínuos de dados multiclasse / Novelty detection in multiclass data streamsElaine Ribeiro de Faria Paiva 08 May 2014 (has links)
Mineração de fluxos contínuos de dados é uma área de pesquisa emergente que visa extrair conhecimento a partir de grandes quantidades de dados, gerados continuamente. Detecção de novidade é uma tarefa de classificação que consiste em reconhecer que um exemplo ou conjunto de exemplos em um fluxo de dados diferem significativamente dos exemplos vistos anteriormente. Essa é uma importante tarefa para fluxos contínuos de dados, principalmente porque novos conceitos podem aparecer, desaparecer ou evoluir ao longo do tempo. A maioria dos trabalhos da literatura apresentam a detecção de novidade como uma tarefa de classificação binária. Poucos trabalhos tratam essa tarefa como multiclasse, mas usam medidas de avaliação binária. Em vários problemas, o correto seria tratar a detecção de novidade em fluxos contínuos de dados como uma tarefa multiclasse, no qual o conceito conhecido do problema é formado por uma ou mais classes, e diferentes novas classes podem aparecer ao longo do tempo. Esta tese propõe um novo algoritmo MINAS para detecção de novidade em fluxos contínuos de dados. MINAS considera que a detecção de novidade é uma tarefa multiclasse. Na fase de treinamento, MINAS constrói um modelo de decisão com base em um conjunto de exemplos rotulados. Na fase de aplicação, novos exemplos são classificados usando o modelo de decisão atual, ou marcados como desconhecidos. Grupos de exemplos desconhecidos podem formar padrões-novidade válidos, que são então adicionados ao modelo de decisão. O modelo de decisão é atualizado ao longo do fluxo a fim de refletir mudanças nas classes conhecidas e permitir inserção de padrões-novidade. Esta tese também propõe uma nova metodologia para avaliação de algoritmos para detecção de novidade em fluxos contínuos de dados. Essa metodologia associa os padrões-novidade não rotulados às classes reais do problema, permitindo assim avaliar a matriz de confusão que é incremental e retangular. Além disso, a metodologia de avaliação propõe avaliar os exemplos desconhecidos separadamente e utilizar medidas de avaliação multiclasse. Por último, esta tese apresenta uma série de experimentos executados usando o MINAS e os principais algoritmos da literatura em bases de dados artificiais e reais. Além disso, o MINAS foi aplicado a um problema real, que consiste no reconhecimento de atividades humanas usando dados de acelerômetro. Os resultados experimentais mostram o potencial do algoritmo e da metodologia propostos / Data stream mining is an emergent research area that aims to extract knowledge from large amounts of continuously generated data. Novelty detection is a classification task that assesses if an example or a set of examples differ significantly from the previously seen examples. This is an important task for data streams, mainly because new concepts may appear, disappear or evolve over time. Most of the work found in the novelty detection literature presents novelty detection as a binary classification task. A few authors treat this task as multiclass, but even they use binary evaluation measures. In several real problems, novelty detection in data streams must be treated as a multiclass task, in which, the known concept about the problem is composed by one or more classes and different new classes may appear over time. This thesis proposes a new algorithm MINAS for novelty detection in data streams. MINAS deals with novelty detection as a multiclass task. In the training phase, MINAS builds a decision model based on a labeled data set. In the application phase, new examples are classified using the decision model, or marked with an unknown profile. Groups of unknown examples can be later used to create valid novelty patterns, which are added to the current decision model. The decision model is updated as new data arrives in the stream in order to reflect changes in the known classes and to allow the addition of novelty patterns. This thesis also proposes a new methodology to evaluate classifiers for novelty detection in data streams. This methodology associates the unlabeled novelty patterns to the true problem classes, allowing the evaluation of a confusion matrix that is incremental and rectangular. In addition, the proposed methodology allows the evaluation of unknown examples separately and the use multiclass evaluation measures. Additionally, this thesis presents a set of experiments carried out comparing the MINAS algorithm and the main novelty detection algorithms found in the literature, using artificial and real data sets. Finally, MINAS was applied to a human activity recognition problem using accelerometer data. The experimental results show the potential of the proposed algorithm and methodologies
|
8 |
Data Driven Visual RecognitionAghazadeh, Omid January 2014 (has links)
This thesis is mostly about supervised visual recognition problems. Based on a general definition of categories, the contents are divided into two parts: one which models categories and one which is not category based. We are interested in data driven solutions for both kinds of problems. In the category-free part, we study novelty detection in temporal and spatial domains as a category-free recognition problem. Using data driven models, we demonstrate that based on a few reference exemplars, our methods are able to detect novelties in ego-motions of people, and changes in the static environments surrounding them. In the category level part, we study object recognition. We consider both object category classification and localization, and propose scalable data driven approaches for both problems. A mixture of parametric classifiers, initialized with a sophisticated clustering of the training data, is demonstrated to adapt to the data better than various baselines such as the same model initialized with less subtly designed procedures. A nonparametric large margin classifier is introduced and demonstrated to have a multitude of advantages in comparison to its competitors: better training and testing time costs, the ability to make use of indefinite/invariant and deformable similarity measures, and adaptive complexity are the main features of the proposed model. We also propose a rather realistic model of recognition problems, which quantifies the interplay between representations, classifiers, and recognition performances. Based on data-describing measures which are aggregates of pairwise similarities of the training data, our model characterizes and describes the distributions of training exemplars. The measures are shown to capture many aspects of the difficulty of categorization problems and correlate significantly to the observed recognition performances. Utilizing these measures, the model predicts the performance of particular classifiers on distributions similar to the training data. These predictions, when compared to the test performance of the classifiers on the test sets, are reasonably accurate. We discuss various aspects of visual recognition problems: what is the interplay between representations and classification tasks, how can different models better adapt to the training data, etc. We describe and analyze the aforementioned methods that are designed to tackle different visual recognition problems, but share one common characteristic: being data driven. / <p>QC 20140604</p>
|
9 |
Detecção de novidade com aplicação a fluxos contínuos de dados / Novelty detection with application to data streamsSpinosa, Eduardo Jaques 20 February 2008 (has links)
Neste trabalho a detecção de novidade é tratada como o problema de identificação de conceitos emergentes em dados que podem ser apresentados em um fluxo contínuo. Considerando a relação intrínseca entre tempo e novidade e os desafios impostos por fluxos de dados, uma nova abordagem é proposta. OLINDDA (OnLIne Novelty and Drift Detection Algorithm) vai além da classficação com uma classe e concentra-se no aprendizado contínuo não-supervisionado de novos conceitos. Tendo aprendido uma descrição inicial de um conceito normal, prossegue à análise de novos dados, tratando-os como um fluxo contínuo em que novos conceitos podem aparecer a qualquer momento. Com o uso de técnicas de agrupamento, OLINDDA pode empregar diversos critérios de validação para avaliar grupos em termos de sua coesão e representatividade. Grupos considerados válidos produzem conceitos que podem sofrer fusão, e cujo conhecimento é continuamente incorporado. A técnica é avaliada experimentalmente com dados artificiais e reais. O módulo de classificação com uma classe é comparado a outras técnicas de detecção de novidade, e a abordagem como um todo é analisada sob vários aspectos por meio da evolução temporal de diversas métricas. Os resultados reforçam a importância da detecção contínua de novos conceitos, assim como as dificuldades e desafios do aprendizado não-supervisionado de novos conceitos em fluxos de dados / In this work novelty detection is treated as the problem of identifying emerging concepts in data that may be presented in a continuous ow. Considering the intrinsic relationship between time and novelty and the challenges imposed by data streams, a novel approach is proposed. OLINDDA, an OnLIne Novelty and Drift Detection Algorithm, goes beyond one-class classification and focuses on the unsupervised continuous learning of novel concepts. Having learned an initial description of a normal concept, it proceeds to the analysis of new data, treating them as a continuous ow where novel concepts may appear at any time. By the use of clustering techniques, OLINDDA may employ several validation criteria to evaluate clusters in terms of their cohesiveness and representativeness. Clusters considered valid produce concepts that may be merged, and whose knowledge is continuously incorporated. The technique is experimentally evaluated with artificial and real data. The one-class classification module is compared to other novelty detection techniques, and the whole approach is analyzed from various aspects through the temporal evolution of several metrics. Results reinforce the importance of continuous detection of novel concepts, as well as the dificulties and challenges of the unsupervised learning of novel concepts in data streams
|
10 |
A Novelty Detection Approach to Seizure Analysis from Intracranial EEGGardner, Andrew Britton 12 April 2004 (has links)
A Novelty Detection Approach to Seizure Analysis from Intracranial EEG
Andrew B. Gardner
146 pages
Directed by Dr. George Vachtsevanos and Dr. Brian Litt
A framework for support vector machine classification of time series events is proposed and applied to analyze physiological signals recorded from epileptic patients. In contrast to previous works, this research formulates seizure analysis as a novelty detection problem which allows seizure detection and prediction to be treated uniformly, in a way that is capable of accommodating multichannel and/or multimodal measurements. Theoretical properties of the support vector machine algorithm employed provide a straightforward means for controlling the false alarm rate of the detector. The resulting novelty detection system was evaluated both offline and online on a corpus of 1077 hours of intracranial electroencephalogram (IEEG) recordings from 12 patients diagnosed with medically resistant temporal lobe epilepsy during evaluation for epilepsy surgery. These patients collectively had 118 seizures during the recording period. The performance of the novelty detection framework was assessed with an emphasis on four key metrics: (1) sensitivity (probability of correct detection), (2) mean detection latency, (3) early-detection fraction (prediction or detection of seizure prior to electrographic onset), and (4) false positive rate. Both the offline and online novelty detectors achieved state-of-the-art seizure detection performance. In particular, the online detector achieved 97.85% sensitivity, -13.3 second latency, and 40% early-detection fraction at an average of 1.74 false positive predictions per hour (Fph). These results demonstrate that a novelty detection approach is not only feasible for seizure analysis, but it improves upon the state-of-the-art as an effective, robust technique. Additionally, an extension of the basic novelty detection framework demonstrated its use as a simple, effective tool for examining the spread of seizure onsets. This may be useful for automatically identifying seizure focus channels in patients with focal epilepsies. It is anticipated that this research will aid in localizing seizure onsets, and provide more efficient algorithms for use in a real device.
|
Page generated in 0.0589 seconds