Spelling suggestions: "subject:" anomaly detection"" "subject:" unomaly detection""
331 |
Applications Of Machine Learning To Anomaly Based Intrusion DetectionPhani, B 07 1900 (has links)
This thesis concerns anomaly detection as a mechanism for intrusion detection in a machine learning framework, using two kinds of audit data : system call traces and Unix shell command traces. Anomaly detection systems model the problem of intrusion detection as a problem of self-nonself discrimination problem. To be able to use machine learning algorithms for anomaly detection, precise definitions of two aspects namely, the learning model and the dissimilarity measure are required. The audit data considered in this thesis is intrinsically sequential. Thus the dissimilarity measure must be able to extract the temporal information in the data which in turn will be used for classification purposes. In this thesis, we study the application of a set of dissimilarity measures broadly termed as sequence kernels that are exclusively suited for such applications. This is done in conjunction with Instance Based learning algorithms (IBL) for anomaly detection. We demonstrate the performance of the system under a wide range of parameter settings and show conditions under which best performance is obtained. Finally, some possible future extensions to the work reported in this report are considered and discussed.
|
332 |
Adaptive Graph-Based Algorithms for Conditional Anomaly Detection and Semi-Supervised LearningValko, Michal 01 August 2011 (has links) (PDF)
We develop graph-based methods for semi-supervised learning based on label propagation on a data similarity graph. When data is abundant or arrive in a stream, the problems of computation and data storage arise for any graph-based method. We propose a fast approximate online algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local representative points that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We also present graph-based methods for detecting conditional anomalies and apply them to the identification of unusual clinical actions in hospitals. Our hypothesis is that patient-management actions that are unusual with respect to the past patients may be due to errors and that it is worthwhile to raise an alert if such a condition is encountered. Conditional anomaly detection extends standard unconditional anomaly framework but also faces new problems known as fringe and isolated points. We devise novel nonparametric graph-based methods to tackle these problems. Our methods rely on graph connectivity analysis and soft harmonic solution. Finally, we conduct an extensive human evaluation study of our conditional anomaly methods by 15 experts in critical care.
|
333 |
Graph Structured Normal Means InferenceSharpnack, James 01 May 2013 (has links)
This thesis addresses statistical estimation and testing of signals over a graph when measurements are noisy and high-dimensional. Graph structured patterns appear in applications as diverse as sensor networks, virology in human networks, congestion in internet routers, and advertising in social networks. We will develop asymptotic guarantees of the performance of statistical estimators and tests, by stating conditions for consistency by properties of the graph (e.g. graph spectra). The goal of this thesis is to demonstrate theoretically that by exploiting the graph structure one can achieve statistical consistency in extremely noisy conditions.
We begin with the study of a projection estimator called Laplacian eigenmaps, and find that eigenvalue concentration plays a central role in the ability to estimate graph structured patterns. We continue with the study of the edge lasso, a least squares procedure with total variation penalty, and determine combinatorial conditions under which changepoints (edges across which the underlying signal changes) on the graph are recovered. We will shift focus to testing for anomalous activations in the graph, using the scan statistic relaxations, the spectral scan statistic and the graph ellipsoid scan statistic. We will also show how one can form a decomposition of the graph from a spanning tree which will lead to a test for activity in the graph. This will lead to the construction of a spanning tree wavelet basis, which can be used to localize activations on the graph.
|
334 |
Correlation-based Botnet Detection in Enterprise NetworksGu, Guofei 07 July 2008 (has links)
Most of the attacks and fraudulent activities on the Internet are carried out by malware. In particular, botnets, as state-of-the-art malware, are now considered as the largest threat to Internet security.
In this thesis, we focus on addressing the botnet detection problem in an enterprise-like network environment. We present a comprehensive correlation-based framework for multi-perspective botnet detection consisting of detection technologies demonstrated in four complementary systems: BotHunter, BotSniffer, BotMiner, and BotProbe. The common thread of these systems is correlation analysis, i.e., vertical correlation (dialog correlation), horizontal correlation, and cause-effect correlation. All these Bot* systems have been evaluated in live networks and/or real-world network traces. The evaluation results show that they can accurately detect real-world botnets for their desired detection purposes with a very low false positive rate.
We find that correlation analysis techniques are of particular value for detecting advanced malware such as botnets. Dialog correlation can be effective as long as malware infections need multiple stages. Horizontal correlation can be effective as long as malware tends to be distributed and coordinated. In addition, active techniques can greatly complement passive approaches, if carefully used. We believe our experience and lessons are of great benefit to future malware detection.
|
335 |
Incremental learning of discrete hidden Markov modelsFlorez-Larrahondo, German, January 2005 (has links)
Thesis (Ph.D.) -- Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
336 |
Réduction à la volée du volume des traces d'exécution pour l'analyse d'applications multimédia de systèmes embarqués / Online execution trace reduction for multimedia software analysis of embedded systemsEmteu Tchagou, Serge Vladimir 15 December 2015 (has links)
Le marché de l'électronique grand public est dominé par les systèmes embarqués du fait de leur puissance de calcul toujours croissante et des nombreuses fonctionnalités qu'ils proposent.Pour procurer de telles caractéristiques, les architectures des systèmes embarqués sont devenues de plus en plus complexes (pluralité et hétérogénéité des unités de traitements, exécution concurrente des tâches, ...).Cette complexité a fortement influencé leur programmabilité au point où rendre difficile la compréhension de l'exécution d'une application sur ces architectures.L'approche la plus utilisée actuellement pour l'analyse de l'exécution des applications sur les systèmes embarqués est la capture des traces d'exécution (séquences d'événements, tels que les appels systèmes ou les changements de contexte, générés pendant l'exécution des applications).Cette approche est utilisée lors des activités de test, débogage ou de profilage des applications.Toutefois, suivant certains cas d'utilisation, les traces d'exécution générées peuvent devenir très volumineuses, de l'ordre de plusieurs centaines de gigaoctets.C'est le cas des tests d'endurance ou encore des tests de validation, qui consistent à tracer l'exécution d'une application sur un système embarqué pendant de longues périodes, allant de plusieurs heures à plusieurs jours.Les outils et méthodes d'analyse de traces d'exécution actuels ne sont pas conçus pour traiter de telles quantités de données.Nous proposons une approche de réduction du volume de trace enregistrée à travers une analyse à la volée de la trace durant sa capture.Notre approche repose sur les spécificités des applications multimédia, qui sont parmi les plus importantes pour le succès des dispositifs populaires comme les Set-top boxes ou les smartphones.Notre approche a pour but de détecter automatiquement les fragments (périodes) suspectes de l'exécution d'une application afin de n'enregistrer que les parties de la trace correspondant à ces périodes d'activités.L'approche que nous proposons comporte deux étapes : une étape d'apprentissage qui consiste à découvrir les comportements réguliers de l'application à partir de la trace d'exécution, et une étape de détection d'anomalies qui consiste à identifier les comportements déviant des comportements réguliers.Les nombreuses expériences, réalisées sur des données synthétiques et des données réelles, montrent que notre approche permet d'obtenir une réduction du volume de trace enregistrée d'un ordre de grandeur avec d'excellentes performances de détection des comportements suspects. / The consumer electronics market is dominated by embedded systems due to their ever-increasing processing power and the large number of functionnalities they offer.To provide such features, architectures of embedded systems have increased in complexity: they rely on several heterogeneous processing units, and allow concurrent tasks execution.This complexity degrades the programmability of embedded system architectures and makes application execution difficult to understand on such systems.The most used approach for analyzing application execution on embedded systems consists in capturing execution traces (event sequences, such as system call invocations or context switch, generated during application execution).This approach is used in application testing, debugging or profiling.However in some use cases, execution traces generated can be very large, up to several hundreds of gigabytes.For example endurance tests, which are tests consisting in tracing execution of an application on an embedded system during long periods, from several hours to several days.Current tools and methods for analyzing execution traces are not designed to handle such amounts of data.We propose an approach for monitoring an application execution by analyzing traces on the fly in order to reduce the volume of recorded trace.Our approach is based on features of multimedia applications which contribute the most to the success of popular devices such as set-top boxes or smartphones.This approach consists in identifying automatically the suspicious periods of an application execution in order to record only the parts of traces which correspond to these periods.The proposed approach consists of two steps: a learning step which discovers regular behaviors of an application from its execution trace, and an anomaly detection step which identifies behaviors deviating from the regular ones.The many experiments, performed on synthetic and real-life datasets, show that our approach reduces the trace size by an order of magnitude while maintaining a good performance in detecting suspicious behaviors.
|
337 |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor DataWahab, Nor-Ul January 2018 (has links)
A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result.
|
338 |
Computational Methods for Perceptual Training in RadiologyJanuary 2012 (has links)
abstract: Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an extended period of exposure to medical images. This dissertation proposes novel computational methods for evaluating and facilitating perceptual training in radiologists. Part 1 of this dissertation proposes an eye-tracking-based metric for measuring the training progress of individual radiologists. Six metrics were identified as potentially useful: time to complete task, fixation count, fixation duration, consciously viewed regions, subconsciously viewed regions, and saccadic length. Part 2 of this dissertation proposes an eye-tracking-based entropy metric for tracking the rise and fall in the interest level of radiologists, as they scan chest radiographs. The results showed that entropy was significantly lower when radiologists were fixating on abnormal regions. Part 3 of this dissertation develops a method that allows extraction of Gabor-based feature vectors from corresponding anatomical regions of "normal" chest radiographs, despite anatomical variations across populations. These feature vectors are then used to develop and compare transductive and inductive computational methods for generating overlay maps that show atypical regions within test radiographs. The results show that the transductive methods produced much better maps than the inductive methods for 20 ground-truthed test radiographs. Part 4 of this dissertation uses an Extended Fuzzy C-Means (EFCM) based instance selection method to reduce the computational cost of transductive methods. The results showed that EFCM substantially reduced the computational cost without a substantial drop in performance. The dissertation then proposes a novel Variance Based Instance Selection (VBIS) method that also reduces the computational cost, but allows for incremental incorporation of new informative radiographs, as they are encountered. Part 5 of this dissertation develops and demonstrates a novel semi-transductive framework that combines the superior performance of transductive methods with the reduced computational cost of inductive methods. The results showed that the semi-transductive approach provided both an effective and efficient framework for detection of atypical regions in chest radiographs. / Dissertation/Thesis / Ph.D. Computer Science 2012
|
339 |
Anomaly detection technique for sequential data / Technique de détection d'anomalies utilisant des données séquentiellesPellissier, Muriel 15 October 2013 (has links)
De nos jours, beaucoup de données peuvent être facilement accessibles. Mais toutes ces données ne sont pas utiles si nous ne savons pas les traiter efficacement et si nous ne savons pas extraire facilement les informations pertinentes à partir d'une grande quantité de données. Les techniques de détection d'anomalies sont utilisées par de nombreux domaines afin de traiter automatiquement les données. Les techniques de détection d'anomalies dépendent du domaine d'application, des données utilisées ainsi que du type d'anomalie à détecter.Pour cette étude nous nous intéressons seulement aux données séquentielles. Une séquence est une liste ordonnée d'objets. Pour de nombreux domaines, il est important de pouvoir identifier les irrégularités contenues dans des données séquentielles comme par exemple les séquences ADN, les commandes d'utilisateur, les transactions bancaires etc.Cette thèse présente une nouvelle approche qui identifie et analyse les irrégularités de données séquentielles. Cette technique de détection d'anomalies peut détecter les anomalies de données séquentielles dont l'ordre des objets dans les séquences est important ainsi que la position des objets dans les séquences. Les séquences sont définies comme anormales si une séquence est presque identique à une séquence qui est fréquente (normale). Les séquences anormales sont donc les séquences qui diffèrent légèrement des séquences qui sont fréquentes dans la base de données.Dans cette thèse nous avons appliqué cette technique à la surveillance maritime, mais cette technique peut être utilisée pour tous les domaines utilisant des données séquentielles. Pour notre application, la surveillance maritime, nous avons utilisé cette technique afin d'identifier les conteneurs suspects. En effet, de nos jours 90% du commerce mondial est transporté par conteneurs maritimes mais seulement 1 à 2% des conteneurs peuvent être physiquement contrôlés. Ce faible pourcentage est dû à un coût financier très élevé et au besoin trop important de ressources humaines pour le contrôle physique des conteneurs. De plus, le nombre de conteneurs voyageant par jours dans le monde ne cesse d'augmenter, il est donc nécessaire de développer des outils automatiques afin d'orienter le contrôle fait par les douanes afin d'éviter les activités illégales comme les fraudes, les quotas, les produits illégaux, ainsi que les trafics d'armes et de drogues. Pour identifier les conteneurs suspects nous comparons les trajets des conteneurs de notre base de données avec les trajets des conteneurs dits normaux. Les trajets normaux sont les trajets qui sont fréquents dans notre base de données.Notre technique est divisée en deux parties. La première partie consiste à détecter les séquences qui sont fréquentes dans la base de données. La seconde partie identifie les séquences de la base de données qui diffèrent légèrement des séquences qui sont fréquentes. Afin de définir une séquence comme normale ou anormale, nous calculons une distance entre une séquence qui est fréquente et une séquence aléatoire de la base de données. La distance est calculée avec une méthode qui utilise les différences qualitative et quantitative entre deux séquences. / Nowadays, huge quantities of data can be easily accessible, but all these data are not useful if we do not know how to process them efficiently and how to extract easily relevant information from a large quantity of data. The anomaly detection techniques are used in many domains in order to help to process the data in an automated way. The anomaly detection techniques depend on the application domain, on the type of data, and on the type of anomaly.For this study we are interested only in sequential data. A sequence is an ordered list of items, also called events. Identifying irregularities in sequential data is essential for many application domains like DNA sequences, system calls, user commands, banking transactions etc.This thesis presents a new approach for identifying and analyzing irregularities in sequential data. This anomaly detection technique can detect anomalies in sequential data where the order of the items in the sequences is important. Moreover, our technique does not consider only the order of the events, but also the position of the events within the sequences. The sequences are spotted as anomalous if a sequence is quasi-identical to a usual behavior which means if the sequence is slightly different from a frequent (common) sequence. The differences between two sequences are based on the order of the events and their position in the sequence.In this thesis we applied this technique to the maritime surveillance, but this technique can be used by any other domains that use sequential data. For the maritime surveillance, some automated tools are needed in order to facilitate the targeting of suspicious containers that is performed by the customs. Indeed, nowadays 90% of the world trade is transported by containers and only 1-2% of the containers can be physically checked because of the high financial cost and the high human resources needed to control a container. As the number of containers travelling every day all around the world is really important, it is necessary to control the containers in order to avoid illegal activities like fraud, quota-related, illegal products, hidden activities, drug smuggling or arm smuggling. For the maritime domain, we can use this technique to identify suspicious containers by comparing the container trips from the data set with itineraries that are known to be normal (common). A container trip, also called itinerary, is an ordered list of actions that are done on containers at specific geographical positions. The different actions are: loading, transshipment, and discharging. For each action that is done on a container, we know the container ID and its geographical position (port ID).This technique is divided into two parts. The first part is to detect the common (most frequent) sequences of the data set. The second part is to identify those sequences that are slightly different from the common sequences using a distance-based method in order to classify a given sequence as normal or suspicious. The distance is calculated using a method that combines quantitative and qualitative differences between two sequences.
|
340 |
DETECÇÃO DE ATAQUES DE NEGAÇÃO DE SERVIÇO EM REDES DE COMPUTADORES ATRAVÉS DA TRANSFORMADA WAVELET 2D / A BIDIMENSIONAL WAVELET TRANSFORM BASED ALGORITHM FOR DOS ATTACK DETECTIONAzevedo, Renato Preigschadt de 08 March 2012 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The analysis of network traffic is a key area for the management of fault-tolerant systems,
since anomalies in network traffic can affect the availability and quality of service (QoS). Intrusion
detection systems in computer networks are used to analyze network traffic in order
to detect attacks and anomalies. The analysis based on anomalies allows attacks detection by
analyzing the behavior of the traffic network. This work proposes an intrusion detection tool
to quickly and effectively detect anomalies in computer networks generated by denial of service
(DoS). The detection algorithm is based on the two-dimensional wavelet transform (2D
Wavelet), a derived method of signal analysis. The wavelet transform is a mathematical tool
with low computational cost that explores the existing information present in the input samples
according to the different levels of the transformation. The proposed algorithm detects anomalies
directly based on the wavelet coefficients, considering threshold techniques. This operation
does not require the reconstruction of the original signal. Experiments were performed using
two databases: a synthetic (DARPA) and another one from data collected at the Federal
University of Santa Maria (UFSM), allowing analysis of the intrusion detection tool under different
scenarios. The wavelets considered for the tests were all from the orthonormal family of
Daubechies: Haar (Db1), Db2, Db4 and Db8 (with 1, 2, 4 and 8 null vanishing moments respectively).
For the DARPA database we obtained a detection rate up to 100% using the Daubechies
wavelet transform Db4, considering normalized wavelet coefficients. For the database collected
at UFSM the detection rate was 95%, again considering Db4 wavelet transform with normalized
wavelet coefficients. / A análise de tráfego de rede é uma área fundamental no gerenciamento de sistemas tolerantes
a falhas, pois anomalias no tráfego de rede podem afetar a disponibilidade e a qualidade do
serviço (QoS). Sistemas detectores de intrusão em redes de computadores são utilizados para
analisar o tráfego de rede com o objetivo de detectar ataques ou anomalias. A análise baseada
em anomalias permite detectar ataques através da análise do comportamento do tráfego de
rede. Este trabalho propõe uma ferramenta de detecção de intrusão rápida e eficaz para detectar
anomalias em redes de computadores geradas por ataques de negação de serviço (DoS).
O algoritmo de detecção é baseado na transformada Wavelet bidimensional (Wavelet 2D), um
método derivado da análise de sinais. A transformada wavelet é uma ferramenta matemática
de baixo custo computacional, que explora as informações presentes nas amostras de entrada
ao longo dos diversos níveis da transformação. O algoritmo proposto detecta anomalias diretamente
nos coeficientes wavelets através de técnicas de corte, não necessitando da reconstrução
do sinal original. Foram realizados experimentos utilizando duas bases de dados: uma sintética
(DARPA), e outra coletada na instituição de ensino (UFSM), permitindo a análise da ferramenta
de detecção de intrusão sob diferentes cenários. As famílias wavelets utilizadas nos testes foram
as wavelets ortonormais de Daubechies: Haar (Db1), Db2, Db4 e Db8 (com 1, 2, 4 e 8 momentos
nulos respectivamente). Para a base de dados DARPA obteve-se uma taxa de detecção de
ataques DoS de até 100% utilizando a wavelet de Daubechies Db4 com os coeficientes wavelets
normalizados, e de 95% para a base de dados da UFSM com a wavelet de Daubechies Db4 com
os coeficientes wavelets normalizados.
|
Page generated in 0.1745 seconds