321 |
RADAR: compiler and architecture supported intrusion prevention, detection, analysis and recoveryZhang, Tao 25 August 2006 (has links)
In this dissertation, we propose RADAR - compileR and micro-Architecture supported intrusion prevention, Detection, Analysis and Recovery. RADAR is an infrastructure to help prevent, detect and even recover from attacks to critical software. Our approach emphasizes collaborations between compiler and micro-architecture to avoid the problems of purely software or hardware based approaches.
With hardware support for cryptographic operations, our infrastructure can achieve strong process isolation to prevent attacks from other processes and to prevent certain types of hardware attacks. Moreover, we show that an unprotected system address bus leaks critical control flow information of the protected software but has never been carefully addressed previously. To enhance intrusion prevention capability of our infrastructure further, we present a scheme with both innovative hardware modification and extensive compiler support to eliminate most of the information leakage on system address bus.
However, no security system is able to prevent all attacks. In general, we have to assume that certain attacks will get through our intrusion prevention mechanisms. To protect software from those attacks, we build a second line of defense consisted of intrusion detection and intrusion recovery mechanisms. Our intrusion detection mechanisms are based on anomaly detection. In this dissertation, we propose three anomaly detection schemes. We demonstrate the effectiveness of our anomaly detection schemes thus the great potential of what compiler and micro-architecture can do for software security.
The ability to recover from an attack is very important for systems providing critical services. Thus, intrusion recoverability is an important goal of our infrastructure. We focus on recovery of memory state in this dissertation, since most attacks break into a system by memory tampering. We propose two schemes for intrusion analysis. The execution logging based scheme incurs little performance overhead but has higher demand for storage and memory bandwidth. The external input points tagging based scheme is much more space and memory bandwidth efficient, but leads to significant performance degradation. After intrusion analysis is done and tampered memory state is identified, tampered memory state can be easily recovered through memory updates logging or memory state checkpointing.
|
322 |
Anomaly Detection From Personal Usage Patterns In Web ApplicationsVural, Gurkan 01 December 2006 (has links) (PDF)
The anomaly detection task is to recognize the presence of an unusual (and potentially hazardous) state within the behaviors or activities of a computer user, system, or network with respect to some model of normal behavior which may be either hard-coded or learned from observation. An anomaly detection agent faces many learning problems including learning from streams of temporal data, learning from instances of a single class, and adaptation to a dynamically changing concept. The domain is complicated by considerations of the trusted insider problem (recognizing the difference between innocuous and malicious behavior changes on the part of a trusted user).
This study introduces the anomaly detection in web applications and formulates it as a machine learning task on temporal sequence data. In this study the goal is to develop a model or profile of normal working state of web application user and to detect anomalous conditions as deviations from the expected behavior patterns. We focus, here, on learning models of normality at the user behavioral level, as observed through a web application. In this study we introduce some sensors intended to function as a focus of attention unit at the lowest level of a classification hierarchy using Finite State Markov Chains and Hidden Markov Models and discuss the success of these sensors.
|
323 |
A Bayesian least squares support vector machines based framework for fault diagnosis and failure prognosisKhawaja, Taimoor Saleem 21 July 2010 (has links)
A high-belief low-overhead Prognostics and Health Management (PHM) system
is desired for online real-time monitoring of complex non-linear systems operating
in a complex (possibly non-Gaussian) noise environment. This thesis presents a
Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault
diagnosis and failure prognosis in nonlinear, non-Gaussian systems. The methodology
assumes the availability of real-time process measurements, definition of a set
of fault indicators, and the existence of empirical knowledge (or historical data) to
characterize both nominal and abnormal operating conditions.
An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm,
set within a Bayesian Inference framework, not only allows for the development of
real-time algorithms for diagnosis and prognosis but also provides a solid theoretical
framework to address key concepts related to classication for diagnosis and regression
modeling for prognosis. SVM machines are founded on the principle of Structural
Risk Minimization (SRM) which tends to nd a good trade-o between low empirical
risk and small capacity. The key features in SVM are the use of non-linear kernels,
the absence of local minima, the sparseness of the solution and the capacity control
obtained by optimizing the margin. The Bayesian Inference framework linked with
LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis.
Additional levels of inference provide the much coveted features of adaptability
and tunability of the modeling parameters.
The two main modules considered in this research are fault diagnosis and failure
prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed
scheme uses only baseline data to construct a 1-class LS-SVM machine which,
when presented with online data, is able to distinguish between normal behavior and
any abnormal or novel data during real-time operation. The results of the scheme
are interpreted as a posterior probability of health (1 - probability of fault). As
shown through two case studies in Chapter 3, the scheme is well suited for diagnosing
imminent faults in dynamical non-linear systems.
Finally, the failure prognosis scheme is based on an incremental weighted Bayesian
LS-SVR machine. It is particularly suited for online deployment given the incremental
nature of the algorithm and the quick optimization problem solved in the LS-SVR
algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM)
scheme, the algorithm can estimate (possibly) non-Gaussian posterior distributions
for complex non-linear systems. An efficient regression scheme associated with the
more rigorous core algorithm allows for long-term predictions, fault growth estimation
with confidence bounds and remaining useful life (RUL) estimation after a fault
is detected.
The leading contributions of this thesis are (a) the development of a novel Bayesian
Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI)
based on Least Squares Support Vector Machines , (b) the development of a data-driven
real-time architecture for long-term Failure Prognosis using Least Squares Support
Vector Machines,(c) Uncertainty representation and management using Bayesian
Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis
algorithms in order to relate the efficiency and reliability of the proposed schemes.
|
324 |
Applications Of Machine Learning To Anomaly Based Intrusion DetectionPhani, B 07 1900 (has links)
This thesis concerns anomaly detection as a mechanism for intrusion detection in a machine learning framework, using two kinds of audit data : system call traces and Unix shell command traces. Anomaly detection systems model the problem of intrusion detection as a problem of self-nonself discrimination problem. To be able to use machine learning algorithms for anomaly detection, precise definitions of two aspects namely, the learning model and the dissimilarity measure are required. The audit data considered in this thesis is intrinsically sequential. Thus the dissimilarity measure must be able to extract the temporal information in the data which in turn will be used for classification purposes. In this thesis, we study the application of a set of dissimilarity measures broadly termed as sequence kernels that are exclusively suited for such applications. This is done in conjunction with Instance Based learning algorithms (IBL) for anomaly detection. We demonstrate the performance of the system under a wide range of parameter settings and show conditions under which best performance is obtained. Finally, some possible future extensions to the work reported in this report are considered and discussed.
|
325 |
Adaptive Graph-Based Algorithms for Conditional Anomaly Detection and Semi-Supervised LearningValko, Michal 01 August 2011 (has links) (PDF)
We develop graph-based methods for semi-supervised learning based on label propagation on a data similarity graph. When data is abundant or arrive in a stream, the problems of computation and data storage arise for any graph-based method. We propose a fast approximate online algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local representative points that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We also present graph-based methods for detecting conditional anomalies and apply them to the identification of unusual clinical actions in hospitals. Our hypothesis is that patient-management actions that are unusual with respect to the past patients may be due to errors and that it is worthwhile to raise an alert if such a condition is encountered. Conditional anomaly detection extends standard unconditional anomaly framework but also faces new problems known as fringe and isolated points. We devise novel nonparametric graph-based methods to tackle these problems. Our methods rely on graph connectivity analysis and soft harmonic solution. Finally, we conduct an extensive human evaluation study of our conditional anomaly methods by 15 experts in critical care.
|
326 |
Graph Structured Normal Means InferenceSharpnack, James 01 May 2013 (has links)
This thesis addresses statistical estimation and testing of signals over a graph when measurements are noisy and high-dimensional. Graph structured patterns appear in applications as diverse as sensor networks, virology in human networks, congestion in internet routers, and advertising in social networks. We will develop asymptotic guarantees of the performance of statistical estimators and tests, by stating conditions for consistency by properties of the graph (e.g. graph spectra). The goal of this thesis is to demonstrate theoretically that by exploiting the graph structure one can achieve statistical consistency in extremely noisy conditions.
We begin with the study of a projection estimator called Laplacian eigenmaps, and find that eigenvalue concentration plays a central role in the ability to estimate graph structured patterns. We continue with the study of the edge lasso, a least squares procedure with total variation penalty, and determine combinatorial conditions under which changepoints (edges across which the underlying signal changes) on the graph are recovered. We will shift focus to testing for anomalous activations in the graph, using the scan statistic relaxations, the spectral scan statistic and the graph ellipsoid scan statistic. We will also show how one can form a decomposition of the graph from a spanning tree which will lead to a test for activity in the graph. This will lead to the construction of a spanning tree wavelet basis, which can be used to localize activations on the graph.
|
327 |
Correlation-based Botnet Detection in Enterprise NetworksGu, Guofei 07 July 2008 (has links)
Most of the attacks and fraudulent activities on the Internet are carried out by malware. In particular, botnets, as state-of-the-art malware, are now considered as the largest threat to Internet security.
In this thesis, we focus on addressing the botnet detection problem in an enterprise-like network environment. We present a comprehensive correlation-based framework for multi-perspective botnet detection consisting of detection technologies demonstrated in four complementary systems: BotHunter, BotSniffer, BotMiner, and BotProbe. The common thread of these systems is correlation analysis, i.e., vertical correlation (dialog correlation), horizontal correlation, and cause-effect correlation. All these Bot* systems have been evaluated in live networks and/or real-world network traces. The evaluation results show that they can accurately detect real-world botnets for their desired detection purposes with a very low false positive rate.
We find that correlation analysis techniques are of particular value for detecting advanced malware such as botnets. Dialog correlation can be effective as long as malware infections need multiple stages. Horizontal correlation can be effective as long as malware tends to be distributed and coordinated. In addition, active techniques can greatly complement passive approaches, if carefully used. We believe our experience and lessons are of great benefit to future malware detection.
|
328 |
Incremental learning of discrete hidden Markov modelsFlorez-Larrahondo, German, January 2005 (has links)
Thesis (Ph.D.) -- Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
329 |
Réduction à la volée du volume des traces d'exécution pour l'analyse d'applications multimédia de systèmes embarqués / Online execution trace reduction for multimedia software analysis of embedded systemsEmteu Tchagou, Serge Vladimir 15 December 2015 (has links)
Le marché de l'électronique grand public est dominé par les systèmes embarqués du fait de leur puissance de calcul toujours croissante et des nombreuses fonctionnalités qu'ils proposent.Pour procurer de telles caractéristiques, les architectures des systèmes embarqués sont devenues de plus en plus complexes (pluralité et hétérogénéité des unités de traitements, exécution concurrente des tâches, ...).Cette complexité a fortement influencé leur programmabilité au point où rendre difficile la compréhension de l'exécution d'une application sur ces architectures.L'approche la plus utilisée actuellement pour l'analyse de l'exécution des applications sur les systèmes embarqués est la capture des traces d'exécution (séquences d'événements, tels que les appels systèmes ou les changements de contexte, générés pendant l'exécution des applications).Cette approche est utilisée lors des activités de test, débogage ou de profilage des applications.Toutefois, suivant certains cas d'utilisation, les traces d'exécution générées peuvent devenir très volumineuses, de l'ordre de plusieurs centaines de gigaoctets.C'est le cas des tests d'endurance ou encore des tests de validation, qui consistent à tracer l'exécution d'une application sur un système embarqué pendant de longues périodes, allant de plusieurs heures à plusieurs jours.Les outils et méthodes d'analyse de traces d'exécution actuels ne sont pas conçus pour traiter de telles quantités de données.Nous proposons une approche de réduction du volume de trace enregistrée à travers une analyse à la volée de la trace durant sa capture.Notre approche repose sur les spécificités des applications multimédia, qui sont parmi les plus importantes pour le succès des dispositifs populaires comme les Set-top boxes ou les smartphones.Notre approche a pour but de détecter automatiquement les fragments (périodes) suspectes de l'exécution d'une application afin de n'enregistrer que les parties de la trace correspondant à ces périodes d'activités.L'approche que nous proposons comporte deux étapes : une étape d'apprentissage qui consiste à découvrir les comportements réguliers de l'application à partir de la trace d'exécution, et une étape de détection d'anomalies qui consiste à identifier les comportements déviant des comportements réguliers.Les nombreuses expériences, réalisées sur des données synthétiques et des données réelles, montrent que notre approche permet d'obtenir une réduction du volume de trace enregistrée d'un ordre de grandeur avec d'excellentes performances de détection des comportements suspects. / The consumer electronics market is dominated by embedded systems due to their ever-increasing processing power and the large number of functionnalities they offer.To provide such features, architectures of embedded systems have increased in complexity: they rely on several heterogeneous processing units, and allow concurrent tasks execution.This complexity degrades the programmability of embedded system architectures and makes application execution difficult to understand on such systems.The most used approach for analyzing application execution on embedded systems consists in capturing execution traces (event sequences, such as system call invocations or context switch, generated during application execution).This approach is used in application testing, debugging or profiling.However in some use cases, execution traces generated can be very large, up to several hundreds of gigabytes.For example endurance tests, which are tests consisting in tracing execution of an application on an embedded system during long periods, from several hours to several days.Current tools and methods for analyzing execution traces are not designed to handle such amounts of data.We propose an approach for monitoring an application execution by analyzing traces on the fly in order to reduce the volume of recorded trace.Our approach is based on features of multimedia applications which contribute the most to the success of popular devices such as set-top boxes or smartphones.This approach consists in identifying automatically the suspicious periods of an application execution in order to record only the parts of traces which correspond to these periods.The proposed approach consists of two steps: a learning step which discovers regular behaviors of an application from its execution trace, and an anomaly detection step which identifies behaviors deviating from the regular ones.The many experiments, performed on synthetic and real-life datasets, show that our approach reduces the trace size by an order of magnitude while maintaining a good performance in detecting suspicious behaviors.
|
330 |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor DataWahab, Nor-Ul January 2018 (has links)
A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result.
|
Page generated in 0.0885 seconds