Global ETD Search

321	Anomaly Detection From Personal Usage Patterns In Web Applications Vural, Gurkan 01 December 2006 (has links) (PDF) The anomaly detection task is to recognize the presence of an unusual (and potentially hazardous) state within the behaviors or activities of a computer user, system, or network with respect to some model of normal behavior which may be either hard-coded or learned from observation. An anomaly detection agent faces many learning problems including learning from streams of temporal data, learning from instances of a single class, and adaptation to a dynamically changing concept. The domain is complicated by considerations of the trusted insider problem (recognizing the difference between innocuous and malicious behavior changes on the part of a trusted user). This study introduces the anomaly detection in web applications and formulates it as a machine learning task on temporal sequence data. In this study the goal is to develop a model or profile of normal working state of web application user and to detect anomalous conditions as deviations from the expected behavior patterns. We focus, here, on learning models of normality at the user behavioral level, as observed through a web application. In this study we introduce some sensors intended to function as a focus of attention unit at the lowest level of a classification hierarchy using Finite State Markov Chains and Hidden Markov Models and discuss the success of these sensors.
322	A Bayesian least squares support vector machines based framework for fault diagnosis and failure prognosis Khawaja, Taimoor Saleem 21 July 2010 (has links) A high-belief low-overhead Prognostics and Health Management (PHM) system is desired for online real-time monitoring of complex non-linear systems operating in a complex (possibly non-Gaussian) noise environment. This thesis presents a Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault diagnosis and failure prognosis in nonlinear, non-Gaussian systems. The methodology assumes the availability of real-time process measurements, definition of a set of fault indicators, and the existence of empirical knowledge (or historical data) to characterize both nominal and abnormal operating conditions. An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm, set within a Bayesian Inference framework, not only allows for the development of real-time algorithms for diagnosis and prognosis but also provides a solid theoretical framework to address key concepts related to classication for diagnosis and regression modeling for prognosis. SVM machines are founded on the principle of Structural Risk Minimization (SRM) which tends to nd a good trade-o between low empirical risk and small capacity. The key features in SVM are the use of non-linear kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. The Bayesian Inference framework linked with LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis. Additional levels of inference provide the much coveted features of adaptability and tunability of the modeling parameters. The two main modules considered in this research are fault diagnosis and failure prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed scheme uses only baseline data to construct a 1-class LS-SVM machine which, when presented with online data, is able to distinguish between normal behavior and any abnormal or novel data during real-time operation. The results of the scheme are interpreted as a posterior probability of health (1 - probability of fault). As shown through two case studies in Chapter 3, the scheme is well suited for diagnosing imminent faults in dynamical non-linear systems. Finally, the failure prognosis scheme is based on an incremental weighted Bayesian LS-SVR machine. It is particularly suited for online deployment given the incremental nature of the algorithm and the quick optimization problem solved in the LS-SVR algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM) scheme, the algorithm can estimate (possibly) non-Gaussian posterior distributions for complex non-linear systems. An efficient regression scheme associated with the more rigorous core algorithm allows for long-term predictions, fault growth estimation with confidence bounds and remaining useful life (RUL) estimation after a fault is detected. The leading contributions of this thesis are (a) the development of a novel Bayesian Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI) based on Least Squares Support Vector Machines , (b) the development of a data-driven real-time architecture for long-term Failure Prognosis using Least Squares Support Vector Machines,(c) Uncertainty representation and management using Bayesian Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis algorithms in order to relate the efficiency and reliability of the proposed schemes. Prognostics PHM CBM Diagnostics Classification Anomaly detection Regression Support vector machines Least squares Fault location (Engineering) Algorithms Bayesian statistical decision theory System failures (Engineering)
323	Applications Of Machine Learning To Anomaly Based Intrusion Detection Phani, B 07 1900 (has links) This thesis concerns anomaly detection as a mechanism for intrusion detection in a machine learning framework, using two kinds of audit data : system call traces and Unix shell command traces. Anomaly detection systems model the problem of intrusion detection as a problem of self-nonself discrimination problem. To be able to use machine learning algorithms for anomaly detection, precise deﬁnitions of two aspects namely, the learning model and the dissimilarity measure are required. The audit data considered in this thesis is intrinsically sequential. Thus the dissimilarity measure must be able to extract the temporal information in the data which in turn will be used for classiﬁcation purposes. In this thesis, we study the application of a set of dissimilarity measures broadly termed as sequence kernels that are exclusively suited for such applications. This is done in conjunction with Instance Based learning algorithms (IBL) for anomaly detection. We demonstrate the performance of the system under a wide range of parameter settings and show conditions under which best performance is obtained. Finally, some possible future extensions to the work reported in this report are considered and discussed. Intrusion Detection Cryptography Computer Access Control Machine Learning Sequence Kernel Anomaly Detection Data Mining System Call Traces Intrusion Detection Systems (IDS) Computer Science
324	Adaptive Graph-Based Algorithms for Conditional Anomaly Detection and Semi-Supervised Learning Valko, Michal 01 August 2011 (has links) (PDF) We develop graph-based methods for semi-supervised learning based on label propagation on a data similarity graph. When data is abundant or arrive in a stream, the problems of computation and data storage arise for any graph-based method. We propose a fast approximate online algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local representative points that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We also present graph-based methods for detecting conditional anomalies and apply them to the identification of unusual clinical actions in hospitals. Our hypothesis is that patient-management actions that are unusual with respect to the past patients may be due to errors and that it is worthwhile to raise an alert if such a condition is encountered. Conditional anomaly detection extends standard unconditional anomaly framework but also faces new problems known as fringe and isolated points. We devise novel nonparametric graph-based methods to tackle these problems. Our methods rely on graph connectivity analysis and soft harmonic solution. Finally, we conduct an extensive human evaluation study of our conditional anomaly methods by 15 experts in critical care. [STAT:OT] Statistics/Other Statistics [STAT:OT] Statistiques/Autres Machine Learning Anomaly Detection Graph-Based Learning Online Learning Adaptive Learning Semi-Supervised Learning
325	Graph Structured Normal Means Inference Sharpnack, James 01 May 2013 (has links) This thesis addresses statistical estimation and testing of signals over a graph when measurements are noisy and high-dimensional. Graph structured patterns appear in applications as diverse as sensor networks, virology in human networks, congestion in internet routers, and advertising in social networks. We will develop asymptotic guarantees of the performance of statistical estimators and tests, by stating conditions for consistency by properties of the graph (e.g. graph spectra). The goal of this thesis is to demonstrate theoretically that by exploiting the graph structure one can achieve statistical consistency in extremely noisy conditions. We begin with the study of a projection estimator called Laplacian eigenmaps, and find that eigenvalue concentration plays a central role in the ability to estimate graph structured patterns. We continue with the study of the edge lasso, a least squares procedure with total variation penalty, and determine combinatorial conditions under which changepoints (edges across which the underlying signal changes) on the graph are recovered. We will shift focus to testing for anomalous activations in the graph, using the scan statistic relaxations, the spectral scan statistic and the graph ellipsoid scan statistic. We will also show how one can form a decomposition of the graph from a spanning tree which will lead to a test for activity in the graph. This will lead to the construction of a spanning tree wavelet basis, which can be used to localize activations on the graph. Normal means Gaussian goodness-of-fit graph structure estimation detection localization anomaly detection statistical inference network Laplacian spanning tree wavelets spectral scan Computer Sciences
326	Correlation-based Botnet Detection in Enterprise Networks Gu, Guofei 07 July 2008 (has links) Most of the attacks and fraudulent activities on the Internet are carried out by malware. In particular, botnets, as state-of-the-art malware, are now considered as the largest threat to Internet security. In this thesis, we focus on addressing the botnet detection problem in an enterprise-like network environment. We present a comprehensive correlation-based framework for multi-perspective botnet detection consisting of detection technologies demonstrated in four complementary systems: BotHunter, BotSniffer, BotMiner, and BotProbe. The common thread of these systems is correlation analysis, i.e., vertical correlation (dialog correlation), horizontal correlation, and cause-effect correlation. All these Bot* systems have been evaluated in live networks and/or real-world network traces. The evaluation results show that they can accurately detect real-world botnets for their desired detection purposes with a very low false positive rate. We find that correlation analysis techniques are of particular value for detecting advanced malware such as botnets. Dialog correlation can be effective as long as malware infections need multiple stages. Horizontal correlation can be effective as long as malware tends to be distributed and coordinated. In addition, active techniques can greatly complement passive approaches, if carefully used. We believe our experience and lessons are of great benefit to future malware detection. Malware detection Network security Anomaly detection Intrusion detection Local area networks (Computer networks) Computer networks Security measures Computer crimes Correlation (Statistics)
327	Incremental learning of discrete hidden Markov models Florez-Larrahondo, German, January 2005 (has links) Thesis (Ph.D.) -- Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
328	Réduction à la volée du volume des traces d'exécution pour l'analyse d'applications multimédia de systèmes embarqués / Online execution trace reduction for multimedia software analysis of embedded systems Emteu Tchagou, Serge Vladimir 15 December 2015 (has links) Le marché de l'électronique grand public est dominé par les systèmes embarqués du fait de leur puissance de calcul toujours croissante et des nombreuses fonctionnalités qu'ils proposent.Pour procurer de telles caractéristiques, les architectures des systèmes embarqués sont devenues de plus en plus complexes (pluralité et hétérogénéité des unités de traitements, exécution concurrente des tâches, ...).Cette complexité a fortement influencé leur programmabilité au point où rendre difficile la compréhension de l'exécution d'une application sur ces architectures.L'approche la plus utilisée actuellement pour l'analyse de l'exécution des applications sur les systèmes embarqués est la capture des traces d'exécution (séquences d'événements, tels que les appels systèmes ou les changements de contexte, générés pendant l'exécution des applications).Cette approche est utilisée lors des activités de test, débogage ou de profilage des applications.Toutefois, suivant certains cas d'utilisation, les traces d'exécution générées peuvent devenir très volumineuses, de l'ordre de plusieurs centaines de gigaoctets.C'est le cas des tests d'endurance ou encore des tests de validation, qui consistent à tracer l'exécution d'une application sur un système embarqué pendant de longues périodes, allant de plusieurs heures à plusieurs jours.Les outils et méthodes d'analyse de traces d'exécution actuels ne sont pas conçus pour traiter de telles quantités de données.Nous proposons une approche de réduction du volume de trace enregistrée à travers une analyse à la volée de la trace durant sa capture.Notre approche repose sur les spécificités des applications multimédia, qui sont parmi les plus importantes pour le succès des dispositifs populaires comme les Set-top boxes ou les smartphones.Notre approche a pour but de détecter automatiquement les fragments (périodes) suspectes de l'exécution d'une application afin de n'enregistrer que les parties de la trace correspondant à ces périodes d'activités.L'approche que nous proposons comporte deux étapes : une étape d'apprentissage qui consiste à découvrir les comportements réguliers de l'application à partir de la trace d'exécution, et une étape de détection d'anomalies qui consiste à identifier les comportements déviant des comportements réguliers.Les nombreuses expériences, réalisées sur des données synthétiques et des données réelles, montrent que notre approche permet d'obtenir une réduction du volume de trace enregistrée d'un ordre de grandeur avec d'excellentes performances de détection des comportements suspects. / The consumer electronics market is dominated by embedded systems due to their ever-increasing processing power and the large number of functionnalities they offer.To provide such features, architectures of embedded systems have increased in complexity: they rely on several heterogeneous processing units, and allow concurrent tasks execution.This complexity degrades the programmability of embedded system architectures and makes application execution difficult to understand on such systems.The most used approach for analyzing application execution on embedded systems consists in capturing execution traces (event sequences, such as system call invocations or context switch, generated during application execution).This approach is used in application testing, debugging or profiling.However in some use cases, execution traces generated can be very large, up to several hundreds of gigabytes.For example endurance tests, which are tests consisting in tracing execution of an application on an embedded system during long periods, from several hours to several days.Current tools and methods for analyzing execution traces are not designed to handle such amounts of data.We propose an approach for monitoring an application execution by analyzing traces on the fly in order to reduce the volume of recorded trace.Our approach is based on features of multimedia applications which contribute the most to the success of popular devices such as set-top boxes or smartphones.This approach consists in identifying automatically the suspicious periods of an application execution in order to record only the parts of traces which correspond to these periods.The proposed approach consists of two steps: a learning step which discovers regular behaviors of an application from its execution trace, and an anomaly detection step which identifies behaviors deviating from the regular ones.The many experiments, performed on synthetic and real-life datasets, show that our approach reduces the trace size by an order of magnitude while maintaining a good performance in detecting suspicious behaviors. Réduction de données Analyse à la volée Trace d'exécution Système embarqué Détection d'anomalies Apprentissage automatique Data reduction Online analysis Execution trace Embedded system Anomaly detection Machine learning 621
329	Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data Wahab, Nor-Ul January 2018 (has links) A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result. Anomaly detection rule-based one class support vector machine k-nearest neighbor random forest confusion matrix accuracy precision recall F1-score Social Sciences Interdisciplinary
330	Computational Methods for Perceptual Training in Radiology January 2012 (has links) abstract: Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an extended period of exposure to medical images. This dissertation proposes novel computational methods for evaluating and facilitating perceptual training in radiologists. Part 1 of this dissertation proposes an eye-tracking-based metric for measuring the training progress of individual radiologists. Six metrics were identified as potentially useful: time to complete task, fixation count, fixation duration, consciously viewed regions, subconsciously viewed regions, and saccadic length. Part 2 of this dissertation proposes an eye-tracking-based entropy metric for tracking the rise and fall in the interest level of radiologists, as they scan chest radiographs. The results showed that entropy was significantly lower when radiologists were fixating on abnormal regions. Part 3 of this dissertation develops a method that allows extraction of Gabor-based feature vectors from corresponding anatomical regions of "normal" chest radiographs, despite anatomical variations across populations. These feature vectors are then used to develop and compare transductive and inductive computational methods for generating overlay maps that show atypical regions within test radiographs. The results show that the transductive methods produced much better maps than the inductive methods for 20 ground-truthed test radiographs. Part 4 of this dissertation uses an Extended Fuzzy C-Means (EFCM) based instance selection method to reduce the computational cost of transductive methods. The results showed that EFCM substantially reduced the computational cost without a substantial drop in performance. The dissertation then proposes a novel Variance Based Instance Selection (VBIS) method that also reduces the computational cost, but allows for incremental incorporation of new informative radiographs, as they are encountered. Part 5 of this dissertation develops and demonstrates a novel semi-transductive framework that combines the superior performance of transductive methods with the reduced computational cost of inductive methods. The results showed that the semi-transductive approach provided both an effective and efficient framework for detection of atypical regions in chest radiographs. / Dissertation/Thesis / Ph.D. Computer Science 2012 Computer science Medical imaging and radiology Anomaly Detection Atypicality Detection Chest Radiographs Eye tracking for Radiology Training Online Instance Selection Semi-Transductive Learning

Search results