Spelling suggestions: "subject:"anomaly detection."" "subject:"anomaly 1detection.""
11 |
Anomaly Detection Through Statistics-Based Machine Learning For Computer NetworksZhu, Xuejun January 2006 (has links)
The intrusion detection in computer networks is a complex research problem, which requires the understanding of computer networks and the mechanism of intrusions, the configuration of sensors and the collected data, the selection of the relevant attributes, and the monitor algorithms for online detection. It is critical to develop general methods for data dimension reduction, effective monitoring algorithms for intrusion detection, and means for their performance improvement. This dissertation is motivated by the timely need to develop statistics-based machine learning methods for effective detection of computer network anomalies.Three fundamental research issues related to data dimension reduction, control charts design and performance improvement have been addressed accordingly. The major research activities and corresponding contributions are summarized as follows:(1) Filter and Wrapper models are integrated to extract a small number of the informative attributes for computer network intrusion detection. A two-phase analyses method is proposed for the integration of Filter and Wrapper models. The proposed method has successfully reduced the original 41 attributes to 12 informative attributes while increasing the accuracy of the model. The comparison of the results in each phase shows the effectiveness of the proposed method.(2) Supervised kernel based control charts for anomaly intrusion detection. We propose to construct control charts in a feature space. The first contribution is the use of multi-objective Genetic Algorithm in the parameter pre-selection for SVM based control charts. The second contribution is the performance evaluation of supervised kernel based control charts.(3) Unsupervised kernel based control charts for anomaly intrusion detection. Two types of unsupervised kernel based control charts are investigated: Kernel PCA control charts and Support Vector Clustering based control charts. The applications of SVC based control charts on computer networks audit data are also discussed to demonstrate the effectiveness of the proposed method.Although the developed methodologies in this dissertation are demonstrated in the computer network intrusion detection applications, the methodologies are also expected to be applied to other complex system monitoring, where the database consists of a large dimensional data with non-Gaussian distribution.
|
12 |
Parallel Stochastic Estimation on Multicore PlatformsRosén, Olov January 2015 (has links)
The main part of this thesis concerns parallelization of recursive Bayesian estimation methods, both linear and nonlinear such. Recursive estimation deals with the problem of extracting information about parameters or states of a dynamical system, given noisy measurements of the system output and plays a central role in signal processing, system identification, and automatic control. Solving the recursive Bayesian estimation problem is known to be computationally expensive, which often makes the methods infeasible in real-time applications and problems of large dimension. As the computational power of the hardware is today increased by adding more processors on a single chip rather than increasing the clock frequency and shrinking the logic circuits, parallelization is one of the most powerful ways of improving the execution time of an algorithm. It has been found in the work of this thesis that several of the optimal filtering methods are suitable for parallel implementation, in certain ranges of problem sizes. For many of the suggested parallelizations, a linear speedup in the number of cores has been achieved providing up to 8 times speedup on a double quad-core computer. As the evolution of the parallel computer architectures is unfolding rapidly, many more processors on the same chip will soon become available. The developed methods do not, of course, scale infinitely, but definitely can exploit and harness some of the computational power of the next generation of parallel platforms, allowing for optimal state estimation in real-time applications. / CoDeR-MP
|
13 |
Modeling and Detection of Content and Packet Flow Anomalies at Enterprise Network GatewayLin, Sheng-Ya 02 October 2013 (has links)
This dissertation investigates modeling techniques and computing algorithms for detection of anomalous contents and traffic flows of ingress Internet traffic at an enterprise network gateway. Anomalous contents refer to a large volume of ingress packets whose contents are not wanted by enterprise users, such as unsolicited electronic messages (UNE). UNE are often sent by Botnet farms for network resource exploitation, information stealing, and they incur high costs in bandwidth waste. Many products have been designed to block UNE, but most of them rely on signature database(s) for matching, and they cannot recognize unknown attacks. To address this limitation, in this dissertation I propose a Progressive E-Message Classifier (PEC) to timely classify message patterns that are commonly associated with UNE. On the basis of a scoring and aging engine, a real-time scoreboard keeps track of detected feature instances of the detection features until they are considered either as UNE or normal messages. A mathematical model has been designed to precisely depict system behaviors and then set detection parameters. The PEC performance is widely studied using different parameters based on several experiments.
The objective of anomalous traffic flow detection is to detect selfish Transmission Control Protocol, TCP, flows which do not conform to one of the handful of congestion control protocols in adjusting their packet transmission rates in the face of network congestion. Given that none of the operational parameters in congestion control are carried in the transmitted packets, a gateway can only use packet arrival times to recover states of end to end congestion control rules, if any. We develop new techniques to estimate round trip time (RTT) using EWMA Lomb-Scargle periodogram, detect change of congestion windows by the CUSUM algorithm, and then finally predict detected congestion flow states using a prioritized decision chain. A high level finite state machine (FSM) takes the predictions as inputs to determine if a TCP flow follows a particular congestion control protocol. Multiple experiments show promising outcomes of classifying flows of different protocols based on the ratio of the aberrant transition count to normal transition count generated by FSM.
|
14 |
Rare category detection using hierarchical mean shift /Vatturi, Pavan Kumar. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2009. / Printout. Includes bibliographical references (leaves 45-46). Also available on the World Wide Web.
|
15 |
Anomaly Classification Through Automated Shape Grammar RepresentationWhiting, Mark E. 01 August 2017 (has links)
Statistical learning offers a trove of opportunities for problems where a large amount of data is available but falls short when data are limited. For example, in medicine, statistical learning has been used to outperform dermatologists in diagnosing melanoma visually from millions of photos of skin lesions. However, many other medical applications of this kind of learning are made impossible due to the lack of sufficient learning data, for example, performing similar diagnosis of soft tissue tumors within the body based on radiological imagery of blood vessel development. A key challenge underlying this situation is that many statistical learning approaches utilize unstructured data representations such as strings of text or raw images, that don’t intrinsically incorporate structural information. Shape grammar is a way of using visual rules to define the underlying structure of geometric data, pioneered by the design community. Shape grammar rules are replacement rules in which the left side of the rule is a search pattern and the right side is a replacement pattern which can replace the left side where it is found. Traditionally shape grammars have been assembled by hand through observation, making it slow to use them and limiting their use with complex data. This work introduces a way to automate the generation of shape grammars and a technique to use grammars for classification in situations with limited data. A method for automatically inducing grammars from graph based data using a simple recursive algorithm, providing non-probabilistic rulesets, is introduced. The algorithm uses iterative data segmentation to establish multi scale shape rules, and can do so with a single dataset. Additionally, this automatic grammar induction algorithm has been extended to apply to high dimensional data in a nonvisual domain, for example, graphs like social networks. We validated our method by comparing our results to grammars made of historic buildings and products and found it performed comparably grammars made by humans. The induction method was extended by introducing a classification approach based on mapping grammar rule occurrences to dimensions in a high dimensional vector space. With this representation data samples can be analyzed and quickly classified, without the need for data intensive statistical learning. We validated this method by performing sensitivity tests on key graph augmentations and found that our method was comparably sensitive and significantly faster at learning than related existing methods at detecting graph differences across cases. The automated grammar technique and the grammar based classification technique were used together to classify magnetic resonance imaging (MRI) of the brain of 17 individuals and showed that our methods could detect a variety of vasculature borne condition indicators with short and long-term health implications. Through this study we demonstrate that automated grammar based representations can be used for efficient classification of anomalies in abstract domains such as design and biological tissue analysis.
|
16 |
A timing approach to network-based anomaly detection for SCADA systemsLin, Chih-Yuan January 2020 (has links)
Supervisory Control and Data Acquisition (SCADA) systems control and monitor critical infrastructure in society, such as electricity transmission and distribution systems. Modern SCADA systems are increasingly adopting open architectures, protocols, and standards and being connected to the Internet to enable remote control. A boost in sophisticated attacks against SCADA systems makes SCADA security a pressing issue. An Intrusion Detection System (IDS) is a security countermeasure that monitors a network and tracks unauthenticated activities inside the network. Most commercial IDSs used in general IT systems are signature-based, by which an IDS compares the system behaviors with known attack patterns. Unfortunately, recent attacks against SCADA systems exploit zero-day vulnerabilities in SCADA devices which are undetectable by signature-based IDSs. This thesis aims to enhance SCADA system monitoring by anomaly detection that models normal behaviors and finds deviations from the model. With anomaly detection, zero-day attacks are possible to detect. We focus on modeling the timing attributes of SCADA traffic for two reasons: (1) the timing regularity fits the automation nature of SCADA systems, and (2) the timing information (i.e., arrival time) of a packet is captured and sent by a network driver where an IDS is located. Hence, it’s less prone to intentional manipulation by an attacker, compared to the payload of a packet. This thesis first categorises SCADA traffic into two groups, request-response and spontaneous traffic, and studies data collected in three different protocol formats (Modbus, Siemens S7, and IEC-60870-5-104). The request-response traffic is generated by a polling mechanism. For this type of traffic, we model the inter-arrival times for each command and response pair with a statistical approach. Results presented in this thesis show that request-response traffic exists in several SCADA traffic sets collected from systems with different sizes and settings. The proposed statistical approach for request-response traffic can detect attacks having subtle changes in timing, such as a single packet insertion and TCP prediction for two of the three SCADA protocols studied. The spontaneous traffic is generated by remote terminal units when they see significant changes in measurement values. For this type of traffic, we first use a pattern mining approach to find the timing characteristics of the data. Then, we model the suggested attributes with machine learning approaches and run it on traffic collected in a real power facility. We test our anomaly detection model with two types of attacks. One causes persistent anomalies and another only causes intermittent ones. Our anomaly detector exhibits a 100% detection rate with at most 0.5% false positive rate for the attacks with persistent anomalies. For the attacks with intermittent anomalies, we find our approach effective when (1) the anomalies last for a longer period (over 1 hour), or (2) the original traffic has relatively low volume.
|
17 |
Fine-Grained Anomaly Detection For In Depth Data ProtectionShagufta Mehnaz (9012230) 23 June 2020 (has links)
Data represent a key resource for all organizations we may think of. Thus, it is not surprising that data are the main target of a large variety of attacks. Security vulnerabilities and phishing attacks make it possible for malicious software to steal business or privacy sensitive data and to undermine data availability such as in recent ransomware attacks.Apart from external malicious parties, insider attacks also pose serious threats to organizations with sensitive information, e.g., hospitals with patients’ sensitive information. Access control mechanisms are not always able to prevent insiders from misusing or stealing data as they often have data access permissions. Therefore, comprehensive solutions for data protection require combining access control mechanisms and other security techniques,such as encryption, with techniques for detecting anomalies in data accesses. In this the-sis, we develop fine-grained anomaly detection techniques for ensuring in depth protection of data from malicious software, specifically, ransomware, and from malicious insiders.While anomaly detection techniques are very useful, in many cases the data that is used for anomaly detection are very sensitive, e.g., health data being shared with untrusted service providers for anomaly detection. The owners of such data would not share their sensitive data in plain text with an untrusted service provider and this predicament undoubtedly hinders the desire of these individuals/organizations to become more data-driven. In this thesis, we have also built a privacy-preserving framework for real-time anomaly detection.
|
18 |
Prediction and Anomaly Detection Techniques for Spatial DataLiu, Xutong 11 June 2013 (has links)
With increasing public sensitivity and concern on environmental issues, huge amounts of spatial data have been collected from location based social network applications to scientific data. This has encouraged formation of large spatial data set and generated considerable interests for identifying novel and meaningful patterns. Allowing correlated observations weakens the usual statistical assumption of independent observations, and complicates the spatial analysis. This research focuses on the construction of efficient and effective approaches for three main mining tasks, including spatial outlier detection, robust inference for spatial dataset, and spatial prediction for large multivariate non-Gaussian data.
spatial outlier analysis, which aims at detecting abnormal objects in spatial contexts, can help extract important knowledge in many applications. There exist the well-known masking and swamping problems in most approaches, which can't still satisfy certain requirements aroused recently. This research focuses on development of spatial outlier detection techniques for three aspects, including spatial numerical outlier detection, spatial categorical outlier detection and identification of the number of spatial numerical outliers.
First, this report introduces Random Walk based approaches to identify spatial numerical outliers. The Bipartite and an Exhaustive Combination weighted graphs are modeled based on spatial and/or non-spatial attributes, and then Random walk techniques are performed on the graphs to compute the relevance among objects. The objects with lower relevance are recognized as outliers. Second, an entropy-based method is proposed to estimate the optimum number of outliers. According to the entropy theory, we expect that, by incrementally removing outliers, the entropy value will decrease sharply, and reach a stable state when all the outliers have been removed. Finally, this research designs several Pair Correlation Function based methods to detect spatial categorical outliers for both single and multiple attribute data. Within them, Pair Correlation Ratio(PCR) is defined and estimated for each pair of categorical combinations based on their co-occurrence frequency at different spatial distances. The observations with the lower PCRs are diagnosed as potential SCOs.
Spatial kriging is a widely used predictive model whose predictive accuracy could be significantly compromised if the observations are contaminated by outliers. Also, due to spatial heterogeneity, observations are often different types. The prediction of multivariate spatial processes plays an important role when there are cross-spatial dependencies between multiple responses. In addition, given the large volume of spatial data, it is computationally challenging. These raise three research topics: 1).robust prediction for spatial data sets; 2).prediction of multivariate spatial observations; and 3). efficient processing for large data sets.
First, increasing the robustness of spatial kriging model can be systematically addressed by integrating heavy tailed distributions. However, it is analytically intractable inference. Here, we presents a novel robust and reduced Rank spatial kriging Model (R$^3$-SKM), which is resilient to the influences of outliers and allows for fast spatial inference. Second, this research introduces a flexible hierarchical Bayesian framework that permits the simultaneous modeling of mixed type variable. Specifically, the mixed-type attributes are mapped to latent numerical random variables that are multivariate Gaussian in nature. Finally, the knot-based techniques is utilized to model the predictive process as a reduced rank spatial process, which projects the process realizations of the spatial model to a lower dimensional subspace. This projection significantly reduces the computational cost. / Ph. D.
|
19 |
Deep adaptive anomaly detection using an active learning frameworkSekyi, Emmanuel 18 April 2023 (has links) (PDF)
Anomaly detection is the process of finding unusual events in a given dataset. Anomaly detection is often performed on datasets with a fixed set of predefined features. As a result of this, if the normal features bear a close resemblance to the anomalous features, most anomaly detection algorithms exhibit poor performance. This work seeks to answer the question, can we deform these features so as to make the anomalies standout and hence improve the anomaly detection outcome? We employ a Deep Learning and an Active Learning framework to learn features for anomaly detection. In Active Learning, an Oracle (usually a domain expert) labels a small amount of data over a series of training rounds. The deep neural network is trained after each round to incorporate the feedback from the Oracle into the model. Results on the MNIST, CIFAR-10 and Galaxy Zoo datasets show that our algorithm, Ahunt, significantly outperforms other anomaly detection algorithms used on a fixed, static, set of features. Ahunt can therefore overcome a poor choice of features that happen to be suboptimal for detecting anomalies in the data, learning more appropriate features. We also explore the role of the loss function and Active Learning query strategy, showing these are important, especially when there is a significant variation in the anomalies.
|
20 |
Tuning and Optimising Concept Drift DetectionDo, Ethan Quoc-Nam January 2021 (has links)
Data drifts naturally occur in data streams due to seasonality, change in data usage,
and the data generation process. Concepts modelled via the data streams will also
experience such drift. The problem of differentiating concept drift from anomalies
is important to identify normal vs abnormal behaviour. Existing techniques achieve
poor responsiveness and accuracy towards this differentiation task.
We take two approaches to address this problem. First, we extend an existing
sliding window algorithm to include multiple windows to model recently seen data
stream patterns, and define new parameters to compare the data streams. Second,
we study a set of optimisers and tune a Bi-LSTM model parameters to maximize
accuracy. / Thesis / Master of Applied Science (MASc)
|
Page generated in 0.087 seconds