1 |
Time-variant normal profiling for anomaly detection systemsKim, Jung Yeop. January 2008 (has links)
Thesis (Ph.D.)--University of Wyoming, 2008. / Title from PDF title page (viewed on August 3, 2009). Includes bibliographical references (p. 73-84).
|
2 |
Anomaly detection with Machine learning : Quality assurance of statistical data in the Aid communityBlomquist, Hanna, Möller, Johanna January 2015 (has links)
The overall purpose of this study was to find a way to identify incorrect data in Sida’s statistics about their contributions. A contribution is the financial support given by Sida to a project. The goal was to build an algorithm that determines if a contribution has a risk to be inaccurate coded, based on supervised classification methods within the area of Machine Learning. A thorough data analysis process was done in order to train a model to find hidden patterns in the data. Descriptive features containing important information about the contributions were successfully selected and used for this task. These included keywords that were retrieved from descriptions of the contributions. Two Machine learning methods, Adaboost and Support Vector Machines, were tested for ten classification models. Each model got evaluated depending on their accuracy of predicting the target variable into its correct class. A misclassified component was more likely to be incorrectly coded and was also seen as an anomaly. The Adaboost method performed better and more steadily on the majority of the models. Six classification models built with the Adaboost method were combined to one final ensemble classifier. This classifier was verified with new unseen data and an anomaly score was calculated for each component. The higher the score, the higher the risk of being anomalous. The result was a ranked list, where the most anomalous components were prioritized for further investigation of staff at Sida.
|
3 |
Incremental Anomaly Detection Using Two-Layer Cluster-based StructureBigdeli, Elnaz January 2016 (has links)
Anomaly detection algorithms face several challenges, including processing
speed and dealing with noise in data. In this thesis, a two-layer cluster-
based anomaly detection structure is presented which is fast, noise-resilient
and incremental. In this structure, each normal pattern is considered as
a cluster, and each cluster is represented using a Gaussian Mixture Model
(GMM). Then, new instances are presented to the GMM to be labeled as
normal or abnormal.
The proposed structure comprises three main steps. In the first step, the
data are clustered. The second step is to represent each cluster in a way
that enables the model to classify new instances. The Summarization based
on Gaussian Mixture Model (SGMM) proposed in this thesis represents each
cluster as a GMM.
In the third step, a two-layer structure efficiently updates clusters using
GMM representation while detecting and ignoring redundant instances. A
new approach, called Collective Probabilistic Labeling (CPL) is presented
to update clusters in a batch mode. This approach makes the updating
phase noise-resistant and fast. The collective approach also introduces a new
concept called 'rag bag' used to store new instances. The new instances
collected in the rag bag are clustered and summarized by GMMs. This
enables online systems to identify nearby clusters in the existing and new
clusters, and merge them quickly, despite the presence of noise to update the model.
An important step in the updating is the merging of new clusters with ex-
isting ones. To this end, a new distance measure is proposed, which is a mod-
i ed Kullback-Leibler distance between two GMMs. This modi ed distance
allows accurate identi cation of nearby clusters. After finding neighboring
clusters, they are merged, quickly and accurately. One of the reasons that
GMM is chosen to represent clusters is to have a clear and valid mathematical
representation for clusters, which eases further cluster analysis.
In most real-time anomaly detection applications, incoming instances are
often similar to previous ones. In these cases, there is no need to update
clusters based on duplicates, since they have already been modeled in the
cluster distribution. The two-layer structure is responsible for identifying
redundant instances. In this structure, redundant instance are ignored, and
the remaining new instances are used to update clusters. Ignoring redundant
instances, which are typically in the majority, makes the detection phase fast.
Each part of the general structure is validated in this thesis. The experiments include, detection rates, clustering goodness, time, memory usage
and the complexity of the algorithms. The accuracy of the clustering and
summarization of clusters using GMMs is evaluated, and compared to that of
other methods. Using Davies-Bouldin (DB) and Dunn indexes, the distances
for original and regenerated clusters using GMMs is almost zero with SGMM
method while this value for ABACUS is around 0:01. Moreover, the results
show that the SGMM algorithm is 3 times faster than ABACUS in running
time, using one-third of the memory used by ABACUS.
The CPL method, used to label new instances, is found to collectively
remove the effect of noise, while increasing the accuracy of labeling new
instances. In a noisy environment, the detection rate of the CPL method
is 5% higher than other algorithms such as one-class SVM. The false alarm rate is decreased by 10% on average. Memory use is 20 times lesser that that
of the one-class SVM.
The proposed method is found to lower the false alarm rate, which is
one of the basic problems for the one-class SVM. Experiments show the false
alarm rate is decreased from 5% to 15% among different datasets, while the
detection rate is increased from 5% to 10% in di erent datasets with two-
layer structure. The memory usage for the two-layer structure is 20 to 50
times less than that of one-class SVM. One-class SVM uses support vectors in
labeling new instances, while the labeling of the two-layer structure depends
on the number of GMMs. The experiments show that the two-layer structure
is 20 to 50 times faster than the one-class SVM in labeling new instances.
Moreover, the updating time of two-layer structure is 2 to 3 times less than
one-layer structure. This reduction is the direct result of ignoring redundant
instances and using two-layer structure.
|
4 |
Anomaly Detection in Univariate Time Series Data in the Presence of Concept DriftZamani Alavijeh, Soroush January 2021 (has links)
Digital applications and devices record data over time to enable the users and managers to monitor their activity. Errors occur in data, including the time series data, for various reasons including software system failures and human errors. The problem of identifying errors, also referred to as anomaly detection, in time series data is a well studied topic by the data management and systems researchers. Such data are often recorded in dynamic environments where a change in the standard or the recording hardware can result in different and novel patterns arising in the data. Such novel patterns are caused by what is referred to as concept drifts. Concept drift occurs when there is a pattern change in the statistical properties of the data, e.g. the distribution of the data, over time. The problem of identifying anomalies in time series data recorded and stored in dynamic environments has not been extensively studied. In this study, we focus on this problem. We propose and implement a unified framework that is able to identify drifts in univariate time series data and incorporate information gained from the data to train a learning model that is able to detect anomalies in unseen univariate time series data. / Thesis / Master of Science (MSc)
|
5 |
An information based approach to anomaly detection in dynamic systemsOh, Ki-Tae January 1995 (has links)
No description available.
|
6 |
Anomaly detection from aviation safety reports /Raghuraman, Suraj, January 2008 (has links)
Thesis (M.S.)--University of Texas at Dallas, 2008. / Includes vita. Includes bibliographical references (leaves 39-40)
|
7 |
Anomaly Detection in Aeroacoustic Wind Tunnel ExperimentsDefreitas, Aaron Chad 27 October 2021 (has links)
Wind tunnel experiments often employ a wide variety and large number of sensor systems. Anomalous measurements occurring without the knowledge of the researcher can be devastating to the success of costly experiments; therefore, anomaly detection is of great interest to the wind tunnel community. Currently, anomaly detection in wind tunnel data is a manual procedure. A researcher will analyze the quality of measurements, such as monitoring for pressure measurements outside of an expected range or additional variability in a time averaged quantity. More commonly, the raw data must be fully processed to obtain near-final results during the experiment for an effective review.
Rapid anomaly detection methods are desired to ensure the quality of a measurement and reduce the load on the researcher. While there are many effective methodologies for anomaly detection used throughout the wider engineering research community, they have not been demonstrated in wind tunnel experiments. Wind tunnel experimentation is unique in the sense that many repeat measurements are not typical. Typically, this will only occur if an anomaly has been identified. Since most anomaly detection methodologies rely on well-resolved knowledge of a measurement to uncover the expected uncertainties, they can be difficult to apply in the wind tunnel setting.
First, the analysis will focus on pressure measurements around an airfoil and its wake. Principal component analysis (PCA) will be used to build a measurement expectation by linear estimation. A covariance matrix will be constructed from experimental data to be used in the PCA-scheme. This covariance matrix represents both the strong deterministic relations dependent on experimental configuration as well as random uncertainty. Through principles of ideal flow, a method to normalize geometrical changes to improve measurement expectations will be demonstrated. Measurements from a microphone array, another common system employed in aeroacoustic wind tunnels, will be analyzed similarly through evaluation of the cross-spectral matrix of microphone data, with minimal repeat measurements. A spectral projection method will be proposed that identifies unexpected acoustic source distributions. Analysis of good and anomalous measurements show this methodology is effective. Finally, machine learning technique will be investigated for an experimental situation where repeat measurements of a known event are readily available. A convolutional neural network for feature detection will be shown in the context of audio detection.
This dissertation presents techniques for anomaly detection in sensor systems commonly used in wind tunnel experiments. The presented work suggests that these anomaly identification techniques can be easily introduced into aeroacoustic experiment methodology, minimizing tunnel down time, and reducing cost. / Doctor of Philosophy / Efficient detection of anomalies in wind tunnel experiments would reduce the cost of experiments and increase their effectiveness. Currently, manual inspection is used to detect anomalies in wind tunnel measurements. A researcher may analyze measurements during experiment, for instance, monitoring for pressure measurements outside of an expected range or additional variability in a time averaged quantity. More commonly, the raw data must be fully processed to obtain near-final results to determine quality.
In this dissertation, many methods, which can assist the wind tunnel researcher in reviewing measurements, are developed and tested. First, a method to simultaneously monitor pressure measurements and wind tunnel environment measurements is developed with a popular linear algebra technique called Principal Component Analysis (PCA). The novelty in using PCA is that measurements in wind tunnels are often not repeated. Instead, the proposed method uses a large number of independent measurements acquired in various conditions and fundamental aspects of fluid mechanics to train the detection algorithm.
Another wind tunnel system which is considered is a microphone array. A microphone array is a collection of microphones arranged in known locations. Current methods to assess the quality of the output data from this system require extended computation and review time during an experiment. A method parallel to PCA is used to rapidly determine if an anomaly is present in the measurement. This method does not require the extra computation necessary to see what the microphone array has observed and simplifies the quantities assessed for anomalies. While this is not a replacement for complete computation of the results associated with microphone array measurements, this can take most of the effort out of the experiment time and relegate detailed review to a time after the experiment is complete.
Finally, an application of machine learning is discussed with an alternate application outside of the wind tunnel. This work explores the usefulness of a convolutional neural network (CNN) for cough detection. This can be similarly applied to detect anomalies in audio data if searching for specific anomalies with known characteristics. CNNs, in general, require much effort to train and operate effectively but are not dependent on the application or data type. These methods could be applied to a wind tunnel experiment.
Overall, the work in this dissertation shows many techniques which can be implemented into current wind tunnel operations to improve the efficiency and effectiveness of the data review process.
|
8 |
Semi-supervised and Self-evolving Learning Algorithms with Application to Anomaly Detection in Cloud ComputingPannu, Husanbir Singh 12 1900 (has links)
Semi-supervised learning (SSL) is the most practical approach for classification among machine learning algorithms. It is similar to the humans way of learning and thus has great applications in text/image classification, bioinformatics, artificial intelligence, robotics etc. Labeled data is hard to obtain in real life experiments and may need human experts with experimental equipments to mark the labels, which can be slow and expensive. But unlabeled data is easily available in terms of web pages, data logs, images, audio, video les and DNA/RNA sequences. SSL uses large unlabeled and few labeled data to build better classifying functions which acquires higher accuracy and needs lesser human efforts. Thus it is of great empirical and theoretical interest. We contribute two SSL algorithms (i) adaptive anomaly detection (AAD) (ii) hybrid anomaly detection (HAD), which are self evolving and very efficient to detect anomalies in a large scale and complex data distributions. Our algorithms are capable of modifying an existing classier by both retiring old data and adding new data. This characteristic enables the proposed algorithms to handle massive and streaming datasets where other existing algorithms fail and run out of memory. As an application to semi-supervised anomaly detection and for experimental illustration, we have implemented a prototype of the AAD and HAD systems and conducted experiments in an on-campus cloud computing environment. Experimental results show that the detection accuracy of both algorithms improves as they evolves and can achieve 92.1% detection sensitivity and 83.8% detection specificity, which makes it well suitable for anomaly detection in large and streaming datasets. We compared our algorithms with two popular SSL methods (i) subspace regularization (ii) ensemble of Bayesian sub-models and decision tree classifiers. Our contributed algorithms are easy to implement, significantly better in terms of space, time complexity and accuracy than these two methods for semi-supervised anomaly detection mechanism.
|
9 |
Featured anomaly detection methods and applicationsHuang, Chengqiang January 2018 (has links)
Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks.
|
10 |
Lightweight Network Intrusion DetectionChen, Ya-lin 26 July 2005 (has links)
Exploit codes based on system vulnerabilities are often used by attackers to attack target computers or services. Such exploit programs often send attack packets in the first few packets right after a connection established with the target machine or service. And such attacks are often launched via Telnet service as well. A lightweight network-based intrusion detection system is proposed on detecting such attacks on Telnet traffic.
The proposed system filters the first a few packets after each Telnet connection established and only uses partial data of a packet rather than total of it to detect intrusion, i.e. such design makes system load reduced a lot. This research is anomaly detection. The proposed system characterizes the normal traffic behavior and constructs it as a normal model based on the filtered normal traffic. In detection phase, the system examines the deviation of current filtered packet from the normal model via an anomaly score function, i.e. a more deviate packet will receive a higher anomaly score. Finally, we use 1999 DARPA Intrusion Detection Evaluation Data Set which contains 5 days of training data and 10 days of testing data, and 44 attack instances of 16 types of attacks, to evaluate our proposed system. The proposed system has the detection rate of 73% under a low false alarm rate of 2 false alarms per day; 80% for the hard detected attacks which are poorly detected in 1999 DARPA IDEP.
|
Page generated in 0.1288 seconds