Spelling suggestions: "subject:" anomaly detection"" "subject:" unomaly detection""
151 |
Log message anomaly detection using machine learningFarzad, Amir 05 July 2021 (has links)
Log messages are one of the most valuable sources of information in the cloud and other software systems. These logs can be used for audits and ensuring system security. Many millions of log messages are produced each day which makes anomaly detection challenging. Automating the detection of anomalies can save time and money as well as improve detection performance. In this dissertation, Deep Learning (DL) methods called Auto-LSTM, Auto-BLSTM and Auto-GRU are developed for log message anomaly detection. They are evaluated using four data sets, namely BGL, Openstack, Thunderbird and IMDB. The first three are popular log data sets while the fourth is a movie review data set which is used for sentiment classification. The results obtained show that Auto-LSTM, Auto-BLSTM and Auto-GRU perform better than other well-known algorithms.
Dealing with imbalanced data is one of the main challenges in Machine Learning (ML)/DL algorithms for classification. This issue is more important with log message data as it is typically very imbalanced and negative logs are rare. Hence, a model is proposed to generate text log messages using a Sequence Generative Adversarial Network (SeqGAN) network. Then features are extracted using an Autoencoder and anomaly detection is done using a GRU network. The proposed model is evaluated with two imbalanced log data sets, namely BGL and Openstack. Results are presented which show that oversampling and balancing data increases the accuracy of anomaly detection and classification.
Another challenge in anomaly detection is dealing with unlabeled data. Labeling even a small portion of logs for model training may not be possible due to the high volume of generated logs. To deal with this unlabeled data, an unsupervised model for log message anomaly detection is proposed which employs Isolation Forest and two deep Autoencoder networks. The Autoencoder networks are used for training and feature extraction, and then for anomaly detection, while Isolation Forest is used for positive sample prediction. The proposed model is evaluated using the BGL, Openstack and Thunderbird log message data sets. The results obtained show that the number of negative samples predicted to be positive is low, especially with Isolation Forest and one Autoencoder. Further, the results are better than with other well-known models.
A hybrid log message anomaly detection technique is proposed which uses pruning of positive and negative logs. Reliable positive log messages are first identified using a Gaussian Mixture Model (GMM) algorithm. Then reliable negative logs are selected using the K-means, GMM and Dirichlet Process Gaussian Mixture Model (BGM) methods iteratively. It is shown that the precision for positive and negative logs with pruning is high. Anomaly detection is done using a Long Short-Term Memory (LSTM) network. The proposed model is evaluated using the BGL, Openstack, and Thunderbird data sets. The results obtained indicate that the proposed model performs better than several well-known algorithms.
Last, an anomaly detection method is proposed using radius-based Fuzzy C-means (FCM) with more clusters than the number of data classes and a Multilayer Perceptron (MLP) network. The cluster centers and a radius are used to select reliable positive and negative log messages. Moreover, class probabilities are used with an expert to correct the network output for suspect logs. The proposed model is evaluated with three well-known data sets, namely BGL, Openstack and Thunderbird. The results obtained show that this model provides better results than existing methods. / Graduate
|
152 |
Anomaly detection for automated security log analysis : Comparison of existing techniques and tools / Detektion av anomalier för automatisk analys av säkerhetsloggarFredriksson Franzén, Måns, Tyrén, Nils January 2021 (has links)
Logging security-related events is becoming increasingly important for companies. Log messages can be used for surveillance of a system or to make an assessment of the dam- age caused in the event of, for example, an infringement. Typically, large quantities of log messages are produced making manual inspection for finding traces of unwanted activity quite difficult. It is therefore desirable to be able to automate the process of analysing log messages. One way of finding suspicious behavior within log files is to set up rules that trigger alerts when certain log messages fit the criteria. However, this requires prior knowl- edge about the system and what kind of security issues that can be expected. Meaning that any novel attacks will not be detected with this approach. It can also be very difficult to determine what normal behavior and abnormal behavior is. A potential solution to this problem is machine learning and anomaly-based detection. Anomaly detection is the pro- cess of finding patterns which do not behave like defined notion of normal behavior. This thesis examines the process of going from raw log data to finding anomalies. Both existing log analysis tools and the creation of our own proof-of-concept implementation are used for the analysis. With the use of labeled log data, our implementation was able to reach a precision of 73.7% and a recall of 100%. The advantages and disadvantages of creating our own implementation as opposed to using an existing tool is presented and discussed along with several insights from the field of anomaly detection for log analysis.
|
153 |
Anomaly Detection in Log Files Using Machine LearningBjörnerud, Philip January 2021 (has links)
Logs generated by the applications, devices, and servers contain information that can be used to determine the health of the system. Manual inspection of logs is important, for example during upgrades, to determine whether the upgrade and data migration were successful. However, manual testing is not reliable enough, and manual inspection of logs is tedious and time-consuming. In this thesis, we propose to use the machine learning techniques Kmeans and DBSCAN to find anomaly sequences in log files. This research also investigated two different kinds of data representation techniques, feature vector representation, and IDF representation. Evaluation metrics such as F1 score, recall, and precision were used to analyze the performance of the applied machine learning algorithms. The study found that the algorithms have large differences regarding detection of anomalies, in which the algorithms performed better in finding the different kinds of anomalous sequences, rather than finding the total amount of them. The result of the study could help the user to find anomalous sequences, without manually inspecting the log file.
|
154 |
Detecting DoS Attack in Smart Home IoT Devices Using a Graph-Based ApproachPaudel, Ramesh, Muncy, Timothy, Eberle, William 01 December 2019 (has links)
The use of the Internet of Things (IoT) devices has surged in recent years. However, due to the lack of substantial security, IoT devices are vulnerable to cyber-attacks like Denial-of-Service (DoS) attacks. Most of the current security solutions are either computationally expensive or unscalable as they require known attack signatures or full packet inspection. In this paper, we introduce a novel Graph-based Outlier Detection in Internet of Things (GODIT) approach that (i) represents smart home IoT traffic as a real-time graph stream, (ii) efficiently processes graph data, and (iii) detects DoS attack in real-time. The experimental results on real-world data collected from IoT-equipped smart home show that GODIT is more effective than the traditional machine learning approaches, and is able to outperform current graph-stream anomaly detection approaches.
|
155 |
Security related self-protected networks: Autonomous threat detection and response (ATDR)Havenga, Wessel Johannes Jacobus January 2021 (has links)
>Magister Scientiae - MSc / Cybersecurity defense tools, techniques and methodologies are constantly faced with increasing
challenges including the evolution of highly intelligent and powerful new-generation threats. The
main challenges posed by these modern digital multi-vector attacks is their ability to adapt with
machine learning. Research shows that many existing defense systems fail to provide adequate
protection against these latest threats. Hence, there is an ever-growing need for self-learning technologies
that can autonomously adjust according to the behaviour and patterns of the offensive
actors and systems. The accuracy and effectiveness of existing methods are dependent on decision
making and manual input by human experts. This dependence causes 1) administration
overhead, 2) variable and potentially limited accuracy and 3) delayed response time.
|
156 |
Environmental Sensor Anomaly Detection Using Learning MachinesConde, Erick F. 01 December 2011 (has links)
The problem of quality assurance/quality control (QA/QC) for real-time measurements of environmental and water quality variables has been a field explored by many in recent years. The use of in situ sensors has become a common practice for acquiring real-time measurements that provide the basis for important natural resources management decisions. However, these sensors are susceptible to failure due to such things as human factors, lack of necessary maintenance, flaws on the transmission line or any part of the sensor, and unexpected changes in the sensors' surrounding conditions. Two types of machine learning techniques were used in this study to assess the detection of anomalous data points on turbidity readings from the Paradise site on the Little Bear River, in northern Utah: Artificial Neural Networks (ANNs) and Relevance Vector Machines (RVMs). ANN and RVM techniques were used to develop regression models capable of predicting upcoming Paradise site turbidity measurements and estimating confidence intervals associated with those predictions, to be later used to determine if a real measurement is an anomaly. Three cases were identified as important to evaluate as possible inputs for the regression models created: (1) only the reported values from the sensor from previous time steps, (2) reported values from the sensor from previous time steps and values of other water types of sensors from the same site as the target sensor, and (3) adding as inputs the previous readings from sensors from upstream sites. The decision of which of the models performed the best was made based on each model's ability to detect anomalous data points that were identified in a QA/QC analysis that was manually performed by a human technician. False positive and false negative rates for a range of confidence intervals were used as the measure of performance of the models. The RVM models were able to detect more anomalous points within narrower confidence intervals than the ANN models. At the same time, it was shown that incorporating as inputs measurements from other sensors at the same site as well as measurements from upstream sites can improve the performance of the models.
|
157 |
Probabilistic Clustering Ensemble Evaluation for Intrusion DetectionMcElwee, Steven M. 01 January 2018 (has links)
Intrusion detection is the practice of examining information from computers and networks to identify cyberattacks. It is an important topic in practice, since the frequency and consequences of cyberattacks continues to increase and affect organizations. It is important for research, since many problems exist for intrusion detection systems. Intrusion detection systems monitor large volumes of data and frequently generate false positives. This results in additional effort for security analysts to review and interpret alerts. After long hours spent reviewing alerts, security analysts become fatigued and make bad decisions. There is currently no approach to intrusion detection that reduces the workload of human analysts by providing a probabilistic prediction that a computer is experiencing a cyberattack.
This research addressed this problem by estimating the probability that a computer system was being attacked, rather than alerting on individual events. This research combined concepts from cyber situation awareness by applying clustering ensembles, probability analysis, and active learning. The unique contribution of this research is that it provides a higher level of meaning for intrusion alerts than traditional approaches.
Three experiments were conducted in the course of this research to demonstrate the feasibility of these concepts. The first experiment evaluated cluster generation approaches that provided multiple perspectives of network events using unsupervised machine learning. The second experiment developed and evaluated a method for detecting anomalies from the clustering results. This experiment also determined the probability that a computer system was being attacked. Finally, the third experiment integrated active learning into the anomaly detection results and evaluated its effectiveness in improving the accuracy.
This research demonstrated that clustering ensembles with probabilistic analysis were effective for identifying normal events. Abnormal events remained uncertain and were assigned a belief. By aggregating the belief to find the probability that a computer system was under attack, the resulting probability was highly accurate for the source IP addresses and reasonably accurate for the destination IP addresses. Active learning, which simulated feedback from a human analyst, eliminated the residual error for the destination IP addresses with a low number of events that required labeling.
|
158 |
LSTM Networks for Detection and Classification of Anomalies in Raw Sensor DataVerner, Alexander 01 January 2019 (has links)
In order to ensure the validity of sensor data, it must be thoroughly analyzed for various types of anomalies. Traditional machine learning methods of anomaly detections in sensor data are based on domain-specific feature engineering. A typical approach is to use domain knowledge to analyze sensor data and manually create statistics-based features, which are then used to train the machine learning models to detect and classify the anomalies. Although this methodology is used in practice, it has a significant drawback due to the fact that feature extraction is usually labor intensive and requires considerable effort from domain experts.
An alternative approach is to use deep learning algorithms. Research has shown that modern deep neural networks are very effective in automated extraction of abstract features from raw data in classification tasks. Long short-term memory networks, or LSTMs in short, are a special kind of recurrent neural networks that are capable of learning long-term dependencies. These networks have proved to be especially effective in the classification of raw time-series data in various domains. This dissertation systematically investigates the effectiveness of the LSTM model for anomaly detection and classification in raw time-series sensor data.
As a proof of concept, this work used time-series data of sensors that measure blood glucose levels. A large number of time-series sequences was created based on a genuine medical diabetes dataset. Anomalous series were constructed by six methods that interspersed patterns of common anomaly types in the data. An LSTM network model was trained with k-fold cross-validation on both anomalous and valid series to classify raw time-series sequences into one of seven classes: non-anomalous, and classes corresponding to each of the six anomaly types.
As a control, the accuracy of detection and classification of the LSTM was compared to that of four traditional machine learning classifiers: support vector machines, Random Forests, naive Bayes, and shallow neural networks. The performance of all the classifiers was evaluated based on nine metrics: precision, recall, and the F1-score, each measured in micro, macro and weighted perspective.
While the traditional models were trained on vectors of features, derived from the raw data, that were based on knowledge of common sources of anomaly, the LSTM was trained on raw time-series data. Experimental results indicate that the performance of the LSTM was comparable to the best traditional classifiers by achieving 99% accuracy in all 9 metrics. The model requires no labor-intensive feature engineering, and the fine-tuning of its architecture and hyper-parameters can be made in a fully automated way. This study, therefore, finds LSTM networks an effective solution to anomaly detection and classification in sensor data.
|
159 |
Anomaly detection in rolling element bearings via two-dimensional Symbolic Aggregate ApproximationHarris, Bradley William 26 May 2013 (has links)
Symbolic dynamics is a current interest in the area of anomaly detection, especially in mechanical systems. Symbolic dynamics reduces the overall dimensionality of system responses while maintaining a high level of robustness to noise. Rolling element bearings are particularly common mechanical components where anomaly detection is of high importance. Harsh operating conditions and manufacturing imperfections increase vibration innately reducing component life and increasing downtime and costly repairs. This thesis presents a novel way to detect bearing vibrational anomalies through Symbolic Aggregate Approximation (SAX) in the two-dimensional time-frequency domain. SAX reduces computational requirements by partitioning high-dimensional sensor data into discrete states. This analysis specifically suits bearing vibration data in the time-frequency domain, as the distribution of data does not greatly change between normal and faulty conditions.
Under ground truth synthetically-generated experiments, two-dimensional SAX in conjunction with Markov model feature extraction is successful in detecting anomalies (> 99%) using short time spans (< 0.1 seconds) of data in the time-frequency domain with low false alarms (< 8%). Analysis of real-world datasets validates the performance over the commonly used one-dimensional symbolic analysis by detecting 100% of experimental anomalous vibration with 0 false alarms in all fault types using less than 1 second of data for the basis of 'normality'. Two-dimensional SAX also demonstrates the ability to detect anomalies in predicative monitoring environments earlier than previous methods, even in low Signal-to-Noise ratios. / Master of Science
|
160 |
Enhancing System Reliability using Abstraction and Efficient Logical Computation / 抽象化技術と高速な論理演算を利用したシステムの高信頼化Kutsuna, Takuro 24 September 2015 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19335号 / 情博第587号 / 新制||情||102(附属図書館) / 32337 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 山本 章博, 教授 鹿島 久嗣, 教授 五十嵐 淳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
Page generated in 0.1126 seconds