181 |
Anomaly Detection in Time Series Data Based on Holt-Winters Method / Anomalidetektering i tidsseriedata baserat på Holt-Winters metodAboode, Adam January 2018 (has links)
In today's world the amount of collected data increases every day, this is a trend which is likely to continue. At the same time the potential value of the data does also increase due to the constant development and improvement of hardware and software. However, in order to gain insights, make decisions or train accurate machine learning models we want to ensure that the data we collect is of good quality. There are many definitions of data quality, in this thesis we focus on the accuracy aspect. One method which can be used to ensure accurate data is to monitor for and alert on anomalies. In this thesis we therefore suggest a method which, based on historic values, is able to detect anomalies in time series as new values arrive. The method consists of two parts, forecasting the next value in the time series using Holt-Winters method and comparing the residual to an estimated Gaussian distribution. The suggested method is evaluated in two steps. First, we evaluate the forecast accuracy for Holt-Winters method using different input sizes. In the second step we evaluate the performance of the anomaly detector when using different methods to estimate the variance of the distribution of the residuals. The results indicate that the suggested method works well most of the time for detection of point anomalies in seasonal and trending time series data. The thesis also discusses some potential next steps which are likely to further improve the performance of this method. / I dagens värld ökar mängden insamlade data för varje dag som går, detta är en trend som sannolikt kommer att fortsätta. Samtidigt ökar även det potentiella värdet av denna data tack vare ständig utveckling och förbättring utav både hårdvara och mjukvara. För att utnyttja de stora mängder insamlade data till att skapa insikter, ta beslut eller träna noggranna maskininlärningsmodeller vill vi försäkra oss om att vår data är av god kvalité. Det finns många definitioner utav datakvalité, i denna rapport fokuserar vi på noggrannhetsaspekten. En metod som kan användas för att säkerställa att data är av god kvalité är att övervaka inkommande data och larma när anomalier påträffas. Vi föreslår därför i denna rapport en metod som, baserat på historiska data, kan detektera anomalier i tidsserier när nya värden anländer. Den föreslagna metoden består utav två delar, dels att förutsäga nästa värde i tidsserien genom Holt-Winters metod samt att jämföra residualen med en estimerad normalfördelning. Vi utvärderar den föreslagna metoden i två steg. Först utvärderas noggrannheten av de, utav Holt-Winters metod, förutsagda punkterna för olika storlekar på indata. I det andra steget utvärderas prestandan av anomalidetektorn när olika metoder för att estimera variansen av residualernas distribution används. Resultaten indikerar att den föreslagna metoden i de flesta fall fungerar bra för detektering utav punktanomalier i tidsserier med en trend- och säsongskomponent. I rapporten diskuteras även möjliga åtgärder vilka sannolikt skulle förbättra prestandan hos den föreslagna metoden.
|
182 |
Anomaly Detection and Root Cause Analysis for LTE Radio Base Stations / Anomalitetsdetektion och grundorsaksanalys för LTE Radio Base-stationerLópez, Sergio January 2018 (has links)
This project aims to detect possible anomalies in the resource consumption of radio base stations within the 4G LTE Radio architecture. This has been done by analyzing the statistical data that each node generates every 15 minutes, in the form of "performance maintenance counters". In this thesis, we introduce methods that allow resources to be automatically monitored after software updates, in order to detect any anomalies in the consumption patterns of the different resources compared to the reference period before the update. Additionally, we also attempt to narrow down the origin of anomalies by pointing out parameters potentially linked to the issue. / Detta projekt syftar till att upptäcka möjliga anomalier i resursförbrukningen hos radiobasstationer inom 4G LTE Radio-arkitekturen. Detta har gjorts genom att analysera de statistiska data som varje nod genererar var 15:e minut, i form av PM-räknare (PM = Performance Maintenance). I denna avhandling introducerar vi metoder som låter resurser över-vakas automatiskt efter programuppdateringar, för att upptäcka eventuella avvikelser i resursförbrukningen jämfört med referensperioden före uppdateringen. Dessutom försöker vi också avgränsa ursprunget till anomalier genom att peka ut parametrar som är potentiellt kopplade till problemet.
|
183 |
Session-based Intrusion Detection System To Map Anomalous Network TrafficCaulkins, Bruce 01 January 2005 (has links)
Computer crime is a large problem (CSI, 2004; Kabay, 2001a; Kabay, 2001b). Security managers have a variety of tools at their disposal -- firewalls, Intrusion Detection Systems (IDSs), encryption, authentication, and other hardware and software solutions to combat computer crime. Many IDS variants exist which allow security managers and engineers to identify attack network packets primarily through the use of signature detection; i.e., the IDS recognizes attack packets due to their well-known "fingerprints" or signatures as those packets cross the network's gateway threshold. On the other hand, anomaly-based ID systems determine what is normal traffic within a network and reports abnormal traffic behavior. This paper will describe a methodology towards developing a more-robust Intrusion Detection System through the use of data-mining techniques and anomaly detection. These data-mining techniques will dynamically model what a normal network should look like and reduce the false positive and false negative alarm rates in the process. We will use classification-tree techniques to accurately predict probable attack sessions. Overall, our goal is to model network traffic into network sessions and identify those network sessions that have a high-probability of being an attack and can be labeled as a "suspect session." Subsequently, we will use these techniques inclusive of signature detection methods, as they will be used in concert with known signatures and patterns in order to present a better model for detection and protection of networks and systems.
|
184 |
Log Frequency Analysis for Anomaly Detection in Cloud EnvironmentsBendapudi, Prathyusha January 2024 (has links)
Background: Log analysis has been proven to be highly beneficial in monitoring system behaviour, detecting errors and anomalies, and predicting future trends in systems and applications. However, with continuous evolution of these systems and applications, the amount of log data generated on a timely basis is increasing rapidly. Hence, the amount of manual effort invested in log analysis for error detection and root cause analysis is also increasing. While there is continuous research to reduce manual effort, This Thesis introduced a new approach based on the temporal patternsof logs in a particular system environment, to the current scenario of automated log analysis which can help in reducing manual effort to a great extent. Objectives: The main objective of this research is to identify temporal patterns in logs using clustering algorithms, extract the outlier logs which do not adhere to any time pattern, and further analyse them to check if these outlier logs are helpful in error detection and identifying the root cause of the said errors. Methods: Design Science Research was implemented to fulfil the objectives of the thesis, as the thesis required generation of intermediary results and an iterative and responsive approach. The initial part of the thesis consisted of building an artifact which aided in identifying temporal patterns in the logs of different log types using DBSCAN clustering algorithm. After identification of patterns and extraction of outlier logs, Interviews were conducted which employed manual analysis of the outlier logs by system experts, who then provided insights on the logs and validated the log frequency analysis. Results: The results obtained after running the clustering algorithm on logs of different log types show clusters which represent temporal patterns in most of the files. There are log files which do not have any time patterns, which indicate that not all log types have logs which adhere to a fixed time pattern. The interviews conducted with system experts on the outlier logs yield promising results, indicating that the log frequency analysis is indeed helpful in reducing manual effort involved in log analysis for error detection and root cause analysis. Conclusions: The results of the Thesis show that most of the logs in the given cloud environment adhere to time frequency patterns, and analysing these patterns and their outliers will lead to easier error detection and root cause analysis in the given cloud environment.
|
185 |
Robust Anomaly Detection in Critical InfrastructureAbdelaty, Maged Fathy Youssef 14 September 2022 (has links)
Critical Infrastructures (CIs) such as water treatment plants, power grids and telecommunication networks are critical to the daily activities and well-being of our society. Disruption of such CIs would have catastrophic consequences for public safety and the national economy. Hence, these infrastructures have become major targets in the upsurge of cyberattacks. Defending against such attacks often depends on an arsenal of cyber-defence tools, including Machine Learning (ML)-based Anomaly Detection Systems (ADSs). These detection systems use ML models to learn the profile of the normal behaviour of a CI and classify deviations that go well beyond the normality profile as anomalies. However, ML methods are vulnerable to both adversarial and non-adversarial input perturbations. Adversarial perturbations are imperceptible noises added to the input data by an attacker to evade the classification mechanism. Non-adversarial perturbations can be a normal behaviour evolution as a result of changes in usage patterns or other characteristics and noisy data from normally degrading devices, generating a high rate of false positives. We first study the problem of ML-based ADSs being vulnerable to non-adversarial perturbations, which causes a high rate of false alarms. To address this problem, we propose an ADS called DAICS, based on a wide and deep learning model that is both adaptive to evolving normality and robust to noisy data normally emerging from the system. DAICS adapts the pre-trained model to new normality with a small number of data samples and a few gradient updates based on feedback from the operator on false alarms. The DAICS was evaluated on two datasets collected from real-world Industrial Control System (ICS) testbeds. The results show that the adaptation process is fast and that DAICS has an improved robustness compared to state-of-the-art approaches. We further investigated the problem of false-positive alarms in the ADSs. To address this problem, an extension of DAICS, called the SiFA framework, is proposed. The SiFA collects a buffer of historical false alarms and suppresses every new alarm that is similar to these false alarms. The proposed framework is evaluated using a dataset collected from a real-world ICS testbed. The evaluation results show that the SiFA can decrease the false alarm rate of DAICS by more than 80%.
We also investigate the problem of ML-based network ADSs that are vulnerable to adversarial perturbations. In the case of network ADSs, attackers may use their knowledge of anomaly detection logic to generate malicious traffic that remains undetected. One way to solve this issue is to adopt adversarial training in which the training set is augmented with adversarially perturbed samples. This thesis presents an adversarial training approach called GADoT that leverages a Generative Adversarial Network (GAN) to generate adversarial samples for training. GADoT is validated in the scenario of an ADS detecting Distributed Denial of Service (DDoS) attacks, which have been witnessing an increase in volume and complexity. For a practical evaluation, the DDoS network traffic was perturbed to generate two datasets while fully preserving the semantics of the attack. The results show that adversaries can exploit their domain expertise to craft adversarial attacks without requiring knowledge of the underlying detection model. We then demonstrate that adversarial training using GADoT renders ML models more robust to adversarial perturbations. However, the evaluation of adversarial robustness is often susceptible to errors, leading to robustness overestimation. We investigate the problem of robustness overestimation in network ADSs and propose an adversarial attack called UPAS to evaluate the robustness of such ADSs. The UPAS attack perturbs the inter-arrival time between packets by injecting a random time delay before packets from the attacker. The attack is validated by perturbing malicious network traffic in a multi-attack dataset and used to evaluate the robustness of two robust ADSs, which are based on a denoising autoencoder and an adversarially trained ML model. The results demonstrate that the robustness of both ADSs is overestimated and that a standardised evaluation of robustness is needed.
|
186 |
Enhancement of an Ad Reviewal Process through Interpretable Anomaly Detecting Machine Learning Models / Förbättring av en annonsgranskingsprocess genom tolkbara och avvikelsedetekterande maskinsinlärningsmodellerDahlgren, Eric January 2022 (has links)
Technological advancements made in recent decades in the fields of artificial intelligence (AI) and machine learning (ML) has lead to further automation of tasks previously performed by humans. Manually reviewing and assessing content uploaded to social media and marketplace platforms is one of said tasks that is both tedious and expensive to perform, and could possibly be automated through ML based systems. When introducing ML model predictions to a human decision making process, interpretability and explainability of models has been proven to be important factors for humans to trust in individual sample predictions. This thesis project aims to explore the performance of interpretable ML models used together with humans in an ad review process for a rental marketplace platform. Utilizing the XGBoost framework and SHAP for interpretable ML, a system was built with the ability to score an individual ad and explain the prediction with human readable sentences based on feature importance. The model reached an ROC AUC score of 0.90 and an Average Precision score of 0.64 on a held out test set. An end user survey was conducted which indicated some trust in the model and an appreciation for the local prediction explanations, but low general impact and helpfulness. While most related work focus on model performance, this thesis contributes with a smaller model usability study which can provide grounds for utilizing interpretable ML software in any manual decision making process.
|
187 |
Comparison of Machine Learning Algorithms for Anomaly Detection in Train’s Real-Time Ethernet using an Intrusion Detection SystemChaganti, Trayi, Rohith, Tadi January 2022 (has links)
Background: The train communication network is vulnerable to intrusion assaultsbecause of the openness of the ethernet communication protocol. Therefore, an intru-sion detection system must be incorporated into the train communication network.There are many algorithms available in Machine Learning(ML) to develop the Intru-sion Detection System(IDS). Majorly, depending on the accuracy and execution timeof the algorithm, it is decided as the best. Performance metrics like F1 score, preci-sion, recall, and support are compared to see how well the algorithm fits the modelwhile training. The following thesis will detect the anomalies in the Train ControlManagement System(TCMS) and then the comparison of various algorithms will beheld in order to declare the accurate algorithm. Objectives: In this thesis work, we aim to research anomaly detection in a train’sreal-time ethernet using an IDS. The main objectives of this thesis include per-forming Principal Component Analysis(PCA) and feature selection using RandomForest(RF) for simplifying the complexity of the dataset by reducing dimensionalityand extracting significant features. Followed by, choosing the most consistent algo-rithm for anomaly detection from the selected algorithms by evaluating performanceparameters, especially accuracy and execution time after training the models usingML algorithms. Method: This thesis necessitates one research methodology which is experimen-tation, to answer our research questions. For RQ1, experimentation will help usgain better insights into the dataset to extract valuable and essential features as apart of feature selection using RF and dimensionality reduction using PCA. RQ2also uses experimentation because it provides better accuracy and reliability. Afterpre-processing, the data will be used to train the algorithms and will be evaluatedusing various methods. Results: In this study, we have analysed data using EDA, reduced dimensionalityand feature selection using PCA and RF algorithm respectively. We used five su-pervised machine learning methods namely, Support Vector Machine(SVM), NaiveBayes, Decision Tree, K-nearest Neighbor(KNN), and Random Forest(RF). Aftertesting and utilizing the "KDDCup 1999" pre-processed dataset from the Universityof California Irvine(UCI) ML repository, Decision Tree model has been concludedas the best-performing algorithm with an accuracy of 98.89% in 0.098 seconds, incomparison to other models. Conclusions: Five models have been trained using the five ML techniques foranomaly detection using an IDS. We concluded that the decision tree trained modelhas optimal performance with an accuracy of 98.89% and time of 0.098 seconds
|
188 |
Unsupervised Online Anomaly Detection in Multivariate Time-Series / Oövervakad online-avvikelsedetektering i flerdimensionella tidsserierSegerholm, Ludvig January 2023 (has links)
This research aims to identify a method for unsupervised online anomaly detection in multivariate time series in dynamic systems in general and on the case study of Devwards IoT-system in particular. A requirement of the solution is its explainability, online learning and low computational expense. A comprehensive literature review was conducted, leading to the experimentation and analysis of various anomaly detection approaches. Of the methods evaluated, a singular recurrent neural network autoencoder emerged as the most promising, emphasizing a simple model structure that encourages stable performance with consistent outputs, regardless of the average output. While other approaches such as Hierarchical Temporal Memory models and an ensemble strategy of adaptive model pooling yielded suboptimal results. A modified version of the Residual Explainer method for enhancing explainability in autoencoders for online scenarios showed promising outcomes. The use of Mahalanobis distance for anomaly detection was explored. Feature extraction and it's implications in the context of the proposed approach is explored. Conclusively, a single, streamlined recurrent neural network appears to be the superior approach for this application, though further investigation into online learning methods is warranted. The research contributes results into the field of unsupervised online anomaly detection in multivariate time series and contributes to the Residual Explainer method for online autoencoders. Additionally, it offers data on the ineffectiveness of the Mahalanobis distance in an online anomaly detection environment.
|
189 |
Construction of a machine learning training pipeline for merging AIS data with external datasources / Utveckling av en ML-pipeline för att kombinera AIS-data medexterna datakällor i träningsprocessenYahya, Sami Said January 2022 (has links)
Machine learning methods are increasingly being used in the maritime domain to predict traffic anomalies and to mitigate risk, for example avoiding collision and groundingaccidents. However, most machine learning systems used for detecting such issues hasbeen trained predominately on single data sources such as vessel positioning data. Hence,it is desirable to support the means to combine different sources of data - in the trainingphase - to allow more complex models to be built. In this thesis, we propose a multi-data pipeline for accumulating, decoding, preprocessing, and merging Automatic Identification System (AIS) data with weather datato train time series based deep learning models. The pipeline comprises several REST APIsto connect and listen to the data sources, and storing and merging them using StructuredQuery Language (SQL). Specifically, the training pipeline consists of an AIS NMEA message decoder, weather data receiver, and a Postgres database for merging and storing thedata sources. Moreover, the pipeline was assessed by training a TensorFlow vRNN model.The proposed pipeline approach allows flexibility in the inclusion of new data sources toeffectively build models for the maritime domain as well as other traffic domains that usespositioning data.
|
190 |
Anomaly Detection in Multi-Seasonal Time Series DataWilliams, Ashton Taylor 05 June 2023 (has links)
No description available.
|
Page generated in 0.1077 seconds