Spelling suggestions: "subject:"LSTM autoencoder"" "subject:"LSTM autoencoders""
1 |
Anomaly Detection for Insider Threats : Comparative Evaluation of LSTM Autoencoders, Isolation Forest, and Elasticsearch on Two Datasets. / Anomalidetektion för interna hot : Utvärdering av LSTM-autoencoders, Isolation Forest och Elasticsearch på två datasetFagerlund, Martin January 2024 (has links)
Insider threat detection is one of cybersecurity’s most challenging and costly problems. Anomalous behaviour can take multiple shapes, which puts a great demand on the anomaly detection system. Significant research has been conducted in the area, but the existing experimental datasets’ absence of real data leaves uncertainty about the proposed systems’ realistic performance. This thesis introduces a new insider threat dataset consisting exclusively of events from real users. The dataset is used to evaluate the performance of various anomaly detection system techniques comparatively. Three anomaly detection techniques were evaluated: LSTM autoencoder, isolation forest, and Elasticsearch’s anomaly detection. The dataset’s properties inhibited any hyperparameter tuning of the LSTM autoencoders since the data lacks sufficient positive instances. Therefore, the architecture and hyperparameter settings are taken from the previously proposed research. The implemented anomaly detection models were also evaluated on the commonly used CERT v4.2 insider threat test dataset. The results show that the LSTM autoencoder provides better anomaly detection on the CERT v4.2 dataset regarding the accuracy, precision, recall, F1 score, and false positive rate compared to the other tested models. However, the investigated systems performed more similarly on the introduced dataset with real data. The LSTM autoencoder achieved the best recall, precision, and F1 score, the isolation forest showed almost as good F1 score with a lower false positive rate, and Elasticsearch’s anomaly detection reported the best accuracy and false positive rate. Additionally, the LSTM autoencoder generated the best ROC curve and precision-recall curve. While Elasticsearch’s anomaly detection showed promising results concerning the accuracy, it performed with low precision and was explicitly implemented to detect certain anomalies, which reduced its generalisability. In conclusion, the results show that the LSTM autoencoder is a feasible anomaly detection model for detecting abnormal behaviour in real user-behaviour logs. Secondly, Elasticsearch’s anomaly detection can be used but is better suited for less complex data analysis tasks. Further, the thesis analyzes the introduced dataset and problematizes its application. In the closing chapter, the study provides domains where further research should be conducted. / Interna hot är ett av de svåraste och mest kostsamma problemen inom cybersäkerhet. Avvikande beteende kan anta många olika former vilket innebär stora krav på de system som ska upptäcka dem. Mycket forskning har genomförts i detta område för att tillhandahålla kraftfulla system. Dessvärre saknar de existerande dataseten som används inom forskningen verklig data vilket gör evalueringen av systemens verkliga förmåga osäker. Denna rapport introducerar ett nytt dataset med data enbart från riktiga användare. Datasetet används för att analysera prestandan av tre olika anomalidetektionssystem: LSTM autoencoder, isolation forest och Elasticsearchs inbyggda anomalidetektering. Datasetets egenskaper förhindrade hyperparameterjustering av LSTM autoencoderna då datasetet innehåller för få positiva data punkter. Därav var arkitekturen och hyperparameterinställningar tagna från tidigare forskning. De implementerade modellerna var också jämförda på det välanvända CERT v4.2 datasetet. Resultaten från CERT v4.2 datasetet visade att LSTM autoencodern ger en bättre anomalidetektion än de andra modellerna när måtten noggrannhet, precision, recall, F1 poäng och andel falska positiva användes. När modellerna testades på det introducerade datasetet presterade de mer jämlikt. LSTM autoencodern presterar med bäst recall, precision och F1 poäng medan isolation forest nästan nådde lika hög F1 poäng men med lägre andel falska positiva predikteringar. Elasticsearchs anomalidetektering lyckades nå högst noggrannhet med lägst andel falsk positiva. Dessvärre med låg precision jämfört med de två andra modellerna. Elasticsearchs anomalidetektering var även tvungen att implementeras mer specifikt riktat mot anomalierna den skulle upptäcka vilket gör användningsområdet för den mindre generellt. Sammanfattningsvis visar resultaten att LSTM autoencoders är ett adekvat alternativ för att detektera abnormaliteter i loggar med händelser från riktiga användare. Dessutom är det möjligt till en viss gräns att använda Elasticsearchs anomalidetektering för dessa ändamål men den passar bättre för uppgifter med mindre komplexitet. Utöver modellernas resultat så analyseras det framtagna datasetet och några egenskaper specificeras som försvårar dess användning och trovärdighet. Avslutningsvis så preciseras intressanta relaterade områden där vidare forskning bör ske.
|
2 |
A contemporary machine learning approach to detect transportation mode - A case study of Borlänge, SwedenGolshan, Arman January 2020 (has links)
Understanding travel behavior and identifying the mode of transportation are essential for adequate urban devising and transportation planning. Global positioning systems (GPS) tracking data is mainly used to find human mobility patterns in cities. Some travel information, such as most visited location, temporal changes, and the trip speed, can be easily extracted from GPS raw tracking data. GPS trajectories can be used as a method to indicate the mobility modes of commuters. Most previous studies have applied traditional machine learning algorithms and manually computed data features, making the model error-prone. Thus, there is a demand for developing a new model to resolve these methods' weaknesses. The primary purpose of this study is to propose a semi-supervised model to identify transportation mode by using a contemporary machine learning algorithm and GPS tracking data. The model can accept GPS trajectory with adjustable length and extracts their latent information with LSTM Autoencoder. This study adopts a deep neural network architecture with three hidden layers to map the latent information to detect transportation mode. Moreover, different case studies are performed to evaluate the proposed model's efficiency. The model results in an accuracy of 93.6%, which significantly outperforms similar studies.
|
3 |
Real-time Outlier Detection using Unbounded Data Streaming and Machine LearningÅkerström, Emelie January 2020 (has links)
Accelerated advancements in technology, the Internet of Things, and cloud computing have spurred an emergence of unstructured data that is contributing to rapid growth in data volumes. No human can manage to keep up with monitoring and analyzing these unbounded data streams and thus predictive and analytic tools are needed. By leveraging machine learning this data can be converted into insights which are enabling datadriven decisions that can drastically accelerate innovation, improve user experience, and drive operational efficiency. The purpose of this thesis is to design and implement a system for real-time outlier detection using unbounded data streams and machine learning. Traditionally, this is accomplished by using alarm-thresholds on important system metrics. Yet, a static threshold cannot account for changes in trends and seasonality, changes in the system, or an increased system load. Thus, the intention is to leverage machine learning to instead look for deviations in the behavior of the data not caused by natural changes but by malfunctions. The use-case driving the thesis forward is real-time outlier detection in a Content Delivery Network (CDN). The input data includes Http-error messages received by clients, and contextual information like region, cache domains, and error codes, to provide tailormade predictions accounting for the trends in the data. The outlier detection system consists of a data collection pipeline leveraging the technique of stream processing, a MiniBatchKMeans clustering model that provides online clustering of incoming data according to their similar characteristics, and an LSTM AutoEncoder that accounts for temporal nature of the data and detects outlier data points in the clusters. An important finding is that an outlier is defined as an abnormal amount of outlier data points all originating from the same cluster, not a single outlier data point. Thus, the alerting system will be implementing an outlier percentage threshold. The experimental results show that an outlier is detected within one minute from a cache break-down. This triggers an alert to the system owners, containing graphs of the clustered data to narrow down the search area of the cause to enable preventive action towards the prominent incident. Further results show that within 2 minutes from fixing the cause the system will provide feedback that the actions taken were successful. Considering the real-time requirements of the CDN environment, it is concluded that the short delay for detection is indeed real-time. Proving that machine learning is indeed able to detect outliers in unbounded data streams in a real-time manner. Further analysis shows that the system is more accurate during peakhours when more data is in circulation than during none peak-hours, despite the temporal LSTM layers. Presumably, an effect from the model needing to train on more data to better account for seasonality and trends. Future work necessary to put the outlier detection system in production thus includes more training to improve accuracy and correctness. Furthermore, one could consider implementing necessary functionality for a production environment and possibly adding enhancing features that can automatically avert incidents detected and handle the causes of them.
|
4 |
Knowledge Transfer Applied on an Anomaly Detection Problem Using Financial DataNatvig, Filip January 2021 (has links)
Anomaly detection in high-dimensional financial transaction data is challenging and resource-intensive, particularly when the dataset is unlabeled. Sometimes, one can alleviate the computational cost and improve the results by utilizing a pre-trained model, provided that the features learned from the pre-training are useful for learning the second task. Investigating this issue was the main purpose of this thesis. More specifically, it was to explore the potential gain of pre-training a detection model on one trader's transaction history and then retraining the model to detect anomalous trades in another trader's transaction history. In the context of transfer learning, the pre-trained and the retrained model are usually referred to as the source model and target model, respectively. A deep LSTM autoencoder was proposed as the source model due to its advantages when dealing with sequential data, such as financial transaction data. Moreover, to test its anomaly detection ability despite the lack of labeled true anomalies, synthetic anomalies were generated and included in the test set. Various experiments confirmed that the source model learned to detect synthetic anomalies with highly distinctive features. Nevertheless, it is hard to draw any conclusions regarding its anomaly detection performance due to the lack of labeled true anomalies. While the same is true for the target model, it is still possible to achieve the thesis's primary goal by comparing a pre-trained model with an identical untrained model. All in all, the results suggest that transfer learning offers a significant advantage over traditional machine learning in this context.
|
Page generated in 0.0557 seconds