61 |
Comparing machine learning algorithms for detecting behavioural anomaliesJansson, Fredrik January 2023 (has links)
Background. Attempted intrusions at companies, either from an insider threat orotherwise, is increasing in frequency. Most commonly used is static analysis and filters to stop specific attacks. Utilising machine learning in order to detect behaviouralanomalies in the access flow of an isolated system can aid in detecting, and stopping, attacks faster than previous methods. Objectives. In this thesis, four algorithms were selected to be compared againsteach other using three different metrics. These metrics were chosen for their importance in an isolated domain. All algorithms will be trained on the same dataset, from which anomalies are created that are used to test each model. Methods. A dataset created for anomaly detection is preprocessed to fit the scenario that was explored. After which the dataset was split per user and only the user with the most samples was used for training the models. In order to test and evaluate the models, anomalies were forged from a profile created out of the metadata belonging to the chosen user. These anomalies, alongside a part of the benign samples were used to evaluate the F1 score of each model, which was compared. The better performing model according to the F1 score was then subjected to hyperparameter tuning to improve the performance further. Afterwards, the speed of which the model was loaded, and a single sample was predicted and the memory consumption of each action was measured. Results. The results showed that two algorithms were relatively close, all depending on the strictness of memory consumption. Local Outlier Factor, which used four times the memory (44 MB) of the other models, proved to be the better option when looking at F1 score, at 90.91% after having undergone hyperparameter tuning. However, Elliptic Envelope was a close second at 86.61% without undergoing hyperparameter tuning, while consuming less memory (11 MB) than the others. The speed of loading the models were 26.68 ms and 2.01 ms, with predicting one sample 1.87 ms and 0.38 ms respectively for the two models. The initial loading time is less important since it is only done once. Conclusions. Using this dataset, which albeit is not optimal, it showed that Local Outlier Factor was the best performing model, at a slightly higher memory con-sumption, while remaining accurate and relatively fast. However, it was also shown that depending on how strict the memory consumption is, Elliptic Envelope can be applicable as well considering its lower memory consumption. / Bakgrund. Försök till intrång i företag, antingen från insiderhot eller på annat håll ökar i frekvens. Vanligtvis används statisk analys, eller olika filter för att motverka dessa attacker. Genom att använda maskininlärning för att upptäcka beteendeavikelser i ett loggflöde inuti ett isolerat system kan hjälpa till att upptäcka, och stoppa, attacker snabbare än tidigare metoder. Syfte. I det här arbetet har fyra algoritmer valts att jämföras med varandra genom att titta på tre olika mätvärden. Dessa mätvärden har valts på grund av dess betydelse i system placerade i en isolerad domän. Alla algoritmer tränades på samma dataset, och testas på avvikelser som har skapats från att tillverka en profil utifrån datasetet. Metod. Ett dataset som skapades för att upptäcka avvikelser i en åtkomstlogg har behandlats så att den ska passa scenariot som ska utforskas. Sedan så delades datasetet upp per användare, och enbart den användare med flest loggar har använts för att träna modellerna.För att testa modellerna, så har en profil byggts upp ifrån metadatan för att sedan generera anomala tillfällen för den valda användaren. Dessa avvikelser, tillsammans med en del utav de normala fallen har använts för att beräkna modellernas F1 värde. Sedan har tiden som krävts för att ladda modellen till minne från disk, tiden det tog för en gissning utav modellen, samt det datorminne som krävs för detta sparats. Dessa tre mätvärden har sedan satts emot varandra i jämförelsen. Den modell som presterade bäst i F1 värde genomgick hyperparameterjustering för att förbättra detta värde. Resultat. Resultatet visade att två algoritmer är någorlunda nära i hur de presterade. Skillnaden är att ena algoritmen, Local Outlier Factor, har ett lite högre F1 värde på 90.91% efter hyperparameterjustering, men kräver fyra gånger så mycket minne (44 MB). Dess tid att ladda ifrån disk var 26.68 ms, medans en gissning utav den tog 1.87 ms. Till skillnad från Elliptic Envelope som enbart krävde 11 MB för att ladda till minne, med ett F1 värde på 86.61% utan hyperparameterjustering. Det tog även bara 2.01 ms och 0.38 ms för att ladda modellen, respektive att gissa en kategori. Slutsatser. Med detta dataset, som inte är det mest optimala, så visade det sig att Local Outlier Factor var den modell som presterade bäst, relativt snabb med dess gissningar och bra träffsäkerhet med ett högt F1 värde. Däremot, så visade det sig att beroende på hur strikt kravet på låg minnesanvändning är, så kan även Elliptic Envelope vara lämplig. Då den kräver fyra gånger så lite minne som Local Outlier Factor.
|
62 |
Design and Implementation of Parallel Anomaly DetectionShanbhag, Shashank 01 January 2007 (has links) (PDF)
The main objective of the thesis is to show that multiple anomaly detection algorithms can be implemented in parallel to effectively characterize the type of traffic causing the abnormal behavior. The logs are obtained by running six anomaly detection algorithms in parallel on the Network Processor. Further, a hierarchical tree representation is defined which illustrates the state of traffic in real-time. The nodes represent a particular subset of traffic and each of the nodes calculate the aggregate for the traffic represented by the node, given the output from all the algorithms. The greater the aggregate, the darker the node indicating an anomaly. The visual representation makes it easy for an operator to distinguish between anomalous and non-anomalous nodes.
|
63 |
Application of anomaly detection techniques to astrophysical transientsRamonyai, Malema Hendrick January 2021 (has links)
>Magister Scientiae - MSc / We are fast moving into an era where data will be the primary driving factor for discovering new
unknown astronomical objects and also improving our understanding of the current rare astronomical
objects. Wide field survey telescopes such as the Square Kilometer Array (SKA) and Vera C. Rubin
observatory will be producing enormous amounts of data over short timescales. The Rubin observatory
is expected to record ∼ 15 terabytes of data every night during its ten-year Legacy Survey of Space and
Time (LSST), while the SKA will collect ∼100 petabytes of data per day. Fast, automated, and datadriven
techniques, such as machine learning, are required to search for anomalies in these enormous
datasets, as traditional techniques such as manual inspection will take months to fully exploit such
datasets.
|
64 |
Detection of Similarly-structured Anomalous sets of nodes in GraphsSharma, Nikita 04 October 2021 (has links)
No description available.
|
65 |
Unsupervised Anomaly Detection in Receipt Data / Oövervakad anomalidetektion i kvittodataForstén, Andreas January 2017 (has links)
With the progress of data handling methods and computing power comes the possibility of automating tasks that are not necessarily handled by humans. This study was done in cooperation with a company that digitalizes receipts for companies. We investigate the possibility of automating the task of finding anomalous receipt data, which could automate the work of receipt auditors. We study both anomalous user behaviour and individual receipts. The results indicate that automation is possible, which may reduce the necessity of human inspection of receipts. / Med de framsteg inom datahantering och datorkraft som gjorts så kommer också möjligheten att automatisera uppgifter som ej nödvändigtvis utförs av människor. Denna studie gjordes i samarbete med ett företag som digitaliserar företags kvitton. Vi undersöker möjligheten att automatisera sökandet av avvikande kvittodata, vilket kan avlasta revisorer. Vti studerar både avvikande användarbeteenden och individuella kvitton. Resultaten indikerar att automatisering är möjligt, vilket kan reducera behovet av mänsklig inspektion av kvitton
|
66 |
A Framework for Automated Discovery and Analysis of Suspicious Trade RecordsDatta, Debanjan 27 May 2022 (has links)
Illegal logging and timber trade presents a persistent threat to global biodiversity and national security due to its ties with illicit financial flows, and causes revenue loss. The scale of global commerce in timber and associated products, combined with the complexity and geographical spread of the supply chain entities present a non-trivial challenge in detecting such transactions. International shipment records, specifically those containing bill of lading is a key source of data which can be used to detect, investigate and act upon such transactions. The comprehensive problem can be described as building a framework that can perform automated discovery and facilitate actionability on detected transactions. A data driven machine learning based approach is necessitated due to the volume, velocity and complexity of international shipping data. Such an automated framework can immensely benefit our targeted end-users---specifically the enforcement agencies.
This overall problem comprises of multiple connected sub-problems with associated research questions. We incorporate crucial domain knowledge---in terms of data as well as modeling---through employing expertise of collaborating domain specialists from ecological conservationist agencies. The collaborators provide formal and informal inputs spanning across the stages---from requirement specification to the design. Following the paradigm of similar problems such as fraud detection explored in prior literature, we formulate the core problem of discovering suspicious transactions as an anomaly detection task. The first sub-problem is to build a system that can be used find suspicious transactions in shipment data pertaining to imports and exports of multiple countries with different country specific schema. We present a novel anomaly detection approach---for multivariate categorical data, following constraints of data characteristics, combined with a data pipeline that incorporates domain knowledge. The focus of the second problem is U.S. specific imports, where data characteristics differ from the prior sub-problem---with heterogeneous attributes present. This problem is important since U.S. is a top consumer and there is scope of actionable enforcement. For this we present a contrastive learning based anomaly detection model for heterogeneous tabular data, with performance and scalability characteristics applicable to real world trade data. While the first two problems address the task of detecting suspicious trades through anomaly detection, a practical challenge with anomaly detection based systems is that of relevancy or scenario specific precision. The third sub-problem addresses this through a human-in-the-loop approach augmented by visual analytics, to re-rank anomalies in terms of relevance---providing explanations for cause of anomalies and soliciting feedback. The last sub-problem pertains to explainability and actionability towards suspicious records, through algorithmic recourse. Algorithmic recourse aims to provides meaningful alternatives towards flagged anomalous records, such that those counterfactual examples are not judged anomalous by the underlying anomaly detection system. This can help enforcement agencies advise verified trading entities in modifying their trading patterns to avoid false detection, thus streamlining the process. We present a novel formulation and metrics for this unexplored problem of algorithmic recourse in anomaly detection. and a deep learning based approach towards explaining anomalies and generating counterfactuals.
Thus the overall research contributions presented in this dissertation addresses the requirements of the framework, and has general applicability in similar scenarios beyond the scope of this framework. / Doctor of Philosophy / Illegal timber trade presents multiple global challenges to ecological biodiversity, vulnerable ecosystems, national security and revenue collection. Enforcement agencies---the target end-users of this framework---face a myriad of challenges in discovering and acting upon shipments with illegal timber that violate national and transnational laws due to volume and complexity of shipment data, coupled with logistical hurdles. This necessitates an automated framework based upon shipment data that can address this task---through solving problems of discovery, analysis and actionability.
The overall problem is decomposed into self contained sub-problems that address the associated specific research questions. These comprise of anomaly detection in multiple types of high dimensional tabular data, improving precision of anomaly detection through expert feedback and algorithmic recourse for anomaly detection. We present data mining and machine learning solutions to each of the sub-problems that overcome limitations and inapplicability of prior approaches. Further, we address two broader research questions. First is incorporation domain knowledge into the framework, which we accomplish through collaboration with domain experts from environmental conservation organizations. Secondly, we address the issue of explainability in anomaly detection for tabular data in multiple contexts. Such real world data presents with challenges of complexity and scalability, especially given the tabular format of the data that presents it's own set of challenges in terms of machine learning. The solutions presented to these machine learning problems associated with each of components of the framework provide an end-to-end solution to it's requirements. More importantly, the models and approaches presented in this dissertation have applicability beyond the application scenario with similar data and application specific challenges.
|
67 |
Detecting Irregular Network Activity with Adversarial Learning and Expert FeedbackRathinavel, Gopikrishna 15 June 2022 (has links)
Anomaly detection is a ubiquitous and challenging task relevant across many disciplines. With the vital role communication networks play in our daily lives, the security of these networks is imperative for smooth functioning of society. This thesis proposes a novel self-supervised deep learning framework CAAD for anomaly detection in wireless communication systems. Specifically, CAAD employs powerful adversarial learning and contrastive learning techniques to learn effective representations of normal and anomalous behavior in wireless networks. Rigorous performance comparisons of CAAD with several state-of-the-art anomaly detection techniques has been conducted and verified that CAAD yields a mean performance improvement of 92.84%. Additionally, CAAD is augmented with the ability to systematically incorporate expert feedback through a novel contrastive learning feedback loop to improve the learned representations and thereby reduce prediction uncertainty (CAAD-EF). CAAD-EF is a novel, holistic and widely applicable solution to anomaly detection. / Master of Science / Anomaly detection is a technique that can be used to detect if there is any abnormal behavior in data. It is a ubiquitous and a challenging task relevant across many disciplines. With the vital role communication networks play in our daily lives, the security of these networks is imperative for smooth functioning of society. Anomaly detection in such communication networks is essential in ensuring security. This thesis proposes a novel framework CAAD for anomaly detection in wireless communication systems. Rigorous performance comparisons of CAAD with several state-of-the-art anomaly detection techniques has been conducted and verified that CAAD yields a mean performance improvement of 92.84% over state-of-the-art anomaly detection models. Additionally, CAAD is augmented with the ability to incorporate feedback from experts about whether a sample is normal or anomalous through a novel feedback loop (CAAD-EF). CAAD-EF is a novel, holistic and a widely applicable solution to anomaly detection.
|
68 |
Adversarial Learning based framework for Anomaly Detection in the context of Unmanned Aerial SystemsBhaskar, Sandhya 18 June 2020 (has links)
Anomaly detection aims to identify the data samples that do not conform to a known normal (regular) behavior. As the definition of an anomaly is often ambiguous, unsupervised and semi-supervised deep learning (DL) algorithms that primarily use unlabeled datasets to model normal (regular) behaviors, are popularly studied in this context. The unmanned aerial system (UAS) can use contextual anomaly detection algorithms to identify interesting objects of concern in applications like search and rescue, disaster management, public security etc. This thesis presents a novel multi-stage framework that supports detection of frames with unknown anomalies, localization of anomalies in the detected frames, and validation of detected frames for incremental semi-supervised learning, with the help of a human operator. The proposed architecture is tested on two new datasets collected for a UAV-based system. In order to detect and localize anomalies, it is important to both model the normal data distribution accurately as well as formulate powerful discriminant (anomaly scoring) techniques. We implement a generative adversarial network (GAN)-based anomaly detection architecture to study the effect of loss terms and regularization on the modeling of normal (regular) data and arrive at the most effective anomaly scoring method for the given application. Following this, we use incremental semi-supervised learning techniques that utilize a small set of labeled data (obtained through validation from a human operator), with large unlabeled datasets to improve the knowledge-base of the anomaly detection system. / Master of Science / Anomaly detection aims to identify the data samples that do not conform to a known normal (regular) behavior. As the definition of an anomaly is often ambiguous, most techniques use unlabeled datasets, to model normal (regular) behaviors. The availability of large unlabeled datasets combined with novel applications in various domains, has led to an increasing interest in the study of anomaly detection. In particular, the unmanned aerial system (UAS) can use contextual anomaly detection algorithms to identify interesting objects of concern in applications like search and rescue (SAR), disaster management, public security etc. This thesis presents a novel multi-stage framework that supports detection and localization of unknown anomalies, as well as the validation of detected anomalies, for incremental learning, with the help of a human operator. The proposed architecture is tested on two new datasets collected for a UAV-based system. In order to detect and localize anomalies, it is important to both model the normal data distribution accurately and formulate powerful discriminant (anomaly scoring) techniques. To this end, we study the state-of-the-art generative adversarial networks (GAN)-based anomaly detection algorithms for modeling of normal (regular) behavior and formulate effective anomaly detection scores. We also propose techniques to incrementally learn the new normal data as well as anomalies, using the validation provided by a human operator. This framework is introduced with the aim to support temporally critical applications that involve human search and rescue, particularly in disaster management.
|
69 |
Anomaly detection in competitive multiplayer gamesGreige, Laura 05 November 2022 (has links)
As online video games rise in popularity, there has been a significant increase in fraudulent behavior and malicious activity. Numerous methods have been proposed to automate the identification and detection of such behaviors but most studies focused on situations with perfect prior knowledge of the gaming environment, particularly, in regards to the malicious behaviour being identified. This assumption is often too strong and generally false when it comes to real-world scenarios. For these reasons, it is useful to consider the case of incomplete information and combine techniques from machine learning and solution concepts from game theory that are better suited to tackle such settings, and automate the detection of anomalous behaviors. In this thesis, we focus on two major threats in competitive multiplayer games: intrusion and device compromises, and cheating and exploitation.
The former is a knowledge-based anomaly detection, focused on understanding the technology and strategy being used by the attacker in order to prevent it from occurring. One of the major security concerns in cyber-security are Advanced Persistent Threats (APT). APTs are stealthy and constant computer hacking processes which can compromise systems bypassing traditional security measures in order to gain access to confidential information held in those systems. In online video games, most APT attacks leverage phishing and target individuals with fake game updates or email scams to gain initial access and steal user data, including but not limited to account credentials and credit card numbers. In our work, we examine the two player game called FlipIt to model covert compromises and stealthy hacking processes in partial observable settings, and show the efficiency of game theory concept solutions and deep reinforcement learning techniques to improve learning and detection in the context of fraud prevention.
The latter defines a behavioral-based anomaly detection. Cheating in online games comes with many consequences for both players and companies; hence, cheating detection and prevention is an important part of developing a commercial online game. However, the task of manually identifying cheaters from the player population is unfeasible to game designers due to the sheer size of the player population and lack of test datasets. In our work, we present a novel approach to detecting cheating in competitive multiplayer games using tools from hybrid intelligence and unsupervised learning, and give proof-of-concept experimental results on real-world datasets.
|
70 |
Observability of the Scattering Cross-section for Strong and Weak ScatteringFayard, Patrick 09 1900 (has links)
<p> Jakeman's random walk model with step number fluctuations describes the amplitude
scattered from a rough medium in terms as the coherent summation of (independent)
individual scatterers' contributions. For a population following a birthdeath-
immigration (BDI) model, the resulting statistics are k-distributed and the
multiplicative representation of the amplitude as a Gaussian speckle modulated by
a Gamma radar cross-section (RCS) is recovered. The main objective of the present
thesis is to discuss techniques for the inference of the RCS in local time in order to
facilitate anomaly detection. We first show how the Pearson class of diffusions, which
we derive on the basis of a discrete population model analogous to the BDI, encompasses
this Gamma texture as well as other texture models studied in the literature.
Next we recall how Field & Tough derived, in an Ito calculus framework, the dynamics
and the auto-correlation function of the scattered amplitude from the random
walk model. In particular, they showed how the RCS was observable through the
intensity-weighted squared fluctuations of the phase. Thanks to a discussion of the
sources of discrepancy arising during this process, we derive an analytical expression
for the inference error based on its asymptotic behaviours, together with a condition
to minimize it. Our results are then extended to the Pearson class of diffusions
whose importance for radar clutters is described. Next, we consider an experimental
caveat, namely the presence of an additional white noise. The finite impulse response
Wiener filter enables the design of the optimal filter to retrieve the scattered amplitude
when it lies in superposition with thermal noise, thus enabling the usage of our
inference technique. Finally, we consider weak scattering when a coherent signal lies
in superposition with the aforementioned (strongly) scattered amplitude. Strong and
weak scattering patterns differ regarding the correlation structure of their radial and
angular fluctuations. Investigating these geometric characteristics yields two distinct
procedures to infer the scattering cross-section from the phase and intensity fluctuations
of the weakly scattered amplitude, thus generalizing the results obtained in the
strong scattering case. </p> / Thesis / Doctor of Philosophy (PhD)
|
Page generated in 0.1122 seconds