151 |
IDENTIFYING UNUSUAL ENERGY CONSUMPTIONS OF HOUSEHOLDS : Using Inductive Conformal Anomaly Detection approachHavugimana, Léonce January 2020 (has links)
No description available.
|
152 |
Anomaly Detection in a SQL database: A Retrospective InvestigationNaserinia, Vahid, Beremark, Mikael January 2022 (has links)
Insider attacks aiming at stealing data are highly common, according to recent studies, and they are carried out in precise patterns. In order to protect against these threats, additional security measures, such as access control and encryption, must be used in conjunction with tools and methods that can detect anomalies in data access. By analyzing the input query syntax and the amount of data returned in the responses, we can deduce individuals' access patterns. Our method is based on SQL queries in database log files, which allow us to build profiles of ordinary users' access behavior by their doctors. Anomalies that deviate from these characteristics are deemed anomalous and thus indicative of possible data exfiltration or misuse. This paper uses machine learning techniques in existing algorithms to detect outliers and aggregate related data into clusters. Due to the sensitivity of the real-world data and restricting access to such datasets, we have developed our logfiles that groups log lines sequentially based on time and access intervals. Generated log files containing known abnormalities are used to demonstrate the use of real datasets. Our findings demonstrate that our method can effectively detect these anomalies, albeit more research by specialists is required to ensure whether the abnormalities detected were appropriately recognized.
|
153 |
Anomaly Detection in Riding Behaviours : Using Unsupervised Machine Learning Methods on Time Series Data from Micromobility ServicesHansson, Indra, Congreve Lifh, Julia January 2022 (has links)
The global micromobility market is a fast growing market valued at USD 40.19 Billion in 2020. As the market grows, it is of great importance for companies to gain market shares in order to stay competitive and be the first choice within micromobility services. This can be achieved by, e.g., offering a safe micromobility service, for both riders and other road users. With state-of-the-art technology, accident prevention and preventing misuse of scooters and cities’ infrastructure is achievable. This study is conducted in collaboration with Voi Technology, a Swedish micromobility company that is committed to eliminate all serious injuries and fatalities in their value chain by 2030. Given such an ambition, the aim of the thesis is to evaluate the possibility of using unsupervised machine learning for anomaly detection with sensor data, to distinguish abnormal and normal riding behaviours. The study evaluates two machine learning algorithms; isolation forest and artificial neural networks, namely autoencoders. Beyond assessing the models ability to detect abnormal riding behaviours in general, they are evaluated based on their ability to find certain behaviours. By simulating different abnormal riding behaviours, model evaluation can be performed. The data preparation performed for the models include transforming the time series data into non-overlapping windows of a specific size containing descriptive statistics. The result obtained shows that finding a one-size-fits all type of anomaly detection model did not work as desired for either the isolation forest or the autoencoder. Further, the result indicate that one of the abnormal riding behaviours appears to be easier to distinguish, which motivates evaluating models created with the aim of distinguishing that specific behaviour. Hence, a simple moving average is also implemented to explore the performance of a very basic forecasting method. For this method, a similar data transformation as previously described is not performed as it utilises a sliding window of specific size, which is run on a single feature corresponding to an entire scooter ride. The result show that it is possible to isolate one type of abnormal riding behaviour using the autoencoder model. Additionally, the simple moving average model can also be utilised to detect the behaviour in question. Out of the two models, it is recommended to deploy a simple moving average due to its simplicity. / Den globala mikromobilitetsmarknaden är en snabbt växande marknad som år 2020 värderades till 40,19 miljarder USD. I takt med att marknaden växer så ökar också kraven bland företag att erbjuda produkter och tjänster av hög kvalitet, för att erhålla en stark position på marknaden, vara konkurrenskraftiga och förbli ett förstahandsval hos sina kunder. Detta kan uppnås genom att bland annat erbjuda mikromobilitetstjänster som är säkra, för både föraren och andra trafikanter. Med hjälp av den senaste tekniken kan olyckor förebyggas och skadligt bruk av skotrar och städers infrastruktur förhindras. Följande studie utförs i samarbete med Voi Technology, ett svenskt mikromobilitetsföretag som har åtagit sig ansvaret att eliminera samtliga allvarliga skador och dödsfall i deras värdekedja till och med år 2030. I linje med en sådan ambition, är syftet med avhandlingen att utvärdera möjligheten att använda oövervakad maskininlärning för anomalidetektering bland sensordata, för att särskilja onormala och normala körbeteenden. Studien utvärderar två maskininlärningsalgoritmer; isolation forest och artificiella neurala nätverk, mer specifikt autoencoders. Utöver att bedöma modellernas förmåga att upptäcka onormala körbeteenden i allmänhet, utvärderas modellerna utifrån deras förmåga att hitta särskilda körbeteenden. Genom att simulera olika onormala körbeteenden kan modellerna evalueras. Dataförberedelsen som utförs för modellerna inkluderar omvandling av den råa tidsseriedatan till icke överlappande fönster av specifik storlek, bestående av beskrivande statistik. Det erhållna resultatet visar att varken isolation forest eller autoencodern presterar som förväntat samt att önskan om att hitta en generell modell som klarar av att detektera anomalier av olika karaktär inte verkar uppfyllas. Vidare indikerar resultatet på att ett visst onormalt körbeteende verkar enklare att särskilja än resterande, vilket motiverar att utvärdera modeller skapade i syfte att detektera det specifika beteendet. Följaktligen implementeras därför ett glidande medelvärde för att utforska prestandan hos en mycket grundläggande prediktionsmetod. För denna metod utförs inte den tidigare nämnda datatransformationen eftersom metoden använder ett glidande medelvärde som appliceras på en variabel tillhörande en fullständig åktur. Följande analys visar att autoencoder modellen klarar av att urskilja denna typ av onormalt körbeteende. Resultatet visar även att ett glidande medelvärde klarar av att detektera körbeteendet i fråga. Av de två modellerna rekommenderas en implementering av ett glidande medelvärdet på grund av dess enkelhet.
|
154 |
Log message anomaly detection using machine learningFarzad, Amir 05 July 2021 (has links)
Log messages are one of the most valuable sources of information in the cloud and other software systems. These logs can be used for audits and ensuring system security. Many millions of log messages are produced each day which makes anomaly detection challenging. Automating the detection of anomalies can save time and money as well as improve detection performance. In this dissertation, Deep Learning (DL) methods called Auto-LSTM, Auto-BLSTM and Auto-GRU are developed for log message anomaly detection. They are evaluated using four data sets, namely BGL, Openstack, Thunderbird and IMDB. The first three are popular log data sets while the fourth is a movie review data set which is used for sentiment classification. The results obtained show that Auto-LSTM, Auto-BLSTM and Auto-GRU perform better than other well-known algorithms.
Dealing with imbalanced data is one of the main challenges in Machine Learning (ML)/DL algorithms for classification. This issue is more important with log message data as it is typically very imbalanced and negative logs are rare. Hence, a model is proposed to generate text log messages using a Sequence Generative Adversarial Network (SeqGAN) network. Then features are extracted using an Autoencoder and anomaly detection is done using a GRU network. The proposed model is evaluated with two imbalanced log data sets, namely BGL and Openstack. Results are presented which show that oversampling and balancing data increases the accuracy of anomaly detection and classification.
Another challenge in anomaly detection is dealing with unlabeled data. Labeling even a small portion of logs for model training may not be possible due to the high volume of generated logs. To deal with this unlabeled data, an unsupervised model for log message anomaly detection is proposed which employs Isolation Forest and two deep Autoencoder networks. The Autoencoder networks are used for training and feature extraction, and then for anomaly detection, while Isolation Forest is used for positive sample prediction. The proposed model is evaluated using the BGL, Openstack and Thunderbird log message data sets. The results obtained show that the number of negative samples predicted to be positive is low, especially with Isolation Forest and one Autoencoder. Further, the results are better than with other well-known models.
A hybrid log message anomaly detection technique is proposed which uses pruning of positive and negative logs. Reliable positive log messages are first identified using a Gaussian Mixture Model (GMM) algorithm. Then reliable negative logs are selected using the K-means, GMM and Dirichlet Process Gaussian Mixture Model (BGM) methods iteratively. It is shown that the precision for positive and negative logs with pruning is high. Anomaly detection is done using a Long Short-Term Memory (LSTM) network. The proposed model is evaluated using the BGL, Openstack, and Thunderbird data sets. The results obtained indicate that the proposed model performs better than several well-known algorithms.
Last, an anomaly detection method is proposed using radius-based Fuzzy C-means (FCM) with more clusters than the number of data classes and a Multilayer Perceptron (MLP) network. The cluster centers and a radius are used to select reliable positive and negative log messages. Moreover, class probabilities are used with an expert to correct the network output for suspect logs. The proposed model is evaluated with three well-known data sets, namely BGL, Openstack and Thunderbird. The results obtained show that this model provides better results than existing methods. / Graduate
|
155 |
Anomaly detection for automated security log analysis : Comparison of existing techniques and tools / Detektion av anomalier för automatisk analys av säkerhetsloggarFredriksson Franzén, Måns, Tyrén, Nils January 2021 (has links)
Logging security-related events is becoming increasingly important for companies. Log messages can be used for surveillance of a system or to make an assessment of the dam- age caused in the event of, for example, an infringement. Typically, large quantities of log messages are produced making manual inspection for finding traces of unwanted activity quite difficult. It is therefore desirable to be able to automate the process of analysing log messages. One way of finding suspicious behavior within log files is to set up rules that trigger alerts when certain log messages fit the criteria. However, this requires prior knowl- edge about the system and what kind of security issues that can be expected. Meaning that any novel attacks will not be detected with this approach. It can also be very difficult to determine what normal behavior and abnormal behavior is. A potential solution to this problem is machine learning and anomaly-based detection. Anomaly detection is the pro- cess of finding patterns which do not behave like defined notion of normal behavior. This thesis examines the process of going from raw log data to finding anomalies. Both existing log analysis tools and the creation of our own proof-of-concept implementation are used for the analysis. With the use of labeled log data, our implementation was able to reach a precision of 73.7% and a recall of 100%. The advantages and disadvantages of creating our own implementation as opposed to using an existing tool is presented and discussed along with several insights from the field of anomaly detection for log analysis.
|
156 |
Anomaly Detection in Log Files Using Machine LearningBjörnerud, Philip January 2021 (has links)
Logs generated by the applications, devices, and servers contain information that can be used to determine the health of the system. Manual inspection of logs is important, for example during upgrades, to determine whether the upgrade and data migration were successful. However, manual testing is not reliable enough, and manual inspection of logs is tedious and time-consuming. In this thesis, we propose to use the machine learning techniques Kmeans and DBSCAN to find anomaly sequences in log files. This research also investigated two different kinds of data representation techniques, feature vector representation, and IDF representation. Evaluation metrics such as F1 score, recall, and precision were used to analyze the performance of the applied machine learning algorithms. The study found that the algorithms have large differences regarding detection of anomalies, in which the algorithms performed better in finding the different kinds of anomalous sequences, rather than finding the total amount of them. The result of the study could help the user to find anomalous sequences, without manually inspecting the log file.
|
157 |
Detecting DoS Attack in Smart Home IoT Devices Using a Graph-Based ApproachPaudel, Ramesh, Muncy, Timothy, Eberle, William 01 December 2019 (has links)
The use of the Internet of Things (IoT) devices has surged in recent years. However, due to the lack of substantial security, IoT devices are vulnerable to cyber-attacks like Denial-of-Service (DoS) attacks. Most of the current security solutions are either computationally expensive or unscalable as they require known attack signatures or full packet inspection. In this paper, we introduce a novel Graph-based Outlier Detection in Internet of Things (GODIT) approach that (i) represents smart home IoT traffic as a real-time graph stream, (ii) efficiently processes graph data, and (iii) detects DoS attack in real-time. The experimental results on real-world data collected from IoT-equipped smart home show that GODIT is more effective than the traditional machine learning approaches, and is able to outperform current graph-stream anomaly detection approaches.
|
158 |
Security related self-protected networks: Autonomous threat detection and response (ATDR)Havenga, Wessel Johannes Jacobus January 2021 (has links)
>Magister Scientiae - MSc / Cybersecurity defense tools, techniques and methodologies are constantly faced with increasing
challenges including the evolution of highly intelligent and powerful new-generation threats. The
main challenges posed by these modern digital multi-vector attacks is their ability to adapt with
machine learning. Research shows that many existing defense systems fail to provide adequate
protection against these latest threats. Hence, there is an ever-growing need for self-learning technologies
that can autonomously adjust according to the behaviour and patterns of the offensive
actors and systems. The accuracy and effectiveness of existing methods are dependent on decision
making and manual input by human experts. This dependence causes 1) administration
overhead, 2) variable and potentially limited accuracy and 3) delayed response time.
|
159 |
Environmental Sensor Anomaly Detection Using Learning MachinesConde, Erick F. 01 December 2011 (has links)
The problem of quality assurance/quality control (QA/QC) for real-time measurements of environmental and water quality variables has been a field explored by many in recent years. The use of in situ sensors has become a common practice for acquiring real-time measurements that provide the basis for important natural resources management decisions. However, these sensors are susceptible to failure due to such things as human factors, lack of necessary maintenance, flaws on the transmission line or any part of the sensor, and unexpected changes in the sensors' surrounding conditions. Two types of machine learning techniques were used in this study to assess the detection of anomalous data points on turbidity readings from the Paradise site on the Little Bear River, in northern Utah: Artificial Neural Networks (ANNs) and Relevance Vector Machines (RVMs). ANN and RVM techniques were used to develop regression models capable of predicting upcoming Paradise site turbidity measurements and estimating confidence intervals associated with those predictions, to be later used to determine if a real measurement is an anomaly. Three cases were identified as important to evaluate as possible inputs for the regression models created: (1) only the reported values from the sensor from previous time steps, (2) reported values from the sensor from previous time steps and values of other water types of sensors from the same site as the target sensor, and (3) adding as inputs the previous readings from sensors from upstream sites. The decision of which of the models performed the best was made based on each model's ability to detect anomalous data points that were identified in a QA/QC analysis that was manually performed by a human technician. False positive and false negative rates for a range of confidence intervals were used as the measure of performance of the models. The RVM models were able to detect more anomalous points within narrower confidence intervals than the ANN models. At the same time, it was shown that incorporating as inputs measurements from other sensors at the same site as well as measurements from upstream sites can improve the performance of the models.
|
160 |
Probabilistic Clustering Ensemble Evaluation for Intrusion DetectionMcElwee, Steven M. 01 January 2018 (has links)
Intrusion detection is the practice of examining information from computers and networks to identify cyberattacks. It is an important topic in practice, since the frequency and consequences of cyberattacks continues to increase and affect organizations. It is important for research, since many problems exist for intrusion detection systems. Intrusion detection systems monitor large volumes of data and frequently generate false positives. This results in additional effort for security analysts to review and interpret alerts. After long hours spent reviewing alerts, security analysts become fatigued and make bad decisions. There is currently no approach to intrusion detection that reduces the workload of human analysts by providing a probabilistic prediction that a computer is experiencing a cyberattack.
This research addressed this problem by estimating the probability that a computer system was being attacked, rather than alerting on individual events. This research combined concepts from cyber situation awareness by applying clustering ensembles, probability analysis, and active learning. The unique contribution of this research is that it provides a higher level of meaning for intrusion alerts than traditional approaches.
Three experiments were conducted in the course of this research to demonstrate the feasibility of these concepts. The first experiment evaluated cluster generation approaches that provided multiple perspectives of network events using unsupervised machine learning. The second experiment developed and evaluated a method for detecting anomalies from the clustering results. This experiment also determined the probability that a computer system was being attacked. Finally, the third experiment integrated active learning into the anomaly detection results and evaluated its effectiveness in improving the accuracy.
This research demonstrated that clustering ensembles with probabilistic analysis were effective for identifying normal events. Abnormal events remained uncertain and were assigned a belief. By aggregating the belief to find the probability that a computer system was under attack, the resulting probability was highly accurate for the source IP addresses and reasonably accurate for the destination IP addresses. Active learning, which simulated feedback from a human analyst, eliminated the residual error for the destination IP addresses with a low number of events that required labeling.
|
Page generated in 0.0856 seconds