Return to search

Automating Log Analysis

Background: With the advent of the information age, there are many large numbers of services rising which run on several clusters of computers.  Maintaining such large complex systems is a very difficult task. Developers use one tool which is common for almost all software systems, they are the console logs. To troubleshoot problems, developers refer to these logs to solve the issue. Identifying anomalies in the logs would lead us to the cause of the problem, thereby automating the analysis of logs. This study focuses on anomaly detection in logs. Objectives: The main goal of the thesis is to identify different algorithms for anomaly detection in logs, implement the algorithms and compare them by doing an experiment. Methods: A literature review had been conducted for identifying the most suitable algorithms for anomaly detection in logs. An experiment was conducted to compare the algorithms identified in the literature review. The experiment was performed on a dataset of logs generated by Hadoop Data File System (HDFS) servers which consisted of more than 11 million lines of logs. The algorithms that have been compared are K-means, DBSCAN, Isolation Forest, and Local Outlier Factor algorithms which are all unsupervised learning algorithms. Results: The performance of all these algorithms has been compared using metrics precision, recall, accuracy, F1 score, and run time. Though DBSCAN was the fastest, it resulted in poor recall, similarly Isolation Forest also resulted in poor recall. Local Outlier Factor was the fastest to predict. K-means had the highest precision and Local Outlier Factor had the highest recall, accuracy, and F1 score. Conclusion: After comparing the metrics of different algorithms, we conclude that Local Outlier Factor performed better than the other algorithms with respect to most of the metrics measured.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:bth-21175
Date January 2021
CreatorsKommineni, Sri Sai Manoj, Dindi, Akhila
PublisherBlekinge Tekniska Högskola, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds