Return to search

Anomaly detection techniques for unsupervised machine learning

Anomalies in data can be of great importance as they often indicate faulty behaviour. Locating these can thus assist in finding the source of the issue. Isolation Forest, an unsupervised machine learning model used to detect anomalies, is evaluated against two other commonly used models. The data set used were log files from a company named Trimma. The log files contained information about different events that executed. Different types of event could differ in execution time. The models were then used to find logs where some event took longer than usual to execute. The feature created for the models was a percentual difference from the median of each job type. The comparison made on various data set sizes, using one feature, showed that Isolation Forest did not perform the best with regard to execution time among the models. Isolation Forest classified similar data points compared to the other models. However, the smallest classified anomaly differed a bit from the other models. This discrepancy was only seen in the smaller anomalies, the larger deviations were consistently classified as anomalies by all models.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-197040
Date January 2022
CreatorsIivari, Albin
PublisherUmeå universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationUMNAD ; 1324

Page generated in 0.002 seconds