Global ETD Search

Return to search

Anomaly detection techniques for unsupervised machine learning

Anomalies in data can be of great importance as they often indicate faulty behaviour. Locating these can thus assist in finding the source of the issue. Isolation Forest, an unsupervised machine learning model used to detect anomalies, is evaluated against two other commonly used models. The data set used were log files from a company named Trimma. The log files contained information about different events that executed. Different types of event could differ in execution time. The models were then used to find logs where some event took longer than usual to execute. The feature created for the models was a percentual difference from the median of each job type. The comparison made on various data set sizes, using one feature, showed that Isolation Forest did not perform the best with regard to execution time among the models. Isolation Forest classified similar data points compared to the other models. However, the smallest classified anomaly differed a bit from the other models. This discrepancy was only seen in the smaller anomalies, the larger deviations were consistently classified as anomalies by all models.

http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-197040

Machine learning

unsupervised anomaly detection

anomaly detection in logfiles

isolation forest

cblof

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-197040
Date	January 2022
Creators	Iivari, Albin
Publisher	Umeå universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UMNAD ; 1324

Page generated in 0.0058 seconds

Anomaly detection techniques for unsupervised machine learning

Description

Links & Downloads

Tags

Additional Fields