In several applications, when anomalies are detected, human experts have to investigate or verify them one by one. As they investigate, they unwittingly produce a label - true positive (TP) or false positive (FP). In this thesis, we propose two methods (PAD and Clustering-based OMD/OJRank) that exploit this label feedback to minimize the FP rate and detect more relevant anomalies, while minimizing the expert effort required to investigate them. These two methods iteratively suggest the top-1 anomalous instance to a human expert and receive feedback. Before suggesting the next anomaly, the methods re-ranks instances so that the top anomalous instances are similar to the TP instances and dissimilar to the FP instances. This is achieved by learning to score anomalies differently in various regions of the feature space (OMD-Clustering) and by learning to score anomalies based on the distance to the real anomalies (PAD). An experimental evaluation on several real-world datasets is conducted. The results show that OMD-Clustering achieves statistically significant improvement in both detection precision and expert effort compared to state-of-the-art interactive anomaly detection methods. PAD reduces expert effort but there was no improvement in detection precision compared to state-of-the-art methods. We submitted a paper based on the work presented in this thesis, to the ECML/PKDD Workshop on "IoT Stream for Data Driven Predictive Maintenance".
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hh-42815 |
Date | January 2020 |
Creators | Cheng, Lingyun, Sundaresh, Sadhana |
Publisher | Högskolan i Halmstad, Högskolan i Halmstad |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0018 seconds