Global ETD Search

Return to search

Anomaly detection with Machine learning : Quality assurance of statistical data in the Aid community

The overall purpose of this study was to find a way to identify incorrect data in Sida’s statistics about their contributions. A contribution is the financial support given by Sida to a project. The goal was to build an algorithm that determines if a contribution has a risk to be inaccurate coded, based on supervised classification methods within the area of Machine Learning. A thorough data analysis process was done in order to train a model to find hidden patterns in the data. Descriptive features containing important information about the contributions were successfully selected and used for this task. These included keywords that were retrieved from descriptions of the contributions. Two Machine learning methods, Adaboost and Support Vector Machines, were tested for ten classification models. Each model got evaluated depending on their accuracy of predicting the target variable into its correct class. A misclassified component was more likely to be incorrectly coded and was also seen as an anomaly. The Adaboost method performed better and more steadily on the majority of the models. Six classification models built with the Adaboost method were combined to one final ensemble classifier. This classifier was verified with new unseen data and an anomaly score was calculated for each component. The higher the score, the higher the risk of being anomalous. The result was a ranked list, where the most anomalous components were prioritized for further investigation of staff at Sida.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-260380

Anomaly detection

Machine learning

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-260380
Date	January 2015
Creators	Blomquist, Hanna, Möller, Johanna
Publisher	Uppsala universitet, Datalogi, Uppsala universitet, Datalogi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC STS, 1650-8319 ; 15014

Page generated in 0.0023 seconds

Anomaly detection with Machine learning : Quality assurance of statistical data in the Aid community

Description

Links & Downloads

Tags

Additional Fields