Privacy issues relating to having data made public is relevant with the introduction of the GDPR. To limit problems related to data becoming public, intentionally or via an event such as a security breach, anonymization of datasets can be employed. In this report, the impact of the application of 5-anonymity to the adult dataset on a classifier based on a neural network predicting whether people had an income exceeding $50,000 was investigated using precision, recall and accuracy. The classifier was trained using the non-anonymized data, the anonymized data, and the non-anonymized data with those attributes which were suppressed in the anonymized data removed. The result was that average accuracy dropped from 0.82 to 0.76, precision from 0.58 to 0.50, and recall increased from 0.82 to 0.87. The average values and distributions seem to support the estimation that the majority of the performance impact of anonymization in this case comes from the suppression of attributes.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:his-17918 |
Date | January 2019 |
Creators | Paulson, Jörgen |
Publisher | Högskolan i Skövde, Institutionen för informationsteknologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0017 seconds