Recent research shows that machine learning algorithms are highly susceptible to attacks trying to extract sensitive information about the data used in model training. These attacks called privacy attacks, exploit the model training process. Contemporary defense techniques make alterations to the training algorithm. Such defenses are computationally expensive, cause a noticeable privacy-utility tradeoff, and require control over the training process. This thesis presents a data-centric approach using data augmentations to mitigate privacy attacks.
We present privacy-focused data augmentations to change the sensitive data submitted to the model trainer. Compared to traditional defenses, our method provides more control to the individual data owner to protect one's private data. The defense is model-agnostic and does not require the data owner to have any sort of control over the model training. Privacypreserving augmentations are implemented for two attacks namely membership inference and model inversion using two distinct techniques. While the proposed augmentations offer a better privacy-utility tradeoff on CIFAR-10 for membership inference, they reduce the reconstruction rate to ≤ 1% while reducing the classification accuracy by only 2% against model inversion attacks. This is the first attempt to defend model inversion and membership inference attacks using decentralized privacy protection. / Master of Science / Privacy attacks are threats posed to extract sensitive information about the data used to train machine learning models. As machine learning is used extensively for many applications, they have access to private information like financial records, medical history, etc depending on the application. It has been observed that machine learning models can leak the information they contain. As models tend to 'memorize' training data to some extent, even removing the data from the training set cannot prevent privacy leakage. As a result, the research community has focused its attention on developing defense techniques to prevent this information leakage.
However, the existing defenses rely heavily on making alterations to the way a machine learning model is trained. This approach is termed as a model-centric approach wherein the model owner is responsible to make changes to the model algorithm to preserve data privacy.
By doing this, the model performance is degraded while upholding data privacy. Our work introduces the first data-centric defense which provides the tools to protect the data to the data owner. We demonstrate the effectiveness of the proposed defense in providing protection while ensuring that the model performance is maintained to a great extent.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/116036 |
Date | 14 August 2023 |
Creators | Abhyankar, Nikhil Suhas |
Contributors | Electrical and Computer Engineering, Jia, Ruoxi, Abbott, Amos L., Ramakrishnan, Narendran |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0015 seconds