abstract: Machine learning (ML) and deep neural networks (DNNs) have achieved great success in a variety of application domains, however, despite significant effort to make these networks robust, they remain vulnerable to adversarial attacks in which input that is perceptually indistinguishable from natural data can be erroneously classified with high prediction confidence. Works on defending against adversarial examples can be broadly classified as correcting or detecting, which aim, respectively at negating the effects of the attack and correctly classifying the input, or detecting and rejecting the input as adversarial. In this work, a new approach for detecting adversarial examples is proposed. The approach takes advantage of the robustness of natural images to noise. As noise is added to a natural image, the prediction probability of its true class drops, but the drop is not sudden or precipitous. The same seems to not hold for adversarial examples. In other word, the stress response profile for natural images seems different from that of adversarial examples, which could be detected by their stress response profile. An evaluation of this approach for detecting adversarial examples is performed on the MNIST, CIFAR-10 and ImageNet datasets. Experimental data shows that this approach is effective at detecting some adversarial examples on small scaled simple content images and with little sacrifice on benign accuracy. / Dissertation/Thesis / Masters Thesis Computer Science 2019
Identifer | oai:union.ndltd.org:asu.edu/item:55594 |
Date | January 2019 |
Contributors | Sun, Lin (Author), Bazzi, Rida (Advisor), Li, Baoxin (Committee member), Tong, Hanghang (Committee member), Arizona State University (Publisher) |
Source Sets | Arizona State University |
Language | English |
Detected Language | English |
Type | Masters Thesis |
Format | 64 pages |
Rights | http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0016 seconds