Spelling suggestions: "subject:"adversarial attack"" "subject:"adversarialt attack""
11 |
Improving the Robustness of Deep Neural Networks against Adversarial Examples via Adversarial Training with Maximal Coding Rate Reduction / Förbättra Robustheten hos Djupa Neurala Nätverk mot Exempel på en Motpart genom Utbildning för motståndare med Maximal Minskning av KodningshastighetenChu, Hsiang-Yu January 2022 (has links)
Deep learning is one of the hottest scientific topics at the moment. Deep convolutional networks can solve various complex tasks in the field of image processing. However, adversarial attacks have been shown to have the ability of fooling deep learning models. An adversarial attack is accomplished by applying specially designed perturbations on the input image of a deep learning model. The noises are almost visually indistinguishable to human eyes, but can fool classifiers into making wrong predictions. In this thesis, adversarial attacks and methods to improve deep learning ’models robustness against adversarial samples were studied. Five different adversarial attack algorithm were implemented. These attack algorithms included white-box attacks and black-box attacks, targeted attacks and non-targeted attacks, and image-specific attacks and universal attacks. The adversarial attacks generated adversarial examples that resulted in significant drop in classification accuracy. Adversarial training is one commonly used strategy to improve the robustness of deep learning models against adversarial examples. It is shown that adversarial training can provide an additional regularization benefit beyond that provided by using dropout. Adversarial training is performed by incorporating adversarial examples into the training process. Traditionally, during this process, cross-entropy loss is used as the loss function. In order to improve the robustness of deep learning models against adversarial examples, in this thesis we propose two new methods of adversarial training by applying the principle of Maximal Coding Rate Reduction. The Maximal Coding Rate Reduction loss function maximizes the coding rate difference between the whole data set and the sum of each individual class. We evaluated the performance of different adversarial training methods by comparing the clean accuracy, adversarial accuracy and local Lipschitzness. It was shown that adversarial training with Maximal Coding Rate Reduction loss function would yield a more robust network than the traditional adversarial training method. / Djupinlärning är ett av de hetaste vetenskapliga ämnena just nu. Djupa konvolutionella nätverk kan lösa olika komplexa uppgifter inom bildbehandling. Det har dock visat sig att motståndarattacker har förmågan att lura djupa inlärningsmodeller. En motståndarattack genomförs genom att man tillämpar särskilt utformade störningar på den ingående bilden för en djup inlärningsmodell. Störningarna är nästan visuellt omöjliga att särskilja för mänskliga ögon, men kan lura klassificerare att göra felaktiga förutsägelser. I den här avhandlingen studerades motståndarattacker och metoder för att förbättra djupinlärningsmodellers robusthet mot motståndarexempel. Fem olika algoritmer för motståndarattack implementerades. Dessa angreppsalgoritmer omfattade white-box-attacker och black-box-attacker, riktade attacker och icke-målinriktade attacker samt bildspecifika attacker och universella attacker. De negativa attackerna genererade motståndarexempel som ledde till en betydande minskning av klassificeringsnoggrannheten. Motståndsträning är en vanligt förekommande strategi för att förbättra djupinlärningsmodellernas robusthet mot motståndarexempel. Det visas att motståndsträning kan ge en ytterligare regulariseringsfördel utöver den som ges genom att använda dropout. Motståndsträning utförs genom att man införlivar motståndarexempel i träningsprocessen. Traditionellt används under denna process cross-entropy loss som förlustfunktion. För att förbättra djupinlärningsmodellernas robusthet mot motståndarexempel föreslår vi i den här avhandlingen två nya metoder för motståndsträning genom att tillämpa principen om maximal minskning av kodningshastigheten. Förlustfunktionen Maximal Coding Rate Reduction maximerar skillnaden i kodningshastighet mellan hela datamängden och summan av varje enskild klass. Vi utvärderade prestandan hos olika metoder för motståndsträning genom att jämföra ren noggrannhet, motstånds noggrannhet och lokal Lipschitzness. Det visades att motståndsträning med förlustfunktionen Maximal Coding Rate Reduction skulle ge ett mer robust nätverk än den traditionella motståndsträningsmetoden.
|
12 |
Generation and Detection of Adversarial Attacks for Reinforcement Learning PoliciesDrotz, Axel, Hector, Markus January 2021 (has links)
In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
13 |
Data-Driven Computing and Networking Solution for Securing Cyber-Physical SystemsYifu Wu (18498519) 03 May 2024 (has links)
<p dir="ltr">In recent years, a surge in data-driven computation has significantly impacted security analysis in cyber-physical systems (CPSs), especially in decentralized environments. This transformation can be attributed to the remarkable computational power offered by high-performance computers (HPCs), coupled with advancements in distributed computing techniques and sophisticated learning algorithms like deep learning and reinforcement learning. Within this context, wireless communication systems and decentralized computing systems emerge as highly suitable environments for leveraging data-driven computation in security analysis. Our research endeavors have focused on exploring the vast potential of various deep learning algorithms within the CPS domains. We have not only delved into the intricacies of existing algorithms but also designed novel approaches tailored to the specific requirements of CPSs. A pivotal aspect of our work was the development of a comprehensive decentralized computing platform prototype, which served as the foundation for simulating complex networking scenarios typical of CPS environments. Within this framework, we harnessed deep learning techniques such as restricted Boltzmann machine (RBM) and deep convolutional neural network (DCNN) to address critical security concerns such as the detection of Quality of Service (QoS) degradation and Denial of Service (DoS) attacks in smart grids. Our experimental results showcased the superior performance of deep learning-based approaches compared to traditional pattern-based methods. Additionally, we devised a decentralized computing system that encompassed a novel decentralized learning algorithm, blockchain-based learning automation, distributed storage for data and models, and cryptography mechanisms to bolster the security and privacy of both data and models. Notably, our prototype demonstrated excellent efficacy, achieving a fine balance between model inference performance and confidentiality. Furthermore, we delved into the integration of domain knowledge from CPSs into our deep learning models. This integration shed light on the vulnerability of these models to dedicated adversarial attacks. Through these multifaceted endeavors, we aim to fortify the security posture of CPSs while unlocking the full potential of data-driven computation in safeguarding critical infrastructures.</p>
|
14 |
Defending Against Trojan Attacks on Neural Network-based Language ModelsAzizi, Ahmadreza 15 May 2020 (has links)
Backdoor (Trojan) attacks are a major threat to the security of deep neural network (DNN) models. They are created by an attacker who adds a certain pattern to a portion of given training dataset, causing the DNN model to misclassify any inputs that contain the pattern. These infected classifiers are called Trojan models and the added pattern is referred to as the trigger. In image domain, a trigger can be a patch of pixel values added to the images and in text domain, it can be a set of words. In this thesis, we propose Trojan-Miner (T-Miner), a defense scheme against such backdoor attacks on text classification deep learning models. The goal of T-Miner is to detect whether a given classifier is a Trojan model or not.
To create T-Miner , our approach is based on a sequence-to-sequence text generation model. T-Miner uses feedback from the suspicious (test) classifier to perturb input sentences such that their resulting class label is changed. These perturbations can be different for each of the inputs. T-Miner thus extracts the perturbations to determine whether they include any backdoor trigger and correspondingly flag the suspicious classifier as a Trojan model.
We evaluate T-Miner on three text classification datasets: Yelp Restaurant Reviews, Twitter Hate Speech, and Rotten Tomatoes Movie Reviews. To illustrate the effectiveness of T-Miner, we evaluate it on attack models over text classifiers. Hence, we build a set of clean classifiers with no trigger in their training datasets and also using several trigger phrases, we create a set of Trojan models. Then, we compute how many of these models are correctly marked by T-Miner. We show that our system is able to detect trojan and clean models with 97% overall accuracy over 400 classifiers. Finally, we discuss the robustness of T-Miner in the case that the attacker knows T-Miner framework and wants to use this knowledge to weaken T-Miner performance. To this end, we propose four different scenarios for the attacker and report the performance of T-Miner under these new attack methods. / M.S. / Backdoor (Trojan) attacks are a major threat to the security of predictive models that make use of deep neural networks. The idea behind these attacks is as follows: an attacker adds a certain pattern to a portion of given training dataset and in the next step, trains a predictive model over this dataset. As a result, the predictive model misclassifies any inputs that contain the pattern. In image domain this pattern that is called trigger, can be a patch of pixel values added to the images and in text domain, it can be a set of words.
In this thesis, we propose Trojan-Miner (T-Miner), a defense scheme against such backdoor attacks on text classification deep learning models. The goal of T-Miner is to detect whether a given classifier is a Trojan model or not. T-Miner is based on a sequence-to-sequence text generation model that is connected to the given predictive model and determine if the predictive model is being backdoor attacked. When T-Miner is connected to the predictive model, it generates a set of words, called perturbations, and analyses these perturbations to determine whether they include any backdoor trigger. Hence if any part of the trigger is present in the perturbations, the predictive model is flagged as a Trojan model.
We evaluate T-Miner on three text classification datasets: Yelp Restaurant Reviews, Twitter Hate Speech, and Rotten Tomatoes Movie Reviews. To illustrate the effectiveness of T-Miner, we evaluate it on attack models over text classifiers. Hence, we build a set of clean classifiers with no trigger in their training datasets and also using several trigger phrases, we create a set of Trojan models. Then, we compute how many of these models are correctly marked by T-Miner. We show that our system is able to detect Trojan models with 97% overall accuracy over 400 predictive models.
|
Page generated in 0.0535 seconds