1 |
Game Theoretic Analysis of Defence Algorithms Against Data Poisoning AttackOu, Yifan January 2020 (has links)
As Machine Learning (ML) algorithms are deployed to solve a wide variety of tasks in today’s world, data poisoning attack poses a significant threat to ML applications. Although numerous defence algorithms against data poisoning attack have been proposed and shown to be effective, most defence algorithms are analyzed under the assumption of fixed attack strategies, without accounting for the strategic interactions between the attacker and the defender. In this work, we perform game theoretic analysis of defence algorithms against data poisoning attacks on Machine Learning. We study the defence strategy as a competitive game between the defender and the adversary and analyze the game characteristics for several defence algorithms. We propose a game model for the poisoning attack scenario, and prove the characteristics of the Nash Equilibrium (NE) defence strategy for all distance-based defence algorithms. Based on the NE characteristics, we develop an efficient algorithm to approximate for the NE defence strategy. Using fixed attack strategies as the benchmark, we then experimentally evaluate the impact of strategic interactions in the game model. Our approach does not only provide insights about the effectiveness of the analyzed algorithms under optimal poisoning attacks, but also serves as a method for the modellers to determine capable defence algorithms and optimal strategies to employ on their ML models. / Thesis / Master of Science (MSc) / As Machine Learning (ML) algorithms are deployed to solve a wide variety of tasks in today’s world, data poisoning attack poses a significant threat to ML applications. In this work, we study the defence against poisoning attack scenario as a competitive game between the defender and the adversary and analyze the game characteristics for several defence algorithms. Our goal is to identify the optimal defence strategy against poisoning attacks, even when the adversary responds optimally to the defence strategy. We propose a game model for the poisoning attack scenario, and develop an efficient algorithm to approximate for the Nash Equilibrium defence strategy. Our approach does not only provide insights about the effectiveness of the analyzed algorithms under optimal poisoning attacks, but also serves as a method for the modellers to determine capable defence algorithms and optimal strategies to employ on their ML models.
|
2 |
Analysis of Attacks on Controlled Stochastic SystemsRusso, Alessio January 2022 (has links)
In this thesis, we investigate attack vectors against Markov decision processes anddynamical systems. This work is motivated by the recent interest in the researchcommunity towards making Machine Learning models safer to malicious attacks. Wefocus on different attack vectors: (I) attacks that alter the input/output signal of aMarkov decision process; (II) eavesdropping attacks whose aim is to detect a change ina dynamical system; (III) poisoning attacks against data-driven control methods.(I) For attacks on Markov decision processes we focus on 2 types of attacks: (1) attacksthat alter the observations of the victim, and (2) attacks that alter the control signalof the victim. Regarding (1), we investigate the problem of devising optimal attacksthat minimize the collected reward of the victim. We show that when the policy andthe system are known to the attacker, designing optimal attacks amounts to solving aMarkov decision process. We also show that, for the victim, the system uncertaintiesinduced by the attack can be modeled using a Partially Observable Markov decisionprocess (POMDP) framework. We demonstrate that using Reinforcement Learningmethods tailored to POMDP lead to more resilient policies. Regarding (2), we insteadinvestigate the problem of designing optimal stealthy poisoning attacks on the controlchannel of Markov decision processes. Previous work constrained the amplitude ofthe adversarial perturbation, with the hope that this constraint will make the attackimperceptible. However, such constraints do not grant any level of undetectabilityand do not take into account the dynamic nature of the underlying Markov process.To design an optimal stealthy attack, we investigate a new attack formulation, basedon information-theoretical quantities, that considers the objective of minimizing thedetectability of the attack as well as the performance of the controlled process.(II) In the second part of this thesis we analyse the problem where an eavesdropper triesto detect a change in a Markov decision process. These processes may be affected bychanges that need to remain private. We study the problem using theoretical tools fromoptimal detection theory to motivate a definition of online privacy based on the averageamount of information per observation of the underlying stochastic system. We provideways to derive privacy upper-bounds and compute policies that attain a higher privacylevel, concluding with examples and numerical simulations.(III) Lastly, we investigate poisoning attacks against data-driven control methods.Specifically, we analyse how a malicious adversary can slightly poison the data soas to minimize the performance of a controller trained using this data. We show thatidentifying the most impactful attack boils down to solving a bi-level non-convexoptimization problem, and provide theoretical insights on the attack. We present ageneric algorithm finding a local optimum of this problem and illustrate our analysisfor various techniques. Numerical experiments reveal that minimal but well-craftedchanges in the data-set are sufficient to deteriorate the performance of data-drivencontrol methods significantly, and even make the closed-loop system unstable. / <p>QC 20220510</p><p></p><p>Topic: Alessio Russo - LicentiateTime: May 31, 2022 04:00 PM Madrid</p><p> Zoom Meeting link https://kth-se.zoom.us/j/69452765598</p>
|
3 |
Data Poisoning Attacks on Linked Data with Graph RegularizationJanuary 2019 (has links)
abstract: Social media has become the norm of everyone for communication. The usage of social media has increased exponentially in the last decade. The myriads of Social media services such as Facebook, Twitter, Snapchat, and Instagram etc allow people to connect with their friends, and followers freely. The attackers who try to take advantage of this situation has also increased at an exponential rate. Every social media service has its own recommender systems and user profiling algorithms. These algorithms use users current information to make different recommendations. Often the data that is formed from social media services is Linked data as each item/user is usually linked with other users/items. Recommender systems due to their ubiquitous and prominent nature are prone to several forms of attacks. One of the major form of attacks is poisoning the training set data. As recommender systems use current user/item information as the training set to make recommendations, the attacker tries to modify the training set in such a way that the recommender system would benefit the attacker or give incorrect recommendations and hence failing in its basic functionality. Most existing training set attack algorithms work with ``flat" attribute-value data which is typically assumed to be independent and identically distributed (i.i.d.). However, the i.i.d. assumption does not hold for social media data since it is inherently linked as described above. Usage of user-similarity with Graph Regularizer in morphing the training data produces best results to attacker. This thesis proves the same by demonstrating with experiments on Collaborative Filtering with multiple datasets. / Dissertation/Thesis / Masters Thesis Computer Science 2019
|
4 |
PREVENTING DATA POISONING ATTACKS IN FEDERATED MACHINE LEARNING BY AN ENCRYPTED VERIFICATION KEYMahdee, Jodayree 06 1900 (has links)
Federated learning has gained attention recently for its ability to protect data privacy and distribute computing loads [1]. It overcomes the limitations of traditional machine learning algorithms by allowing computers to train on remote data inputs and build models while keeping participant privacy intact. Traditional machine learning offered a solution by enabling computers to learn patterns and make decisions from data without explicit programming. It opened up new possibilities for automating tasks, recognizing patterns, and making predictions. With the exponential growth of data and advances in computational power, machine learning has become a powerful tool in various domains, driving innovations in fields such as image recognition, natural language processing, autonomous vehicles, and personalized recommendations. traditional machine learning, data is usually transferred to a central server, raising concerns about privacy and security. Centralizing data exposes sensitive information, making it vulnerable to breaches or unauthorized access.
Centralized machine learning assumes that all data is available at a central location, which is only sometimes practical or feasible. Some data may be distributed across different locations, owned by different entities, or subject to legal or privacy restrictions. Training a global model in traditional machine learning involves frequent communication between the central server and participating devices. This communication overhead can be substantial, particularly when dealing with large-scale datasets or resource-constrained devices. / Recent studies have uncovered security issues with most of the federated learning models. One common false assumption in the federated learning model is that participants are the attacker and would not use polluted data. This vulnerability enables attackers to train their models using polluted data and then send the polluted updates to the training server for aggregation, potentially poisoning the overall model. In such a setting, it is challenging for an edge server to thoroughly inspect the data used for model training and supervise any edge device. This study evaluates the vulnerabilities present in federated learning and explores various types of attacks that can occur. This paper presents a robust prevention scheme to address these vulnerabilities. The proposed prevention scheme enables federated learning servers to monitor participants actively in real-time and identify infected individuals by introducing an encrypted verification scheme. The paper outlines the protocol design of this prevention scheme and presents experimental results that demonstrate its effectiveness. / Thesis / Doctor of Philosophy (PhD) / federated learning models face significant security challenges and can be vulnerable to attacks. For instance, federated learning models assume participants are not attackers and will not manipulate the data. However, in reality, attackers can compromise the data of remote participants by inserting fake or altering existing data, which can result in polluted training results being sent to the server. For instance, if the sample data is an animal image, attackers can modify it to contaminate the training data.
This paper introduces a robust preventive approach to counter data pollution attacks in real-time. It incorporates an encrypted verification scheme into the federated learning model, preventing poisoning attacks without the need for specific attack detection programming. The main contribution of this paper is a mechanism for detection and prevention that allows the training server to supervise real-time training and stop data modifications in each client's storage before and between training rounds. The training server can identify real-time modifications and remove infected remote participants with this scheme.
|
Page generated in 0.0672 seconds