Global ETD Search

1	Towards the Safety and Robustness of Deep Models Karim, Md Nazmul 01 January 2023 (has links) (PDF) The primary focus of this doctoral dissertation is to investigate the safety and robustness of deep models. Our objective is to thoroughly analyze and introduce innovative methodologies for cultivating robust representations under diverse circumstances. Deep neural networks (DNNs) have emerged as fundamental components in recent advancements across various tasks, including image recognition, semantic segmentation, and object detection. Representation learning stands as a pivotal element in the efficacy of DNNs, involving the extraction of significant features from data through mechanisms like convolutional neural networks (CNNs) applied to image data. In real-world applications, ensuring the robustness of these features against various adversarial conditions is imperative, thus emphasizing robust representation learning. Through the acquisition of robust representations, DNNs can enhance their ability to generalize to new data, mitigate the impact of label noise and domain shifts, and bolster their resilience against external threats, such as backdoor attacks. Consequently, this dissertation explores the implications of robust representation learning in three principal areas: i) Backdoor Attack, ii) Backdoor Defense, and iii) Noisy Labels. First, we study the backdoor attack creation and detection from different perspectives. Backdoor attack addresses AI safety and robustness issues where an adversary can insert malicious behavior into a DNN by altering the training data. Second, we aim to remove the backdoor from DNN using two different types of defense techniques: i) training-time defense and ii) test-time defense. training-time defense prevents the model from learning the backdoor during model training whereas test-time defense tries to purify the backdoor model after the backdoor has already been inserted. Third, we explore the direction of noisy label learning (NLL) from two perspectives: a) offline NLL and b) online continual NLL. The representation learning under noisy labels gets severely impacted due to the memorization of those noisy labels, which leads to poor generalization. We perform uniform sampling and contrastive learning-based representation learning. We also test the algorithm efficiency in an online continual learning setup. Furthermore, we show the transfer and adaptation of learned representations in one domain to another domain, e.g. source free domain adaptation (SFDA). We study the impact of noisy labels under SFDA settings and propose a novel algorithm that produces state-of-the-art (SOTA) performance. Backdoor Attack Domain Adaptation Noisy Labels Fisher Information DNN Smoothness
2	Towards Secure and Safe AI-enabled Systems Through Optimizations Guanhong Tao (18542383) 15 May 2024 (has links) <p dir="ltr">Artificial intelligence (AI) is increasingly integrated into critical systems across various sectors, including public surveillance, autonomous driving, and malware detection. Despite their impressive performance and promise, the security and safety of AI-enabled systems remain significant concerns. Like conventional systems that have software bugs or vulnerabilities, applications leveraging AI are also susceptible to such issues. Malicious behaviors can be intentionally injected into AI models by adversaries, creating a backdoor. These models operate normally with benign inputs but consistently misclassify samples containing an attacker-inserted trigger, known as a <i>backdoor attack</i>.</p><p dir="ltr">However, backdoors can not only be injected by an attacker but may also naturally exist in normally trained models. One can find backdoor triggers in benign models that cause any inputs with the trigger to be misclassified, a phenomenon termed <i>natural backdoors</i>. Regardless of whether they are injected or natural, backdoors can take various forms, which increases the difficulty of identifying such vulnerabilities. This challenge is exacerbated when access to AI models is limited.</p><p dir="ltr">This dissertation introduces an optimization-based technique that reverse-engineers trigger patterns exploited by backdoors, whether injected or natural. It formulates how backdoor triggers modify inputs down to the pixel level to approximate their potential forms. The intended changes in output predictions guide the reverse-engineering process, which involves computing the input gradient or sampling possible perturbations when model access is limited. Although various types of backdoors exist, this dissertation demonstrates that they can be effectively clustered into two categories based on their methods of input manipulation. The development of practical reverse-engineering approaches is based on this fundamental classification, leading to the successful identification of backdoor vulnerabilities in AI models.</p><p dir="ltr">To alleviate such security threats, this dissertation introduces a novel hardening technique that enhances the robustness of models against adversary exploitation. It sheds light on the existence of backdoors, which can often be attributed to the small distance between two classes. Based on this analysis, a class distance hardening method is proposed to proactively enlarge the distance between every pair of classes in a model. This method is effective in eliminating both injected and natural backdoors in a variety of forms.</p><p dir="ltr">This dissertation aims to highlight both existing and newly identified security and safety challenges in AI systems. It introduces novel formulations of backdoor trigger patterns and provides a fundamental understanding of backdoor vulnerabilities, paving the way for the development of safer and more secure AI systems.</p> Software and application security Adversarial machine learning AI Security Backdoor Attack Trigger Reverse Engineering Model Hardening
3	Trojan Attacks and Defenses on Deep Neural Networks Yingqi Liu (13943811) 13 October 2022 (has links) <p>With the fast spread of machine learning techniques, sharing and adopting public deep neural networks become very popular. As deep neural networks are not intuitive for human to understand, malicious behaviors can be injected into deep neural networks undetected. We call it trojan attack or backdoor attack on neural networks. Trojaned models operate normally when regular inputs are provided, and misclassify to a specific output label when the input is stamped with some special pattern called trojan trigger. Deploying trojaned models can cause various severe consequences including endangering human lives (in applications like autonomous driving). Trojan attacks on deep neural networks introduce two challenges. From the attacker's perspective, since the training data or training process is usually not accessible to the attacker, the attacker needs to find a way to carry out the trojan attack without access to training data. From the user's perspective, the user needs to quickly scan the online public deep neural networks and detect trojaned models.</p> <p>We try to address these challenges in this dissertation. For trojan attack without access to training data, We propose to invert the neural network to generate a general trojan trigger, and then retrain the model with reverse-engineered training data to inject malicious behaviors to the model. The malicious behaviors are only activated by inputs stamped with the trojan trigger. To scan and detect trojaned models, we develop a novel technique that analyzes inner neuron behaviors by determining how output activation change when we introduce different levels of stimulation to a neuron. A trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. Furthermore, for complex trojan attacks, we propose a novel complex trigger detection method. It leverages a novel symmetric feature differencing method to distinguish features of injected complex triggers from natural features. For trojan attacks on NLP models, we propose a novel backdoor scanning technique. It transforms a subject model to an equivalent but differentiable form. It then inverts a distribution of words denoting their likelihood in the trigger and applies a novel word discriminativity analysis to determine if the subject model is particularly discriminative for the presence of likely trigger words.</p> Adversarial machine learning Adversarial Machine Learning Backdoor Attack Trojan Attack AI safety Robust Machine Learning Artificial Intelligence
4	Defending Against Trojan Attacks on Neural Network-based Language Models Azizi, Ahmadreza 15 May 2020 (has links) Backdoor (Trojan) attacks are a major threat to the security of deep neural network (DNN) models. They are created by an attacker who adds a certain pattern to a portion of given training dataset, causing the DNN model to misclassify any inputs that contain the pattern. These infected classifiers are called Trojan models and the added pattern is referred to as the trigger. In image domain, a trigger can be a patch of pixel values added to the images and in text domain, it can be a set of words. In this thesis, we propose Trojan-Miner (T-Miner), a defense scheme against such backdoor attacks on text classification deep learning models. The goal of T-Miner is to detect whether a given classifier is a Trojan model or not. To create T-Miner , our approach is based on a sequence-to-sequence text generation model. T-Miner uses feedback from the suspicious (test) classifier to perturb input sentences such that their resulting class label is changed. These perturbations can be different for each of the inputs. T-Miner thus extracts the perturbations to determine whether they include any backdoor trigger and correspondingly flag the suspicious classifier as a Trojan model. We evaluate T-Miner on three text classification datasets: Yelp Restaurant Reviews, Twitter Hate Speech, and Rotten Tomatoes Movie Reviews. To illustrate the effectiveness of T-Miner, we evaluate it on attack models over text classifiers. Hence, we build a set of clean classifiers with no trigger in their training datasets and also using several trigger phrases, we create a set of Trojan models. Then, we compute how many of these models are correctly marked by T-Miner. We show that our system is able to detect trojan and clean models with 97% overall accuracy over 400 classifiers. Finally, we discuss the robustness of T-Miner in the case that the attacker knows T-Miner framework and wants to use this knowledge to weaken T-Miner performance. To this end, we propose four different scenarios for the attacker and report the performance of T-Miner under these new attack methods. / M.S. / Backdoor (Trojan) attacks are a major threat to the security of predictive models that make use of deep neural networks. The idea behind these attacks is as follows: an attacker adds a certain pattern to a portion of given training dataset and in the next step, trains a predictive model over this dataset. As a result, the predictive model misclassifies any inputs that contain the pattern. In image domain this pattern that is called trigger, can be a patch of pixel values added to the images and in text domain, it can be a set of words. In this thesis, we propose Trojan-Miner (T-Miner), a defense scheme against such backdoor attacks on text classification deep learning models. The goal of T-Miner is to detect whether a given classifier is a Trojan model or not. T-Miner is based on a sequence-to-sequence text generation model that is connected to the given predictive model and determine if the predictive model is being backdoor attacked. When T-Miner is connected to the predictive model, it generates a set of words, called perturbations, and analyses these perturbations to determine whether they include any backdoor trigger. Hence if any part of the trigger is present in the perturbations, the predictive model is flagged as a Trojan model. We evaluate T-Miner on three text classification datasets: Yelp Restaurant Reviews, Twitter Hate Speech, and Rotten Tomatoes Movie Reviews. To illustrate the effectiveness of T-Miner, we evaluate it on attack models over text classifiers. Hence, we build a set of clean classifiers with no trigger in their training datasets and also using several trigger phrases, we create a set of Trojan models. Then, we compute how many of these models are correctly marked by T-Miner. We show that our system is able to detect Trojan models with 97% overall accuracy over 400 predictive models. Machine learning Deep learning (Machine learning) Artificial Intelligence Generative Adversarial Network GAN Auto Encoder Backdoor Attack Natural Language Processing Text Style Transfer Adversarial Attack IMDB dataset Rotten Tomato dataset Yelp dataset

1

Page generated in 0.025 seconds