<p dir="ltr">Artificial intelligence (AI) is increasingly integrated into critical systems across various sectors, including public surveillance, autonomous driving, and malware detection. Despite their impressive performance and promise, the security and safety of AI-enabled systems remain significant concerns. Like conventional systems that have software bugs or vulnerabilities, applications leveraging AI are also susceptible to such issues. Malicious behaviors can be intentionally injected into AI models by adversaries, creating a backdoor. These models operate normally with benign inputs but consistently misclassify samples containing an attacker-inserted trigger, known as a <i>backdoor attack</i>.</p><p dir="ltr">However, backdoors can not only be injected by an attacker but may also naturally exist in normally trained models. One can find backdoor triggers in benign models that cause any inputs with the trigger to be misclassified, a phenomenon termed <i>natural backdoors</i>. Regardless of whether they are injected or natural, backdoors can take various forms, which increases the difficulty of identifying such vulnerabilities. This challenge is exacerbated when access to AI models is limited.</p><p dir="ltr">This dissertation introduces an optimization-based technique that reverse-engineers trigger patterns exploited by backdoors, whether injected or natural. It formulates how backdoor triggers modify inputs down to the pixel level to approximate their potential forms. The intended changes in output predictions guide the reverse-engineering process, which involves computing the input gradient or sampling possible perturbations when model access is limited. Although various types of backdoors exist, this dissertation demonstrates that they can be effectively clustered into two categories based on their methods of input manipulation. The development of practical reverse-engineering approaches is based on this fundamental classification, leading to the successful identification of backdoor vulnerabilities in AI models.</p><p dir="ltr">To alleviate such security threats, this dissertation introduces a novel hardening technique that enhances the robustness of models against adversary exploitation. It sheds light on the existence of backdoors, which can often be attributed to the small distance between two classes. Based on this analysis, a class distance hardening method is proposed to proactively enlarge the distance between every pair of classes in a model. This method is effective in eliminating both injected and natural backdoors in a variety of forms.</p><p dir="ltr">This dissertation aims to highlight both existing and newly identified security and safety challenges in AI systems. It introduces novel formulations of backdoor trigger patterns and provides a fundamental understanding of backdoor vulnerabilities, paving the way for the development of safer and more secure AI systems.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/25823644 |
Date | 15 May 2024 |
Creators | Guanhong Tao (18542383) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/Towards_Secure_and_Safe_AI-enabled_Systems_Through_Optimizations/25823644 |
Page generated in 0.0021 seconds