• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 1
  • Tagged with
  • 21
  • 21
  • 15
  • 9
  • 7
  • 6
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Benevolent and Malevolent Adversaries: A Study of GANs and Face Verification Systems

Nazari, Ehsan 22 November 2023 (has links)
Cybersecurity is rapidly evolving, necessitating inventive solutions for emerging challenges. Deep Learning (DL), having demonstrated remarkable capabilities across various domains, has found a significant role within Cybersecurity. This thesis focuses on benevolent and malevolent adversaries. For the benevolent adversaries, we analyze specific applications of DL in Cybersecurity contributing to the enhancement of DL for downstream tasks. Regarding the malevolent adversaries, we explore the question of how resistant to (Cyber) attacks is DL and show vulnerabilities of specific DL-based systems. We begin by focusing on the benevolent adversaries by studying the use of a generative model called Generative Adversarial Networks (GAN) to improve the abilities of DL. In particular, we look at the use of Conditional Generative Adversarial Networks (CGAN) to generate synthetic data and address issues with imbalanced datasets in cybersecurity applications. Imbalanced classes can be a significant issue in this field and can lead to serious problems. We find that CGANs can effectively address this issue, especially in more difficult scenarios. Then, we turn our attention to using CGAN with tabular cybersecurity problems. However, visually assessing the results of a CGAN is not possible when we are dealing with tabular cybersecurity data. To address this issue, we introduce AutoGAN, a method that can train a GAN on both image-based and tabular data, reducing the need for human inspection during GAN training. This opens up new opportunities for using GANs with tabular datasets, including those in cybersecurity that are not image-based. Our experiments show that AutoGAN can achieve comparable or even better results than other methods. Finally, we shift our focus to the malevolent adversaries by looking at the robustness of DL models in the context of automatic face recognition. We know from previous research that DL models can be tricked into making incorrect classifications by adding small, almost unnoticeable changes to an image. These deceptive manipulations are known as adversarial attacks. We aim to expose new vulnerabilities in DL-based Face Verification (FV) systems. We introduce a novel attack method on FV systems, called the DodgePersonation Attack, and a system for categorizing these attacks based on their specific targets. We also propose a new algorithm that significantly improves upon a previous method for making such attacks, increasing the success rate by more than 13%.
12

Adversarial Attacks On Graph Convolutional Transformer With EHR Data

Siddhartha Pothukuchi (18437181) 28 April 2024 (has links)
<p dir="ltr">This research explores adversarial attacks on Graph Convolutional Transformer (GCT) models that utilize Electronic Health Record (EHR) data. As deep learning models become increasingly integral to healthcare, securing their robustness against adversarial threats is critical. This research assesses the susceptibility of GCT models to specific adversarial attacks, namely the Fast Gradient Sign Method (FGSM) and the Jacobian-based Saliency Map Attack (JSMA). It examines their effect on the model’s prediction of mortality and readmission. Through experiments conducted with the MIMIC-III and eICU datasets, the study finds that although the GCT model exhibits superior performance in processing EHR data under normal conditions, its accuracy drops when subjected to adversarial conditions—from an accuracy of 86% with test data to about 57% and an area under the curve (AUC) from 0.86 to 0.51. These findings averaged across both datasets and attack methods, underscore the urgent need for effective adversarial defense mechanisms in AI systems used in healthcare. This thesis contributes to the field by identifying vulnerabilities and suggesting various strategies to enhance the resilience of GCT models against adversarial manipulations.</p>
13

Efficient and Robust Deep Learning through Approximate Computing

Sanchari Sen (9178400) 28 July 2020 (has links)
<p>Deep Neural Networks (DNNs) have greatly advanced the state-of-the-art in a wide range of machine learning tasks involving image, video, speech and text analytics, and are deployed in numerous widely-used products and services. Improvements in the capabilities of hardware platforms such as Graphics Processing Units (GPUs) and specialized accelerators have been instrumental in enabling these advances as they have allowed more complex and accurate networks to be trained and deployed. However, the enormous computational and memory demands of DNNs continue to increase with growing data size and network complexity, posing a continuing challenge to computing system designers. For instance, state-of-the-art image recognition DNNs require hundreds of millions of parameters and hundreds of billions of multiply-accumulate operations while state-of-the-art language models require hundreds of billions of parameters and several trillion operations to process a single input instance. Another major obstacle in the adoption of DNNs, despite their impressive accuracies on a range of datasets, has been their lack of robustness. Specifically, recent efforts have demonstrated that small, carefully-introduced input perturbations can force a DNN to behave in unexpected and erroneous ways, which can have to severe consequences in several safety-critical DNN applications like healthcare and autonomous vehicles. In this dissertation, we explore approximate computing as an avenue to improve the speed and energy efficiency of DNNs, as well as their robustness to input perturbations.</p> <p> </p> <p>Approximate computing involves executing selected computations of an application in an approximate manner, while generating favorable trade-offs between computational efficiency and output quality. The intrinsic error resilience of machine learning applications makes them excellent candidates for approximate computing, allowing us to achieve execution time and energy reductions with minimal effect on the quality of outputs. This dissertation performs a comprehensive analysis of different approximate computing techniques for improving the execution efficiency of DNNs. Complementary to generic approximation techniques like quantization, it identifies approximation opportunities based on the specific characteristics of three popular classes of networks - Feed-forward Neural Networks (FFNNs), Recurrent Neural Networks (RNNs) and Spiking Neural Networks (SNNs), which vary considerably in their network structure and computational patterns.</p> <p> </p> <p>First, in the context of feed-forward neural networks, we identify sparsity, or the presence of zero values in the data structures (activations, weights, gradients and errors), to be a major source of redundancy and therefore, an easy target for approximations. We develop lightweight micro-architectural and instruction set extensions to a general-purpose processor core that enable it to dynamically detect zero values when they are loaded and skip future instructions that are rendered redundant by them. Next, we explore LSTMs (the most widely used class of RNNs), which map sequences from an input space to an output space. We propose hardware-agnostic approximations that dynamically skip redundant symbols in the input sequence and discard redundant elements in the state vector to achieve execution time benefits. Following that, we consider SNNs, which are an emerging class of neural networks that represent and process information in the form of sequences of binary spikes. Observing that spike-triggered updates along synaptic connections are the dominant operation in SNNs, we propose hardware and software techniques to identify connections that can be minimally impact the output quality and deactivate them dynamically, skipping any associated updates.</p> <p> </p> <p>The dissertation also delves into the efficacy of combining multiple approximate computing techniques to improve the execution efficiency of DNNs. In particular, we focus on the combination of quantization, which reduces the precision of DNN data-structures, and pruning, which introduces sparsity in them. We observe that the ability of pruning to reduce the memory demands of quantized DNNs decreases with precision as the overhead of storing non-zero locations alongside the values starts to dominate in different sparse encoding schemes. We analyze this overhead and the overall compression of three different sparse formats across a range of sparsity and precision values and propose a hybrid compression scheme that identifies that optimal sparse format for a pruned low-precision DNN.</p> <p> </p> <p>Along with improved execution efficiency of DNNs, the dissertation explores an additional advantage of approximate computing in the form of improved robustness. We propose ensembles of quantized DNN models with different numerical precisions as a new approach to increase robustness against adversarial attacks. It is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. We overcome this limitation to achieve the best of both worlds, i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble.</p> <p> </p> <p><br></p><p>In summary, this dissertation establishes approximate computing as a promising direction to improve the performance, energy efficiency and robustness of neural networks.</p>
14

Evaluating Robustness of a CNN Architecture introduced to the Adversarial Attacks

Ishak, Shaik, Jyothsna Chowdary, Anantaneni January 2021 (has links)
Abstract: Background: From Previous research, state-of-the-art deep neural networks have accomplished impressive results on many images classification tasks. However, adversarial attacks can easily fool these deep neural networks by adding little noise to the input images. This vulnerability causes a significant concern in deploying deep neural network-based systems in real-world security-sensitive situations. Therefore, research in attacking and the architectures with adversarial examples has drawn considerable attention. Here, we use the technique for image classification called Convolutional Neural Networks (CNN), which is known for determining favorable results in image classification. Objectives: This thesis reviews all types of adversarial attacks and CNN architectures in the present scientific literature. Experiment to build a CNN architecture to classify the handwritten digits in the MNIST dataset. And they are experimenting with adversarial attacks on the images to evaluate the accuracy fluctuations in categorizing images. This study also includes an experiment using the defensive distillation technique to improve the architecture's performance under adversarial attacks.  Methods: This thesis includes two methods; the systematic literature review method involved finding the best performing CNN architectures and best performing adversarial attack techniques. The experimentation method consists in building a CNN model based on modified LeNet architecture with two convolutional layers, one max-pooling layer, and two dropouts. The model is trained and tested with the MNIST dataset. Then applying adversarial attacks FGSM, IFGSM, MIFGSM on the input images to evaluate the model's performance. Later this model will be modified a little by defensive distillation technique and then tested towards adversarial attacks to evaluate the architecture's performance. Results: An experiment is conducted to evaluate the robustness of the CNN architecture in classifying the handwritten digits. The graphs show the accuracy before and after implementing adversarial attacks on the test dataset. The defensive distillation mechanism is applied to avoid adversarial attacks and achieve robust architecture. Conclusions: The results showed that FGSM, I-FGSM, MI-FGSM attacks reduce the test accuracy from 95% to around 35%. These three attacks to the proposed network successfully reduced ~70% of the test accuracy in all three cases for maximum epsilon 0.3. By the defensive distillation mechanism, the test accuracy reduces from 90% to 88% for max epsilon 0.3. The proposed defensive distillation process is successful in defending the adversarial attacks.
15

Dynamic Chemical Imaging And Analysis Within Biologically Active Materials

Alex M Sherman (10711971) 06 May 2021 (has links)
A thorough understanding of pharmaceutical and therapeutic products and materials is important for an improved quality of life. By probing the complex behaviors and properties of these systems, new insights can allow for a better understanding of current treatments, improved design and synthesis of new drug products, and the development of new treatments for various health conditions. Often, the impact of these new insights are limited by current technology and instrumentation and by the methods in which existing data is processed. Additionally, current standards for characterization of pharmaceuticals and therapeutics are time-consuming and can delay the timeline in which these products become available to the consumer. By addressing the limitations in current instrumentation and data science methods, faster and improved characterization is possible.<div><br></div><div>Development and improvement in optical instrumentation provides potential solutions to the current limitations of characterization methods by conventional instrumentation. Limitations in speed can be addressed through the use of nonlinear optical (NLO) methods, such as second harmonic generation (SHG) and two-photon excited ultraviolet fluorescence (TPE-UVF) microscopy, or by linear methods such as fluorescence recovery after photobleaching (FRAP). For these methods, a high signal-to-noise ratio (SNR) and a nondestructive nature decrease the overall sample size requirements and collections times of these methods. Furthermore, by combination of these optical techniques with other techniques, such as thermal analysis (e.g. differential scanning calorimetry (DSC)), polarization modulation, or patterned illumination, the collection of more complex and higher quality data is possible while retaining the improved speed of these methods. Thus, this modified instrumentation can allow for improved characterization of properties such as stability, structure, and mobility of pharmaceutical and therapeutic products.<br></div><div><br></div><div>With an increase in data quantity and complexity, improvements to existing methods of analysis, as well as development of new data science methods, is essential. Machine learning (ML) architectures and empirically validated models for the analysis of existing data can provide improved quantification. Using the aforementioned optical instrumentation, auto-calibration of data acquired by SHG microscopy is one such method in which quantification of sample crystallinity is enabled by these ML and empirical models. Additionally, ML approaches utilizing generative adversarial networks (GANs) are able to improve on identification of data tampering in order to retain data security. By use of GANs to tamper with experimentally collected and/or simulated data used in existing spectral classifiers, knowledge of adversarial methods and weakness in spectral classification can be ascertained. Likewise, perturbations in physical illumination can be used to ascertain information on classification of real objects by use of GANs. Use of this knowledge can then be used to prevent further data tampering or by improving identification of data tampering.<br></div>
16

Robust Deep Learning Under Application Induced Data Distortions

Rajeev Sahay (10526555) 21 November 2022 (has links)
<p>Deep learning has been increasingly adopted in a multitude of settings. Yet, its strong performance relies on processing data during inference that is in-distribution with its training data. Deep learning input data during deployment, however, is not guaranteed to be in-distribution with the model's training data and can often times be distorted, either intentionally (e.g., by an adversary) or unintentionally (e.g., by a sensor defect), leading to significant performance degradations. In this dissertation, we develop algorithms for a variety of applications to improve the performance of deep learning models in the presence of distorted data. We begin by first designing feature engineering methodologies to increase classification performance in noisy environments. Here, we demonstrate the efficacy of our proposed algorithms on two target detection tasks and show that our framework outperforms a variety of state-of-the-art baselines. Next, we develop mitigation algorithms to improve the performance of deep learning in the presence of adversarial attacks and nonlinear signal distortions. In this context, we demonstrate the effectiveness of our methods on a variety of wireless communications tasks including automatic modulation classification, power allocation in massive MIMO networks, and signal detection. Finally, we develop an uncertainty quantification framework, which produces distributive estimates, as opposed to point predictions, from deep learning models in order to characterize samples with uncertain predictions as well as samples that are out-of-distribution from the model's training data. Our uncertainty quantification framework is carried out on a hyperspectral image target detection task as well as on counter unmanned aircraft systems (cUAS) model. Ultimately, our proposed algorithms improve the performance of deep learning in several environments in which the data during inference has been distorted to be out-of-distribution from the training data. </p>
17

Deep Reinforcement Learning for Temperature Control in Buildings and Adversarial Attacks

Ammouri, Kevin January 2021 (has links)
Heating, Ventilation and Air Conditioning (HVAC) systems in buildings are energy consuming and traditional methods used for building control results in energy losses. The methods cannot account for non-linear dependencies in the thermal behaviour. Deep Reinforcement Learning (DRL) is a powerful method for reaching optimal control in many different control environments. DRL utilizes neural networks to approximate the optimal actions to take given that the system is in a given state. Therefore, DRL is a promising method for building control and this fact is highlighted by several studies. However, neural network polices are known to be vulnerable to adversarial attacks, which are small, indistinguishable changes to the input, which make the network choose a sub-optimal action. Two of the main approaches to attack DRL policies are: (1) the Fast Gradient Sign Method, which uses the gradients of the control agent’s network to conduct the attack; (2) to train a a DRL-agent with the goal to minimize performance of control agents. The aim of this thesis is to investigate different strategies for solving the building control problem with DRL using the building simulator IDA ICE. This thesis is also going to use the concept of adversarial machine learning by applying the attacks on the agents controlling the temperature inside the building. We first built a DRL architecture to learn how to efficiently control temperature in a building. Experiments demonstrate that exploration of the agent plays a crucial role in the training of the building control agent, and one needs to fine-tune the exploration strategy in order to achieve satisfactory performance. Finally, we tested the susceptibility of the trained DRL controllers to adversarial attacks. These tests showed, on average, that attacks trained using DRL methods have a larger impact on building control than those using FGSM, while random perturbation have almost null impact. / Ventilationssystem i byggnader är energiförbrukande och traditionella metoder som används för byggnadskontroll resulterar i förlust av energisparande. Dessa metoder kan inte ta hänsyn till icke-linjära beroenden i termisk beteenden. Djup förstärkande inlärning (DRL) är en kraftfull metod för att uppnå optimal kontroll i många kontrollmiljöer. DRL använder sig av neurala nätverk för att approximera optimala val som kan tas givet att systemet befinner sig i en viss stadie. Därför är DRL en lovande metod för byggnadskontroll och detta faktumet är markerat av flera studier. Likväl, neurala nätverk i allmänhet är kända för att vara svaga mot adversarial attacker, vilket är små ändringar i inmatningen, som gör att neurala nätverket väljer en åtgärd som är suboptimal. Syftet med denna anvhandling är att undersöka olika strategier för att lösa byggnadskontroll-problemet med DRL genom att använda sig av byggnadssimulatorn IDA ICE. Denna avhandling kommer också att använda konceptet av adversarial machine learning för att attackera agenterna som kontrollerar temperaturen i byggnaden. Det finns två olika sätt att attackera neurala nätverk: (1) Fast Gradient Sign Method, som använder gradienterna av kontrollagentens nätverk för att utföra sin attack; (2) träna en inlärningsagent med DRL med målet att minimera kontrollagenternas prestanda. Först byggde vi en DRL-arkitektur som lärde sig kontrollera temperaturen i en byggad. Experimenten visar att utforskning av agenten är en grundläggande faktor för träningen av kontrollagenten och man måste finjustera utforskningen av agenten för att nå tillfredsställande prestanda. Slutligen testade vi känsligheten av de tränade DRL-agenterna till adversarial attacker. Dessa test visade att i genomsnitt har det större påverkan på kontrollagenterna att använda DRL metoder än att använda sig av FGSM medans att attackera helt slumpmässigt har nästan ingen påverkan.
18

TOWARDS SECURE AND ROBUST 3D PERCEPTION IN THE REAL WORLD: AN ADVERSARIAL APPROACH

Zhiyuan Cheng (19104104) 11 July 2024 (has links)
<p dir="ltr">The advent of advanced machine learning and computer vision techniques has led to the feasibility of 3D perception in the real world, which includes but not limited to tasks of monocular depth estimation (MDE), 3D object detection, semantic scene completion, optical flow estimation (OFE), etc. Due to the 3D nature of our physical world, these techniques have enabled various real-world applications like Autonomous Driving (AD), unmanned aerial vehicle (UAV), virtual/augmented reality (VR/AR) and video composition, revolutionizing the field of transportation and entertainment. However, it is well-documented that Deep Neural Network (DNN) models can be susceptible to adversarial attacks. These attacks, characterized by minimal perturbations, can precipitate substantial malfunctions. Considering that 3D perception techniques are crucial for security-sensitive applications, such as autonomous driving systems (ADS), in the real world, adversarial attacks on these systems represent significant threats. As a result, my goal of research is to build secure and robust real-world 3D perception systems. Through the examination of vulnerabilities in 3D perception techniques under such attacks, my dissertation aims to expose and mitigate these weaknesses. Specifically, I propose stealthy physical-world attacks against MDE, a fundamental component in ADS and AR/VR that facilitates the projection from 2D to 3D. I have advanced the stealth of the patch attack by minimizing the patch size and disguising the adversarial pattern, striking an optimal balance between stealth and efficacy. Moreover, I develop single-modal attacks against camera-LiDAR fusion models for 3D object detection, utilizing adversarial patches. This method underscores that mere fusion of sensors does not assure robustness against adversarial attacks. Additionally, I study black-box attacks against MDE and OFE models, which are more practical and impactful as no model details are required and the models can be compromised through only queries. In parallel, I devise a self-supervised adversarial training method to harden MDE models without the necessity of ground-truth depth labels. This enhanced model is capable of withstanding a range of adversarial attacks, including those in the physical world. Through these innovative designs for both attack and defense, this research contributes to the development of more secure and robust 3D perception systems, particularly in the context of the real world applications.</p>
19

Generation and Detection of Adversarial Attacks for Reinforcement Learning Policies

Drotz, Axel, Hector, Markus January 2021 (has links)
In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
20

PatchUp : a feature-space block-level regularization technique for convolutional neural networks

Faramarzi, Mojtaba 07 1900 (has links)
Les modèles d’apprentissage profond à large capacité ont souvent tendance à présenter de hauts écarts de généralisation lorsqu’ils sont entrainés avec une quantité limitée de données étiquetées. Dans ce cas, des réseaux de neurones très profonds et larges auront tendance à mémoriser les échantillons de données et donc ils risquent d’être vulnérables lors d’un léger décalage dans la distribution des données au moment de tester. Ce problème produit une généralisation pauvre lors de changements dans la répartition des données au moment du test. Pour surmonter ce problème, certaines méthodes basées sur la dépendance et l’indépendance de données ont été proposées. Une récente classe de méthodes efficaces pour aborder ce problème utilise plusieurs manières de contruire un nouvel échantillon d’entrainement, en mixant une paire (ou plusieurs) échantillons d’entrainement. Dans cette thèse, nous introduisons PatchUp, une régularisation de l’espace des caractéristiques au niveau des blocs dépendant des données qui opère dans l’espace caché en masquant des blocs contigus parmi les caractéristiques mappées, sélectionnés parmi une paire aléatoire d’échantillons, puis en mixant (Soft PatchUp) ou en échangeant (Hard PatchUp) les blocs contigus sélectionnés. Notre méthode de régularisation n’ajoute pas de surcharge de calcul significative au CNN pendant l’entrainement du modèle. Notre approche améliore la robustesse des modèles CNN face au problème d’intrusion du collecteur qui pourrait apparaitre dans d’autres approches de mixage telles que Mixup et CutMix. De plus, vu que nous mixons des blocs contigus de caractéristiques dans l’espace caché, qui a plus de dimensions que l’espace d’entrée, nous obtenons des échantillons plus diversifiés pour entrainer vers différentes dimensions. Nos expériences sur les ensembles de données CIFAR-10, CIFAR-100, SVHN et Tiny-ImageNet avec des architectures ResNet telles que PreActResnet18, PreActResnet34, WideResnet-28-10, ResNet101 et ResNet152 montrent que PatchUp dépasse ou égalise les performances de méthodes de régularisation pour CNN considérée comme état de l’art actuel. Nous montrons aussi que PatchUp peut fournir une meilleure généralisation pour des transformations affines d’échantillons et est plus robuste face à des attaques d’exemples contradictoires. PatchUp aide aussi les modèles CNN à produire une plus grande variété de caractéristiques dans les blocs résiduels en comparaison avec les méthodes de pointe de régularisation pour CNN telles que Mixup, Cutout, CutMix, ManifoldMixup et Puzzle Mix. Mots clés: Apprentissage en profondeur, Réseau Neuronal Convolutif, Généralisation,Régularisation, Techniques de régularisation dépendantes et indépendantes des données, Robustesse aux attaques adverses. / Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. And, in this case, very deep and wide networks have a tendency to memorize the samples, and therefore they might be vulnerable under a slight distribution shift at testing time. This problem yields poor generalization for data outside of the training data distribution. To overcome this issue some data-dependent and data-independent methods have been proposed. A recent class of successful methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. In this thesis, we introduce PatchUp, a feature-space block-level data-dependent regularization that operates in the hidden space by masking out contiguous blocks of the feature map of a random pair of samples, and then either mixes (Soft PatchUp) or swaps (Hard PatchUp) these selected contiguous blocks. Our regularization method does not incur significant computational overhead for CNNs during training. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches like Mixup and CutMix. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR-10, CIFAR-100, SVHN, and Tiny-ImageNet datasets using ResNet architectures including PreActResnet18, PreActResnet34, WideResnet-28-10, ResNet101, and ResNet152 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide a better generalization to affine transformations of samples and is more robust against adversarial attacks. PatchUp also helps a CNN model to produce a wider variety of features in the residual blocks compared to other state-of-the-art regularization methods for CNNs such as Mixup, Cutout, CutMix, ManifoldMixup, and Puzzle Mix. Key words: Deep Learning, Convolutional Neural Network, Generalization, Regular-ization, Data-dependent and Data-independent Regularization Techniques, Robustness to Adversarial Attacks.

Page generated in 0.0755 seconds