Global ETD Search

51	Updating the generator in PPGN-h with gradients flowing through the encoder Pakdaman, Hesam January 2018 (has links) The Generative Adversarial Network framework has shown success in implicitly modeling data distributions and is able to generate realistic samples. Its architecture is comprised of a generator, which produces fake data that superficially seem to belong to the real data distribution, and a discriminator which is to distinguish fake from genuine samples. The Noiseless Joint Plug & Play model offers an extension to the framework by simultaneously training autoencoders. This model uses a pre-trained encoder as a feature extractor, feeding the generator with global information. Using the Plug & Play network as baseline, we design a new model by adding discriminators to the Plug & Play architecture. These additional discriminators are trained to discern real and fake latent codes, which are the output of the encoder using genuine and generated inputs, respectively. We proceed to investigate whether this approach is viable. Experiments conducted for the MNIST manifold show that this indeed is the case. / Generative Adversarial Network är ett ramverk vilket implicit modellerar en datamängds sannolikhetsfördelning och är kapabel till att producera realistisk exempel. Dess arkitektur utgörs av en generator, vilken kan fabricera datapunkter liggandes nära den verkliga sannolikhetsfördelning, och en diskriminator vars syfte är att urskilja oäkta punkter från genuina. Noiseless Joint Plug & Play modellen är en vidareutveckling av ramverket som samtidigt tränar autoencoders. Denna modell använder sig utav en inlärd enkoder som förser generatorn med data. Genom att använda Plug & Play modellen som referens, skapar vi en ny modell genom att addera diskriminatorer till Plug & Play architekturen. Dessa diskriminatorer är tränade att särskilja genuina och falska latenta koder, vilka har producerats av enkodern genom att ha använt genuina och oäkta datapunkter som inputs. Vi undersöker huruvida denna metod är gynnsam. Experiment utförda för MNIST datamängden visar att så är fallet. Computer Science Computer Vision Deep Learning Machine Learning Generative Adversarial Networks GAN Neural Networks Generative models Computer Sciences Datavetenskap (datalogi)
52	Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes Tang, Hao 27 May 2021 (has links) In this thesis, we mainly focus on image generation. However, one can still observe unsatisfying results produced by existing state-of-the-art methods. To address this limitation and further improve the quality of generated images, we propose a few novel models. The image generation task can be roughly divided into three subtasks, i.e., person image generation, scene image generation, and cross-modal translation. Person image generation can be further divided into three subtasks, namely, hand gesture generation, facial expression generation, and person pose generation. Meanwhile, scene image generation can be further divided into two subtasks, i.e., cross-view image translation and semantic image synthesis. For each task, we have proposed the corresponding solution. Specifically, for hand gesture generation, we have proposed the GestureGAN framework. For facial expression generation, we have proposed the Cycle-in-Cycle GAN (C2GAN) framework. For person pose generation, we have proposed the XingGAN and BiGraphGAN frameworks. For cross-view image translation, we have proposed the SelectionGAN framework. For semantic image synthesis, we have proposed the Local and Global GAN (LGGAN), EdgeGAN, and Dual Attention GAN (DAGAN) frameworks. Although each method was originally proposed for a certain task, we later discovered that each method is universal and can be used to solve different tasks. For instance, GestureGAN can be used to solve both hand gesture generation and cross-view image translation tasks. C2GAN can be used to solve facial expression generation, person pose generation, hand gesture generation, and cross-view image translation. SelectionGAN can be used to solve cross-view image translation, facial expression generation, person pose generation, hand gesture generation, and semantic image synthesis. Moreover, we explore cross-modal translation and propose a novel DanceGAN for audio-to-video translation. Generative Adversarial Networks (GANs) Image Generation Image-to-Image Translation Person Image Generation Scene Image Generation Cross-Modal Translation
53	Exploring Multi-Domain and Multi-Modal Representations for Unsupervised Image-to-Image Translation Liu, Yahui 20 May 2022 (has links) Unsupervised image-to-image translation (UNIT) is a challenging task in the image manipulation field, where input images in a visual domain are mapped into another domain with desired visual patterns (also called styles). An ideal direction in this field is to build a model that can map an input image in a domain to multiple target domains and generate diverse outputs in each target domain, which is termed as multi-domain and multi-modal unsupervised image-to-image translation (MMUIT). Recent studies have shown remarkable results in UNIT but they suffer from four main limitations: (1) State-of-the-art UNIT methods are either built from several two-domain mappings that are required to be learned independently or they generate low-diversity results, a phenomenon also known as model collapse. (2) Most of the manipulation is with the assistance of visual maps or digital labels without exploring natural languages, which could be more scalable and flexible in practice. (3) In an MMUIT system, the style latent space is usually disentangled between every two image domains. While interpolations within domains are smooth, interpolations between two different domains often result in unrealistic images with artifacts when interpolating between two randomly sampled style representations from two different domains. Improving the smoothness of the style latent space can lead to gradual interpolations between any two style latent representations even between any two domains. (4) It is expensive to train MMUIT models from scratch at high resolution. Interpreting the latent space of pre-trained unconditional GANs can achieve pretty good image translations, especially high-quality synthesized images (e.g., 1024x1024 resolution). However, few works explore building an MMUIT system with such pre-trained GANs. In this thesis, we focus on these vital issues and propose several techniques for building better MMUIT systems. First, we base on the content-style disentangled framework and propose to fit the style latent space with Gaussian Mixture Models (GMMs). It allows a well-trained network using a shared disentangled style latent space to model multi-domain translations. Meanwhile, we can randomly sample different style representations from a Gaussian component or use a reference image for style transfer. Second, we show how the GMM-modeled latent style space can be combined with a language model (e.g., a simple LSTM network) to manipulate multiple styles by using textual commands. Then, we not only propose easy-to-use constraints to improve the smoothness of the style latent space in MMUIT models, but also design a novel metric to quantitatively evaluate the smoothness of the style latent space. Finally, we build a new model to use pretrained unconditional GANs to do MMUIT tasks.
54	Benevolent and Malevolent Adversaries: A Study of GANs and Face Verification Systems Nazari, Ehsan 22 November 2023 (has links) Cybersecurity is rapidly evolving, necessitating inventive solutions for emerging challenges. Deep Learning (DL), having demonstrated remarkable capabilities across various domains, has found a significant role within Cybersecurity. This thesis focuses on benevolent and malevolent adversaries. For the benevolent adversaries, we analyze specific applications of DL in Cybersecurity contributing to the enhancement of DL for downstream tasks. Regarding the malevolent adversaries, we explore the question of how resistant to (Cyber) attacks is DL and show vulnerabilities of specific DL-based systems. We begin by focusing on the benevolent adversaries by studying the use of a generative model called Generative Adversarial Networks (GAN) to improve the abilities of DL. In particular, we look at the use of Conditional Generative Adversarial Networks (CGAN) to generate synthetic data and address issues with imbalanced datasets in cybersecurity applications. Imbalanced classes can be a significant issue in this field and can lead to serious problems. We find that CGANs can effectively address this issue, especially in more difficult scenarios. Then, we turn our attention to using CGAN with tabular cybersecurity problems. However, visually assessing the results of a CGAN is not possible when we are dealing with tabular cybersecurity data. To address this issue, we introduce AutoGAN, a method that can train a GAN on both image-based and tabular data, reducing the need for human inspection during GAN training. This opens up new opportunities for using GANs with tabular datasets, including those in cybersecurity that are not image-based. Our experiments show that AutoGAN can achieve comparable or even better results than other methods. Finally, we shift our focus to the malevolent adversaries by looking at the robustness of DL models in the context of automatic face recognition. We know from previous research that DL models can be tricked into making incorrect classifications by adding small, almost unnoticeable changes to an image. These deceptive manipulations are known as adversarial attacks. We aim to expose new vulnerabilities in DL-based Face Verification (FV) systems. We introduce a novel attack method on FV systems, called the DodgePersonation Attack, and a system for categorizing these attacks based on their specific targets. We also propose a new algorithm that significantly improves upon a previous method for making such attacks, increasing the success rate by more than 13%. Cybersecurity Machine Learning Computer Vision and Pattern Recognition Generative Adversarial Networks Face Verification systems Class Imbalance Problem Adversarial Attacks
55	Towards Building Privacy-Preserving Language Models: Challenges and Insights in Adapting PrivGAN for Generation of Synthetic Clinical Text Nazem, Atena January 2023 (has links) The growing development of artificial intelligence (AI), particularly neural networks, is transforming applications of AI in healthcare, yet it raises significant privacy concerns due to potential data leakage. As neural networks memorise training data, they may inadvertently expose sensitive clinical data to privacy breaches, which can engender serious repercussions like identity theft, fraud, and harmful medical errors. While regulations such as GDPR offer safeguards through guidelines, rooted and technical protections are required to address the problem of data leakage. Reviews of various approaches show that one avenue of exploration is the adaptation of Generative Adversarial Networks (GANs) to generate synthetic data for use in place of real data. Since GANs were originally designed and mainly researched for generating visual data, there is a notable gap for further exploration of adapting GANs with privacy-preserving measures for generating synthetic text data. Thus, to address this gap, this study aims at answering the research questions of how a privacy-preserving GAN can be adapted to safeguard the privacy of clinical text data and what challenges and potential solutions are associated with these adaptations. To this end, the existing privGAN framework—originally developed and tested for image data—was tailored to suit clinical text data. Following the design science research framework, modifications were made while adhering to the privGAN architecture to incorporate reinforcement learning (RL) for addressing the discrete nature of text data. For synthetic data generation, this study utilised the 'Discharge summary' class from the Noteevents table of the MIMIC-III dataset, which is clinical text data in American English. The utility of the generated data was assessed using the BLEU-4 metric, and a white-box attack was conducted to test the model's resistance to privacy breaches. The experiment yielded a very low BLEU-4 score, indicating that the generator could not produce synthetic data that would capture the linguistic characteristics and patterns of real data. The relatively low white-box attack accuracy of one discriminator (0.2055) suggests that the trained discriminator was less effective in inferring sensitive information with high accuracy. While this may indicate a potential for preserving privacy, increasing the number of discriminators proves less favourable results (0.361). In light of these results, it is noted that the adapted approach in defining the rewards as a measure of discriminators’ uncertainty can signal a contradicting learning strategy and lead to the low utility of data. This study underscores the challenges in adapting privacy-preserving GANs for text data due to the inherent complexity of GANs training and the required computational power. To obtain better results in terms of utility and confirm the effectiveness of the privacy measures, further experiments are required to consider a more direct and granular rewarding system for the generator and to obtain an optimum learning rate. As such, the findings reiterate the necessity for continued experimentation and refinement in adapting privacy-preserving GANs for clinical text. Generative Adversarial Networks privacy-preserving language models clinical text data reinforcement learning synthetic data Information Systems
56	Machine Learning for 3D Visualisation Using Generative Models Taif, Khasrouf M.M. January 2020 (has links) One of the state-of-the-art highlights of deep learning in the past ten years is the introduction of generative adversarial networks (GANs), which had achieved great success in their ability to generate images comparable to real photos with minimum human intervention. These networks can generalise to a multitude of desired outputs, especially in image-to-image problems and image syntheses. This thesis proposes a computer graphics pipeline for 3D rendering by utilising generative adversarial networks (GANs). This thesis is motivated by regression models and convolutional neural networks (ConvNets) such as U-Net architectures, which can be directed to generate realistic global illumination effects, by using a semi-supervised GANs model (Pix2pix) that is comprised of PatchGAN and conditional GAN which is then accompanied by a U-Net structure. Pix2pix had been chosen for this thesis for its ability for training as well as the quality of the output images. It is also different from other forms of GANs by utilising colour labels, which enables further control and consistency of the geometries that comprises the output image. The series of experiments were carried out with laboratory created image sets, to pursue the possibility of which deep learning and generative adversarial networks can lend a hand to enhance the pipeline and speed up the 3D rendering process. First, ConvNet is applied in combination with Support Vector Machine (SVM) in order to pair 3D objects with their corresponding shadows, which can be applied in Augmenter Reality (AR) scenarios. Second, a GANs approach is presented to generate shadows for non-shadowed 3D models, which can also be beneficial in AR scenarios. Third, the possibility of generating high quality renders of image sequences from low polygon density 3D models using GANs. Finally, the possibility to enhance visual coherence of the output image sequences of GAN by utilising multi-colour labels. The results of the adopted GANs model were able to generate realistic outputs comparable to the lab generated 3D rendered ground-truth and control group output images with plausible scores on PSNR and SSIM similarity index metrices. Image syntheses 3D rendering Computer graphics Shadow generation Generative adversarial networks (GANs) Multi-colour semantic colour labels Computer graphics pipeline
57	Generative Adversarial Networks to enhance decision support in digital pathology De Biase, Alessia January 2019 (has links) Histopathological evaluation and Gleason grading on Hematoxylin and Eosin(H&E) stained specimens is the clinical standard in grading prostate cancer. Recently, deep learning models have been trained to assist pathologists in detecting prostate cancer. However, these predictions could be improved further regarding variations in morphology, staining and differences across scanners. An approach to tackle such problems is to employ conditional GANs for style transfer. A total of 52 prostatectomies from 48 patients were scanned with two different scanners. Data was split into 40 images for training and 12 images for testing and all images were divided into overlapping 256x256 patches. A segmentation model was trained using images from scanner A, and the model was tested on images from both scanner A and B. Next, GANs were trained to perform style transfer from scanner A to scanner B. The training was performed using unpaired training images and different types of Unsupervised Image to Image Translation GANs (CycleGAN and UNIT). Beside the common CycleGAN architecture, a modified version was also tested, adding Kullback Leibler (KL) divergence in the loss function. Then, the segmentation model was tested on the augmented images from scanner B.The models were evaluated on 2,000 randomly selected patches of 256x256 pixels from 10 prostatectomies. The resulting predictions were evaluated both qualitatively and quantitatively. All proposed methods outperformed in AUC, in the best case the improvement was of 16%. However, only CycleGAN trained on a large dataset demonstrated to be capable to improve the segmentation tool performance, preserving tissue morphology and obtaining higher results in all the evaluation measurements. All the models were analyzed and, finally, the significance of the difference between the segmentation model performance on style transferred images and on untransferred images was assessed, using statistical tests. Generative Adversarial Networks Digital Pathology CycleGAN Style Transfer Probability Theory and Statistics Sannolikhetsteori och statistik Medical Image Processing Medicinsk bildbehandling Övrig annan teknik
58	Generation of Synthetic Data with Generative Adversarial Networks Garcia Torres, Douglas January 2018 (has links) The aim of synthetic data generation is to provide data that is not real for cases where the use of real data is somehow limited. For example, when there is a need for larger volumes of data, when the data is sensitive to use, or simply when it is hard to get access to the real data. Traditional methods of synthetic data generation use techniques that do not intend to replicate important statistical properties of the original data. Properties such as the distribution, the patterns or the correlation between variables, are often omitted. Moreover, most of the existing tools and approaches require a great deal of user-defined rules and do not make use of advanced techniques like Machine Learning or Deep Learning. While Machine Learning is an innovative area of Artificial Intelligence and Computer Science that uses statistical techniques to give computers the ability to learn from data, Deep Learning is a closely related field based on learning data representations, which may serve useful for the task of synthetic data generation. This thesis focuses on one of the most interesting and promising innovations of the last years in the Machine Learning community: Generative Adversarial Networks. An approach for generating discrete, continuous or text synthetic data with Generative Adversarial Networks is proposed, tested, evaluated and compared with a baseline approach. The results prove the feasibility and show the advantages and disadvantages of using this framework. Despite its high demand for computational resources, a Generative Adversarial Networks framework is capable of generating quality synthetic data that preserves the statistical properties of a given dataset. / Syftet med syntetisk datagenerering är att tillhandahålla data som inte är verkliga i fall där användningen av reella data på något sätt är begränsad. Till exempel, när det finns behov av större datamängder, när data är känsliga för användning, eller helt enkelt när det är svårt att få tillgång till den verkliga data. Traditionella metoder för syntetiska datagenererande använder tekniker som inte avser att replikera viktiga statistiska egenskaper hos de ursprungliga data. Egenskaper som fördelningen, mönstren eller korrelationen mellan variabler utelämnas ofta. Dessutom kräver de flesta av de befintliga verktygen och metoderna en hel del användardefinierade regler och använder inte avancerade tekniker som Machine Learning eller Deep Learning. Machine Learning är ett innovativt område för artificiell intelligens och datavetenskap som använder statistiska tekniker för att ge datorer möjlighet att lära av data. Deep Learning ett närbesläktat fält baserat på inlärningsdatapresentationer, vilket kan vara användbart för att generera syntetisk data. Denna avhandling fokuserar på en av de mest intressanta och lovande innovationerna från de senaste åren i Machine Learning-samhället: Generative Adversarial Networks. Generative Adversarial Networks är ett tillvägagångssätt för att generera diskret, kontinuerlig eller textsyntetisk data som föreslås, testas, utvärderas och jämförs med en baslinjemetod. Resultaten visar genomförbarheten och visar fördelarna och nackdelarna med att använda denna metod. Trots dess stora efterfrågan på beräkningsresurser kan ett generativt adversarialnätverk skapa generell syntetisk data som bevarar de statistiska egenskaperna hos ett visst dataset. Synthetic Data Generation Generative Adversarial Networks Machine Learning Deep Learning Neural Networks Syntetisk Datagenerering Generativa Adversariella Nätverk Maskin-lärande Djupt Lärande Neurala Nätverk. Computer and Information Sciences Data- och informationsvetenskap
59	Automotive 3D Object Detection Without Target Domain Annotations Gustafsson, Fredrik, Linder-Norén, Erik January 2018 (has links) In this thesis we study a perception problem in the context of autonomous driving. Specifically, we study the computer vision problem of 3D object detection, in which objects should be detected from various sensor data and their position in the 3D world should be estimated. We also study the application of Generative Adversarial Networks in domain adaptation techniques, aiming to improve the 3D object detection model's ability to transfer between different domains. The state-of-the-art Frustum-PointNet architecture for LiDAR-based 3D object detection was implemented and found to closely match its reported performance when trained and evaluated on the KITTI dataset. The architecture was also found to transfer reasonably well from the synthetic SYN dataset to KITTI, and is thus believed to be usable in a semi-automatic 3D bounding box annotation process. The Frustum-PointNet architecture was also extended to explicitly utilize image features, which surprisingly degraded its detection performance. Furthermore, an image-only 3D object detection model was designed and implemented, which was found to compare quite favourably with current state-of-the-art in terms of detection performance. Additionally, the PixelDA approach was adopted and successfully applied to the MNIST to MNIST-M domain adaptation problem, which validated the idea that unsupervised domain adaptation using Generative Adversarial Networks can improve the performance of a task network for a dataset lacking ground truth annotations. Surprisingly, the approach did however not significantly improve upon the performance of the image-based 3D object detection models when trained on the SYN dataset and evaluated on KITTI. Object Detection 3D Object Detection Domain Adaptation Generative Adversarial Networks Computer Vision Deep Learning Machine Learning Autonomous Driving Signal Processing Signalbehandling
60	Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data / Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédias Vukotic, Verdran 26 September 2017 (has links) La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images. / In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain. Autoencodeurs Apprentissage de représentations Deep neural networks Embedding Continuous representation Multimedia Multimodal Computer vision Spoken langage understanding Crossmodal Generative adversarial networks Autoencoders 006.4

Search results