Spelling suggestions: "subject:"generative adversarial betworks"" "subject:"generative adversarial conetworks""
61 |
Automotive 3D Object Detection Without Target Domain AnnotationsGustafsson, Fredrik, Linder-Norén, Erik January 2018 (has links)
In this thesis we study a perception problem in the context of autonomous driving. Specifically, we study the computer vision problem of 3D object detection, in which objects should be detected from various sensor data and their position in the 3D world should be estimated. We also study the application of Generative Adversarial Networks in domain adaptation techniques, aiming to improve the 3D object detection model's ability to transfer between different domains. The state-of-the-art Frustum-PointNet architecture for LiDAR-based 3D object detection was implemented and found to closely match its reported performance when trained and evaluated on the KITTI dataset. The architecture was also found to transfer reasonably well from the synthetic SYN dataset to KITTI, and is thus believed to be usable in a semi-automatic 3D bounding box annotation process. The Frustum-PointNet architecture was also extended to explicitly utilize image features, which surprisingly degraded its detection performance. Furthermore, an image-only 3D object detection model was designed and implemented, which was found to compare quite favourably with current state-of-the-art in terms of detection performance. Additionally, the PixelDA approach was adopted and successfully applied to the MNIST to MNIST-M domain adaptation problem, which validated the idea that unsupervised domain adaptation using Generative Adversarial Networks can improve the performance of a task network for a dataset lacking ground truth annotations. Surprisingly, the approach did however not significantly improve upon the performance of the image-based 3D object detection models when trained on the SYN dataset and evaluated on KITTI.
|
62 |
Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data / Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédiasVukotic, Verdran 26 September 2017 (has links)
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images. / In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain.
|
63 |
Material Artefact Generation / Material Artefact GenerationRončka, Martin January 2019 (has links)
Ne vždy je jednoduché získání dostatečně velké a kvalitní datové sady s obrázky zřetelných artefaktů, ať už kvůli nedostatku ze strany zdroje dat nebo složitosti tvorby anotací. To platí například pro radiologii, nebo také strojírenství. Abychom mohli využít moderní uznávané metody strojového učení které se využívají pro klasifikaci, segmentaci a detekci defektů, je potřeba aby byla datová sada dostatečně velká a vyvážená. Pro malé datové sady čelíme problémům jako je přeučení a slabost dat, které způsobují nesprávnou klasifikaci na úkor málo reprezentovaných tříd. Tato práce se zabývá prozkoumáváním využití generativních sítí pro rozšíření a vyvážení datové sady o nové vygenerované obrázky. Za použití sítí typu Conditional Generative Adversarial Networks (CGAN) a heuristického generátoru anotací jsme schopni generovat velké množství nových snímků součástek s defekty. Pro experimenty s generováním byla použita datová sada závitů. Dále byly použity dvě další datové sady keramiky a snímků z MRI (BraTS). Nad těmito dvěma datovými sadami je provedeno zhodnocení vlivu generovaných dat na učení a zhodnocení přínosu pro zlepšení klasifikace a segmentace.
|
64 |
Robust Object Detection under Varying Illuminations and DistortionsJanuary 2020 (has links)
abstract: Object detection is an interesting computer vision area that is concerned with the detection of object instances belonging to specific classes of interest as well as the localization of these instances in images and/or videos. Object detection serves as a vital module in many computer vision based applications. This work focuses on the development of object detection methods that exhibit increased robustness to varying illuminations and image quality. In this work, two methods for robust object detection are presented.
In the context of varying illumination, this work focuses on robust generic obstacle detection and collision warning in Advanced Driver Assistance Systems (ADAS) under varying illumination conditions. The highlight of the first method is the ability to detect all obstacles without prior knowledge and detect partially occluded obstacles including the obstacles that have not completely appeared in the frame (truncated obstacles). It is first shown that the angular distortion in the Inverse Perspective Mapping (IPM) domain belonging to obstacle edges varies as a function of their corresponding 2D location in the camera plane. This information is used to generate object proposals. A novel proposal assessment method based on fusing statistical properties from both the IPM image and the camera image to perform robust outlier elimination and false positive reduction is also proposed.
In the context of image quality, this work focuses on robust multiple-class object detection using deep neural networks for images with varying quality. The use of Generative Adversarial Networks (GANs) is proposed in a novel generative framework to generate features that provide robustness for object detection on reduced quality images. The proposed GAN-based Detection of Objects (GAN-DO) framework is not restricted to any particular architecture and can be generalized to several deep neural network (DNN) based architectures. The resulting deep neural network maintains the exact architecture as the selected baseline model without adding to the model parameter complexity or inference speed. Performance results provided using GAN-DO on object detection datasets establish an improved robustness to varying image quality and a higher object detection and classification accuracy compared to the existing approaches. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020
|
65 |
Image-to-Image Translation for Improvement of Synthetic Thermal Infrared Training Data Using Generative Adversarial NetworksHamrell, Hanna January 2021 (has links)
Training data is an essential ingredient within supervised learning, yet time con-suming, expensive and for some applications impossible to retrieve. Thus it isof interest to use synthetic training data. However, the domain shift of syntheticdata makes it challenging to obtain good results when used as training data fordeep learning models. It is therefore of interest to refine synthetic data, e.g. using image-to-image translation, to improve results. The aim of this work is to compare different methods to do image-to-image translation of synthetic training data of thermal IR-images using GANs. Translation is done both using synthetic thermal IR-images alone, as well as including pixelwise depth and/or semantic information. To evaluate, a new measure based on the Frechét Inception Distance, adapted to work for thermal IR-images is proposed. The results show that the model trained using IR-images alone translates the generated images closest to the domain of authentic thermal IR-images. The training where IR-images are complemented by corresponding pixelwise depth data performs second best. However, given more training time, inclusion of depth data has the potential to outperform training withirdata alone. This gives a valuable insight on how to best translate images from the domain of synthetic IR-images to that of authentic IR-images, which is vital for quick and low cost generation of training data for deep learning models.
|
66 |
Privacy-aware data generation : Using generative adversarial networks and differential privacyHübinette, Felix January 2022 (has links)
Today we are surrounded by IOT devices that constantly generate different kinds of data about its environment and its users. Much of this data could be useful for different research purposes and development, but a lot of this collected data is privacy-sensitive for the individual person. To protect the individual's privacy, we have data protection laws. But these restrictions by laws also dramatically reduce the amount of data available for research and development. Therefore it would be beneficial if we could find a work around that respects people's privacy without breaking the laws while still maintaining the usefulness of data. The purpose of this thesis is to show how we can generate privacy-aware data from a dataset by using Generative Adversarial Networks (GANS) and Differential Privacy (DP), that maintains data utility. This is useful because it allows for the sharing of privacy-preserving data, so that the data can be used in research and development with concern for privacy. GANS is used for generating synthetic data. DP is an anonymization technique of data. With the combination of these two techniques, we generate synthetic-privacy-aware data from an existing open-source Fitbit dataset. The specific type of GANS model that is used is called CTGAN and differential privacy is achieved with the help of gaussian noise. The results from the experiments performed show many similarities between the original dataset and the experimental datasets. The experiments performed very well at the Kolmogorov Smirnov test, with the lowest P-value of all experiments sitting at 0.92. The conclusion that is drawn is that this is another promising methodology for creating privacy-aware-synthetic data, that maintains reasonable data utility while still utilizing DP techniques to achieve data privacy.
|
67 |
Machine Learning-Based Reduced-Order Modeling and Uncertainty Quantification for "Structure-Property" Relations for ICME ApplicationsYuan, Mengfei 11 July 2019 (has links)
No description available.
|
68 |
Single Image Super Resolution with Infrared Imagery and Multi-Step Reinforcement LearningVassilo, Kyle January 2020 (has links)
No description available.
|
69 |
Effects of Transfer Learning on Data Augmentation with Generative Adversarial Networks / Effekten av transferlärande på datautökning med generativt adversarialt nätverkBerglöf, Olle, Jacobs, Adam January 2019 (has links)
Data augmentation is a technique that acquires more training data by augmenting available samples, where the training data is used to fit model parameters. Data augmentation is utilized due to a shortage of training data in certain domains and to reduce overfitting. Augmenting a training dataset for image classification with a Generative Adversarial Network (GAN) has been shown to increase classification accuracy. This report investigates if transfer learning within a GAN can further increase classification accuracy when utilizing the augmented training dataset. The method section describes a specific GAN architecture for the experiments that includes a label condition. When using transfer learning within the specific GAN architecture, a statistical analysis shows a statistically significant increase in classification accuracy for a classification problem with the EMNIST dataset, which consists of images of handwritten alphanumeric characters. In the discussion section, the authors analyze the results and motivates other use cases for the proposed GAN architecture. / Datautökning är en metod som skapar mer träningsdata genom att utöka befintlig träningsdata, där träningsdatan används för att anpassa modellers parametrar. Datautökning används på grund av en brist på träningsdata inom vissa områden samt för att minska overfitting. Att utöka ett träningsdataset för att genomföra bildklassificering med ett generativt adversarialt nätverk (GAN) har visats kunna öka precisionen av klassificering av bilder. Denna rapport undersöker om transferlärande inom en GAN kan vidare öka klassificeringsprecisionen när ett utökat träningsdataset används. Metoden beskriver en specific GANarkitektur som innehåller ett etikettvillkor. När transferlärande används inom den utvalda GAN-arkitekturen visar en statistisk analys en statistiskt säkerställd ökning av klassificeringsprecisionen för ett klassificeringsproblem med EMNIST datasetet, som innehåller bilder på handskrivna bokstäver och siffror. I diskussionen diskuteras orsakerna bakom resultaten och fler användningsområden nämns.
|
70 |
Latent Space Manipulation of GANs for Seamless Image CompositingFruehstueck, Anna 04 1900 (has links)
Generative Adversarial Networks (GANs) are a very successful method for high-quality image synthesis and are a powerful tool to generate realistic images by learning their visual properties from a dataset of exemplars. However, the controllability of the generator output still poses many challenges. We propose several methods for achieving larger and/or higher visual quality in GAN outputs by combining latent space manipulations with image compositing operations: (1) GANs are inherently suitable for small-scale texture synthesis due to the generator’s capability to learn image properties of a limited domain such as the properties of a specific texture type at a desired level of detail. A rich variety of suitable texture tiles can be synthesized from the trained generator. Due to the convolutional nature of GANs, we can achieve largescale texture synthesis by tiling intermediate latent blocks, allowing the generation of (almost) arbitrarily large texture images that are seamlessly merged. (2) We notice that generators trained on heterogeneous data perform worse than specialized GANs, and we demonstrate that we can optimize multiple independently trained generators in such a way that a specialized network can fill in high-quality details for specific image regions, or insets, of a lower-quality canvas generator. Multiple generators can collaborate to improve the visual output quality and through careful optimization, seamless transitions between different generators can be achieved. (3) GANs can also be used to semantically edit facial images and videos, with novel 3D GANs even allowing for camera changes, enabling unseen views of the target. However, the GAN output must be merged with the surrounding image or video in a spatially and temporally consistent way, which we demonstrate in our method.
|
Page generated in 0.1172 seconds