1 |
Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural ScenesTang, Hao 27 May 2021 (has links)
In this thesis, we mainly focus on image generation. However, one can still observe unsatisfying results produced by existing state-of-the-art methods. To address this limitation and further improve the quality of generated images, we propose a few novel models. The image generation task can be roughly divided into three subtasks, i.e., person image generation, scene image generation, and cross-modal translation. Person image generation can be further divided into three subtasks, namely, hand gesture generation, facial expression generation, and person pose generation. Meanwhile, scene image generation can be further divided into two subtasks, i.e., cross-view image translation and semantic image synthesis. For each task, we have proposed the corresponding solution. Specifically, for hand gesture generation, we have proposed the GestureGAN framework. For facial expression generation, we have proposed the Cycle-in-Cycle GAN (C2GAN) framework. For person pose generation, we have proposed the XingGAN and BiGraphGAN frameworks. For cross-view image translation, we have proposed the SelectionGAN framework. For semantic image synthesis, we have proposed the Local and Global GAN (LGGAN), EdgeGAN, and Dual Attention GAN (DAGAN) frameworks. Although each method was originally proposed for a certain task, we later discovered that each method is universal and can be used to solve different tasks. For instance, GestureGAN can be used to solve both hand gesture generation and cross-view image translation tasks. C2GAN can be used to solve facial expression generation, person pose generation, hand gesture generation, and cross-view image translation. SelectionGAN can be used to solve cross-view image translation, facial expression generation, person pose generation, hand gesture generation, and semantic image synthesis. Moreover, we explore cross-modal translation and propose a novel DanceGAN for audio-to-video translation.
|
2 |
Towards Affective Vision and LanguageHaydarov, Kilichbek 30 November 2021 (has links)
Developing intelligent systems that can recognize and express human affects is essential to bridge the gap between human and artificial intelligence. This thesis explores the creative and emotional frontiers of artificial intelligence. Specifically, in this thesis, we investigate the relation between the affective impact of visual stimuli and natural language by collecting and analyzing a new dataset called ArtEmis. Furthermore, capitalizing on this dataset, we demonstrate affective AI models that can emotionally talk about artwork and generate them given their affective descriptions. In text-to-image generation task, we present HyperCGAN: a conceptually simple and general approach for text-to-image synthesis that uses hypernetworks to condition a GAN model on text. In our setting, the generator and the discriminator weights are controlled by their corresponding hypernetworks, which modulate weight parameters based on the provided text query. We explore different mechanisms to modulate the layers depending on the underlying architecture of a target network and the structure of the conditioning variable.
|
3 |
Morphing architectures for pose-based image generation of people in clothing / Morphing-arkitekturer för pose-baserad bildgeneration av människor i kläderBaldassarre, Federico January 2018 (has links)
This project investigates the task of conditional image generation from misaligned sources, with an example application in the context of content creation for the fashion industry. The problem of spatial misalignment between images is identified, the related literature is discussed, and different approaches are introduced to address it. In particular, several non-linear differentiable morphing modules are designed and integrated in current architectures for image-to-image translation. The proposed method for conditional image generation is applied on a clothes swapping task, using a real-world dataset of fashion images provided by Zalando. In comparison to previous methods for clothes swapping and virtual try-on, the result achieved with our method are of high visual quality and achieve precise reconstruction of the details of the garments. / Detta projekt undersöker villkorad bildgenerering från förskjutna bild-källor, med ett tillämpat exempel inom innehållsskapande för modebranschen. Problemet med rumslig förskjutning mellan bilder identifieras varpå relaterad litteratur diskuteras. Därefter introduceras olika tillvägagångssätt för att lösa problemet. Projektet fokuserar i synnerhet på ickelinjära, differentierbara morphing-moduler vilka designas och integreras i befintlig arkitektur för bild-till-bild-översättning. Den föreslagna metoden för villkorlig bildgenerering tillämpas på en uppgift för klädbyte, med hjälp av ett verklighetsbaserat dataset av modebilder från Zalando. I jämförelse med tidigare modeller för klädbyte och virtuell provning har resultaten från vår metod hög visuell kvalité och uppnår exakt återuppbyggnad av klädernas detaljer.
|
4 |
Improving Image Realism by Traversing the GAN Latent SpaceWen, Jeffrey 25 July 2022 (has links)
No description available.
|
5 |
Low-Cost Design of a 3D Stereo Synthesizer Using Depth-Image-Based RenderingCheng, Ching-Wen 01 September 2011 (has links)
In this thesis, we proposed a low cost stereoscopic image generation hardware using Depth Image Based Rendering (DIBR) method. Due to the unfavorable artifacts produced by the DIBR algorithm, researchers have developed various algorithms to handle the problem. The most common one is to smooth the depth map before rendering. However, pre-processing of the depth map usually generates other artifacts and even degrades the perception of 3D images. In order to avoid these defects, we present a method by modifying the disparity of edges to make the edges of foreground objects on the synthesized virtual images look more natural. In contrast to the high computational complexity and power consumption in previous designs, we propose a method that fills the holes with the mirrored background pixel values next to the holes. Furthermore, unlike previous DIBR methods that usually consist of two phases, image warping and hole filling, in this thesis we present a new DIBR algorithm that combines the operations of image warping and hole filling in one phase so that the total computation time and power consumption are greatly reduced. Experimental results show that the proposed design can generate more natural virtual images for different view angles with shorter computation latency.
|
6 |
Design And Implementation Of A Microprocessor Based Data Collection And Interpretation System With Onboard Graphical InterfaceGoksugur, Gokhan 01 January 2005 (has links) (PDF)
ABSTRACT
DESIGN AND IMPLEMENTATION OF A MICROPROCESSOR BASED DATA
COLLECTION AND INTERPRATATION SYSTEM WITH ONBOARD
GRAPHICAL INTERFACE
Gö / ksü / gü / r, Gö / khan
M.S., Department of Electric and Electronics Engineering
Supervisor : Prof. Dr. Hasan Cengiz Gü / ran
December 2004, 103 pages
This thesis reports the design and implementation of a microprocessor based
interface unit of a navigation system. The interface unit is composed of a TFT
display screen for graphical interface, a Controller Circuit for system control, a
keypad interface for external data entrance to the system and a power interface
circuit to provide interface between the battery of the navigation system and the
Controller Circuit. This thesis reports high speed design of the Controller Circuit and
generation of system functions. Main functions of the interface unit are
communicating with navigation computer and providing a graphical interface to the
driver of the vehicle containing the navigation system. Communication and graphical
data preparation functions are implemented through the use of a microprocessor.
Driver function of TFT display is implemented through the use of a Field
Programmable Gate Array, which is programmed using the Very High Speed IC
Description Language (VHDL).
Keywords: Navigation System, Interface Unit, Controller Circuit, Image Generation
|
7 |
Methods for Generative Adversarial Output EnhancementBrodie, Michael B. 09 December 2020 (has links)
Generative Adversarial Networks (GAN) learn to synthesize novel samples for a given data distribution. While GANs can train on diverse data of various modalities, the most successful use cases to date apply GANs to computer vision tasks. Despite significant advances in training algorithms and network architectures, GANs still struggle to consistently generate high-quality outputs after training. We present a series of papers that improve GAN output inference qualitatively and quantitatively. The first chapter, Alpha Model Domination, addresses a related subfield of Multiple Choice Learning, which -- like GANs -- aims to generate diverse sets of outputs. The next chapter, CoachGAN, introduces a real-time refinement method for the latent input space that improves inference quality for pretrained GANs. The following two chapters introduce finetuning methods for arbitrary, end-to-end differentiable GANs. The first, PuzzleGAN, proposes a self-supervised puzzle-solving task to improve global coherence in generated images. The latter, Trained Truncation Trick, improves upon a common inference heuristic by better maintaining output diversity while increasing image realism. Our final work, Two Second StyleGAN Projection, reduces the time for high-quality, image-to-latent GAN projections by two orders of magnitude. We present a wide array of results and applications of our method. We conclude with implications and directions for future work.
|
8 |
Methods for Generative Adversarial Output EnhancementBrodie, Michael B. 09 December 2020 (has links)
Generative Adversarial Networks (GAN) learn to synthesize novel samples for a given data distribution. While GANs can train on diverse data of various modalities, the most successful use cases to date apply GANs to computer vision tasks. Despite significant advances in training algorithms and network architectures, GANs still struggle to consistently generate high-quality outputs after training. We present a series of papers that improve GAN output inference qualitatively and quantitatively. The first chapter, Alpha Model Domination, addresses a related subfield of Multiple Choice Learning, which -- like GANs -- aims to generate diverse sets of outputs. The next chapter, CoachGAN, introduces a real-time refinement method for the latent input space that improves inference quality for pretrained GANs. The following two chapters introduce finetuning methods for arbitrary, end-to-end differentiable GANs. The first, PuzzleGAN, proposes a self-supervised puzzle-solving task to improve global coherence in generated images. The latter, Trained Truncation Trick, improves upon a common inference heuristic by better maintaining output diversity while increasing image realism. Our final work, Two Second StyleGAN Projection, reduces the time for high-quality, image-to-latent GAN projections by two orders of magnitude. We present a wide array of results and applications of our method. We conclude with implications and directions for future work.
|
9 |
Synthetic Image Generation Using GANs : Generating Class Specific Images of Bacterial Growth / Syntetisk bildgenerering med GANsMattila, Marianne January 2021 (has links)
Mastitis is the most common disease affecting Swedish milk cows. Automatic image classification can be useful for quickly classifying the bacteria causing this inflammation, in turn making it possible to start treatment more quickly. However, training an automatic classifier relies on the availability of data. Data collection can be a slow process, and GANs are a promising way to generate synthetic data to add plausible samples to an existing data set. The purpose of this thesis is to explore the usefulness of GANs for generating images of bacteria. This was done through researching existing literature on the subject, implementing a GAN, and evaluating the generated images. A cGAN capable of generating class-specific bacteria was implemented and improvements upon it made. The images generated by the cGAN were evaluated using visual examination, rapid scene categorization, and an expert interview regarding the generated images. While the cGAN was able to replicate certain features in the real images, it fails in crucial aspects such as symmetry and detail. It is possible that other GAN variants may be better suited to the task. Lastly, the results highlight the challenges of evaluating GANs with current evaluation methods.
|
10 |
Image Analysis For Plant PhenotypingEnyu Cai (15533216) 17 May 2023 (has links)
<p>Plant phenotyping focuses on the measurement of plant characteristics throughout the growing season, typically with the goal of evaluating genotypes for plant breeding and management practices related to nutrient applications. Estimating plant characteristics is important for finding the relationship between the plant's genetic data and observable traits, which is also related to the environment and management practices. Recent machine learning approaches provide promising capabilities for high-throughput plant phenotyping using images. In this thesis, we focus on estimating plant traits for a field-based crop using images captured by Unmanned Aerial Vehicles (UAVs). We propose a method for estimating plant centers by transferring an existing model to a new scenario using limited ground truth data. We describe the use of transfer learning using a model fine-tuned for a single field or a single type of plant on a varied set of similar crops and fields. We introduce a method for rapidly counting panicles using images acquired by UAVs. We evaluate three different deep neural network structures for panicle counting and location. We propose a method for sorghum flowering time estimation using multi-temporal panicle counting. We present an approach that uses synthetic training images from generative adversarial networks for data augmentation to enhance the performance of sorghum panicle detection and counting. We reduce the amount of training data for sorghum panicle detection via semi-supervised learning. We create synthetic sorghum and maize images using diffusion models. We propose a method for tomato plant segmentation by color correction and color space conversion. We also introduce the methods for detecting and classifying bacterial tomato wilting from images.</p>
|
Page generated in 0.1379 seconds