Global ETD Search

1	Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes Tang, Hao 27 May 2021 (has links) In this thesis, we mainly focus on image generation. However, one can still observe unsatisfying results produced by existing state-of-the-art methods. To address this limitation and further improve the quality of generated images, we propose a few novel models. The image generation task can be roughly divided into three subtasks, i.e., person image generation, scene image generation, and cross-modal translation. Person image generation can be further divided into three subtasks, namely, hand gesture generation, facial expression generation, and person pose generation. Meanwhile, scene image generation can be further divided into two subtasks, i.e., cross-view image translation and semantic image synthesis. For each task, we have proposed the corresponding solution. Specifically, for hand gesture generation, we have proposed the GestureGAN framework. For facial expression generation, we have proposed the Cycle-in-Cycle GAN (C2GAN) framework. For person pose generation, we have proposed the XingGAN and BiGraphGAN frameworks. For cross-view image translation, we have proposed the SelectionGAN framework. For semantic image synthesis, we have proposed the Local and Global GAN (LGGAN), EdgeGAN, and Dual Attention GAN (DAGAN) frameworks. Although each method was originally proposed for a certain task, we later discovered that each method is universal and can be used to solve different tasks. For instance, GestureGAN can be used to solve both hand gesture generation and cross-view image translation tasks. C2GAN can be used to solve facial expression generation, person pose generation, hand gesture generation, and cross-view image translation. SelectionGAN can be used to solve cross-view image translation, facial expression generation, person pose generation, hand gesture generation, and semantic image synthesis. Moreover, we explore cross-modal translation and propose a novel DanceGAN for audio-to-video translation. Generative Adversarial Networks (GANs) Image Generation Image-to-Image Translation Person Image Generation Scene Image Generation Cross-Modal Translation
2	Towards Affective Vision and Language Haydarov, Kilichbek 30 November 2021 (has links) Developing intelligent systems that can recognize and express human affects is essential to bridge the gap between human and artificial intelligence. This thesis explores the creative and emotional frontiers of artificial intelligence. Specifically, in this thesis, we investigate the relation between the affective impact of visual stimuli and natural language by collecting and analyzing a new dataset called ArtEmis. Furthermore, capitalizing on this dataset, we demonstrate affective AI models that can emotionally talk about artwork and generate them given their affective descriptions. In text-to-image generation task, we present HyperCGAN: a conceptually simple and general approach for text-to-image synthesis that uses hypernetworks to condition a GAN model on text. In our setting, the generator and the discriminator weights are controlled by their corresponding hypernetworks, which modulate weight parameters based on the provided text query. We explore different mechanisms to modulate the layers depending on the underlying architecture of a target network and the structure of the conditioning variable. Affective Computing Text-to-Image Generation Image Captioning
3	Morphing architectures for pose-based image generation of people in clothing / Morphing-arkitekturer för pose-baserad bildgeneration av människor i kläder Baldassarre, Federico January 2018 (has links) This project investigates the task of conditional image generation from misaligned sources, with an example application in the context of content creation for the fashion industry. The problem of spatial misalignment between images is identified, the related literature is discussed, and different approaches are introduced to address it. In particular, several non-linear differentiable morphing modules are designed and integrated in current architectures for image-to-image translation. The proposed method for conditional image generation is applied on a clothes swapping task, using a real-world dataset of fashion images provided by Zalando. In comparison to previous methods for clothes swapping and virtual try-on, the result achieved with our method are of high visual quality and achieve precise reconstruction of the details of the garments. / Detta projekt undersöker villkorad bildgenerering från förskjutna bild-källor, med ett tillämpat exempel inom innehållsskapande för modebranschen. Problemet med rumslig förskjutning mellan bilder identifieras varpå relaterad litteratur diskuteras. Därefter introduceras olika tillvägagångssätt för att lösa problemet. Projektet fokuserar i synnerhet på ickelinjära, differentierbara morphing-moduler vilka designas och integreras i befintlig arkitektur för bild-till-bild-översättning. Den föreslagna metoden för villkorlig bildgenerering tillämpas på en uppgift för klädbyte, med hjälp av ett verklighetsbaserat dataset av modebilder från Zalando. I jämförelse med tidigare modeller för klädbyte och virtuell provning har resultaten från vår metod hög visuell kvalité och uppnår exakt återuppbyggnad av klädernas detaljer. Deep learning image generation fashion Computer Sciences Datavetenskap (datalogi)
4	Improving Image Realism by Traversing the GAN Latent Space Wen, Jeffrey 25 July 2022 (has links) No description available. Electrical Engineering Generative Adversarial Networks GAN Latent Space Image Generation
5	Domain invariance for semantically consistent image manipulation Bashkirova, Dina 07 February 2025 (has links) 2024 / Image manipulation is a fundamental task in computer vision, spanning its range of applications from domain adaptation and data augmentation to visual content creation. At the root of the task lies two equally important goals -- generating highly realistic and diverse images and preserving the aspects of the input image not related to the desired edit. In this thesis, we explore the latter goal, answering the questions: what can be considered a semantically correct image manipulation, and how to evaluate it? given unpaired examples before and after the edit, can a generative model infer what aspects of the input we aim to preserve, and which we want to manipulate? what are the necessary conditions that allow us to guarantee that manipulation preserves the semantics? and many more. This thesis ties semantic consistency to the problem of disentanglement, formulating it as disentangling the domain invariant factors of variation -- aspects shared across the examples before and after manipulation, which allows a more rigorous and systematic approach to solving the task. We illustrate the advantages of disentangling the domain-invariant features for semantically consistent mappings on various image editing tasks, including general unpaired image-to-image translation, sketch-to-photo translation and object relighting. Artificial intelligence Computer vision Image editing Image generation
6	Low-Cost Design of a 3D Stereo Synthesizer Using Depth-Image-Based Rendering Cheng, Ching-Wen 01 September 2011 (has links) In this thesis, we proposed a low cost stereoscopic image generation hardware using Depth Image Based Rendering (DIBR) method. Due to the unfavorable artifacts produced by the DIBR algorithm, researchers have developed various algorithms to handle the problem. The most common one is to smooth the depth map before rendering. However, pre-processing of the depth map usually generates other artifacts and even degrades the perception of 3D images. In order to avoid these defects, we present a method by modifying the disparity of edges to make the edges of foreground objects on the synthesized virtual images look more natural. In contrast to the high computational complexity and power consumption in previous designs, we propose a method that fills the holes with the mirrored background pixel values next to the holes. Furthermore, unlike previous DIBR methods that usually consist of two phases, image warping and hole filling, in this thesis we present a new DIBR algorithm that combines the operations of image warping and hole filling in one phase so that the total computation time and power consumption are greatly reduced. Experimental results show that the proposed design can generate more natural virtual images for different view angles with shorter computation latency. depth information 3D stereoscopic image generation Depth Image Based Rendering (DIBR)
7	Design And Implementation Of A Microprocessor Based Data Collection And Interpretation System With Onboard Graphical Interface Goksugur, Gokhan 01 January 2005 (has links) (PDF) ABSTRACT DESIGN AND IMPLEMENTATION OF A MICROPROCESSOR BASED DATA COLLECTION AND INTERPRATATION SYSTEM WITH ONBOARD GRAPHICAL INTERFACE G&ouml / ks&uuml / g&uuml / r, G&ouml / khan M.S., Department of Electric and Electronics Engineering Supervisor : Prof. Dr. Hasan Cengiz G&uuml / ran December 2004, 103 pages This thesis reports the design and implementation of a microprocessor based interface unit of a navigation system. The interface unit is composed of a TFT display screen for graphical interface, a Controller Circuit for system control, a keypad interface for external data entrance to the system and a power interface circuit to provide interface between the battery of the navigation system and the Controller Circuit. This thesis reports high speed design of the Controller Circuit and generation of system functions. Main functions of the interface unit are communicating with navigation computer and providing a graphical interface to the driver of the vehicle containing the navigation system. Communication and graphical data preparation functions are implemented through the use of a microprocessor. Driver function of TFT display is implemented through the use of a Field Programmable Gate Array, which is programmed using the Very High Speed IC Description Language (VHDL). Keywords: Navigation System, Interface Unit, Controller Circuit, Image Generation TK Electronics 7800-8360
8	Methods for Generative Adversarial Output Enhancement Brodie, Michael B. 09 December 2020 (has links) Generative Adversarial Networks (GAN) learn to synthesize novel samples for a given data distribution. While GANs can train on diverse data of various modalities, the most successful use cases to date apply GANs to computer vision tasks. Despite significant advances in training algorithms and network architectures, GANs still struggle to consistently generate high-quality outputs after training. We present a series of papers that improve GAN output inference qualitatively and quantitatively. The first chapter, Alpha Model Domination, addresses a related subfield of Multiple Choice Learning, which -- like GANs -- aims to generate diverse sets of outputs. The next chapter, CoachGAN, introduces a real-time refinement method for the latent input space that improves inference quality for pretrained GANs. The following two chapters introduce finetuning methods for arbitrary, end-to-end differentiable GANs. The first, PuzzleGAN, proposes a self-supervised puzzle-solving task to improve global coherence in generated images. The latter, Trained Truncation Trick, improves upon a common inference heuristic by better maintaining output diversity while increasing image realism. Our final work, Two Second StyleGAN Projection, reduces the time for high-quality, image-to-latent GAN projections by two orders of magnitude. We present a wide array of results and applications of our method. We conclude with implications and directions for future work. Generative Adversarial Networks image generation multiple choice learning deep learning generative modeling Physical Sciences and Mathematics
9	Methods for Generative Adversarial Output Enhancement Brodie, Michael B. 09 December 2020 (has links) Generative Adversarial Networks (GAN) learn to synthesize novel samples for a given data distribution. While GANs can train on diverse data of various modalities, the most successful use cases to date apply GANs to computer vision tasks. Despite significant advances in training algorithms and network architectures, GANs still struggle to consistently generate high-quality outputs after training. We present a series of papers that improve GAN output inference qualitatively and quantitatively. The first chapter, Alpha Model Domination, addresses a related subfield of Multiple Choice Learning, which -- like GANs -- aims to generate diverse sets of outputs. The next chapter, CoachGAN, introduces a real-time refinement method for the latent input space that improves inference quality for pretrained GANs. The following two chapters introduce finetuning methods for arbitrary, end-to-end differentiable GANs. The first, PuzzleGAN, proposes a self-supervised puzzle-solving task to improve global coherence in generated images. The latter, Trained Truncation Trick, improves upon a common inference heuristic by better maintaining output diversity while increasing image realism. Our final work, Two Second StyleGAN Projection, reduces the time for high-quality, image-to-latent GAN projections by two orders of magnitude. We present a wide array of results and applications of our method. We conclude with implications and directions for future work. Generative Adversarial Networks image generation multiple choice learning deep learning generative modeling Physical Sciences and Mathematics
10	Synthetic Image Generation Using GANs : Generating Class Specific Images of Bacterial Growth / Syntetisk bildgenerering med GANs Mattila, Marianne January 2021 (has links) Mastitis is the most common disease affecting Swedish milk cows. Automatic image classification can be useful for quickly classifying the bacteria causing this inflammation, in turn making it possible to start treatment more quickly. However, training an automatic classifier relies on the availability of data. Data collection can be a slow process, and GANs are a promising way to generate synthetic data to add plausible samples to an existing data set. The purpose of this thesis is to explore the usefulness of GANs for generating images of bacteria. This was done through researching existing literature on the subject, implementing a GAN, and evaluating the generated images. A cGAN capable of generating class-specific bacteria was implemented and improvements upon it made. The images generated by the cGAN were evaluated using visual examination, rapid scene categorization, and an expert interview regarding the generated images. While the cGAN was able to replicate certain features in the real images, it fails in crucial aspects such as symmetry and detail. It is possible that other GAN variants may be better suited to the task. Lastly, the results highlight the challenges of evaluating GANs with current evaluation methods. Synthetic image generation synthetic data generative model GAN GAN evaluation Computer and Information Sciences Data- och informationsvetenskap

Search results