Global ETD Search

Return to search

Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes

In this thesis, we mainly focus on image generation. However, one can still observe unsatisfying results produced by existing state-of-the-art methods. To address this limitation and further improve the quality of generated images, we propose a few novel models. The image generation task can be roughly divided into three subtasks, i.e., person image generation, scene image generation, and cross-modal translation. Person image generation can be further divided into three subtasks, namely, hand gesture generation, facial expression generation, and person pose generation. Meanwhile, scene image generation can be further divided into two subtasks, i.e., cross-view image translation and semantic image synthesis. For each task, we have proposed the corresponding solution. Specifically, for hand gesture generation, we have proposed the GestureGAN framework. For facial expression generation, we have proposed the Cycle-in-Cycle GAN (C2GAN) framework. For person pose generation, we have proposed the XingGAN and BiGraphGAN frameworks. For cross-view image translation, we have proposed the SelectionGAN framework. For semantic image synthesis, we have proposed the Local and Global GAN (LGGAN), EdgeGAN, and Dual Attention GAN (DAGAN) frameworks. Although each method was originally proposed for a certain task, we later discovered that each method is universal and can be used to solve different tasks. For instance, GestureGAN can be used to solve both hand gesture generation and cross-view image translation tasks. C2GAN can be used to solve facial expression generation, person pose generation, hand gesture generation, and cross-view image translation. SelectionGAN can be used to solve cross-view image translation, facial expression generation, person pose generation, hand gesture generation, and semantic image synthesis. Moreover, we explore cross-modal translation and propose a novel DanceGAN for audio-to-video translation.

Generative Adversarial Networks (GANs)

Image Generation

Image-to-Image Translation

Person Image Generation

Scene Image Generation

Cross-Modal Translation

Identifer	oai:union.ndltd.org:unitn.it/oai:iris.unitn.it:11572/306790
Date	27 May 2021
Creators	Tang, Hao
Contributors	Tang, Hao, Sebe, Niculae
Publisher	Università degli studi di Trento, place:TRENTO
Source Sets	Università di Trento
Language	English
Detected Language	English
Type	info:eu-repo/semantics/doctoralThesis
Rights	info:eu-repo/semantics/openAccess
Relation	firstpage:1, lastpage:281, numberofpages:281

Page generated in 0.0022 seconds

Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes

Description

Links & Downloads

Tags

Additional Fields