Return to search

Stylistic and Spatial Disentanglement in GANs

This dissertation tackles the problem of entanglement in Generative Adversarial Networks (GANs). The key insight is that disentanglement in GANs can be improved by differentiating between the content, and the operations performed on that content. For example, the identity of a generated face can be thought of as the content, while the lighting conditions can be thought of as the operations. We examine disentanglement in several kinds of deep networks. We examine image-to-image translation GANs, unconditional GANs, and sketch extraction networks.
The task in image-to-image translation GANs is to translate images from one domain to another. It is immediately clear that disentanglement is necessary in this case. The network must maintain the core contents of the image while changing the stylistic appearance to match the target domain. We propose latent filter scaling to achieve multimodality and disentanglement. Previous methods require complicated network architectures to enforce that disentanglement. Our approach, on the other hand, maintains the traditional GAN loss with a minor change in architecture. Unlike image-to-image GANs, unconditional GANs are generally entangled. Unconditional GANs offer one method of changing the generated output which is changing the input noise code. Therefore, it is very difficult to resample only some parts of the generated images. We propose structured noise injection to achieve disentanglement in unconditional GANs. We propose using two input codes: one to specify spatially-variable details, and one to specify spatially-invariable details. In addition to the ability to change content and style independently, it also allows users to change the content only at certain locations.
Combining our previous findings, we improve the performance of sketch-to-image translation networks. A crucial problem is how to correct input sketches before feeding them to the generator. By extracting sketches in an unsupervised way only from the spatially-variable branch of the image, we are able to produce sketches that show the content in many different styles. Those sketches can serve as a dataset to train a sketch-to-image translation GAN.

Identiferoai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/670641
Date17 August 2021
CreatorsAlharbi, Yazeed
ContributorsWonka, Peter, Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Michels, Dominik, Ghanem, Bernard, Yang, Ming-Hsuan
Source SetsKing Abdullah University of Science and Technology
LanguageEnglish
Detected LanguageEnglish
TypeDissertation

Page generated in 0.0015 seconds