Spelling suggestions: "subject:"ctructural correction"" "subject:"ctructural eorrection""
1 |
BrandGAN: Unsupervised Structural Image CorrectionEl Katerji, Mostafa 12 May 2021 (has links)
Recently, machine learning models such as Generative Adversarial Networks and Autoencoders have received significant attention from the research community. In fact, researchers have produced novel ways for using this technology in the space of image manipulation for cross-domain image-to-image transformations, upsampling, style imprinting, human facial editing, and computed tomography correction. Previous work primarily focuses on transformations where the output inherits the same skeletal outline as the input image.
This work proposes a novel framework, called BrandGAN, that tackles image correction for hand-drawn images. One of this problem’s novelties is that it requires the skeletal outline of the input image to be manipulated and adjusted to look more like a target reference while retaining key visual features that were included intentionally by its creator.
GANs, when trained on a dataset, are capable of producing a large variety of novel images derived from a combination of visual features from the original dataset. StyleGAN is a model that iterated on the concept of GANs and was able to produce high-fidelity images such as human faces and cars. StyleGAN includes a process called projection that finds an encoding of an input image capable of producing a visually similar image. Projection in StyleGAN demonstrated the model’s ability to represent real images that were not a part of its training dataset. StyleGAN encodings are vectors that represent features of an image. Encodings can be combined to merge or manipulate features of distinct images.
In BrandGAN, we tackle image correction by leveraging StyleGAN’s projection and encoding vector feature manipulation. We present a modified version of projection to find an encoding representation of hand-drawn images. We propose a novel GAN indexing technique, called GANdex, capable of finding encodings of novel images derived from the original dataset that share visual similarities with the input image. Finally, with vector feature manipulation, we combine the GANdex vector’s features with the input image’s projection to produce the final image-corrected output. Combining the vectors results in adjusting the input imperfections to resemble the original dataset’s structure while retaining novel features from the raw input image. We evaluate seventy-five hand-drawn images collected through a study with fifteen participants using objective and subjective measures. BrandGAN reduced the Fréchet inception distance from 193 to 161 and the Kernel-Inception distance from 0.048 to 0.026 when comparing the hand-drawn and BrandGAN output images to the reference design dataset. A blinded experiment showed that the average participant could identify 4.33 out of 5 images as their own when presented with a visually similar control image. We included a survey that collected opinion scores ranging from one or “strongly disagree” to five or “strongly agree.” The average participant answered 4.32 for the retention of detail, 4.25 for the output’s professionalism, and 4.57 for their preference of using the BrandGAN output over their own.
|
Page generated in 0.0841 seconds