Return to search

Exploring 2D and 3D Human Generation and Editing

In modern society, cameras on intelligent devices can generate a huge amount of natural images, including images of the human body and face. Therefore, there is a huge social demand for more efficient editing of images to meet human production and life needs, including entertainment, such as image beauty. In recent years, Generative Models with Deep Learning techniques have attracted lots of attention in the Artificial Intelligence field, and some powerful methods, such as Variational Autoencoder and Generative Adversarial Networks, can generate very high-resolution and realistic images, especially for facial images, human body image. In this thesis, we follow the powerful generative model to achieve image generation and editing tasks, and we focus on human image generation and editing tasks, including local eye and face generation and editing, global human body generation, and editing. We introduce different methods to improve previous baselines based on different human regions. 1) Eye region of human image: Gaze correction and redirection aim to manipulate the eye gaze to a desired direction. Previous common gaze correction methods require annotating training data with precise gaze and head pose information. To address this issue, we proposed the new datasets as training data and formulated the gaze correction task as a generative inpainting problem, addressed using two new modules. 2) Face region of human image: Based on a powerful generative model for face region, many papers have learned to control the latent space to manipulate face attributes. However, they need more precise controls on 3d factors such as camera pose because they tend to ignore the underlying 3D scene rendering process. Thus, we take the pre-trained 3D-Aware generative model as the backbone and learn to manipulate the latent space using the attribute labels as conditional information to achieve the 3D-Aware face generation and editing task. 3) Human Body region of human image: 3D-Aware generative models have been shown to produce realistic images representing rigid/semi-rigid objects, such as facial regions. However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which greatly interests many computer graphics applications. Thus, we introduce semantic segmentation into the model. We split the entire generation pipeline into two stages and use intermediate segmentation masks to bridge these two stages. Furthermore, our model can control pose, semantic, and appearance codes by using multiple latent codes to achieve human image editing.

Identiferoai:union.ndltd.org:unitn.it/oai:iris.unitn.it:11572/400992
Date12 February 2024
CreatorsZhang, Jichao
ContributorsZhang, Jichao, Sebe, Niculae
PublisherUniversità degli studi di Trento, place:TRENTO
Source SetsUniversità di Trento
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis
Rightsinfo:eu-repo/semantics/openAccess
Relationfirstpage:1, lastpage:134, numberofpages:134

Page generated in 0.0024 seconds