Return to search

3D Face Reconstruction from a Front Image by Pose Extension in Latent Space

Numerous techniques for 3D face reconstruction from a single image exist, making use of large facial databases. However, they commonly encounter quality issues due to the absence of information from alternate perspectives. For example, 3D reconstruction with a single front view input data has limited realism, particularly for profile views. We have observed that multiple-view 3D face reconstruction yields higher-quality models compared to single-view reconstruction. Based on this observation, we propose a novel pipeline that combines several deep-learning methods to enhance the quality of reconstruction from a single frontal view.
Our method requires only a single image (front view) as input, yet it generates multiple realistic facial viewpoints using various deep-learning networks. These viewpoints are utilized to create a 3D facial model, significantly enhancing the 3D face quality. Traditional image-space editing has limitations in manipulating content and styles while preserving high quality. However, editing in the latent space, which is the space after encoding or before decoding in a neural network, offers greater capabilities for manipulating a given photo.
Motivated by the ability of neural networks to generate 2D images from an extensive database and recognizing that multi-view 3D face reconstruction outperforms single-view approaches, we propose a new pipeline. This pipeline involves latent space manipulation by first finding a latent vector corresponding to a given image using the Generative Adversarial Network (GAN) inversion method. We then search for nearby latent vectors to synthesize multiple pose images from the provided input image, aiming to enhance 3D face reconstruction.
The generated images are then fed into Diffusion models, another image synthesis network, to generate their respective profile views. The Diffusion model is known for producing more realistic large-angle variations of a given object than GAN models do. Subsequently, all these images (multi-view images) are fed into an Autoencoder, a neural network designed for 3D face model predictions, to derive the 3D structure of the face. Finally, the texture of the 3D face model is combined to enhance its realism, and certain areas of the 3D shape are refined to correct any unrealistic aspects.
Our experimental results validate the effectiveness and efficiency of our method in reconstructing highly accurate 3D models of human faces from a single input (front view input) image. The reconstructed models retain high visual fidelity to the original image, even without the need for a 3D database.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/45481
Date27 September 2023
CreatorsZhang, Zhao
ContributorsLee, Wonsook
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAttribution 4.0 International, http://creativecommons.org/licenses/by/4.0/

Page generated in 0.0025 seconds