1 |
Improve the efficiency of conditional generative modelsXu, Yanwu 13 September 2024 (has links)
Deep generative models have undergone significant advancements, enabling the production of high-fidelity data across various fields, including computer vision and medical imaging. The availability of paired annotations facilitates a controllable generative process through conditional generative models, which capture the conditional distribution P(X|Y), where X represents high-dimensional data and Y denotes the associated annotation. This controllability makes conditional generative models more preferable than ordinary generative models, which only model P(X). For instance, the latest generative AI techniques within the Artificial Intelligence Generated Content (AIGC) realm have unlocked the potential for flexible image and video generation/editing based on text descriptions or “prompts.” Additionally, generative AI has enhanced model efficiency by supplementing datasets with synthesized data in scenarios where annotations are unavailable or imprecise.
Despite these capabilities, challenges persist in ensuring efficient training of conditional generative models. These challenges include: 1) capturing the intricate relationship between data and its annotations, which can introduce biases; 2) higher computational resource requirements during training and inference compared to discriminative models; and 3) the need to balance speed and quality at inference time.
To address these challenges, this thesis introduces four models aimed at enhancing the training and inference efficiency of conditional generative models without compromising quality. The first method focuses on conditional Generative Adversarial Networks (cGANs), proposing a novel training objective to improve stability and diversity in synthetic data generation. The second method involves a hybrid generative model that combines GANs and Diffusion-based models to alleviate the unstable training of GANs and accelerate the denoising process. The third model introduces a fine-tuning framework that utilizes pre-trained diffusion parameters for high-fidelity, fast sampling, and quick adaptation during training. The final method presents a super-efficient 3D diffusion model for high-fidelity 3D CT synthesis, addressing the efficiency and quality gap in current models.
These methods collectively tackle the efficiency of generative models and enhance generative quality in both computer vision and medical domains, suggesting sustainable solutions for the future of generative AI.
|
2 |
3D Human Face Reconstruction and 2D Appearance SynthesisZhao, Yajie 01 January 2018 (has links)
3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store.
In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs.
In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption.
As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach.
In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses.
We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results.
We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn't need special hardware and run in real-time with satisfying results.
|
3 |
Deep Synthetic Noise Generation for RGB-D Data AugmentationHammond, Patrick Douglas 01 June 2019 (has links)
Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets.
|
Page generated in 0.1019 seconds