Return to search

Improve the efficiency of conditional generative models

Deep generative models have undergone significant advancements, enabling the production of high-fidelity data across various fields, including computer vision and medical imaging. The availability of paired annotations facilitates a controllable generative process through conditional generative models, which capture the conditional distribution P(X|Y), where X represents high-dimensional data and Y denotes the associated annotation. This controllability makes conditional generative models more preferable than ordinary generative models, which only model P(X). For instance, the latest generative AI techniques within the Artificial Intelligence Generated Content (AIGC) realm have unlocked the potential for flexible image and video generation/editing based on text descriptions or “prompts.” Additionally, generative AI has enhanced model efficiency by supplementing datasets with synthesized data in scenarios where annotations are unavailable or imprecise.

Despite these capabilities, challenges persist in ensuring efficient training of conditional generative models. These challenges include: 1) capturing the intricate relationship between data and its annotations, which can introduce biases; 2) higher computational resource requirements during training and inference compared to discriminative models; and 3) the need to balance speed and quality at inference time.

To address these challenges, this thesis introduces four models aimed at enhancing the training and inference efficiency of conditional generative models without compromising quality. The first method focuses on conditional Generative Adversarial Networks (cGANs), proposing a novel training objective to improve stability and diversity in synthetic data generation. The second method involves a hybrid generative model that combines GANs and Diffusion-based models to alleviate the unstable training of GANs and accelerate the denoising process. The third model introduces a fine-tuning framework that utilizes pre-trained diffusion parameters for high-fidelity, fast sampling, and quick adaptation during training. The final method presents a super-efficient 3D diffusion model for high-fidelity 3D CT synthesis, addressing the efficiency and quality gap in current models.

These methods collectively tackle the efficiency of generative models and enhance generative quality in both computer vision and medical domains, suggesting sustainable solutions for the future of generative AI.

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/49262
Date13 September 2024
CreatorsXu, Yanwu
ContributorsBatmanghelich, Kayhan
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.002 seconds