21 |
Improve the efficiency of conditional generative modelsXu, Yanwu 13 September 2024 (has links)
Deep generative models have undergone significant advancements, enabling the production of high-fidelity data across various fields, including computer vision and medical imaging. The availability of paired annotations facilitates a controllable generative process through conditional generative models, which capture the conditional distribution P(X|Y), where X represents high-dimensional data and Y denotes the associated annotation. This controllability makes conditional generative models more preferable than ordinary generative models, which only model P(X). For instance, the latest generative AI techniques within the Artificial Intelligence Generated Content (AIGC) realm have unlocked the potential for flexible image and video generation/editing based on text descriptions or “prompts.” Additionally, generative AI has enhanced model efficiency by supplementing datasets with synthesized data in scenarios where annotations are unavailable or imprecise.
Despite these capabilities, challenges persist in ensuring efficient training of conditional generative models. These challenges include: 1) capturing the intricate relationship between data and its annotations, which can introduce biases; 2) higher computational resource requirements during training and inference compared to discriminative models; and 3) the need to balance speed and quality at inference time.
To address these challenges, this thesis introduces four models aimed at enhancing the training and inference efficiency of conditional generative models without compromising quality. The first method focuses on conditional Generative Adversarial Networks (cGANs), proposing a novel training objective to improve stability and diversity in synthetic data generation. The second method involves a hybrid generative model that combines GANs and Diffusion-based models to alleviate the unstable training of GANs and accelerate the denoising process. The third model introduces a fine-tuning framework that utilizes pre-trained diffusion parameters for high-fidelity, fast sampling, and quick adaptation during training. The final method presents a super-efficient 3D diffusion model for high-fidelity 3D CT synthesis, addressing the efficiency and quality gap in current models.
These methods collectively tackle the efficiency of generative models and enhance generative quality in both computer vision and medical domains, suggesting sustainable solutions for the future of generative AI.
|
22 |
3D Human Face Reconstruction and 2D Appearance SynthesisZhao, Yajie 01 January 2018 (has links)
3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store.
In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs.
In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption.
As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach.
In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses.
We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results.
We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn't need special hardware and run in real-time with satisfying results.
|
23 |
Byron's Shakespearean ImitationsBarber, Benjamin January 2016 (has links)
Though Byron is known for his provocative denials of the importance of Shakespeare, his public derogations of the early modern playwright are in fact a pose that hides the respect he had for the playwright’s powerful poetic vision, a regard which is recorded most comprehensively in the Shakespearean references of Don Juan. Byron imitated Shakespeare by repeating and adapting the older poet’s observations on the imitative nature of desire and the structure of emulous ambition as a source of violence. His appropriations make his work part of the modern shift away from earlier European societies, wherein ritual means of mitigating desire’s potentially inimical impact on human communities were supplemented with an increased reliance on market mechanisms to defer the effects of emulation and resentment. Finding himself among the first modern celebrities, Byron deploys Shakespeare’s representations of desire to trace the processes that produced the arc of his own fame and notoriety. Drawing on his deep knowledge of Shakespeare, Byron’s poetic vision—in its observations on the contagious nature of desire—exhibits elements of Shakespeare’s own vivid depictions of imitation as a key conduit for his characters’ cupidity, ambitions, and violence. Exploring how he plays with and integrates these representations into his letters, journals, poetry, and plays, my dissertation investigates Byron’s intuitions on the nature of human desire by focusing on his engagement with one of literature’s greatest observers of human behaviour, Shakespeare.
|
24 |
Muster der Raumnutzung markierter Blessgänse (Anser albifrons albifrons) in West- und Mitteleuropa unter Berücksichtung sozialer AspekteKruckenberg, Helmut 24 April 2003 (has links)
Die Europäische Blessgans ist die häufigste arktische Gänseart, die in Westeuropa überwintert. Seit 1998 wurden in einem internationalen Farbmarkierungsprojekt 3740 Blessgänse mit individuell codierten Halsmanschetten beringt, die sich im Feld mit Ferngläsern oder Fernrohren ablesen lassen. Insgesamt wurtden 25.000 Beobachtungen registriert. Die vorliegende Arbeit präsentiert als erste Auswertung dieses Langzeitprojektes 17 Kapiteln, die unterschiedliche Aspekte des winterlichen Gänsezuges beleuchten. Auf drei geografischen Ebenen wird das Zuggeschehen untersucht: zunächst auf der kontinentalen Ebene (Zug von den Brut- in die Wintergebiete), dem überregionalen Niveau (Vernetzung europäischer Rastgebiete) und dem regionalen Niveau (Auswertungen der Rastbestände und Zugbewegungen in Ostfriesland, dem Niederrheingebiet und dem Lauwersmeer) z.T. mit Nutzung der Rasterkartierung und der Telemetriemethode. Zwei Kapitel behandeln soziale Hintergründe des winterlichen Rastgeschehens bei Wildgänsen. Die Dauer des Familienzusammenhaltes stellt einen wesentlichen Punkt für die Frage des Erlernens von Ortstraditionen dar. Der biologische Wert von Ortstreue konnte anhand des Rastverhaltens markierter Gänse am Dollart untersucht werden. Wildgänse nutzen ihre Winterquartiere mit sehr individueller Tradition und nach einer individuellen Art und Weise. Sie besitzen eine "innere Karte" mit ihnen vertrauten Rastgebieten, die sich in einer festen Abfolge oder nach einer bestimmten Strategie nutzen. Die ersten Ergebnisse des Farbmarkierungsprojektes geben Hinweise auf die Entstehung und Funktionsweise dieser "inneren Karte" als Grundlage für die individuelle Rasttradition und ihre soziale Begründung (Fitness-Optimierung). Diese Dissertation vereint Manuskripte und Veröffentlichungen und diskutiert sie vor diesem Hintergrund.
|
25 |
Cooperative versus Adversarial Learning: Generating Political TextJonsson, Jacob January 2018 (has links)
This thesis aims to evaluate the current state of the art for unconditional text generation and compare established models with novel approaches in the task of generating texts, after being trained on texts written by political parties from the Swedish Riksdag. First, the progression of language modeling from n-gram models and statistical models to neural network models is presented. This is followed by theoretical arguments for the development of adversarial training methods,where a generator neural network tries to fool a discriminator network, trained to distinguish between real and generated sentences. One of the methods in the research frontier diverges from the adversarial idea and instead uses cooperative training, where a mediator network is trained instead of a discriminator. The mediator is then used to estimate a symmetric divergence measure between the true distribution and the generator’s distribution, which is to be minimized in training. A set of experiments evaluates the performance of cooperative training and adversarial training, and finds that they both have advantages and disadvantages. In the experiments, the adversarial training increases the quality of generated texts, while the cooperative training increases the diversity. The findings are in line with the theoretical expectation. / Denna uppsats utvärderar några nyligen föreslagna metoder för obetingad textgenerering, baserade på s.k. “Generative Adversarial Networks” (GANs). Den jämför etablerade modeller med nya metoder för att generera text, efter att ha tränats på texter från de svenska Riksdagspartierna. Utvecklingen av språkmodellering från n-gram-modeller och statistiska modeller till modeller av neurala nätverk presenteras. Detta följs upp av teoretiska argument för utvecklingen av GANs, för vilka ett generatornätverk försöker överlista ett diskriminatornätverk, som tränas skilja mellan riktiga och genererade meningar. En av de senaste metoderna avviker från detta angreppssätt och introducerar istället kooperativ träning, där ett mediatornätverk tränas istället för en diskriminator. Mediatorn används sedan till att uppskatta ett symmetriskt divergensmått mellan den sanna distributionen och generatorns distribution, vilket träningen syftar till att minimera. En serie experiment utvärderar hur GANs och kooperativ träning presterar i förhållande till varandra, och finner att de båda har för- och nackdelar. I experimenten ökar GANs kvaliteten på texterna som genereras, medan kooperativ träning ökar mångfalden. Resultaten motsvarar vad som kan förväntas teoretiskt.
|
26 |
Investigation of deep learning approaches for overhead imagery analysis / Utredning av djupinlärningsmetoder för satellit- och flygbilderGruneau, Joar January 2018 (has links)
Analysis of overhead imagery has a great potential to produce real-time data cost-effectively. This can be an important foundation for decision-making for businesses and politics. Every day a massive amount of new satellite imagery is produced. To fully take advantage of these data volumes a computationally efficient pipeline is required for the analysis. This thesis proposes a pipeline which outperforms the Segment Before you Detect network [6] and different types of fast region based convolutional neural networks [61] with a large margin in a fraction of the time. The model obtains a prediction error for counting cars of 1.67% on the Potsdam dataset and increases the vehiclewise F1 score on the VEDAI dataset from 0.305 reported by [61] to 0.542. This thesis also shows that it is possible to outperform the Segment Before you Detect network in less than 1% of the time on car counting and vehicle detection while also using less than half of the resolution. This makes the proposed model a viable solution for large-scale satellite imagery analysis. / Analys av flyg- och satellitbilder har stor potential att kostnadseffektivt producera data i realtid för beslutsfattande för företag och politik. Varje dag produceras massiva mängder nya satellitbilder. För att fullt kunna utnyttja dessa datamängder krävs ett beräkningseffektivt nätverk för analysen. Denna avhandling föreslår ett nätverk som överträffar Segment Before you Detect-nätverket [6] och olika typer av snabbt regionsbaserade faltningsnätverk [61] med en stor marginal på en bråkdel av tiden. Den föreslagna modellen erhåller ett prediktionsfel för att räkna bilar på 1,67% på Potsdam-datasetet och ökar F1- poängen for fordons detektion på VEDAI-datasetet från 0.305 rapporterat av [61] till 0.542. Denna avhandling visar också att det är möjligt att överträffa Segment Before you Detect-nätverket på mindre än 1% av tiden på bilräkning och fordonsdetektering samtidigt som den föreslagna modellen använder mindre än hälften av upplösningen. Detta gör den föreslagna modellen till en attraktiv lösning för storskalig satellitbildanalys.
|
27 |
Deep Synthetic Noise Generation for RGB-D Data AugmentationHammond, Patrick Douglas 01 June 2019 (has links)
Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets.
|
28 |
Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural ScenesTang, Hao 27 May 2021 (has links)
In this thesis, we mainly focus on image generation. However, one can still observe unsatisfying results produced by existing state-of-the-art methods. To address this limitation and further improve the quality of generated images, we propose a few novel models. The image generation task can be roughly divided into three subtasks, i.e., person image generation, scene image generation, and cross-modal translation. Person image generation can be further divided into three subtasks, namely, hand gesture generation, facial expression generation, and person pose generation. Meanwhile, scene image generation can be further divided into two subtasks, i.e., cross-view image translation and semantic image synthesis. For each task, we have proposed the corresponding solution. Specifically, for hand gesture generation, we have proposed the GestureGAN framework. For facial expression generation, we have proposed the Cycle-in-Cycle GAN (C2GAN) framework. For person pose generation, we have proposed the XingGAN and BiGraphGAN frameworks. For cross-view image translation, we have proposed the SelectionGAN framework. For semantic image synthesis, we have proposed the Local and Global GAN (LGGAN), EdgeGAN, and Dual Attention GAN (DAGAN) frameworks. Although each method was originally proposed for a certain task, we later discovered that each method is universal and can be used to solve different tasks. For instance, GestureGAN can be used to solve both hand gesture generation and cross-view image translation tasks. C2GAN can be used to solve facial expression generation, person pose generation, hand gesture generation, and cross-view image translation. SelectionGAN can be used to solve cross-view image translation, facial expression generation, person pose generation, hand gesture generation, and semantic image synthesis. Moreover, we explore cross-modal translation and propose a novel DanceGAN for audio-to-video translation.
|
29 |
Exploring Multi-Domain and Multi-Modal Representations for Unsupervised Image-to-Image TranslationLiu, Yahui 20 May 2022 (has links)
Unsupervised image-to-image translation (UNIT) is a challenging task in the image manipulation field, where input images in a visual domain are mapped into another domain with desired visual patterns (also called styles). An ideal direction in this field is to build a model that can map an input image in a domain to multiple target domains and generate diverse outputs in each target domain, which is termed as multi-domain and multi-modal unsupervised image-to-image translation (MMUIT). Recent studies have shown remarkable results in UNIT but they suffer from four main limitations: (1) State-of-the-art UNIT methods are either built from several two-domain mappings that are required to be learned independently or they generate low-diversity results, a phenomenon also known as model collapse. (2) Most of the manipulation is with the assistance of visual maps or digital labels without exploring natural languages, which could be more scalable and flexible in practice. (3) In an MMUIT system, the style latent space is usually disentangled between every two image domains. While interpolations within domains are smooth, interpolations between two different domains often result in unrealistic images with artifacts when interpolating between two randomly sampled style representations from two different domains. Improving the smoothness of the style latent space can lead to gradual interpolations between any two style latent representations even between any two domains. (4) It is expensive to train MMUIT models from scratch at high resolution. Interpreting the latent space of pre-trained unconditional GANs can achieve pretty good image translations, especially high-quality synthesized images (e.g., 1024x1024 resolution). However, few works explore building an MMUIT system with such pre-trained GANs.
In this thesis, we focus on these vital issues and propose several techniques for building better MMUIT systems. First, we base on the content-style disentangled framework and propose to fit the style latent space with Gaussian Mixture Models (GMMs). It allows a well-trained network using a shared disentangled style latent space to model multi-domain translations. Meanwhile, we can randomly sample different style representations from a Gaussian component or use a reference image for style transfer. Second, we show how the GMM-modeled latent style space can be combined with a language model (e.g., a simple LSTM network) to manipulate multiple styles by using textual commands. Then, we not only propose easy-to-use constraints to improve the smoothness of the style latent space in MMUIT models, but also design a novel metric to quantitatively evaluate the smoothness of the style latent space. Finally, we build a new model to use pretrained unconditional GANs to do MMUIT tasks.
|
30 |
An Evaluation of Approaches for Generative Adversarial Network Overfitting DetectionTung Tien Vu (12091421) 20 November 2023 (has links)
<p dir="ltr">Generating images from training samples solves the challenge of imbalanced data. It provides the necessary data to run machine learning algorithms for image classification, anomaly detection, and pattern recognition tasks. In medical settings, having imbalanced data results in higher false negatives due to a lack of positive samples. Generative Adversarial Networks (GANs) have been widely adopted for image generation. GANs allow models to train without computing intractable probability while producing high-quality images. However, evaluating GANs has been challenging for the researchers due to a need for an objective function. Most studies assess the quality of generated images and the variety of classes those images cover. Overfitting of training images, however, has received less attention from researchers. When the generated images are mere copies of the training data, GAN models will overfit and will not generalize well. This study examines the ability to detect overfitting of popular metrics: Maximum Mean Discrepancy (MMD) and Fréchet Inception Distance (FID). We investigate the metrics on two types of data: handwritten digits and chest x-ray images using Analysis of Variance (ANOVA) models.</p>
|
Page generated in 0.0333 seconds