Spelling suggestions: "subject:"generative adversarial network"" "subject:"agenerative adversarial network""
1 |
Semantically Correct High-resolution CT Image Interpolation and its ApplicationLi, Jiawei 01 October 2020 (has links)
Image interpolation in the medical area is of vital importance as most 3D biomedical volume images are sampled where the distance between consecutive slices is significantly greater than the in-plane pixel size due to radiation dose or scanning time. Image interpolation creates a certain number of new slices between known slices in order to obtain an isotropic volume image. The results can be used for the higher quality of 2D and 3D visualization or reconstruction of human body structure.
Semantic interpolation on the manifold has been proved to be very useful for smoothing the interpolation process. Nevertheless, all previous methods focused on low-resolution image interpolation, and most of which work poorly on high-resolution images. Besides, the medical field puts a high threshold for the quality of interpolations, as they need to be semantic and realistic enough, and resemble real data with only small errors permitted.
Typically, people downsample the images into 322 and 642 for semantic interpolation, which does not meet the requirement for high-resolution in the medical field. Thus, we explore a novel way to generate semantically correct interpolations and maintain the resolution at the same time. Our method has been proved to generate realistic and high-resolution interpolations on the sizes of 5262 and 5122.
Our main contribution is, first, we propose a novel network, High Resolution Interpolation Network (HRINet), aiming at producing semantically correct high-resolution CT image interpolations. Second, by combining the idea of ACAI and GANs, we propose a unique alternative supervision method by applying supervised and unsupervised training alternatively to raise the accuracy and fidelity of body structure in CT when interpolated while keeping high quality. Third, we introduce an extra Markovian discriminator as a texture or fine details regularizer to make our model generate results indistinguishable from real data. In addition, we explore other possibilities or tricks to further improve the performance of our model, including low-level feature maps mixing, and removing batch normalization layers within the autoencoder. Moreover, we compare the impacts of MSE based and perceptual based loss optimizing methods for high quality interpolation, and show the trade-off between the structural correctness and sharpness.
The interpolation experiments show significant improvement on both sizes of 256 2 and 5122 images quantitatively and qualitatively. We find that interpolations produced by HRINet are sharper and more realistic compared with other existing methods such as AE and ACAI in terms of various metrics.
As an application of high-resolution interpolation, we have done 2D volume projection and 3D volume reconstruction from axial view CT data and their interpolations. We show the great enhancement of applying HRINet for both in sharpness and fidelity. Specifically, for 2D volume projection, we explore orthogonal projection and weighted projection respectively so as to show the improved effectiveness for visualizing internal and external human body structure.
|
2 |
Image Transfer Between Magnetic Resonance Images and Speech DiagramsWang, Kang 03 December 2020 (has links)
Realtime Magnetic Resonance Imaging (MRI) is a method used for human
anatomical study. MRIs give exceptionally detailed information about soft-tissue
structures, such as tongues, that other current imaging techniques cannot achieve.
However, the process requires special equipment and is expensive. Hence, it is not quite
suitable for all patients.
Speech diagrams show the side view positions of organs like the tongue, throat,
and lip of a speaking or singing person. The process of making a speech diagram is like
the semantic segmentation of an MRI, which focuses on the selected edge structure.
Speech diagrams are easy to understand with a clear speech diagram of the tongue and
inside mouth structure. However, it often requires manual annotation on the MRI
machine by an expert in the field.
By using machine learning methods, we achieved transferring images between
MRI and speech diagrams in two directions. We first matched videos of speech diagram
and tongue MRIs. Then we used various image processing methods and data
augmentation methods to make the paired images easy to train. We built our network
model inspired by different cross-domain image transfer methods and applied
reference-based super-resolution methods—to generate high-resolution images. Thus,
we can do the transferring work through our network instead of manually. Also,
generated speech diagram can work as an intermediary part to be transferred to other
medical images like computerized tomography (CT), since it is simpler in structure
compared to an MRI.
We conducted experiments using both the data from our database and other MRI
video sources. We use multiple methods to do the evaluation and comparisons with
several related methods show the superiority of our approach.
|
3 |
A Discrete Wavelet Transform GAN for NonHomogeneous DehazingFu, Minghan January 2021 (has links)
Hazy images are often subject to color distortion, blurring and other visible quality degradation. Some existing CNN-based methods have shown great performance on removing the homogeneous haze, but they are not robust in the non-homogeneous case. The reason is twofold. Firstly, due to the complicated haze distribution, texture details are easy to get lost during the dehazing process. Secondly, since the training pairs are hard to be collected, training on limited data can easily lead to the over-fitting problem. To tackle these two issues, we introduce a novel dehazing network using the 2D discrete wavelet transform, namely DW-GAN. Specifically, we propose a two-branch network to deal with the aforementioned problems. By utilizing the wavelet transform in the DWT branch, our proposed method can retain more high-frequency information in feature maps. To prevent over-fitting, ImageNet pre-trained Res2Net is adopted in the knowledge adaptation branch. Owing to the robust feature representations of ImageNet pre-training, the generalization ability of our network is improved dramatically. Finally, a patch-based discriminator is used to reduce artifacts of the restored images. Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art quantitatively and qualitatively. / Thesis / Master of Applied Science (MASc)
|
4 |
On Depth and Complexity of Generative Adversarial Networks / Djup och komplexitet hos generativa motstridanade nätverkYamazaki, Hiroyuki Vincent January 2017 (has links)
Although generative adversarial networks (GANs) have achieved state-of-the-art results in generating realistic look- ing images, they are often parameterized by neural net- works with relatively few learnable weights compared to those that are used for discriminative tasks. We argue that this is suboptimal in a generative setting where data is of- ten entangled in high dimensional space and models are ex- pected to benefit from high expressive power. Additionally, in a generative setting, a model often needs to extrapo- late missing information from low dimensional latent space when generating data samples while in a typical discrimina- tive task, the model only needs to extract lower dimensional features from high dimensional space. We evaluate different architectures for GANs with varying model capacities using shortcut connections in order to study the impacts of the capacity on training stability and sample quality. We show that while training tends to oscillate and not benefit from additional capacity of naively stacked layers, GANs are ca- pable of generating samples with higher quality, specifically for images, samples of higher visual fidelity given proper regularization and careful balancing. / Trots att Generative Adversarial Networks (GAN) har lyckats generera realistiska bilder består de än idag av neurala nätverk som är parametriserade med relativt få tränbara vikter jämfört med neurala nätverk som används för klassificering. Vi tror att en sådan modell är suboptimal vad gäller generering av högdimensionell och komplicerad data och anser att modeller med högre kapaciteter bör ge bättre estimeringar. Dessutom, i en generativ uppgift så förväntas en modell kunna extrapolera information från lägre till högre dimensioner medan i en klassificeringsuppgift så behöver modellen endast att extrahera lågdimensionell information från högdimensionell data. Vi evaluerar ett flertal GAN med varierande kapaciteter genom att använda shortcut connections för att studera hur kapaciteten påverkar träningsstabiliteten, samt kvaliteten av de genererade datapunkterna. Resultaten visar att träningen blir mindre stabil för modeller som fått högre kapaciteter genom naivt tillsatta lager men visar samtidigt att datapunkternas kvaliteter kan öka, specifikt för bilder, bilder med hög visuell fidelitet. Detta åstadkoms med hjälp utav regularisering och noggrann balansering.
|
5 |
Generative adversarial network for point cloud upsamplingWidell Delgado, Edison January 2024 (has links)
Point clouds are a widely used system for the collection and application of 3D data. But most timesthe data gathered is too scarce to reliably be used in any application. Therefore this thesis presentsa GAN based upsampling method within a patch based approach together with a GCN based featureextractor, in an attempt to enhance the density and reliability of point cloud data. Our approachis rigorously compared with existing methods to compare the performance. The thesis also makescorrelations between input sizes and how the quality of the inputs affects the upsampled result. TheGAN is also applied to real-world data to assess the viability of its current state, and to test how it isaffected by the interference that occurs in an unsupervised scenario.
|
6 |
Latent Walking Techniques for Conditioning GAN-Generated MusicEisenbeiser, Logan Ryan 21 September 2020 (has links)
Artificial music generation is a rapidly developing field focused on the complex task of creating neural networks that can produce realistic-sounding music. Generating music is very difficult; components like long and short term structure present time complexity, which can be difficult for neural networks to capture. Additionally, the acoustics of musical features like harmonies and chords, as well as timbre and instrumentation require complex representations for a network to accurately generate them. Various techniques for both music representation and network architecture have been used in the past decade to address these challenges in music generation.
The focus of this thesis extends beyond generating music to the challenge of controlling and/or conditioning that generation. Conditional generation involves an additional piece or pieces of information which are input to the generator and constrain aspects of the results. Conditioning can be used to specify a tempo for the generated song, increase the density of notes, or even change the genre. Latent walking is one of the most popular techniques in conditional image generation, but its effectiveness on music-domain generation is largely unexplored. This paper focuses on latent walking techniques for conditioning the music generation network MuseGAN and examines the impact of this conditioning on the generated music. / Master of Science / Artificial music generation is a rapidly developing field focused on the complex task of creating neural networks that can produce realistic-sounding music. Beyond simply generating music lies the challenge of controlling or conditioning that generation. Conditional generation can be used to specify a tempo for the generated song, increase the density of notes, or even change the genre. Latent walking is one of the most popular techniques in conditional image generation, but its effectiveness on music-domain generation is largely unexplored, especially for generative adversarial networks (GANs). This paper focuses on latent walking techniques for conditioning the music generation network MuseGAN and examines the impact and effectiveness of this conditioning on the generated music.
|
7 |
Scenario Generation for Stress Testing Using Generative Adversarial Networks : Deep Learning Approach to Generate Extreme but Plausible ScenariosGustafsson, Jonas, Jonsson, Conrad January 2023 (has links)
Central Clearing Counterparties play a crucial role in financial markets, requiring robust risk management practices to ensure operational stability. A growing emphasis on risk analysis and stress testing from regulators has led to the need for sophisticated tools that can model extreme but plausible market scenarios. This thesis presents a method leveraging Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP) to construct an independent scenario generator capable of modeling and generating return distributions for financial markets. The developed method utilizes two primary components: the WGAN-GP model and a novel scenario selection strategy. The WGAN-GP model approximates the multivariate return distribution of stocks, generating plausible return scenarios. The scenario selection strategy employs lower and upper bounds on Euclidean distance calculated from the return vector to identify, and select, extreme scenarios suitable for stress testing clearing members' portfolios. This approach enables the extraction of extreme yet plausible returns. This method was evaluated using 25 years of historical stock return data from the S&P 500. Results demonstrate that the WGAN-GP model effectively approximates the multivariate return distribution of several stocks, facilitating the generation of new plausible returns. However, the model requires extensive training to fully capture the tails of the distribution. The Euclidean distance-based scenario selection strategy shows promise in identifying extreme scenarios, with the generated scenarios demonstrating comparable portfolio impact to historical scenarios. These results suggest that the proposed method offers valuable tools for Central Clearing Counterparties to enhance their risk management. / Centrala motparter spelar en avgörande roll i dagens finansmarknad, vilket innebär att robusta riskhanteringsrutiner är nödvändiga för att säkerställa operativ stabilitet. Ökande regulatoriskt tryck för riskanalys och stresstestning från tillsynsmyndigheter har lett till behovet av avancerade verktyg som kan modellera extrema men troliga marknadsscenarier. I denna uppsats presenteras en metod som använder Wasserstein Generative Adversarial Networks med Gradient Penalty (WGAN-GP) för att skapa en oberoende scenariogenerator som kan modellera och generera avkastningsfördelningar för finansmarknader. Den framtagna metoden består av två huvudkomponenter: WGAN-GP-modellen och en scenariourvalstrategi. WGAN-GP-modellen approximerar den multivariata avkastningsfördelningen för aktier och genererar möjliga avkastningsscenarier. Urvalsstrategin för scenarier använder nedre och övre gränser för euklidiskt avstånd, beräknat från avkastningsvektorn, för att identifiera och välja extrema scenarier som kan användas för att stresstesta clearingmedlemmars portföljer. Denna strategi gör det möjligt att erhålla nya extrema men troliga avkastningar. Metoden utvärderas med 25 års historisk aktieavkastningsdata från S&P 500. Resultaten visar att WGAN-GP-modellen effektivt kan approximera den multivariata avkastningsfördelningen för flera aktier och därmed generera nya möjliga avkastningar. Modellen kan dock kräva en omfattande mängd träningscykler (epochs) för att fullt ut fånga fördelningens svansar. Scenariurvalet baserat på euklidiskt avstånd visade lovande resultat som ett urvalskriterium för extrema scenarier. De genererade scenarierna visar en jämförbar påverkan på portföljer i förhållande till de historiska scenarierna. Dessa resultat tyder på att den föreslagna metoden kan erbjuda värdefulla verktyg för centrala motparter att förbättra sin riskhantering.
|
8 |
Partial Facial Re-imaging Using Generative Adversarial NetworksDesentz, Derek 28 May 2021 (has links)
No description available.
|
9 |
A Study on Generative Adversarial Networks Exacerbating Social Data BiasJanuary 2020 (has links)
abstract: Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
10 |
Inferential GANs and Deep Feature Selection with ApplicationsYao Chen (8892395) 15 June 2020 (has links)
Deep nueral networks (DNNs) have become popular due to their predictive power and flexibility in model fitting. In unsupervised learning, variational autoencoders (VAEs) and generative adverarial networks (GANs) are two most popular and successful generative models. How to provide a unifying framework combining the best of VAEs and GANs in a principled way is a challenging task. In supervised learning, the demand for high-dimensional data analysis has grown significantly, especially in the applications of social networking, bioinformatics, and neuroscience. How to simultaneously approximate the true underlying nonlinear system and identify relevant features based on high-dimensional data (typically with the sample size smaller than the dimension, a.k.a. small-n-large-p) is another challenging task.<div><br></div><div>In this dissertation, we have provided satisfactory answers for these two challenges. In addition, we have illustrated some promising applications using modern machine learning methods.<br></div><div><br></div><div>In the first chapter, we introduce a novel inferential Wasserstein GAN (iWGAN) model, which is a principled framework to fuse auto-encoders and WGANs. GANs have been impactful on many problems and applications but suffer from unstable training. The Wasserstein GAN (WGAN) leverages the Wasserstein distance to avoid the caveats in the minmax two-player training of GANs but has other defects such as mode collapse and lack of metric to detect the convergence. The iWGAN model jointly learns an encoder network and a generator network motivated by the iterative primal dual optimization process. The encoder network maps the observed samples to the latent space and the generator network maps the samples from the latent space to the data space. We establish the generalization error bound of iWGANs to theoretically justify the performance of iWGANs. We further provide a rigorous probabilistic interpretation of our model under the framework of maximum likelihood estimation. The iWGAN, with a clear stopping criteria, has many advantages over other autoencoder GANs. The empirical experiments show that the iWGAN greatly mitigates the symptom of mode collapse, speeds up the convergence, and is able to provide a measurement of quality check for each individual sample. We illustrate the ability of iWGANs by obtaining a competitive and stable performance with state-of-the-art for benchmark datasets. <br></div><div><br></div><div>In the second chapter, we present a general framework for high-dimensional nonlinear variable selection using deep neural networks under the framework of supervised learning. The network architecture includes both a selection layer and approximation layers. The problem can be cast as a sparsity-constrained optimization with a sparse parameter in the selection layer and other parameters in the approximation layers. This problem is challenging due to the sparse constraint and the nonconvex optimization. We propose a novel algorithm, called Deep Feature Selection, to estimate both the sparse parameter and the other parameters. Theoretically, we establish the algorithm convergence and the selection consistency when the objective function has a Generalized Stable Restricted Hessian. This result provides theoretical justifications of our method and generalizes known results for high-dimensional linear variable selection. Simulations and real data analysis are conducted to demonstrate the superior performance of our method.<br></div><div><br></div><div><div>In the third chapter, we develop a novel methodology to classify the electrocardiograms (ECGs) to normal, atrial fibrillation and other cardiac dysrhythmias as defined by the Physionet Challenge 2017. More specifically, we use piecewise linear splines for the feature selection and a gradient boosting algorithm for the classifier. In the algorithm, the ECG waveform is fitted by a piecewise linear spline, and morphological features related to the piecewise linear spline coefficients are extracted. XGBoost is used to classify the morphological coefficients and heart rate variability features. The performance of the algorithm was evaluated by the PhysioNet Challenge database (3658 ECGs classified by experts). Our algorithm achieves an average F1 score of 81% for a 10-fold cross validation and also achieved 81% for F1 score on the independent testing set. This score is similar to the top 9th score (81%) in the official phase of the Physionet Challenge 2017.</div></div><div><br></div><div>In the fourth chapter, we introduce a novel region-selection penalty in the framework of image-on-scalar regression to impose sparsity of pixel values and extract active regions simultaneously. This method helps identify regions of interest (ROI) associated with certain disease, which has a great impact on public health. Our penalty combines the Smoothly Clipped Absolute Deviation (SCAD) regularization, enforcing sparsity, and the SCAD of total variation (TV) regularization, enforcing spatial contiguity, into one group, which segments contiguous spatial regions against zero-valued background. Efficient algorithm is based on the alternative direction method of multipliers (ADMM) which decomposes the non-convex problem into two iterative optimization problems with explicit solutions. Another virtue of the proposed method is that a divide and conquer learning algorithm is developed, thereby allowing scaling to large images. Several examples are presented and the experimental results are compared with other state-of-the-art approaches. <br></div>
|
Page generated in 0.1328 seconds