Generative models are one of the most prominent components of unsupervised learning models that have a plethora of applications in various domains such as image-to-image translation, video prediction, and generating synthetic data where accessing real data is expensive, unethical, or compromising privacy. One of the main challenges in designing a generative model is creating a disentangled representation of generative factors which gives control over various characteristics of the generated data. Since the architecture of variational autoencoders is centered around latent variables and their objective function directly governs the generative factors, they are the perfect choice for creating a more disentangled representation. However, these architectures generate samples that are blurry and of lower quality compared to other state-of-the-art generative models such as generative adversarial networks. Thus, we attempt to increase the disentanglement of latent variables in variational autoencoders without compromising the generated image quality.
In this thesis, a novel generative model based on capsule networks and a variational autoencoder is proposed. Motivated by the concept of capsule neural networks and their vectorized output, these structures are employed to create a disentangled representation of latent features in variational autoencoders. In particular, the proposed structure, called CapsuleVAE, utilizes a capsule encoder whose vector outputs can translate to latent variables in a meaningful way. It is shown that CapsuleVAE generates results that are sharper and more diverse based on FID score and a metric inspired by the inception score. Furthermore, two different methods for training CapsuleVAE are proposed, and the generated results are investigated. In the first method, an objective function with regularization is proposed, and the optimal regularization hyperparameter is derived. In the second method, called sequential optimization, a novel training technique for training CapsuleVAE is proposed and the results are compared to the first method. Moreover, a novel metric for measuring disentanglement in latent variables is introduced. Based on this metric, it is shown that the proposed CapsuleVAE creates more disentangled representations. In summary, our proposed generative model enhances the disentanglement of latent variables which contributes to the model's generalizing well to new tasks and more control over the generated data. Our model also increases the generated image quality which addresses a common disadvantage in variational autoencoders. / Master of Science / Generative models are algorithms that, given a large enough initial dataset, create data points (such as images) similar to the initial dataset from random input numbers. These algorithms have various applications in different fields, such as generating synthetic healthcare data, wireless systems data generation in extreme or rare conditions, generating high-resolution, colorful images from grey-scale photos or sketches, and in general, generating synthetic data for applications where obtaining real data is expensive, inaccessible, unethical, or compromising privacy. Some generative models create a representation for the data and divide it into several ``generative factors". Researchers have shown that a better data representation is one where the generative factors are ``disentangled", meaning that each generative factor is responsible for only one particular feature in the generated data. Unfortunately, creating a model with disentangled generative factors sacrifices the image quality. In this work, we design a generative model that enhances the disentanglement of generative factors without compromising the quality of the generated images. In order to design a generative model with more disentangled generative factors, we employ capsule networks in the architecture of the generative model. Capsule networks are algorithms that classify the inputted information into different categories. We show that by using capsule networks, our designed generative model achieves higher performance in the quality of the generated images and creates a more disentangled representation of generative factors.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/112993 |
Date | 30 June 2021 |
Creators | Moghimi, Zahra |
Contributors | Electrical Engineering, Saad, Walid, Ramakrishnan, Narendran, Kekatos, Vasileios |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0022 seconds