Global ETD Search

1	Capsule Networks: Framework and Application to Disentanglement for Generative Models Moghimi, Zahra 30 June 2021 (has links) Generative models are one of the most prominent components of unsupervised learning models that have a plethora of applications in various domains such as image-to-image translation, video prediction, and generating synthetic data where accessing real data is expensive, unethical, or compromising privacy. One of the main challenges in designing a generative model is creating a disentangled representation of generative factors which gives control over various characteristics of the generated data. Since the architecture of variational autoencoders is centered around latent variables and their objective function directly governs the generative factors, they are the perfect choice for creating a more disentangled representation. However, these architectures generate samples that are blurry and of lower quality compared to other state-of-the-art generative models such as generative adversarial networks. Thus, we attempt to increase the disentanglement of latent variables in variational autoencoders without compromising the generated image quality. In this thesis, a novel generative model based on capsule networks and a variational autoencoder is proposed. Motivated by the concept of capsule neural networks and their vectorized output, these structures are employed to create a disentangled representation of latent features in variational autoencoders. In particular, the proposed structure, called CapsuleVAE, utilizes a capsule encoder whose vector outputs can translate to latent variables in a meaningful way. It is shown that CapsuleVAE generates results that are sharper and more diverse based on FID score and a metric inspired by the inception score. Furthermore, two different methods for training CapsuleVAE are proposed, and the generated results are investigated. In the first method, an objective function with regularization is proposed, and the optimal regularization hyperparameter is derived. In the second method, called sequential optimization, a novel training technique for training CapsuleVAE is proposed and the results are compared to the first method. Moreover, a novel metric for measuring disentanglement in latent variables is introduced. Based on this metric, it is shown that the proposed CapsuleVAE creates more disentangled representations. In summary, our proposed generative model enhances the disentanglement of latent variables which contributes to the model's generalizing well to new tasks and more control over the generated data. Our model also increases the generated image quality which addresses a common disadvantage in variational autoencoders. / Master of Science / Generative models are algorithms that, given a large enough initial dataset, create data points (such as images) similar to the initial dataset from random input numbers. These algorithms have various applications in different fields, such as generating synthetic healthcare data, wireless systems data generation in extreme or rare conditions, generating high-resolution, colorful images from grey-scale photos or sketches, and in general, generating synthetic data for applications where obtaining real data is expensive, inaccessible, unethical, or compromising privacy. Some generative models create a representation for the data and divide it into several ``generative factors". Researchers have shown that a better data representation is one where the generative factors are ``disentangled", meaning that each generative factor is responsible for only one particular feature in the generated data. Unfortunately, creating a model with disentangled generative factors sacrifices the image quality. In this work, we design a generative model that enhances the disentanglement of generative factors without compromising the quality of the generated images. In order to design a generative model with more disentangled generative factors, we employ capsule networks in the architecture of the generative model. Capsule networks are algorithms that classify the inputted information into different categories. We show that by using capsule networks, our designed generative model achieves higher performance in the quality of the generated images and creates a more disentangled representation of generative factors. Read more Deep Learning Generative models Capsule Networks Disentanglement
2	Learning generative models of mid-level structure in natural images Heess, Nicolas Manfred Otto January 2012 (has links) Natural images arise from complicated processes involving many factors of variation. They reflect the wealth of shapes and appearances of objects in our three-dimensional world, but they are also affected by factors such as distortions due to perspective, occlusions, and illumination, giving rise to structure with regularities at many different levels. Prior knowledge about these regularities and suitable representations that allow efficient reasoning about the properties of a visual scene are important for many image processing and computer vision tasks. This thesis focuses on models of image structure at intermediate levels of complexity as required, for instance, for image inpainting or segmentation. It aims at developing generative, probabilistic models of this kind of structure, and, in particular, at devising strategies for learning such models in a largely unsupervised manner from data. One hallmark of natural images is that they can often be decomposed into regions with very different visual characteristics. The main approach of this thesis is therefore to represent images in terms of regions that are characterized by their shapes and appearances, and an image is then composed from many such regions. We explore approaches to learn about the appearance of regions, to learn about region shapes, and ways to combine several regions to form a full image. To achieve this goal, we make use of some ideas for unsupervised learning developed in the literature on models of low-level image structure and in the “deep learning” literature. These models are used as building blocks of more structured model formulations that incorporate additional prior knowledge of how images are formed. The thesis makes the following contributions: Firstly, we investigate a popular, MRF based prior of natural image structure, the Field-of Experts, with respect to its ability to model image textures, and propose an extended formulation that is considerably more successful at this task. This formulation gives rise to a fully parametric, translation-invariant probabilistic generative model of image textures. We illustrate how this model can be used as a component of a more comprehensive model of images comprising multiple textured regions. Secondly, we develop a model of region shape. This work is an extension of the “Masked Restricted Boltzmann Machine” proposed by Le Roux et al. (2011) and it allows explicit reasoning about the independent shapes and relative depths of occluding objects. We develop an inference and unsupervised learning scheme and demonstrate how this shape model, in combination with the masked RBM gives rise to a good model of natural image patches. Finally, we demonstrate how this model of region shape can be extended to model shapes in large images. The result is a generative model of large images which are formed by composition from many small, partially overlapping and occluding objects. Read more 006.3
3	Encoder-decoder neural networks Kalchbrenner, Nal January 2017 (has links) This thesis introduces the concept of an encoder-decoder neural network and develops architectures for the construction of such networks. Encoder-decoder neural networks are probabilistic conditional generative models of high-dimensional structured items such as natural language utterances and natural images. Encoder-decoder neural networks estimate a probability distribution over structured items belonging to a target set conditioned on structured items belonging to a source set. The distribution over structured items is factorized into a product of tractable conditional distributions over individual elements that compose the items. The networks estimate these conditional factors explicitly. We develop encoder-decoder neural networks for core tasks in natural language processing and natural image and video modelling. In Part I, we tackle the problem of sentence modelling and develop deep convolutional encoders to classify sentences; we extend these encoders to models of discourse. In Part II, we go beyond encoders to study the longstanding problem of translating from one human language to another. We lay the foundations of neural machine translation, a novel approach that views the entire translation process as a single encoder-decoder neural network. We propose a beam search procedure to search over the outputs of the decoder to produce a likely translation in the target language. Besides known recurrent decoders, we also propose a decoder architecture based solely on convolutional layers. Since the publication of these new foundations for machine translation in 2013, encoder-decoder translation models have been richly developed and have displaced traditional translation systems both in academic research and in large-scale industrial deployment. In services such as Google Translate these models process in the order of a billion translation queries a day. In Part III, we shift from the linguistic domain to the visual one to study distributions over natural images and videos. We describe two- and three- dimensional recurrent and convolutional decoder architectures and address the longstanding problem of learning a tractable distribution over high-dimensional natural images and videos, where the likely samples from the distribution are visually coherent. The empirical validation of encoder-decoder neural networks as state-of- the-art models of tasks ranging from machine translation to video prediction has a two-fold significance. On the one hand, it validates the notions of assigning probabilities to sentences or images and of learning a distribution over a natural language or a domain of natural images; it shows that a probabilistic principle of compositionality, whereby a high- dimensional item is composed from individual elements at the encoder side and whereby a corresponding item is decomposed into conditional factors over individual elements at the decoder side, is a general method for modelling cognition involving high-dimensional items; and it suggests that the relations between the elements are best learnt in an end-to-end fashion as non-linear functions in distributed space. On the other hand, the empirical success of the networks on the tasks characterizes the underlying cognitive processes themselves: a cognitive process as complex as translating from one language to another that takes a human a few seconds to perform correctly can be accurately modelled via a learnt non-linear deterministic function of distributed vectors in high-dimensional space. Read more
4	Learning 3D Shape Representations for Reconstruction and Modeling Biao, Zhang 04 1900 (has links) Neural fields, also known as neural implicit representations, are powerful for modeling 3D shapes. They encode shapes as continuous functions mapping 3D coordinates to scalar values like the signed distance function (SDF) or occupancy probability. Neural fields represent complex shapes using an MLP. The MLP takes spatial coordinates, undergoes nonlinear transformations, and approximates the continuous function of the neural field. During training, the MLP's weights are learned through backpropagation. This PhD thesis presents novel methods for shape representation learning and generation with neural fields. The first part introduces an interpretable and high-quality reconstruction method for neural fields. A neural network predicts labeled points, improving surface visualization and interpretability. The method achieves accurate reconstruction even with rendered image input. A binary classifier, based on predicted labeled points, represents the shape's surface with precision. The second part focuses on shape generation, a challenge in generative modeling. Complex data structures like oct-trees or BSP-trees are challenging to generate with neural networks. To address this, a two-step framework is proposed: an autoencoder compresses the neural field into a fixed-size latent space, followed by training generative models within that space. Incorporating sparsity into the shape autoencoding network reduces dimensionality while maintaining high-quality shape reconstruction. Autoregressive transformer models enable the generation of complex shapes with intricate details. This research explores the potential of denoising diffusion models for 3D shape generation. The latent space efficiency is improved by further compression, leading to more efficient and effective generation of high-quality shapes. Remarkable shape reconstruction results are achieved, even without sparse structures. The approach combines the latest generative model advancements with novel techniques, advancing the field. It has the potential to revolutionize shape generation in gaming, manufacturing, and beyond. In summary, this PhD thesis proposes novel methods for shape representation learning, generation, and reconstruction. It contributes to the field of shape analysis and generation by enhancing interpretability, improving reconstruction quality, and pushing the boundaries of efficient and effective 3D shape generation. Read more deep learning shape analysis generative models representation learning neural fields
5	Defending Against Misuse of Synthetic Media: Characterizing Real-world Challenges and Building Robust Defenses Pu, Jiameng 07 October 2022 (has links) Recent advances in deep generative models have enabled the generation of realistic synthetic media or deepfakes, including synthetic images, videos, and text. However, synthetic media can be misused for malicious purposes and damage users' trust in online content. This dissertation aims to address several key challenges in defending against the misuse of synthetic media. Key contributions of this dissertation include the following: (1) Understanding challenges with the real-world applicability of existing synthetic media defenses. We curate synthetic videos and text from the wild, i.e., the Internet community, and assess the effectiveness of state-of-the-art defenses on synthetic content in the wild. In addition, we propose practical low-cost adversarial attacks, and systematically measure the adversarial robustness of existing defenses. Our findings reveal that most defenses show significant degradation in performance under real-world detection scenarios, which leads to the second thread of my work: (2) Building detection schemes with improved generalization performance and robustness for synthetic content. Most existing synthetic image detection schemes are highly content-specific, e.g., designed for only human faces, thus limiting their applicability. I propose an unsupervised content-agnostic detection scheme called NoiseScope, which does not require a priori access to synthetic images and is applicable to a wide variety of generative models, i.e., GANs. NoiseScope is also resilient against a range of countermeasures conducted by a knowledgeable attacker. For the text modality, our study reveals that state-of-the-art defenses that mine sequential patterns in the text using Transformer models are vulnerable to simple evasion schemes. We conduct further exploration towards enhancing the robustness of synthetic text detection by leveraging semantic features. / Doctor of Philosophy / Recent advances in deep generative models have enabled the generation of realistic synthetic media or deepfakes, including synthetic images, videos, and text. However, synthetic media can be misused for malicious purposes and damage users' trust in online content. This dissertation aims to address several key challenges in defending against the misuse of synthetic media. Key contributions of this dissertation include the following: (1) Understanding challenges with the real-world applicability of existing synthetic media defenses. We curate synthetic videos and text from the Internet community, and assess the effectiveness of state-of-the-art defenses on the collected datasets. In addition, we systematically measure the robustness of existing defenses by designing practical low-cost attacks, such as changing the configuration of generative models. Our findings reveal that most defenses show significant degradation in performance under real-world detection scenarios, which leads to the second thread of my work: (2) Building detection schemes with improved generalization performance and robustness for synthetic content. Many existing synthetic image detection schemes make decisions by looking for anomalous patterns in a specific type of high-level content, e.g., human faces, thus limiting their applicability. I propose a blind content-agnostic detection scheme called NoiseScope, which does not require synthetic images for training, and is applicable to a wide variety of generative models. For the text modality, our study reveals that state-of-the-art defenses that mine sequential patterns in the text using Transformer models are not robust against simple attacks. We conduct further exploration towards enhancing the robustness of synthetic text detection by leveraging semantic features. Read more Deepfake Datasets Deepfake Detection Synthetic Media Generative Models
6	A Statistical Model of Recreational Trails Predoehl, Andrew January 2016 (has links) We present a statistical model of recreational trails, and a method to infer trail routes from geophysical data, namely aerial imagery and terrain elevation. We learn a set of textures (textons) that characterize the imagery, and use the textons to segment each image into super-pixels. We also model each texton's probability of generating trail pixels, and the direction of such trails. From terrain elevation, we model the magnitude and direction of terrain gradient on-trail and off-trail. These models lead to a likelihood function for image and elevation. Consistent with Bayesian reasoning, we combine the likelihood with a prior model of trail length and smoothness, yielding a posterior distribution for trails, given an image. We search for good values of this posterior using both a novel stochastic variation of Dijkstra's algorithm, and an MCMC-inspired sampler. Our experiments, on trail images and groundtruth collected in the western continental USA, show substantial improvement over those of the previous best trail-finding methods. Bayesian models Computer vision Digital elevation models Generative models Image processing Computer Science Aerial imagery
7	The mind as a predictive modelling engine : generative models, structural similarity, and mental representation Williams, Daniel George January 2018 (has links) I outline and defend a theory of mental representation based on three ideas that I extract from the work of the mid-twentieth century philosopher, psychologist, and cybernetician Kenneth Craik: first, an account of mental representation in terms of idealised models that capitalize on structural similarity to their targets; second, an appreciation of prediction as the core function of such models; and third, a regulatory understanding of brain function. I clarify and elaborate on each of these ideas, relate them to contemporary advances in neuroscience and machine learning, and favourably contrast a predictive model-based theory of mental representation with other prominent accounts of the nature, importance, and functions of mental representations in cognitive science and philosophy.
8	Learning Statistical Features of Scene Images Lee, Wooyoung 01 September 2014 (has links) Scene perception is a fundamental aspect of vision. Humans are capable of analyzing behaviorally-relevant scene properties such as spatial layouts or scene categories very quickly, even from low resolution versions of scenes. Although humans perform these tasks effortlessly, they are very challenging for machines. Developing methods that well capture the properties of the representation used by the visual system will be useful for building computational models that are more consistent with perception. While it is common to use hand-engineered features that extract information from predefined dimensions, they require careful tuning of parameters and do not generalize well to other tasks or larger datasets. This thesis is driven by the hypothesis that the perceptual representations are adapted to the statistical properties of natural visual scenes. For developing statistical features for global-scale structures (low spatial frequency information that encompasses entire scenes), I propose to train hierarchical probabilistic models on whole scene images. I first investigate statistical clusters of scene images by training a mixture model under the assumption that each image can be decoded by sparse and independent coefficients. Each cluster discovered by the unsupervised classifier is consistent with the high-level semantic categories (such as indoor, outdoor-natural and outdoor-manmade) as well as perceptual layout properties (mean depth, openness and perspective). To address the limitation of mixture models in their assumptions of a discrete number of underlying clusters, I further investigate a continuous representation for the distributions of whole scenes. The model parameters optimized for natural visual scenes reveal a compact representation that encodes their global-scale structures. I develop a probabilistic similarity measure based on the model and demonstrate its consistency with the perceptual similarities. Lastly, to learn the representations that better encode the manifold structures in general high-dimensional image space, I develop the image normalization process to find a set of canonical images that anchors the probabilistic distributions around the real data manifolds. The canonical images are employed as the centers of the conditional multivariate Gaussian distributions. This approach allows to learn more detailed structures of the local manifolds resulting in improved representation of the high level properties of scene images. Read more Visual scene understanding visual features probabilistic models generative models adaptive representation feature learning
9	Learning Transferable Data Representations Using Deep Generative Models January 2018 (has links) abstract: Machine learning models convert raw data in the form of video, images, audio, text, etc. into feature representations that are convenient for computational process- ing. Deep neural networks have proven to be very efficient feature extractors for a variety of machine learning tasks. Generative models based on deep neural networks introduce constraints on the feature space to learn transferable and disentangled rep- resentations. Transferable feature representations help in training machine learning models that are robust across different distributions of data. For example, with the application of transferable features in domain adaptation, models trained on a source distribution can be applied to a data from a target distribution even though the dis- tributions may be different. In style transfer and image-to-image translation, disen- tangled representations allow for the separation of style and content when translating images. This thesis examines learning transferable data representations in novel deep gen- erative models. The Semi-Supervised Adversarial Translator (SAT) utilizes adversar- ial methods and cross-domain weight sharing in a neural network to extract trans- ferable representations. These transferable interpretations can then be decoded into the original image or a similar image in another domain. The Explicit Disentangling Network (EDN) utilizes generative methods to disentangle images into their core at- tributes and then segments sets of related attributes. The EDN can separate these attributes by controlling the ow of information using a novel combination of losses and network architecture. This separation of attributes allows precise modi_cations to speci_c components of the data representation, boosting the performance of ma- chine learning tasks. The effectiveness of these models is evaluated across domain adaptation, style transfer, and image-to-image translation tasks. / Dissertation/Thesis / Masters Thesis Computer Science 2018 Read more Computer science Deep Learning Domain Adaptation Generative Models Machine Learning Transfer Learning
10	Zero Shot Learning for Visual Object Recognition with Generative Models January 2020 (has links) abstract: Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by the amount of training data. Inspired by the human ability to recognize object categories based on textual descriptions of objects and previous visual knowledge, the research community has extensively pursued the area of zero-shot learning. In this area of research, machine vision models are trained to recognize object categories that are not observed during the training process. Zero-shot learning models leverage textual information to transfer visual knowledge from seen object categories in order to recognize unseen object categories. Generative models have recently gained popularity as they synthesize unseen visual features and convert zero-shot learning into a classical supervised learning problem. These generative models are trained using seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting towards seen classes, which leads to substandard performance in generalized zero-shot learning. To address this concern, this dissertation proposes a novel generative model that leverages the semantic relationship between seen and unseen categories and explicitly performs knowledge transfer from seen categories to unseen categories. Experiments were conducted on several benchmark datasets to demonstrate the efficacy of the proposed model for both zero-shot learning and generalized zero-shot learning. The dissertation also provides a unique Student-Teacher based generative model for zero-shot learning and concludes with future research directions in this area. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Read more Artificial intelligence GANs Generative Models Object Recognition Transfer Learning Zero-Shot Learning

Search results