In this thesis, we present probabilistic approaches for two critical aspects of deep learning: unsupervised representation learning and uncertainty estimation. The first part of the thesis focuses on developing a probabilistic method for deep representation learning and an application of representation learning on multimodal text-video data. Unsupervised representation learning has been proven effective for learning useful representations of data using deep learning and enhancing the performance on downstream applications. However, current methods for representation learning lack a solid theoretical foundation despite their empirical success.
To bridge this gap, we present a novel perspective for unsupervised representation learning: we argue that representation learning should maximize the effective nonlinear expressivity of a deep neural network on the data so that the downstream predictors can take full advantage of its nonlinear representation power. To this end, we propose our method of neural activation coding (NAC) that maximizes the mutual information between activation patterns of the encoder and the data over a noisy communication channel. We show that learning for a noise-robust activation code maximizes the number of distinct linear regions of ReLU encoders, hence maximizing its nonlinear expressivity. Experiment results demonstrate that NAC enhances downstream performance on linear classification and nearest neighbor retrieval on natural image datasets, and furthermore significantly improve the training of deep generative models.
Next, we study an application of representation learning for multimodal text-video retrieval. We reveal that when using a pretrained representation model, many test instances are either over- or under-represented during text-video retrieval, hurting the retrieval performance. To address the problem, we propose normalized contrastive learning (NCL) that utilizes the Sinkhorn-Knopp algorithm to normalize the retrieval probabilities of text and video instances, thereby significantly enhancing the text-video retrieval performance.
The second part of the thesis addresses the critical challenge of quantifying the predictive uncertainty of deep learning models, which is crucial for high-stakes applications of ML including medical diagnosis, autonomous driving, and financial forecasting. However, uncertainty estimation for deep learning remains an open challenge and current Bayesian approximations often output unreliable uncertainty estimates. We propose a density-based uncertainty criterion that posits that a model’s predictive uncertainty should be grounded in the density of the model’s training data so that the predictive uncertainty is high for inputs that are unlikely under the training data distribution. To this end, we introduce density uncertainty layers as a general building block for density-aware deep architectures.
These layers embed the density-based uncertainty criterion directly into the model architecture and can be used as a drop-in replacement for existing neural network layers to produce reliable uncertainty estimates for deep learning models. On uncertainty estimation benchmarks, we show that the proposed method delivers more reliable uncertainty estimates and robust out-of-distribution detection performance.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/p28h-6109 |
Date | January 2024 |
Creators | Park, Yookoon |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.002 seconds