Global ETD Search

1	Deep Learning Approaches to Low-level Vision Problems Liu, Huan January 2022 (has links) Recent years have witnessed tremendous success in using deep learning approaches to handle low-level vision problems. Most of the deep learning based methods address the low-level vision problem by training a neural network to approximate the mapping from the inputs to the desired ground truths. However, directly learning this mapping is usually difficult and cannot achieve ideal performance. Besides, under the setting of unsupervised learning, the general deep learning approach cannot be used. In this thesis, we investigate and address several problems in low-level vision using the proposed approaches. To learn a better mapping using the existing data, an indirect domain shift mechanism is proposed to add explicit constraints inside the neural network for single image dehazing. This allows the neural network to be optimized across several identified neighbours, resulting in a better performance. Despite the success of the proposed approaches in learning an improved mapping from the inputs to the targets, three problems of unsupervised learning is also investigated. For unsupervised monocular depth estimation, a teacher-student network is introduced to strategically integrate both supervised and unsupervised learning benefits. The teacher network is formed by learning under the binocular depth estimation setting, and the student network is constructed as the primary network for monocular depth estimation. In observing that the performance of the teacher network is far better than that of the student network, a knowledge distillation approach is proposed to help improve the mapping learned by the student. For single image dehazing, the current network cannot handle different types of haze patterns as it is trained on a particular dataset. The problem is formulated as a multi-domain dehazing problem. To address this issue, a test-time training approach is proposed to leverage a helper network in assisting the dehazing network adapting to a particular domain using self-supervision. In lossy compression system, the target distribution can be different from that of the source and ground truths are not available for reference. Thus, the objective is to transform the source to target under a rate constraint, which generalizes the optimal transport. To address this problem, theoretical analyses on the trade-off between compression rate and minimal achievable distortion are studied under the cases with and without common randomness. A deep learning approach is also developed using our theoretical results for addressing super-resolution and denoising tasks. Extensive experiments and analyses have been conducted to prove the effectiveness of the proposed deep learning based methods in handling the problems in low-level vision. / Thesis / Doctor of Philosophy (PhD) Low-level Vision Computer Vision Image Restoration Image Dehazing Image Denoising Image Super-resolution Test-time Adaptation Meta-Learning Stereo Matching Depth Estimation Optimal Transport
2	Label-Efficient Visual Understanding with Consistency Constraints Zou, Yuliang 24 May 2022 (has links) Modern deep neural networks are proficient at solving various visual recognition and understanding tasks, as long as a sufficiently large labeled dataset is available during the training time. However, the progress of these visual tasks is limited by the number of manual annotations. On the other hand, it is usually time-consuming and error-prone to annotate visual data, rendering the challenge of scaling up human labeling for many visual tasks. Fortunately, it is easy to collect large-scale, diverse unlabeled visual data from the Internet. And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how to utilize the unlabeled data and synthetic labeled data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea is to encourage deep neural networks to produce consistent predictions across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose to use the consistency over different geometric formulations and a cycle consistency over time to tackle the low-level scene geometry perception tasks in a self-supervised learning setting. In Part II, we tackle the high-level semantic understanding tasks in a semi-supervised learning setting, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains with one single forward pass, without model training or optimization at the inference time. / Doctor of Philosophy / Recently, deep learning has emerged as one of the most powerful tools to solve various visual understanding tasks. However, the development of deep learning methods is significantly limited by the amount of manually labeled data. On the other hand, it is usually time-consuming and error-prone to annotate visual data, making the human labeling process not easily scalable. Fortunately, it is easy to collect large-scale, diverse raw visual data from the Internet (\eg search engines, YouTube, Instagram, etc.). And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how we can utilize the raw visual data and synthetic data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea behind this is to encourage deep neural networks to produce consistent predictions of the same visual input across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose using the consistency over different geometric formulations and a forward-backward cycle consistency over time to tackle the low-level scene geometry perception tasks, using unlabeled visual data only. In Part II, we tackle the high-level semantic understanding tasks using both a small amount of labeled data and a large amount of unlabeled data jointly, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains. Label-Efficient Consistency Regularization Visual Understanding Self-Supervised Learning Semi-Supervised Learning Pseudo Labeling Test-Time Adaptation BatchNorm Calibration Cross-Domain Generalization

Search results

Deep Learning Approaches to Low-level Vision Problems

Label-Efficient Visual Understanding with Consistency Constraints