A number of problems in Computer Vision, Natural Language Processing, and Machine Learning produce structured outputs in high-dimensional space, which makes searching for the global optimal solution extremely expensive. Thus, greedy algorithms, making trade-offs between precision and efficiency, are widely used. Unfortunately, they in general lack theoretical guarantees.
In this thesis, we prove that greedy algorithms are effective and efficient to search for multiple top-scoring hypotheses from structured (neural) models: 1) Entropy estimation. We aim to find deterministic samples that are representative of Gibbs distribution via a greedy strategy. 2) Searching for a set of diverse and high-quality bounding boxes. We formulate this problem as the constrained maximization of a monotonic sub-modular function such that there exists a greedy algorithm having near-optimal guarantee. 3) Fill-in-the-blank. The goal is to generate missing words conditioned on context given an image. We extend Beam Search, a greedy algorithm applicable on unidirectional expansion, to bidirectional neural models when both past and future information have to be considered.
We test our proposed approaches on a series of Computer Vision and Natural Language Processing benchmarks and show that they are effective and efficient. / Ph. D. / The rapid progress has been made in Computer Vision (e.g., detecting what and where objects are shown in an image), Natural Language Processing (e.g., translating a sentence in English to Chinese), and Machine learning (e.g., inference over graph models). However, a number of problems produce structured outputs in high-dimensional space, e.g., semantic segmentation requires predicting the labels (e.g., dog, cat, or person, etc) of all super-pixels, the search space is huge, say L<sup>n</sup>, where L is the number of object labels and n is the number of super-pixels. Thus, searching for the global optimal solution is often intractable. Instead, we aim to prove that greedy algorithms that produce reasonable solutions, e.g., near-optimal, are much effective and efficient. There are three tasks studied in the thesis: 1) Entropy estimation. We attempt to search for a finite number of semantic segmentations which are representative and diverse such that we can approximate the entropy of the distribution over output space by applying the existing model on the image. 2) Searching for a set of diverse bounding boxes that are most likely to contain an object. We formulate this problem as an optimization problem such that there exist a greedy algorithm having theoretical guarantee. 3) Fill-in-the-blank. We attempt to generate missing words in the blanks around which there are contexts available. We tested our proposed approaches on a series of Computer Vision and Natural Language Processing benchmarks, e.g., MS COCO, PASCAL VOC, etc, and show that they are indeed effective and efficient.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/81860 |
Date | 18 January 2018 |
Creators | Sun, Qing |
Contributors | Electrical Engineering, Batra, Dhruv, Huang, Jia-Bin, Abbott, A. Lynn, Parikh, Devi, Prakash, B. Aditya, Dhillon, Harpreet Singh |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0183 seconds