Semantic segmentation methods using deep neural networks typically require huge volumes of annotated data to train properly. Due to the expense of collecting these pixel-level dataset annotations, the problem of semantic segmentation without ground-truth labels has been recently proposed. Many current approaches to unsupervised semantic segmentation frame the problem as a pixel clustering task, and in particular focus heavily on color differences between image regions. In this paper, we explore a weakness to this approach: By focusing on color, these approaches do not adequately capture relationships between similar objects across images. We present a new approach to the problem, and propose a novel architecture that captures the characteristic similarities of objects between images directly. We design a synthetic dataset to illustrate this flaw in an existing model. Experiments on this synthetic dataset show that our method can succeed where the pixel color clustering approach fails. Further, we show that plain autoencoder models can implicitly capture these cross-instance object relationships. This suggests that some generative model architectures may be viable candidates for unsupervised semantic segmentation even with no additional loss terms.
Identifer | oai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-theses-2392 |
Date | 13 May 2020 |
Creators | Bishop, Griffin R. |
Contributors | Jacob R. Whitehill, Advisor |
Publisher | Digital WPI |
Source Sets | Worcester Polytechnic Institute |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Masters Theses (All Theses, All Years) |
Page generated in 0.0019 seconds