Return to search

An Investigation of Scale Factor in Deep Networks for Scene Recognition

Is there a significant difference in the design of deep networks for the tasks of classifying object-centric images and scenery images? How to design networks that extract the most representative features for scene recognition? To answer these questions, we design studies to examine the scales and richness of image features for scenery image recognition. Three methods are proposed that integrate the scale factor to the deep networks and reveal the fundamental network design strategies. In our first attempt to integrate scale factors into the deep network, we proposed a method that aggregates both the context and multi-scale object information of scene images by constructing a multi-scale pyramid. In our design, integration of object-centric multi-scale networks achieved a performance boost of 9.8%; integration of object- and scene-centric models obtained an accuracy improvement of 5.9% compared with single scene-centric models. We also exploit bringing the attention scheme to the deep network and proposed a Scale Attentive Network (SANet). The SANet streamlines the multi-scale scene recognition pipeline, learns comprehensive scene features at various scales and locations, addresses the inter-dependency among scales, and further assists feature re-calibration as well as the aggregation process. The proposed network achieved a Top-1 accuracy increase by 1.83% on Place365 standard dataset with only 0.12% additional parameters and 0.24% additional GFLOPs using ResNet-50 as the backbone. We further bring the scale factor implicitly into network backbone design by proposing a Deep-Narrow Network and Dilated Pooling module. The Deep-narrow architecture increased the depth of the network as well as decreased the width of the network, which uses a variety of receptive fields by stacking more layers. We further proposed a Dilated Pooling module which expanded the pooling scope and made use of multi-scale features in the pooling operation. By embedding the Dilated Pooling into Deep-Narrow Network, we obtained a Top-1 accuracy boost of 0.40% using less than half of the GFLOPs and parameters compared to benchmark ResNet-50.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc1944210
Date05 1900
CreatorsQiao, Zhinan
ContributorsYuan, Xiaohui, Dong, Pinliang, Guo, Xuan, Ji, Yuede
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
FormatText
RightsPublic, Qiao, Zhinan, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved.

Page generated in 0.0102 seconds