Capsule Network (CapsNet) was introduced in 2017 as the new generation of the image classifiers to perform supervised classification of images. It incorporates a new structure of neurons which is called a capsule. A capsule is basically a vector of neurons and serves as the basic computation unit in CapsNet. CapsNet has obtained state-of-the-art testing accuracy on the task of classifying the MNIST digit recognition dataset. Despite its fundamental advantages over CNNs, it has its own shortcomings as well. CapsNet provides a relatively high accuracy in classifying images with affine transforms applied to them and also classifying images containing overlapping categories, compared to CNNs. Unlike CNNs, CapsNet creates the representation based on the part to whole relationship of the features of different levels. As a result, it comes with a more robust representation of the input image. CapsNet could only get reasonable inference accuracy on small-scale datasets. Also, it only supports a limited number of categories in the classification task. Finally, CapsNet is a relatively slow network, which is mostly due to the iterative Dynamic Routing (DR) algorithm used in it. There have been several works trying to address the shortcomings of CapsNet since it was introduced.
In this work, we focus on optimizing CapsNet in several aspects: the network speed i.e. training and testing times, the number of parameters in the network, the network accuracy and its generalization ability. We propose several optimizations in order to compensate for the drawbacks of CapsNet. First, we introduce the Quick-CapsNet (QCN) network with our primary focus on the network speed. QCN makes changes to the feature extractor of CapsNet and produces fewer capsules compared to the baseline network (Base-CaspsNet). It performs inference 5x faster on small-scale datasets i.e. MNIST, F-MNIST, SVHN and CIFAR-10. QCN however loses testing accuracy marginally compared to the baseline e.g. 1% for F-MNIST dataset.
Our second contribution is designing a capsule-specific layer for the feature extractor of CapsNet referred to as the Convolutional Fully-Connected (CFC) layer. We employ the CFC layer into CapsNet and call this new architecture CFC-CapsNet. CFC layer is added on top of the current feature extractor to translate the feature map into capsules. This layer has two parameters: kernel size and the output dimension. We performed some experiments to explore the effect of these two parameters on the network performance. Using the CFC layer results in reducing the number of parameters, faster training and testing, and higher test accuracy. On the CIFAR-10 dataset, CFC-CapsNet gets 1.46% higher accuracy (with baseline of 71.69%) and 49% fewer number of parameters. CFC-CapsNet is 4x and 4.5x faster than Base-CapsNet on CIFAR-10 for training and testing respectively.
Our third contribution includes the introduction of LE-CapsNet as a light, enhanced and resource-aware variant of CapsNet. This network contains a Primary Capsule Generator (PCG) module as well as a robust decoder. Using 3.8M weights, LE-CapsNet obtains 77.21% accuracy for the CIFAR-10 dataset while performing inference 4x faster than CapsNet. In addition, our proposed network is more robust at detecting images with affine transformations compared to CapsNet. We achieve 94.37% accuracy on the AffNIST dataset (compared to CapsNet's 90.52%).
Finally, we propose a deep variant of CapsNet consisting of several capsule layers referred to as Deep Light CapsNet (DL-CasNet). In this work, we design the Capsule Summarization (CapsSum) layer to reduce the complexity of the proposed deep network by reducing the number of parameters. DL-CapsNet, while being highly accurate, employs a small number of parameters compared to the state-of-the-art CapsNet based networks. Moreover DL-CapsNet delivers faster training and inference. Using a 7-ensemble model on the CIFAR-10 dataset, we achieve a 91.29% accuracy. DL-CapsNet is among the few networks based on CapsNet that supports the CIFAR-100 dataset (68.36% test accuracy using the 7-ensemble model) and can process complex datasets with a high number of categories. / Graduate
Identifer | oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/14124 |
Date | 23 August 2022 |
Creators | Shiri, Pouya |
Contributors | Baniasadi, Amirali |
Source Sets | University of Victoria |
Language | English, English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Available to the World Wide Web |
Page generated in 0.002 seconds