Most neural networks use a normal convolutional layer that assumes that all input pixels are valid pixels. However, pixels added to the input through padding result in adding extra information that was not initially present. This extra information can be considered invalid. Invalid pixels can also be inside the image where they are referred to as holes in completion tasks like image inpainting. In this work, we look for a method that can handle both types of invalid pixels. We compare on the same test bench two methods previously used to handle invalid pixels outside the image (Partial and Edge convolutions) and one method that was designed for invalid pixels inside the image (Gated convolution). We show that Partial convolution performs the best in image classification while Gated convolution has the advantage on semantic segmentation. As for hotel recognition with masked regions, none of the methods seem appropriate to generate embeddings that leverage the masked regions. / Master of Science / A module at the heart of deep neural networks built for Artificial Intelligence is the convolutional layer. When multiple convolutional layers are used together with other modules, a Convolutional Neural Network (CNN) is obtained. These CNNs can be used for tasks such as image classification where they tell if the object in an image is a chair or a car, for example. Most CNNs use a normal convolutional layer that assumes that all parts of the image fed to the network are valid. However, most models zero pad the image at the beginning to maintain a certain output shape. Zero padding is equivalent to adding a black frame around the image. These added pixels result in adding information that was not initially present. Therefore, this extra information can be considered invalid. Invalid pixels can also be inside the image where they are referred to as holes in completion tasks like image inpainting where the network is asked to fill these holes and give a realistic image. In this work, we look for a method that can handle both types of invalid pixels. We compare on the same test bench two methods previously used to handle invalid pixels outside the image (Partial and Edge convolutions) and one method that was designed for invalid pixels inside the image (Gated convolution). We show that Partial convolution performs the best in image classification while Gated convolution has the advantage on semantic segmentation. As for hotel recognition with masked regions, none of the methods seem appropriate to generate embeddings that leverage the masked regions.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/98619 |
Date | 29 May 2020 |
Creators | Messou, Ehounoud Joseph Christopher |
Contributors | Electrical and Computer Engineering, Huang, Jia-Bin, Dhillon, Harpreet Singh, Abbott, A. Lynn |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0018 seconds