Return to search

Multi-spectral Fusion for Semantic Segmentation Networks

Indiana University-Purdue University Indianapolis (IUPUI) / Semantic segmentation is a machine learning task that is seeing increased utilization
in multiples fields, from medical imagery, to land demarcation, and autonomous vehicles.
Semantic segmentation performs the pixel-wise classification of images, creating a new, seg-
mented representation of the input that can be useful for detected various terrain and objects
within and image. Recently, convolutional neural networks have been heavily utilized when
creating neural networks tackling the semantic segmentation task. This is particularly true
in the field of autonomous driving systems.
The requirements of automated driver assistance systems (ADAS) drive semantic seg-
mentation models targeted for deployment on ADAS to be lightweight while maintaining
accuracy. A commonly used method to increase accuracy in the autonomous vehicle field is
to fuse multiple sensory modalities. This research focuses on leveraging the fusion of long
wave infrared (LWIR) imagery with visual spectrum imagery to fill in the inherent perfor-
mance gaps when using visual imagery alone. This comes with a host of benefits, such as
increase performance in various lighting conditions and adverse environmental conditions.
Utilizing this fusion technique is an effective method of increasing the accuracy of a semantic
segmentation model. Being a lightweight architecture is key for successful deployment on
ADAS, as these systems often have resource constraints and need to operate in real-time.
Multi-Spectral Fusion Network (MFNet) [1] accomplishes these parameters by leveraging
a sensory fusion approach, and as such was selected as the baseline architecture for this
research.
Many improvements were made upon the baseline architecture by leveraging a variety
of techniques. Such improvements include the proposal of a novel loss function categori-
cal cross-entropy dice loss, introduction of squeeze and excitation (SE) blocks, addition of
pyramid pooling, a new fusion technique, and drop input data augmentation. These improve-
ments culminated in the creation of the Fast Thermal Fusion Network (FTFNet). Further
improvements were made by introducing depthwise separable convolutional layers leading to
lightweight FTFNet variants, FTFNet Lite 1 & 2.
13
The FTFNet family was trained on the Multi-Spectral Road Scenarios (MSRS) and MIL-
Coaxials visual/LWIR datasets. The proposed modifications lead to an improvement over
the baseline in mean intersection over union (mIoU) of 2.92% and 2.03% for FTFNet and
FTFNet Lite 2 respectively when trained on the MSRS dataset. Additionally, when trained
on the MIL-Coaxials dataset, the FTFNet family showed improvements in mIoU of 8.69%,
4.4%, and 5.0% for FTFNet, FTFNet Lite 1, and FTFNet Lite 2.

Identiferoai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/33368
Date05 1900
CreatorsEdwards, Justin
ContributorsEl-Sharkawy, Mohamed, King, Brian, Kim, Dongsoo
Source SetsIndiana University-Purdue University Indianapolis
Languageen_US
Detected LanguageEnglish
TypeThesis
RightsAttribution-ShareAlike 4.0 International, http://creativecommons.org/licenses/by-sa/4.0/

Page generated in 0.0023 seconds