Multi-modality fusion is an area of research that has shown promising results in the domain of 2D and 3D object detection. However, multi-modality fusion methods have largely not been utilized in the domain of instance segmentation. This master’s thesis investigated if multi-modality fusion methods can be applied to deep learning instance segmentation models to improve their performance on multi-modality data. The two multi-modality fusion methods presented, input extension and feature fusions, were applied to a two-stage instance segmentation model, Mask R-CNN, and a single-stage instance segmentation model, RTMDet. Models were trained on different variations of preprocessed RGBD and ToF data provided by SICK IVP, as well as RGBD data from the publicly available NYUDepth dataset. The master’s thesis concludes that the multi-modality fusion method presented as feature fusion can be applied to the Mask R-CNN model to improve the networks performance by 1.8%points (1.8%pt.) bounding box mAP and 1.6%pt. segmentation mAP on SICK RGBD, 7.7%pt. bounding box mAP and 7.4%pt. segmentation mAP on ToF, and 7.4%pt. bounding box mAP and 7.4%pt. segmentation mAP on NYUDepth. The RTMDet model saw little to no improvements from the inclusion of depth but had similar baseline performance as the improved Mask R-CNN model that utilized feature fusion. The input extension method saw no improvements to performance as it faced technical implementation limitations.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-199768 |
Date | January 2023 |
Creators | Öhrling, Jonathan |
Publisher | Linköpings universitet, Institutionen för systemteknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0029 seconds