In the field of robotics and autonomous vehicles, the use of RGB-D data and LiDAR sensors is a popular practice for applications such as SLAM[14], object classification[19] and scene understanding[5]. This thesis explores the problem of semantic segmentation using deep multimodal fusion of LRF and depth data. Two data set consisting of 1080 and 108 data points from two scenes is created and manually labeled in 2D space and transferred to 1D using a proposed label transfer method utilizing hierarchical clustering. The data set is used to train and validate the suggested method for segmentation using a proposed dual encoder-decoder network based on SalsaNet [1] with gradual fusion in the decoder. Applying the suggested method yielded an improvement in the scenario of an unseen circuit when compared to uni-modal segmentation using depth, RGB, laser, and a naive combination of RGB-D data. A suggestion of feature extraction in the form of PCA or stacked auto-encoders is suggested as a further improvement for this type of fusion. The source code and data set are made publicly available at https://github.com/Anguse/salsa_fusion.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hh-42239 |
Date | January 2020 |
Creators | Lilja, Harald |
Publisher | Högskolan i Halmstad, CAISR Centrum för tillämpade intelligenta system (IS-lab) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0019 seconds