Autonomous driving technology has the potential to revolutionize the way we think about transportation and its impact on society. Perceiving the environment is a key aspect of autonomous driving, which involves multiple computer vision tasks. Multi-scale deep learning has dramatically improved the performance on many computer vision tasks, but its practical use in autonomous driving is limited by the available resources in embedded systems. Multi-task learning offers a solution to this problem by allowing more compact deep learning models that share parameters between tasks. However, not all tasks benefit from being learned together. One way of avoiding task interference during training is to learn tasks in sequence, with each task providing useful information for the next – a scheme which builds on transfer learning. Multi-task and transfer dynamics are both concerned with the relationships between tasks, but have previously only been studied separately. This Master’s thesis investigates how different computer vision tasks relate to each other in the context of multi-task and transfer learning, using a state-ofthe-art efficient multi-scale deep learning model. Through an experimental research methodology, the performance on semantic segmentation, depth estimation, and object detection were evaluated on the Virtual KITTI 2 dataset in a multi-task and transfer learning setting. In addition, transfer learning with a frozen encoder was compared to constrained encoder fine tuning, to uncover the effects of fine-tuning on task dynamics. The results suggest that findings from previous work regarding semantic segmentation and depth estimation in multi-task learning generalize to multi-scale learning on autonomous driving data. Further, no statistically significant correlation was found between multitask learning dynamics and transfer learning dynamics. An analysis of the results from transfer learning indicate that some tasks might be more sensitive to fine-tuning than others, suggesting that transferring with a frozen encoder only captures a subset of the complexities involved in transfer relationships. Regarding object detection, it is observed to negatively impact the performance on other tasks during multi-task learning, but might be a valuable task to transfer from due to lower annotation costs. Possible avenues for future work include applying the used methodology to real-world datasets and exploring ways of utilizing the presented findings for more efficient perception algorithms. / Självkörande teknik har potential att revolutionera transport och dess påverkan på samhället. Självkörning medför ett flertal uppgifter inom datorseende, som bäst löses med djupa neurala nätverk som lär sig att tolka bilder på flera olika skalor. Begränsningar i mobil hårdvara kräver dock att tekniker som multiuppgifts- och sekventiell inlärning används för att minska neurala nätverkets fotavtryck, där sekventiell inlärning bygger på överföringsinlärning. Dynamiken bakom både multiuppgiftsinlärning och överföringsinlärning kan till stor del krediteras relationen mellan olika uppdrag. Tidigare studier har dock bara undersökt dessa dynamiker var för sig. Detta examensarbete undersöker relationen mellan olika uppdrag inom datorseende från perspektivet av både multiuppgifts- och överföringsinlärning. En experimentell forskningsmetodik användes för att jämföra och undersöka tre uppgifter inom datorseende på datasetet Virtual KITTI 2. Resultaten stärker tidigare forskning och föreslår att tidigare fynd kan generaliseras till flerskaliga nätverk och data för självkörning. Resultaten visar inte på någon signifikant korrelation mellan multiuppgift- och överföringsdynamik. Slutligen antyder resultaten att vissa uppgiftspar ställer högre krav än andra på att nätverket anpassas efter överföring.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-343454 |
Date | January 2023 |
Creators | Ekman von Huth, Simon |
Publisher | KTH, Skolan för elektroteknik och datavetenskap (EECS) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | TRITA-EECS-EX ; 2023:885 |
Page generated in 0.0032 seconds