Monocular 6DoF pose tracking has many applications in augmented reality, robotics and other areas and because of the rise of deep learning new approaches such as category-level models are successful. The temporal information in sequential data is essential for both online and offline tasks, which can help boost the quality of predictions while encountering some unexpected influences like occlusions and vibration. In 2D object detection and tracking, substantial research has been done in leveraging temporal information to improve the performance of the model. Nevertheless, it is challenging to lift the temporal processing to 3D space because of the ambiguity of the visual data. In this thesis, we propose a method to calculate the temporal difference of points and pixels assuming that the K nearest points share similar features. The extracted features from the difference are learned to weigh the relevant points in the temporal sequence and aggregate them to provide support to the current frame's prediction. We propose a novel difference-based temporal module to incorporate both RGB and 3D points data in a temporal sequence. This module can be easily integrated with any category-level 6DoF pose tracking model which uses RGB and 3D points as input. We evaluate this module on two state-of-the-art category-level 6D pose tracking models and the result shows that it can increase the model's accuracy and robustness in complex scenarios.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/45870 |
Date | 22 January 2024 |
Creators | Chen, Zishen |
Contributors | Lang, Jochen |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Attribution-NonCommercial 4.0 International, http://creativecommons.org/licenses/by-nc/4.0/ |
Page generated in 0.0047 seconds