Multiple object tracking (MOT) can be an efficient tool for finding patterns in video monitoring data. In this thesis, we investigate which type of video data works best for MOT in an indoor combat training scenario. The three types of camera data evaluated are color data, near-infrared (NIR) data, and depth data. In order to evaluate which of these lend themselves best for MOT, we develop object tracking models based on YOLOv5 and DeepSORT, and train the models on the respective types of data. In addition to the individual models, ensembles of the three models are also developed, to see if any increase in performance can be gained. The models are evaluated using the well-established MOT evaluation metrics, as well as studying the frame rate performance of each model. The results are rigorously analyzed using statistical significance tests, to ensure only well-supported conclusions are drawn. These evaluations and analyses show mixed results. Regarding the MOT metrics, the performance of most models were not shown to be significantly different from most other models, so while a difference in performance was observed, it cannot be assumed to hold over larger sample sizes. Regarding frame rate, we find that the ensemble models are significantly slower than the individual models on their own.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hj-57120 |
Date | January 2022 |
Creators | Zenk, Viktor, Bach, Willy |
Publisher | Jönköping University, JTH, Avdelningen för datavetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0023 seconds