• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

AATrackT: A deep learning network using attentions for tracking fast-moving and tiny objects : (A)ttention (A)ugmented - (Track)ing on (T)iny objects

Lundberg Andersson, Fredric January 2022 (has links)
Recent advances in deep learning have made it possible to visually track objects from a video sequence. Moreover, as transformers got introduced in computer vision, new state-of-the-art performances were achieved in visual tracking. However, most of these studies have used attentions to correlate the distinguishing factors between target-object and candidate-objects to localise the object throughout the video sequence. This approach is not adequate for tracking tiny objects. Also, conventional trackers in general are often not applicable to tracking extreme small objects, or objects that are moving fast. Therefore, the purpose of this study is to improve current methods to track tiny fast-moving objects, with the help of attentions. A deep neural network, named AATrackT, is built to address this gap by referring to it as a visual image segmentation problem. The proposed method is using data extracted from broadcasting videos of the sport Tennis. Moreover, to capture the global context of images, attention augmented convolutions are used as a substitute to the conventional convolution operation. Contrary to what the authors assumed, the experiment showed an indication that using attention augmented convolutions did not contribute to increasing the tracking performance. Our findings showed that the reason is mainly that the spatial resolution of the activation maps of 72x128 is too large for the attention weights to converge.

Page generated in 0.082 seconds