• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Object Detection via Contextual Information / Objektdetektion via Kontextuell Information

Stålebrink, Lovisa January 2022 (has links)
Using computer vision to automatically process and understand images is becoming increasingly popular. One frequently used technique in this area is object detection, where the goal is to both localize and classify objects in images. Today's detection models are accurate, but there is still room for improvement. Most models process objects independently and do not take any contextual information into account in the classification step. This thesis will therefore investigate if a performance improvement can be achieved by classifying all objects jointly with the use of contextual information. An architecture that has the ability to learn relationships of this type of information is the transformer. To investigate what performance that can be achieved, a new architecture is constructed where the classification step is replaced by a transformer block. The model is trained and evaluated on document images and shows promising results with a mAP score of 87.29. This value is compared to a mAP of 88.19, which was achieved by the object detector, Mask R-CNN, that the new model is built upon.  Although the proposed model did not improve the performance, it comes with some benefits worth exploring further. By using contextual information the proposed model can eliminate the need for Non-Maximum Suppression, which can be seen as a benefit since it removes one hand-crafted process. Another benefit is that the model tends to learn relatively quickly and a single pass over the dataset seems sufficient. The model, however, comes with some drawbacks, including a longer inference time due to the increase in model parameters. The model predictions are also less secure than for Mask R-CNN. With some further investigation and optimization, these drawbacks could be reduced and the performance of the model be improved.

Page generated in 0.083 seconds