Global ETD Search

Return to search

Incorporating Sparse Attention Mechanism into Transformer for Object Detection in Images / Inkludering av gles attention i en transformer för objektdetektering i bilder

DEtection TRansformer, DETR, introduces an innovative design for object detection based on softmax attention. However, the softmax operation produces dense attention patterns, i.e., all entries in the attention matrix receive a non-zero weight, regardless of their relevance for detection. In this work, we explore several alternatives to softmax to incorporate sparsity into the architecture of DETR. Specifically, we replace softmax with a sparse transformation from the α-entmax family: sparsemax and entmax-1.5, which induce a set amount of sparsity, and α-entmax, which treats sparsity as a learnable parameter of each attention head. In addition to evaluating the effect on detection performance, we examine the resulting attention maps from the perspective of explainability. To this end, we introduce three evaluation metrics to quantify the sparsity, complementing the qualitative observations. Although our experimental results on the COCO detection dataset do not show an increase in detection performance, we find that learnable sparsity provides more flexibility to the model and produces more explicative attention maps. To the best of our knowledge, we are the first to introduce learnable sparsity into the architecture of transformer-based object detectors. / DEtection Transformer, DETR, introducerar en innovativ design för objektdetektering baserad på softmax attention. Softmax producerar tät attention, alla element i attention-matrisen får en vikt skild från noll, oberoende av deras relevans för objektdetektering. Vi utforskar flera alternativ till softmax för att inkludera gleshet i DETRs arkitektur. Specifikt så ersätter vi softmax med en gles transformation från α-entmax familjen: sparsemax och entmax1.5, vilka inducerar en fördefinierad mängd gleshet, och α-entmax, som ser gleshet som en träningsbar parameter av varje attention-huvud. Förutom att evaluera effekten på detekteringsprestandan, så utforskar vi de resulterande attention-matriserna från ett förklarbarhetsperspektiv. Med det som mål så introducerar vi tre olika metriker för att evaluera gleshet, som ett komplement till de kvalitativa observationerna. Trots att våra experimentella resultat på COCO, ett utmanande dataset för objektdetektering, inte visar en ökning i detekteringsprestanda, så finner vi att träningsbar gleshet ökar modellens flexibilitet, och producerar mer förklarbara attentionmatriser. Såvitt vi vet så är vi de första som introducerar träningsbar gleshet i transformer-baserade arkitekturer för objektdetektering.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-321997

Detection Transformer

Detektionstransformator

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-321997
Date	January 2022
Creators	Duc Dao, Cuong
Publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	Swedish
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EECS-EX ; 2022:650

Page generated in 0.0029 seconds

Incorporating Sparse Attention Mechanism into Transformer for Object Detection in Images / Inkludering av gles attention i en transformer för objektdetektering i bilder

Description

Links & Downloads

Tags

Additional Fields