Global ETD Search

Return to search

Forced Attention for Image Captioning

<div>
<div>
<div>
<p>Automatic generation of captions for a given image is an active research area in Artificial
Intelligence. The architectures have evolved from using metadata of the images on which classical
machine learning was employed to neural networks. Two different styles of architectures evolved
in the neural network space for image captioning: Encoder-Attention-Decoder architecture, and
the transformer architecture. This study is an attempt to modify the attention to allow any object
to be specified. An archetypical Encoder-Attention-Decoder architecture (Show, Attend, and Tell
(Xu et al., 2015)) is employed as a baseline for this study, and a modification of the Show, Attend,
and Tell architecture is proposed. Both the architectures are evaluated on the MSCOCO (Lin et al.,
2014) dataset, and seven metrics: BLEU – 1, 2, 3, 4 (Papineni, Roukos, Ward & Zhu, 2002),
METEOR (Banerjee & Lavie, 2005), ROGUE L (Lin, 2004), and CIDer (Vedantam, Lawrence &
Parikh, 2015) are calculated. Finally, the statistical significance of the results is evaluated by
performing paired t tests.
</p>
</div>
</div>
</div>

10.25394/pgs.7408883.v1

Natural Language Processing

Artificial intelligence.

Natural language processsing

Image Captioning

Deep Learning

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/7408883
Date	17 January 2019
Creators	Hemanth Devarapalli (5930603)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/Forced_Attention_for_Image_Captioning/7408883

Page generated in 0.0013 seconds

Forced Attention for Image Captioning

Description

Links & Downloads

Tags

Additional Fields