Global ETD Search

Return to search

IMAGE CAPTIONING USING TRANSFORMER ARCHITECTURE

The domain of Deep Learning that is related to generation of textual description of images is called ‘Image Captioning.’ The central idea behind Image Captioning is to identify key features of an image and create meaningful sentences that describe the image. The current popular models include image captioning using Convolution Neural Network - Long Short-Term Memory (CNN-LSTM) based models and Attention based models. This research work first identifies the drawbacks of existing image captioning models namely – sequential style of execution, vanishing gradient problem and lack of context during training.
This work aims at resolving the discovered problems by creating a Contextually Aware Image Captioning (CATIC) Model. The Transformer architecture, which solves the issues of vanishing gradients and sequential execution, forms the basis of the suggested model. In order to inject the contextualized embeddings of the caption sentences, this work uses Bidirectional Encoder Representation of Transformers (BERT). This work uses Remote Sensing Image Captioning Dataset. The results of the CATIC model are evaluated using BLEU, METEOR and ROGUE scores. On comparison the proposed model outperforms the CNN-LSTM model in all metrices. When compared to the Attention based model’s metrices, the CATIC model outperforms for BLEU2 and ROGUE metrices and gives competitive results for others.

10.25394/pgs.21674945.v1

Natural language processing

Computer vision

Deep learning

Transformer Architecture

Remote Sensing Images

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/21674945
Date	06 December 2022
Creators	Wrucha A Nanal (14216009)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/IMAGE_CAPTIONING_USING_TRANSFORMER_ARCHITECTURE/21674945

Page generated in 0.2806 seconds

IMAGE CAPTIONING USING TRANSFORMER ARCHITECTURE

Description

Links & Downloads

Tags

Additional Fields