Global ETD Search

Return to search

Using a Character-Based Language Model for Caption Generation / Användning av teckenbaserad språkmodell för generering av bildtext

Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness. Previous work has shown that character-based language models are able to outperform token-based language models in morphologically rich languages. Other studies show that simple multi-layered LSTM-blocks are able to learn to replicate the syntax of its training data. To study the usability of character-based language models an alternative model based on TensorFlow im2txt has been created. The model changes the token-generation architecture into handling character-sized tokens instead of word-sized tokens. The results suggest that a character-based language model could outperform the current token-based language models, although due to time and computing power constraints this study fails to draw a clear conclusion. A problem with one of the methods, subsampling, is discussed. When using the original method on character-sized tokens this method removes characters (including special characters) instead of full words. To solve this issue, a two-phase approach is suggested, where training data first is separated into word-sized tokens where subsampling is performed. The remaining tokens are then separated into character-sized tokens. Future work where the modified subsampling and fine-tuning of the hyperparameters are performed is suggested to gain a clearer conclusion of the performance of character-based language models.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-163001

Natural Language Processing

Recurrent Neural Network

Long-Short-Term-Memory

LSTM

word2vec

Language Model

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-163001
Date	January 2019
Creators	Keisala, Simon
Publisher	Linköpings universitet, Interaktiva och kognitiva system
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0027 seconds

Using a Character-Based Language Model for Caption Generation / Användning av teckenbaserad språkmodell för generering av bildtext

Description

Links & Downloads

Tags

Additional Fields