Global ETD Search

Return to search

Hledání obrázků k textům / Matching Images to Texts

We build a joint multimodal model of text and images for automatically assigning illustrative images to journalistic articles. We approach the task as an unsupervised representation learning problem of finding a common representation that abstracts from individual modalities, inspired by multimodal Deep Boltzmann Machine of Srivastava and Salakhutdinov. We use state-of-the-art image content classification features obtained from the Convolutional Neural Network of Krizhevsky et al. as input "images" and entire documents instead of keywords as input texts. A deep learning and experiment management library Safire has been developed. We have not been able to create a successful retrieval system because of difficulties with training neural networks on the very sparse word observation. However, we have gained substantial understanding of the nature of these difficulties and thus are confident that we will be able to improve in future work.

http://www.nusl.cz/ntk/nusl-340872

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:340872
Date	January 2014
Creators	Hajič, Jan
Contributors	Pecina, Pavel, Průša, Daniel
Source Sets	Czech ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0015 seconds

Hledání obrázků k textům / Matching Images to Texts

Description

Links & Downloads

Tags

Additional Fields