Return to search

Natural Language Driven Image Edits using a Semantic Image Manipulation Language

Language provides us with a powerful tool to articulate and express ourselves! Understanding and harnessing the expressions of natural language can open the doors to a vast array of creative applications. In this work we explore one such application - natural language based image editing. We propose a novel framework to go from free-form natural language commands to performing fine-grained image edits.

Recent progress in the field of deep learning has motivated solving most tasks using end-to-end deep convolutional frameworks. Such methods have shown to be very successful even achieving super-human performance in some cases. Although such progress has shown significant promise for the future we believe there is still progress to be made before their effective application to a task like fine-grained image editing. We approach the problem by dissecting the inputs (image and language query) and focusing on understanding the language input utilizing traditional natural language processing (NLP) techniques. We start by parsing the input query to identify the entities, attributes and relationships and generate a command entity representation. We define our own high-level image manipulation language that serves as an intermediate programming language connecting natural language requests that represent a creative intent over an image into the lower-level operations needed to execute them. The semantic command entity representations are mapped into this high- level language to carry out the intended execution. / Master of Science / Image editing is a very challenging task that requires a specific skill set. Hence, Going from natural language to directly performing image edits thereby automating the entire procedure is a challenging problem as well as a potential application that could benefit widespread users. There are multiple stages involved in such a process starting with understanding the intent of a command provided in natural language, identifying the editing tasks represented by it and the different objects and properties of the image the command intends to act upon and finally performing the intended edit(s).

There has been significant progress in the field of natural language processing as well as computer vision in recent years. On the natural language front computers are now able to accurately parse sentences, analyze large amounts of text, classify sentiments and emotions and much more. Similarly on the computer vision side computers can accurately identify objects, localize them and even generate real life like images from random noise pixels.

In this work, we propose a novel framework that enables us to go from natural language commands to performing image edits. Our approach starts by parsing the language input, identifying the entities and relations in the image from the language followed by mapping it into a set of sequential executable commands in an intermediate programming language that we define to execute the edit.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/83452
Date04 June 2018
CreatorsMohapatra, Akrit
ContributorsElectrical and Computer Engineering, Huang, Jia-Bin, Batra, Dhruv, Abbott, A. Lynn
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0018 seconds