Learning Generative Image Manipulations from Language Instructions

Martin Längkvist; Andreas Persson; Amy Loutfi

Learning Generative Image Manipulations from Language Instructions

Martin Längkvist, Andreas Persson, Amy Loutfi

30 Apr 2020 (modified: 05 May 2023)CARLA 2020Readers: Everyone

Keywords: image manipulation, predictive learning, relational network, cognitive learning, image generation

Abstract: This paper studies whether a perceptual visual system can simulate human-like cognitive capabilities by training a computational model to predict the output of an action using language instruction. The aim is to ground action words such that an AI is able to generate an output image that outputs the effect of a certain action on an given object. The output of the model is a synthetic generated image that demonstrates the effect that the action has on the scene. This work combines an image encoder, language encoder, relational network, and image generator to ground action words, and then visualize the effect an action would have on a simulated scene. The focus in this work is to learn meaningful shared image and text representations for relational learning and object manipulation.

0 Replies

Loading