- TL;DR: We propose a novel method to manipulate given images using natural language descriptions.
- Abstract: We propose a novel generative adversarial network for visual attributes manipulation (ManiGAN), which is able to semantically modify the visual attributes of given images using natural language descriptions. The key to our method is to design a novel co-attention module to combine text and image information rather than simply concatenating two features along the channel direction. Also, a detail correction module is proposed to rectify mismatched attributes of the synthetic image, and to reconstruct text-unrelated contents. Finally, we propose a new metric for evaluating manipulation results, in terms of both the generation of text-related attributes and the reconstruction of text-unrelated contents. Extensive experiments on benchmark datasets demonstrate the advantages of our proposed method, regarding the effectiveness of image manipulation and the capability of generating high-quality results.
- Keywords: Image Manipulation, Natural Language, Generative Adversarial Networks