AMMUNIT: An Attention-Based Multimodal Multi-domain UNsupervised Image-to-Image Translation Framework

Published: 2022, Last Modified: 05 Nov 2025ICANN (2) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We address the open problem of unsupervised multimodal multi-domain image-to-image (I2I) translation using a generative adversarial network with attention mechanism. Previous works, such as CycleGAN, MUNIT, and StarGAN2 are able to translate images among multiple domains and generate diverse images, but they often introduce unwanted changes to the background. In this paper, we propose a simple yet effective attention-based framework for unsupervised I2I translation. Our framework not only translates solely objects of interests and leave the background unaltered, but also generates images for multiple domains simultaneously. Unlike recent studies on unsupervised I2I with attention mechanism that require ground truth for learning attention maps, our approach learns attention maps in an unsupervised manner. Extensive experiments show that our framework is superior than the state-of-the-art baselines.
Loading