Background-focused contrastive learning for unpaired image-to-image translation

Published: 01 Jan 2024, Last Modified: 15 Feb 2025J. Electronic Imaging 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Contrastive learning for unpaired image-to-image translation (CUT) aims to learn a mapping from source to target domain with an unpaired dataset, which combines contrastive loss to maximize the mutual information between real and generated images. However, the existing CUT-based methods exhibit unsatisfactory visual quality due to the wrong locating of objects and backgrounds, particularly where it incorrectly transforms the background to match the object pattern in layout-changing datasets. To alleviate the issue, we present background-focused contrastive learning for unpaired image-to-image translation (BFCUT) to improve the background’s consistency between real and its generated images. Specifically, we first generate heat maps to explicitly locate the objects and backgrounds for subsequent contrastive loss and global background similarity loss. Then, the representative queries of objects and backgrounds rather than randomly sampling queries are selected for contrastive loss to promote reality of objects and maintenance of backgrounds. Meanwhile, global semantic vectors with less object information are extracted with the help of heat maps, and we further align the vectors of real images and their corresponding generated images to promote the maintenance of the backgrounds in global background similarity loss. Our BFCUT alleviates the wrong translation of backgrounds and generates more realistic images. Extensive experiments on three datasets demonstrate better quantitative results and qualitative visual effects.
Loading