Abstract: Maintaining content consistency is a critical point in unpaired image-to-image translation. Contrastive learning is an essential framework in unpaired image-to-image translation to maximize the mutual information between corresponding patches of input and generated images. However, ignoring the importance of background information, the previous CL-based methods suffer from the problems that the background of generated images, as well as the regions where the object and background intersect, appear to be blurred. To tackle these problems, we propose cluster-guided contrastive learning in the I2I task. We leverage the cluster information of features to mine hard negative samples and distinguish the concept and the detail without additional parameters. We further provide rigorous proof of mutual information of our proposed method, which proves that our method preserves the lower bound of previous work and has a lower variance. Our proposed method, CGCUT, achieves state-of-the-art performance on most metrics on three benchmark datasets.
Loading