Abstract: The goal of image-to-image translation (I2I) is to translate images from one domain to another while maintaining the content representations.
A popular method for I2I translation involves the use of a reference image to guide the transformation process.
However, most of the architectures fail to maintain the input main characteristics and produce images too close to the reference during style transfer.
In order to avoid this problem, we propose a novel architecture which is able to perform source-coherent translation between multiple domains.
Our goal is to preserve input details during I2I translation by weighting the style code obtained from the reference images before applying it to the source image. Therefore, we choose to mask the reference images in an unsupervised way, before extracting the style from them. By doing so, the input characteristics while performing style transfer are better maintained. As a result, we also increase the diversity in images generated by extracting style from the same reference. Additionally, the adaptive normalization layers, commonly used to inject styles into the model, are substituted with an attention mechanism for the purpose of increasing the quality in generated images.
Several experiments are performed on CelebA-HQ and AFHQ datasets in order to prove the efficacy of the proposed system. Quantitative results, measured with LPIPS and FID metrics, demonstrate the superiority the proposed architecture compared to the state of art.
Loading