Rethink arbitrary style transfer with transformer and contrastive learning

Published: 01 Jan 2024, Last Modified: 08 Apr 2025Comput. Vis. Image Underst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose a novel Style Consistency Instance Normalization (SCIN) to capture long-range and non-local style correlation. This can align the content feature with the style feature instead of the mean and variance computed by fixed VGG.•Considering existing methods always generate low-quality stylized images with artifacts or stylized images with semantic errors, we introduce a novel Instance-based Contrastive Learning (ICL) to learn stylization-to-stylization relation and remove artifacts.•We analyze the defects of attention-based arbitrary style transfer due to fixed VGG and propose a novel Perception Encoder (PE) that can capture style information and avoid paying too much attention on the remarkable classification feature of style images.•Compared to the state-of-the-art method, extensive experiments demonstrate our proposed method can learn detailed texture and global style correlation and remove artifacts.
Loading