Abstract: Due to the limitation of spatial receptive field, it is challenging for CNN-based style transfer methods to capture rich and long-range semantic concepts in artworks. Though the transformer provides a fresh solution by considering long-range dependencies, it suffers from the heavy burdens of parameter scale and computation cost especially in vision tasks. In this paper, we design a compact transformer architecture AdaFormer to address the problem. The model scale shrinks about 20% compared to state-of-the-art transformer for style transfer. Furthermore, we explore the adaptive style transfer by letting the content to select the detailed style element automatically and adaptively, which encourages the output to be both appealing and reasonable. We evaluate AdaFormer comprehensively in the experiments and the results have shown the effectiveness and superiority of our approach compared to existing artistic methods. Diverse plausible stylized images are obtained with better content preservation, more convincing style sfumato and lower computation complexity.
0 Replies
Loading