Abstract: Despite the recent success of image style transfer with Generative Adversarial Networks (GANs), this task remains challenging due to the requirements of large volumes of style image data. In this work, we present a deep model called CycleTransformer to optimize the mapping between a content image and a single style image by leveraging the strengths of transformer encoders and generative adversarial networks, where we advocate for patch-level operations. Our proposed network contains a Multi-level Patch Transformer encoder (MPT), which enables effective utilization of the style features of different scales. We combine the patch-based features with global feature maps to avoid overfitting to local style patterns, and feed them to a dynamic filtering decoder to adapt to different styles when generating the final result. Furthermore, we use a cycle-consistent training scheme to ensure the balance between content preservation and stylizing effects. Experiments and a user study confirm that our method substantially outperforms the state-of-the-art style transfer methods when both the style and content domain only contain one image each.
Loading