Abstract: Text-driven style transfer methods leveraging diffusion models have shown impressive creativity, yet they still face challenges in maintaining consistent structure and content preservation. Existing methods often directly concatenate the content and style prompts for a prompt-level style injection. However, this coarse-grained style injection strategy inevitably leads to structural deviations in the stylized images. This poses a significant obstacle for professional artists and creators seeking precise artistic editing. In this work, we strive to attain a harmonious balance between content preservation and style transformation. We propose Adaptive Style Incorporation (ASI), to achieve fine-grained feature-level style incorporation. It consists of the Siamese Cross-Attention~(SiCA) to decouple the single-track cross-attention to a dual-track structure to obtain separate content and style features, and the Adaptive Content-Style Blending (AdaBlending) module to couple the content and style information from a structure-consistent manner. Experimentally, our method exhibits much better performance in both structure preservation and stylized effects.
Loading