VST-SD: Versatile Style Transfer with Content-Style Statistics Disentanglement

Linfeng Wen; Qinliang Su; Haoran Mo; Chengying Gao; Wei-Shi Zheng

VST-SD: Versatile Style Transfer with Content-Style Statistics Disentanglement

Linfeng Wen, Qinliang Su, Haoran Mo, Chengying Gao, Wei-Shi Zheng

14 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: versatile style transfer, content-style disentanglement

TL;DR: A general framework for image and text-guided style transfer

Abstract: Recent works in versatile style transfer have achieved impressive results in both content preservation and style fidelity. However, optimizing models solely with content and style losses often fails to match the real image distribution, leading to suboptimal stylization quality. In this paper, we propose a novel self-supervised framework, VST-SD, which disentangles content and style representations to enhance stylization performance. Specifically, we separate content and style from the input and train the model to reconstruct the original image. To facilitate effective disentanglement, we leverage feature statistics: a content encoder is designed with perturbation and compression to remove style-related statistics, while a style encoder employs magnitude preservation to capture style-specific information. A cascade of diffusion models are introduced to integrate content and style into new images. To support multi-modal capabilities in versatile style transfer, we construct a paired text-style dataset and design a pipeline enabling flexible, text-guided stylization. Experimental results across artistic, photorealistic, and text-guided stylization demonstrate the effectiveness and versatility of our approach.

Primary Area: generative models

Submission Number: 5162

Loading