Abstract: Highlights•Our method enables arbitrary text-guided style transfer with unpaired training data.•Our cross-space modulation module is able to bridge the VGG and CLIP feature spaces.•The CLIP-based style contrastive loss helps in learning better style representations.•Extensive experiments are conducted to verify the effectiveness of our method.
Loading