Keywords: Diffusion models, Style transfer
Abstract: Recently, diffusion models have demonstrated superior performance in text-guided image style transfer. However, there exists fundamental trade-off between transforming styles and maintaining content in diffusion models. Although a simple remedy would be using deterministic sampling scheme such as denoising diffusion implicit model (DDIM) that guarantees the perfect reconstruction, it requires the computationally expensive fining-tuning of the diffusion models. To address this, here we present a text-guided sampling scheme using a patch-wise contrastive loss fine-tuning. By exploiting the contrastive loss between the samples and the original images, our diffusion model can generate an image with the same semantic content as the source image. Experimental results demonstrate that our approach outperforms the existing methods while maintaining content and requiring no additional training on the diffusion model.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Generative models
9 Replies
Loading