Abstract: Artistic image creation has recently attracted considerable interest. Style and content are two key components of an artistic image. Existing methods typically learn style from paintings/texts while extracting content from photographs, or alternatively, derive both style and content from texts. In this paper, we revisit the distinctive characteristics of style and content to explore a more reasonable way for artistic image creation. Intuitively, style is usually highly abstract and intricate, which poses a challenge when describing it in words. Therefore, representing style with a painting is often a better choice. Conversely, content is more concrete and tangible, making it more easily describable through texts. Based on above analysis, we propose a novel text-driven painting variation method called PaintDiffusion, which creates new artistic images by rearranging the style elements within a painting to align with the content structures described by a text prompt. PaintDiffusion is built upon the pretrained Stable Diffusion model conditioning on images/texts. To achieve our goal, we first design a condition seeking strategy to find an optimal condition code that can seamlessly integrate the style information from the image condition and the content information from the text condition. Moreover, a style-content calibration strategy is introduced to further refine and enhance the details by adjusting the noisy latent code with the gradients of well-designed calibration functions. The obtained condition code, along with the adjusted latent code, can collaboratively guide our model to produce desired artistic images with impressive quality. In addition, our PaintDiffusion does not require heavy model training or fine-tuning, significantly reducing the time and resources required. Extensive experiments are conducted to verify the effectiveness of our method.
Loading