Abstract: In this paper, we present a novel approach for picture painting that extends the Intelligent Painter framework by integrating a Latent Diffusion Model (LDM) with text embedding for text-to-image generation. Similar to the original Intelligent Painter, which focuses on painting people's imagination, our method operates in the latent space using DDPM sampling and incorporates a mean-conditioned masking strategy. This new strategy allows the model to effectively utilize both object inputs and textual descriptions to guide the composition process. Experimental results demonstrate that our approach produces pictures with improved harmonization and semantic consistency compared to previous methods. The proposed framework offers a controlled and flexible solution for generating new pictures that combine user-specified spatial cues with rich textual information without the need of retraining. This further enhances people's imagination. The codes for the research is available at https://github.com/Hammond65/Advance-Intelligent-Painter
External IDs:dblp:conf/icdsp/LeeSDCC25
Loading