Intelligent Picture Painting Under Deep Learning with Text Enhancement

Hon-Man Hammond Lee, Wan-Chi Siu, Felix Ming-Fei Duan, Yi-Hao Cheng, H. Anthony Chan

Published: 01 Jan 2025, Last Modified: 14 Nov 2025DSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we present a novel approach for picture painting that extends the Intelligent Painter framework by integrating a Latent Diffusion Model (LDM) with text embedding for text-to-image generation. Similar to the original Intelligent Painter, which focuses on painting people's imagination, our method operates in the latent space using DDPM sampling and incorporates a mean-conditioned masking strategy. This new strategy allows the model to effectively utilize both object inputs and textual descriptions to guide the composition process. Experimental results demonstrate that our approach produces pictures with improved harmonization and semantic consistency compared to previous methods. The proposed framework offers a controlled and flexible solution for generating new pictures that combine user-specified spatial cues with rich textual information without the need of retraining. This further enhances people's imagination. The codes for the research is available at https://github.com/Hammond65/Advance-Intelligent-Painter

External IDs:dblp:conf/icdsp/LeeSDCC25