Personalized Text-to-Image Generation with Attribute Disentanglement and Feature Embedding

01 Mar 2025 (modified: 02 Mar 2025)XJTU 2025 CSUC SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text-to-Image, Diffusion, ProSpect
Abstract: Text-to-image generation has made remarkable progress in recent years, yet achieving personalized and consistent generation remains a significant challenge. In this paper, we propose a novel framework that combines ProSpect-inspired multi-stage learning and attribute disentanglement with advanced feature extraction and embedding techniques to address these challenges. Our method leverages a diffusion-based architecture to generate high-fidelity images conditioned on both textual prompts and user-provided reference images. By adopting a multi-stage approach, the model progressively refines the image, ensuring alignment with global structures, specific attributes, and fine-grained details. Attribute disentanglement enables precise control over visual characteristics such as style, color, and structure, while feature extraction and embedding mechanisms ensure accurate representation of user-specific concepts. Our approach requires only a single reference image, making it highly practical and scalable. Extensive experiments demonstrate that our method outperforms existing approaches in terms of image quality, personalization accuracy, and cross-generation consistency. Additionally, our framework offers strong editability, allowing users to modify specific attributes without compromising overall quality. This work advances the state-of-the-art in text-to-image generation, providing a robust and flexible solution for personalized and consistent image creation in creative applications.
Submission Number: 30
Loading