Image-Alchemy : Advancing Subject Fidelity in Personalized Text-to-Image Generation

Published: 06 Mar 2025, Last Modified: 14 Apr 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: Personalized Image Generation, LoRA Fine-tuning, Catastrophic Forgetting, Text-to-Image Synthesis, Latent Diffusion Models, Deep generative models
Abstract: Recent advances in text-to-image diffusion models, particularly Stable Diffusion, have enabled the generation of highly detailed and semantically rich images. However, personalizing these models to represent novel subjects based on a few reference images remains challenging. This often leads to catastrophic forgetting, over-fitting, or large computational overhead. We propose a two-stage pipeline that addresses these limitations by leveraging LoRA-based fine-tuning on the attention weights within the U-net of Stable Diffusion XL model. Next, we exploit the unmodified SDXL to generate a generic scene, replacing the subject with its class label. Then we selectively insert the personalized subject through a segmentation-driven Img2Img pipeline that uses the trained LoRA weights. The framework isolates the subject encoding from the overall composition, thus preserving SDXL’s broader generative capabilities while integrating the new subject in a high-fidelity manner. Our method achieves a DINO similarity score of 0.789 on SDXL, outperforming existing personalized text-to-image approaches.
Submission Number: 71
Loading