Finetuning-free Alignment of Diffusion Model for Text-to-Image Generation

Finetuning-free Alignment of Diffusion Model for Text-to-Image Generation

ICLR 2026 Conference Submission22717 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion model, Text-to-image generation, Alignment

Abstract: Diffusion models have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained diffusion models to maximize a given reward function, these approaches require extensive computational resources and may not generalize well across different objectives. In this work, we propose a novel fine-tuning-free alignment framework by leveraging the underlying nature of the alignment problem---sampling from reward-weighted distributions. Moreover, we give an in-depth discussion of adopting current guidance methods for text-to-image alignment. We identify a fundamental challenge: the adversarial nature of the guidance term can introduce undesirable artifacts in the generated images. To address this, we propose a regularization strategy that stabilizes the guidance signal. We evaluate our approach on a text-to-image benchmark and demonstrate comparable performance to state-of-the-art models with one-step generation while achieving at least a 60% reduction in computational cost.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 22717

Loading