Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation

Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation

ICLR 2026 Conference Submission15573 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Diffusion Models, Image Reasoning Generation

Abstract: Recent years have witnessed the wonderful effects of diffusion models in generative task. However, existing image generation methods still suffer from the significant "reasoning-oriented generative" dilemma. Motivated by the success of Chain of Thought (CoT) and Reinforcement Learning (RL) in LLMs, we propose SRRL, a self-reflective RL algorithm for diffusion models to achieve reasoning generation of logical images by performing reflection and iteration across generation trajectories. The intermediate samples in the denoising process carry noise, making accurate reward evaluation difficult. To address this challenge, SRRL treats the entire denoising trajectory as a CoT step with multi-round reflective denoising process and introduces condition guided forward process, which allows for reflective iteration between CoT steps. Through SRRL-based iterative diffusion training, we introduce image reasoning through CoT into generation tasks adhering to physical laws and unconventional physical phenomena for the first time. Notably, experimental results of case study exhibit that the superior performance of our SRRL algorithm even compared with advanced T2I models.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 15573

Loading