ReasonEdit: Towards Reasoning-Enhanced Image Editing Models

04 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unified model; Understand; Image Edit
Abstract: Recent advances in image editing models have shown remarkable progress. A common architectural design couples a multimodal large language model (MLLM) encoder with a diffusion decoder, as seen in systems such as Step1X-Edit and Qwen-Image-Edit, where the MLLM encodes both the reference image and the instruction but remains frozen during training. In this work, we demonstrate that unlocking the reasoning capabilities of MLLM can further push the boundaries of editing models. Specifically, we explore two reasoning mechanisms, thinking and reflection, which enhance instruction understanding and editing accuracy. Based on that, our proposed framework enables image editing in a thinking–editing–reflection loop: the thinking mechanism leverages the world knowledge of MLLM to interpret abstract instructions, while the reflection reviews editing results and automatically corrects unintended manipulations. Extensive experiments demonstrate that our reasoning approach achieves significant performance gains over the base model, with improvements of ImgEdit (+0.7%), GEdit (+1.4%), and Kris (2.5%), and also outperforms previous open-source methods on both Kris and ImgEdit.
Primary Area: generative models
Submission Number: 1891
Loading