InEdit-Bench: Benchmarking Intermediate Logical Pathways for Intelligent Image Editing Models

Zhiqiang Sheng; Xumeng Han; Zhiwei Zhang; Zenghui Xiong; Yifan Ding; Aoxiang Ping; Xiang Li; Tong Guo; Yao Mao

InEdit-Bench: Benchmarking Intermediate Logical Pathways for Intelligent Image Editing Models

Zhiqiang Sheng, Xumeng Han, Zhiwei Zhang, Zenghui Xiong, Yifan Ding, Aoxiang Ping, Xiang Li, Tong Guo, Yao Mao

17 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Intelligent Image Editing Models, Benchmark, Image Editing, Visual Reasoning

TL;DR: InEdit-Bench is the first system evaluation benchmark for image multi-step editing and intermediate logical reasoning, covering 16 sub-tasks and 6 evaluation dimensions.

Abstract: Multimodal generative models have made significant strides in image editing, demonstrating impressive performance on a variety of static tasks. However, their proficiency typically does not extend to complex scenarios requiring dynamic reasoning, leaving them ill-equipped to model the coherent, intermediate logical pathways that constitute a multi-step evolution from an initial state to a final one. This capacity is crucial for unlocking a deeper level of procedural and causal understanding in visual manipulation. To systematically measure this critical limitation, we introduce InEdit-Bench, the first evaluation benchmark dedicated to reasoning over intermediate pathways in image editing. InEdit-Bench comprises a meticulously hand-annotated dataset spanning 4 fundamental categories: state transition, dynamic process, temporal sequence, and scientific simulation, which collectively cover 16 distinct sub-tasks. We also propose a suite of 6 evaluation metrics to assess the logical coherence and visual naturalness of the generated pathways, as well as model fidelity to specified or novel path constraints. Our comprehensive evaluation of 14 representative image editing models on InEdit-Bench reveals significant and widespread shortcomings in this domain. By providing a standardized and challenging benchmark, we aim for InEdit-Bench to catalyze research and steer development towards more dynamic, reason-aware, and intelligent multimodal generative models.

Primary Area: datasets and benchmarks

Submission Number: 9423

Loading