Runge-Kutta Approximation and Decoupled Attention for Rectified Flow Inversion and Semantic Editing

ICLR 2026 Conference Submission17346 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Models, Rectified Flow, Inversion, Semantic Editing
TL;DR: We enhance text-guided semantic editing in rectified flow models by proposing a high-order solver and a decoupled attention mechanism, jointly improving fidelity-editability balance.
Abstract: Rectified flow (RF) models have recently demonstrated superior generative performance compared to DDIM-based diffusion models. However, in real-world applications, they suffer from two major challenges: (1) low inversion accuracy that hinders the consistency with the source image, and (2) entangled multimodal attention in diffusion transformers, which hinders precise attention control. To address the first challenge, we propose an efficient high-order inversion method for rectified flow models based on the Runge-Kutta solver of differential equations. To tackle the second challenge, we introduce Decoupled Diffusion Transformer Attention (DDTA), a novel mechanism that disentangles text and image attention inside the multimodal diffusion transformers, enabling more precise semantic control. Extensive experiments on image reconstruction and text-guided editing tasks demonstrate that our method achieves state-of-the-art performance in terms of fidelity and editability.
Primary Area: generative models
Submission Number: 17346
Loading