Dragging with Geometry: From Pixels to Geometry-Guided Image Editing

ICLR 2026 Conference Submission12321 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Model; Drag-based Image Editing
Abstract: Interactive point-based image editing serves as a controllable editor, enabling precise and flexible manipulation of image content. However, previous methods predominantly center on 2D pixel plane, neglecting the underlying 3D geometric structure. As a result, they often produce imprecise and inconsistent edits, particularly in geometry-intensive scenarios such as rotations and perspective transformations. To address these limitations, we propose a novel geometry-guided drag-based image editing method—GeoDrag, which addresses three key challenges: 1) incorporating 3D geometric cues into pixel-level editing, 2) mitigating discontinuities caused by geometry-only guidance, and 3) resolving conflicts arising from multi-point dragging. Built upon a unified displacement field that jointly encodes 3D geometry and 2D spatial priors, GeoDrag enables coherent, high-fidelity, and structure-consistent editing in a single forward pass. In addition, a conflict-free partitioning strategy is introduced to isolate editing regions, effectively preventing interference and ensuring consistency. Extensive experiments across various editing scenarios validate the effectiveness of our method, showing superior precision, structural consistency, and reliable multi-point editability. Our code and models will be released publicly. Project page: https://geodrag-site.github.io.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 12321
Loading