Abstract: Drag-based editing within pretrained diffusion model provides a precise and flexible way to manipulate foreground objects. Traditional methods optimize the input feature obtained from DDIM inversion directly, adjusting them iteratively to guide handle points towards target locations. However, these approaches often suffer from limited accuracy due to the low representation ability of the feature in motion supervision, as well as inefficiencies caused by the large search space required for point tracking. To address these limitations, we present DragLoRA, a novel framework that integrates LoRA (Low-Rank Adaptation) adapters into the drag-based editing pipeline. To enhance the training of LoRA adapters, we introduce an additional denoising score distillation loss which regularizes the online model by aligning its output with that of the original model. Additionally, we improve the consistency of motion supervision by adapting the input features using the updated LoRA, giving a more stable and accurate input feature for subsequent operations. Building on this, we design an adaptive optimization scheme that dynamically toggles between two modes, prioritizing efficiency without compromising precision. Extensive experiments demonstrate that DragLoRA significantly enhances the control precision and computational efficiency for drag-based image editing. The Codes of DragLoRA are available at: https://github.com/Sylvie-X/DragLoRA.
Lay Summary: We often want to tweak the shape or position of an object in an image by simply dragging it, but current methods struggle to move those “handles” exactly where you want and can be slow to adjust.
To fix this, we introduce a new method called DragLoRA, which adds a lightweight module into an existing image-generation model so it can learn on the fly how to better follow your drag moves. Besides, we incorporate a strategy that dynamically accelerates the entire process so you get quick feedback without losing precision.
As a result, DragLoRA lets users drag parts of an image more accurately and much faster than before, making it easier for anyone to reshape or reposition objects simply by pulling on intuitive handle points.
Link To Code: https://github.com/Sylvie-X/DragLoRA
Primary Area: Applications->Computer Vision
Keywords: Computer Vision, Generative Model, Diffusion Model, Image Editing
Submission Number: 6748
Loading