FINEdits : Precise Image Editing with Inferred Masks and Light Fine-tuning

ICLR 2026 Conference Submission24865 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: computer vision, generative modeling, diffusion models, image editing
TL;DR: we propose a new image editing method that better preserves elements from the original image
Abstract: Image editing with diffusion models faces a fundamental trade-off between edit fidelity and preservation of unedited regions. Training-free methods often suffer from imperfect inversion that degrades reconstruction quality, while training-based approaches require substantial computational resources and carefully curated datasets. We present FINEdits, a method that addresses these limitations through two key innovations: (1) automatic mask inference using cross-attention maps to explicitly preserve non-edited regions, and (2) lightweight fine-tuning to improve inversion quality without semantic drift. Our masking approach leverages transformer attention mechanisms to automatically identify editing regions using a parameter-free K-means clustering method, eliminating the need for manual hyperparameter tuning. To handle the inversion quality degradation at early timesteps required for large edits, we introduce a light fine-tuning strategy that balances reconstruction fidelity with semantic preservation. We introduce EditFFHQ, a new benchmark dataset of 2000 face images with sequential editing instructions, enabling quantitative evaluation of identity preservation and edit quality. Extensive experiments demonstrate that FINEdits achieves superior identity preservation while maintaining competitive edit fidelity and image quality. Our method provides an effective solution for precise image editing that preserves visual consistency without requiring extensive retraining or manual parameter adjustment.
Primary Area: generative models
Submission Number: 24865
Loading