High-Fidelity Object Removal through Boosting Diffusion Processes

Published: 2025, Last Modified: 08 Jan 2026ISCAS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The remarkable image understanding and generation capabilities of diffusion models have made image editing a highly promising area of research. As a significant subtask within the field, object removal aims to remove objects from a specified region and fill the missing pixels with visually coherent and semantically sound content. Despite the great progress made in deep generative models, research in this area still faces several challenges: i. high expense of model training induced by the data scarcity and the difficulity in large-scale real data collection ii. current training-free methods are unable to drastically change the behavior of the attention layer that has been set during the pre-training phase for text-guided object inpainting. In this paper, we introduce HybridRemover, a two-stage diffusion scheme. We decouple the task into two subtasks: one to remove the specified objects from the target region and one to perform image restoration of the target region. By fine-tuning the SD-Inpainting model with a very small amount of data, we transform it into a model that focuses only on the complete removal of the object without considering the surrounding effects, and the overall repair of the image is taken care of by the SD-Inpainting model cascaded behind it. As a result of our efforts, our method achieved state-of-the-art performance in object removal tasks. Even when a strong perspective distortion gets involved, our method delivers exceptional results.
Loading