ReFOCUS: Recurrent False Object Correction Using guidance Strategies in Object Detection

Quentin Bourbon; Timothée Blondiaux; Joël Tang; Laurent Lam; Seif Edinne LAATIRI

ReFOCUS: Recurrent False Object Correction Using guidance Strategies in Object Detection

Quentin Bourbon, Timothée Blondiaux, Joël Tang, Laurent Lam, Seif Edinne LAATIRI

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Object Detection, False Positive, Computer Vision, Recurrent Errors, Correction

Abstract: This work addresses the issue of recurrent false positive classification in object detection. We consider two experimental setups imitating real-world scenarios that lead to such errors: i) erroneous annotations, ii) non-objects that resemble actual objects. We show that resulting models can be corrected efficiently using a two-step protocol that leverages false positive annotations. For the first step, we present and compare two correction approaches that guide false positives toward true negatives, in either the latent or the logit space. The second step then consists in standard continuous fine-tuning on correct annotations. The latent guidance framework relies on a decoder that maps the bounding box of a given false positive to its target true negative embedding. The decoder is trained as part of an autoencoder, where appropriate true negative samples are generated by a learnable Gaussian mixture model in the latent space. By leveraging the properties of the Wasserstein distance, the mixture model is optimized through standard backpropagation. In both experimental setups, the two correction methods significantly outperform standard continuous fine-tuning on correct annotations and demonstrate competitive performance when compared to models retrained from scratch on correct annotations. In particular, in the second experimental setup, the latent guidance framework consistently outperforms these models, effectively enhancing detection performance at the cost of supplementary false positive annotations. Additionally, the proposed techniques prove effective in a few-shot learning context.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7231

Loading