Keywords: object detection, noisy label, early learning
TL;DR: We propose a distillation-based training method for object detectors that mitigates both categorization and localization noise by leveraging the early-learning phase to guide robust detector learning across diverse datasets.
Abstract: The performance of learning-based object detection algorithms, which attempt to both classify and locate objects within images, is determined largely by the quality of the annotated dataset used for training. Two types of labelling noises are prevalent: objects that are incorrectly classified (categorization noise) and inaccurate bounding boxes (localization noise); both noises typically occur together in large-scale datasets. In this paper we propose a distillation-based method to train object detectors that takes into account both categorization and localization noise. The key insight underpinning our method is that the early-learning phenomenon - in which models trained on noisy data with mixed clean and false labels tend to first fit to the clean data, and memorize the false labels later -- manifests earlier for localization noise than for categorization noise. We propose a method that uses models from the early-learning phase (before overfitting to noisy data occurs) as a teacher network. A plug-in module implementation compatible with general object detection architectures is developed, and its performance is validated against the state-of-the-art using PASCAL VOC, MS COCO and VinDr-CXR medical detection datasets.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 1703
Loading