DEIM: DETR with Improved Matching for Fast Convergence
Abstract: We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in realtime object detection with Transformer-based architectures
(DETR). To mitigate the sparse supervision inherent in oneto-one (O2O) matching in DETR models, DEIM employs
a Dense O2O matching strategy. This approach increases
the number of positive samples per image by incorporating additional targets, using standard data augmentation
techniques. While Dense O2O matching speeds up convergence, it also introduces numerous low-quality matches that
could affect performance. To address this, we propose the
Matchability-Aware Loss (MAL), a novel loss function that
optimizes matches across various quality levels, enhancing
the effectiveness of Dense O2O. Extensive experiments on
the COCO dataset validate the efficacy of DEIM. When integrated with RT-DETR and D-FINE, it consistently boosts
performance while reducing training time by 50%. Notably, paired with RT-DETRv2, DEIM achieves 53.2% AP
in a single day of training on an NVIDIA 4090 GPU. Additionally, DEIM-trained real-time models outperform leading real-time object detectors, with DEIM-D-FINE-L and DEIM-D-FINE-X achieving 54.7% and 56.5% AP at 124
and 78 FPS on an NVIDIA T4 GPU, respectively, without the need for additional data. We believe DEIM sets a
new baseline for advancements in real-time object detection.
Loading