Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining

Published: 05 Sept 2024, Last Modified: 16 Oct 2024ACML 2024 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: small object detection, Transformer-based models, multi-scale information, auxiliary positive queries
Verify Author List: I have double-checked the author list and understand that additions and removals will not be allowed after the submission deadline.
TL;DR: A novel small object detector, MRQM, enhances detection capabilities through Multi-scale Refinement and Query-aided Mining, demonstrating superior performance on SODA-D and VisDrone datasets.
Abstract: Small object detection (SOD) aims to precisely localize and accurately classify objects from limited spatial extent and discernible features. Despite significant advancements in object detection driven by CNN-based and Transformer-based methods, SOD remains a significant challenge. This is primarily due to their minimal spatial dimensions and distinct features which pose difficulties in both computational efficiency and effective supervision. Particularly, Transformer-based detectors suffer from the high computational cost caused by the introduction of a feature pyramid network (FPN) and the sparse supervision for the encoder output due to insufficient positive queries. Current approaches attempt to mitigate these issues through sparse attention mechanisms and auxiliary one-to-many label assignment strategies. However, these approaches often still suffer from inefficiencies in processing multi-scale information and a deficiency in generating adequate positive queries for small objects. To address this issue, we propose a novel small object detector MRQM, which integrates Multi-scale Refinement and Query-aided Mining. The scale-aware encoder strategically refines features across multiple scales from a bi-directional feature pyramid network (BiFPN) through iterative updates. This process not only reduces redundant computations but also significantly enhances the representation of features at various scales. Furthermore, the IoU-aware head integrates the dynamic anchors mining strategy and one-to-many label assignments to fully mine potential high-quality auxiliary positive queries for small instances, and mitigate issues related to sparse supervision for the encoder. Extensive experiments on the SODA-D and VisDrone datasets consistently demonstrate the superiority and effectiveness of our MRQM method.
A Signed Permission To Publish Form In Pdf: pdf
Primary Area: Applications (bioinformatics, biomedical informatics, climate science, collaborative filtering, computer vision, healthcare, human activity recognition, information retrieval, natural language processing, social networks, etc.)
Paper Checklist Guidelines: I certify that all co-authors of this work have read and commit to adhering to the guidelines in Call for Papers.
Student Author: Yes
Submission Number: 281
Loading