Abstract: Benefiting from attention mechanisms, query-based detectors have a strong model capacity. They predict classification and regression by utilizing their shared queries and features in the decoder. Inter-task biases cause multi-directional gradients that disturb each other to limit model optimization. In this work, we introduce an attention decoupling (AD) for query-based detectors to explicitly align multi-task features. Specifically, AD consists of a Dense-to-Sparse Query Generator (DSQG) and a Split Cross-Attention (SCA), enabling query and feature decoupling respectively in decoding phase. Then, we propose a task consistency loss (TCL) which integrates a novel task alignment metric to classification loss to further improve task consistency across multiple decoding stages. Thus, AD effectively mitigates query-based detectors’ task misalignment problem and inspires subsequent multi-task paradigms. Moreover, extensive experiments on COCO dataset demonstrate that the proposed AD can enhance a variety of representative detectors. Remarkably, AD-DINO achieves the state-of-the-art performance.
External IDs:dblp:conf/icassp/MaLMTQY24
Loading