Abstract: Roadside Collaborative Perception (RCooper) enables the sharing of a wide field of view information, enabling area coverage sensing of traffic-constrained ranges. However, existing work has focused on homogeneous traffic, ignoring the problem of feature alignment, and similar sensors carried by roadside infrastructure may lead to the failure of existing collaborative sensing approaches. In this paper, we study the problem of collaborative perception in the typical scenarios of RCooper, where agents may have different sensor types. We propose the Deformable Implicit Feature Alignment module (DIFA), which utilizes deformable convolution to learn pixel offsets for feature alignment of heterogeneous agents that are sensor-type agnostic. In addition, we propose the Attention Feature Fusion module (AFF), which utilizes an attention mechanism to capture the spatial relationships of multiple agents at the pixel level. To validate the effectiveness of DIFA, we conducted extensive experiments on DAIR-RCooper, a large-scale real-world RCooper dataset. Notably, DIFA outperforms existing methods for IOU thresholds of 0.3, 0.5, and 0.7. In particular, under the strict evaluation of IOU of 0.7, DIFA surpassed the state-of-the-art (SOTA) method by 2.5%.
External IDs:dblp:conf/iconip/GuZYX24
Loading