Perception Through Sparsity: Fusing and Enhancing Multi-Agent Sparse Representation with Deformable Cross-Attention
Keywords: Autonomous Driving; Perception; Multi-Agent Perceptioin;
Abstract: Multi‑agent perception has gained significant attention for its ability to share information among connected automated vehicles (CAVs) and smart infrastructure, thus mitigating occlusions and extending effective sensing range.
Despite this progress, research on radar-based cooperative perception has been
constrained by limited datasets, where existing benchmarks either provide only
partial radar views or a small number of frames, making it difficult to fully
study radar’s potential in V2X perception. To address this gap, we introduce V2XSet-R, the first large-scale dataset
that provides complete 360 degree radar coverage from both vehicles
and infrastructure, with 150k radar frames and 170k annotated 3D bounding
boxes. This dataset significantly expands the scale and diversity of radar data,
enabling systematic study of radar-based cooperative perception and fusion. Building on this resource, we propose SparseFusion, a dual-stage fusion
framework tailored to sparse multi-agent perception. Unlike prior position-wise
self-attention designs that compute affinity scores only among voxels at the
same BEV location, SparseFusion aggregates cross-voxel context via a
query-based deformable attention module that adaptively samples informative
regions across space and agents. This design overcomes sparsity-induced
degeneration and enhances feature interaction across agents
, and effectively generalizes to camera BEV features. These results demonstrate that SparseFusion is a precise, efficient, and modality‑agnostic fusion method for cooperative perception.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6367
Loading