Keywords: scene understanding, radar, multi-view radar, radar perception, object detection, 3D bounding box, diffusion model
TL;DR: To achieve the indoor radar perception for scene understanding, REXO simplifies multi-view indoor radar object detection by using 3D box diffusion for explicit feature association, achieving state-of-the-art performance.
Abstract: Privacy-preserving and cost-effective indoor sensing is vital for embodied agents to collaborate safely with people in dynamic scenes.
Multi-view millimeter-wave radar shows great potential for this purpose. However, prevailing methods rely on implicit cross-view association, which this reliance often results in ambiguous feature matches and degraded performance in cluttered environments.
To address these limitations, we propose REXO (multi-view Radar object dEtection with 3D bounding boX diffusiOn), which lifts DiffusionDet's 2D box denoising to the full 3D radar space. Noisy 3D boxes are projected onto all radar views to enable explicit association and radar-conditioned denoising. Evaluated on two open indoor radar datasets, our approach outperforms state-of-the-art methods by +11.02 AP on MMVR and +4.22 AP on HIBER.
Supplementary Material: pdf
Submission Number: 8
Loading