RagNet3D: Learning distinguishable representation for pooled grids in 3D object detection

Published: 01 Jan 2025, Last Modified: 16 May 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: 3D object detection plays a crucial role in autonomous driving. Previous two-stage 3D detectors have developed Grid-based Region-of-Interest (RoI) Pooling techniques, such as RoI-Grid Pooling and Voxel Pooling, both of which quantify RoI into grids. However, the grids are usually ambiguous since parts of them pertain to multiple RoIs that identify a single object. To address this issue, we propose a RoI-Aware Grids Pooling Network (RagNet3D), which introduces RoI-View context to create distinguishable grid representations. Specifically, we present a RoI-View Prediction module that predicts RoI-View context via the guidance of the distance between RoIs and objects. Meanwhile, we propose a Couple-View Fusion module that propagates the probabilistic distribution, calculated from the RoI-View context, into the RoI-irrelevant grid features for further box refinement. Extensive experiments on KITTI and Waymo Open Dataset show that our method achieves remarkable improvements against the baselines.
Loading