Keywords: 3D object detection ; Point clouds
Abstract: Currently, LiDAR-based 3D detectors are broadly categorized into two groups, namely, BEV-based detectors and cluster-based detectors.
BEV-based detectors capture the contextual information from the Bird's Eye View (BEV) and fill their center voxels via feature diffusion with a stack of convolution layers, which, however, weakens the capability of presenting an object with the center point.
On the other hand, cluster-based detectors exploit the voting mechanism and aggregate the foreground points into object-centric clusters for further prediction.
In this paper, we explore how to effectively combine these two complementary representations into a unified framework.
Specifically, we propose a new 3D object detection framework, referred to as CluB, which incorporates an auxiliary cluster-based branch into the BEV-based detector by enriching the object representation at both feature and query levels.
Technically, CluB is comprised of two steps.
First, we construct a cluster feature diffusion module to establish the association between cluster features and BEV features in a subtle and adaptive fashion.
Based on that, an imitation loss is introduced to distill object-centric knowledge from the cluster features to the BEV features.
Second, we design a cluster query generation module to leverage the voting centers directly from the cluster branch, thus enriching the diversity of object queries.
Meanwhile, a direction loss is employed to encourage a more accurate voting center for each cluster.
Extensive experiments are conducted on Waymo and nuScenes datasets, and our CluB achieves state-of-the-art performance on both benchmarks.
Supplementary Material: pdf
Submission Number: 3132
Loading