Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization
Abstract: Recent voxel-based 3D object detectors for autonomous vehicles learn point cloud
representations either from bird eye view (BEV) or range view (RV, a.k.a. perspective view). However, each view has its own strengths and weaknesses. In this paper,
we present a novel framework to unify and leverage the benefits from both BEV
and RV. The widely-used cuboid-shaped voxels in Cartesian coordinate system only
benefit BEV feature map. Therefore, to enable learning both BEV and RV feature
maps, we introduce Hybrid-Cylindrical-Spherical voxelization. Our findings show
that simply adding detection on another view as auxiliary supervision will lead to
poor performance. We proposed a pair of cross-view transformers to transform
the feature maps into the other view and introduce cross-view consistency loss on
them. Comprehensive experiments on the challenging NuScenes Dataset validate
the effectiveness of our proposed method which leverages joint optimization and
complementary information on both views. Remarkably, our approach achieved
mAP of 55.8%, outperforming all published approaches by at least 3% in overall
performance and up to 16.5% in safety-crucial categories like cyclist.
0 Replies
Loading