ProEqBEV: Product Group Equivariant BEV Network for 3D Object Detection in Road Scenes of Autonomous Driving
Abstract: With the rapid development of autonomous driving systems, 3D object detection based on Bird’s Eye View (BEV) in road scenes has witnessed great progress over the past few years. As a road scene exhibits a part-whole hierarchy between the within objects and the scene itself, simple parts (e.g., roads, lane lines, vehicles and pedestrians) can be assembled into progressively more complex shapes to form a BEV representation of the whole road scene. Therefore, a BEV often has multiple levels of freedom on motion, i.e., the rotation and the moving shift of the whole BEV, and the random movements of objects (e.g., pedestrians and vehicles) inside the BEV. However, most of the current single-sensor or multi-sensor fusion-based BEV object detection methods have not yet taken into account capturing such multi-level motion in a BEV. To address this problem, we propose a product group equivariant object detection network framework that is equivariant with respect to multiple levels of symmetry groups based on multi-sensor fusion. The proposed framework extracts local equivariant features of objects in point clouds, while global equivariant features are extracted in both point clouds and images. Furthermore, the network learns diverse rotation-equivariant features and mitigates a significant amount of detection errors caused by rotations of BEV and objects inside a BEV, thereby further enhancing the performance of object detection. The experiment results show that the network architecture significantly improves object detection on mAP and NDS, respectively. In addition, in order to demonstrate the effectiveness of the proposed local-multi-global equivariant components, we conduct sufficient ablation experiments. The results show that the individual components are indispensable for the object detection performance improvement of the overall network architecture.
Loading