Enhanced Object Detection in Bird's Eye View Using 3D Global Context Inferred From Lidar Point Data

Published: 01 Jan 2019, Last Modified: 13 Nov 2024IV 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we present a new deep neural network architecture, which detects objects in bird's eye view (BEV) using Lidar sensor data in autonomous driving scenarios. The key idea of the proposed method is to improve the accuracy of the object detection by exploiting the 3D global context provided by the whole set of Lidar points. The overall structure of the proposed method consists of two parts: 1) the detection core network (DetNet) and 2) the context extraction network (ConNet). First, the DetNet generates the BEV representation by projecting the Lidar points into the BEV plane and applies the CNN to extract the feature maps locally activated on the objects. The ConNet directly processes the whole set of the Lidar points to produce the 1 × 1 × k feature vector capturing the 3D geometrical structure of the surrounding in the global scale. The context vector produced by the ConNet is concatenated to each pixel of the feature maps obtained by the DetNet. The combined feature maps are used to regress the oriented bounding box and identify the category of the object. The experiments evaluated on the public KITTI dataset show that the use of the context feature offers the significant performance gain over the baseline and the proposed object detector achieves the competitive performance as compared to the state of the art 3D object detectors.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview