Abstract: Object detection in 3D point clouds typically follows a two-stage pipeline of extracting object proposals followed by classification and regression. Existing models such as PointNet, PointNet++, StarNet use a series of point-wise linear transformations to learn features from pointclouds. However, for smaller objects with fewer points such as pedestrians, capturing larger context around them could provide more information for accurate detection, which point-wise features fail to achieve. Self-attention has been widely used as a computational primitive across several modalities and has been shown to capture both local and global dependencies. In this work, we use a self-attention based featurizer to model local interactions within proposal neighborhoods and global interactions between proposals. This featurizer outperforms the previous point-based featurizers on the large scale Waymo 3D object detection on vehicles and pedestrians, achieving significant gains on pedestrian detecttion (1.8% mAP). Our ablations show that modeling both global and local interactions are important, and provide complementary gains.
Loading