Abstract: 3D object detection is a challenging task because point clouds are characterized by sparsity and irregularity. Most state-of-the-art detectors recognize objects individually without considering the rich context relationships of objects at different levels. In this paper, we propose an end-to-end vote-based multi-level context attention network. Specifically, a Patch-Context-Module is designed to extract multi-level context features among point patches. Meanwhile, because low-level features contain fine location description information, a Spatial-Context-Module is adopted to combine low-level spatial and semantic features. Furthermore, a Fusion Sampling and Aggregation module is proposed to consider additional semantic information of each vote point, thereby increasing the ratio of positive points and improving detection performance. Finally, the Class-IoU-Guide NMS with an adaptive threshold is implemented to suppress false detection at the inference time. Experiments on the ScanNetV2 and SUN RGB-D datasets demonstrated that our proposed method out-performs current state-of-the-art approaches.
0 Replies
Loading