Abstract: In 3D point cloud-based object detection, attention mechanism in Group-Free (Liu et al.,2021) learns direct relationships between proposals and all seed points, providing each proposal with a global context in the form of a cross-attention map. However, our analysis and experimental comparison show that the attention mechanism assigns inappropriately large attention weights to certain seed points far from a proposal, which is not conducive to detecting objects correctly. In this work, we alleviate the above problem by proposing a mask method. For an initial proposal, our method first calculates a spatial distance-based mask, which measures the spatial relationship between all seed points and the proposal. Then, we fuse the mask into cross-attention layers in stacked attention modules and get a refined cross-attention map. In essence, our mask gives each proposal a local context; after it is fused with the global context given by the attention mechanism, the refined cross-attention map could suppress the negative impact of some distant seed points on a proposal. We present two alternative strategies to compute the mask, a hard mask, and a soft mask. Experimental results demonstrate that the soft mask brings better performance. In the soft mask, for each initial proposal's 3D-box shape, we use a parametric approximate ellipsoid as the basis of the mask's calculation, which has only two learnable parameters. Experimental results show our work could outperform Group-Free 0.7 mAP@0.25 at the cost of increasing inference time by less than 1%. The performance of our algorithm on the public dataset SUN RGB-D is 63.7 mAP@0.25 and 45.5 mAP@0.5, which is the best performance among algorithms that preserve the irregular of seed points.
Loading