Spatial-Temporal Transformer for 3D Point Cloud Sequences

Yimin Wei, Hao Liu, Tingting Xie, Qiuhong Ke, Yulan Guo

2022 (modified: 28 Oct 2022)WACV 2022Readers: Everyone

Abstract: Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel frame-work named Point Spatial-Temporal Transformer (PST 2 ) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST 2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA) module and a Resolution Embedding (RE) module. Our STSA module is introduced to capture the spatial-temporal context in-formation across adjacent frames, while the RE module is proposed to aggregate features across neighbors to enhance the resolution of feature maps. We test the effectiveness our PST 2 with two different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D action recognition. Extensive experiments on three benchmarks show that our PST 2 outperforms existing methods on all datasets. The effectiveness of our STSA and RE modules have also been justified with ablation experiments.

0 Replies