Instance-Guided Point Cloud Single Object Tracking With Inception Transformer

Published: 01 Jan 2023, Last Modified: 13 Nov 2024IEEE Trans. Instrum. Meas. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Single object tracking (SOT) in light detection and ranging (LiDAR) point clouds is a challenging problem in computer vision. Compared to object-level point clouds, scene-level point clouds for tracking are more complex, requiring long-range semantic awareness and local shape context. However, previous methods directly filter candidates under limited matched features without systematically considering these two factors. Inspired by transformer to establish long-distance dependence and convolution to capture local high-frequency information, we propose a point-tracking inception transformer (PTIT), which efficiently predicts high-quality 3-D tracking results in a coarse-to-fine manner with the support of spatio-temporal point clouds. PTIT consists of three novel designs as follows. 1) We design instance-guided sampling (IGS) to help identify and preserve the relevant points of the given template and the foreground points of the search area. 2) We propose a point inception transformer (PIT), which consists of a multifrequency attention and cross-attention module, where the former captures both remote dependency and local detail and the latter matches template and search area features. 3) After generating coarse tracking results from cross-attention, we locate the target by motion transformation in the spatio-temporal point cloud to generate a fine-grained 3-D bounding box (BBox). In addition, we perform feature augmentation on the points and boxes to mitigate the negative effects of LiDAR point clouds without texture and incompleteness. PTIT performs significantly better than previous state-of-the-art methods on KITTI and nuScenes datasets. Our further analysis confirms the effectiveness of each component and shows the great potential of the inception transformer-centric paradigm when combined with spatio-temporal point clouds. Our code is available at https://github.com/ywu0912/TeamCode.git .
Loading