3DGTN: 3-D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation

Dening Lu, Kyle Gao, Qian Xie, Linlin Xu, Jonathan Li

Published: 01 Jan 2024, Last Modified: 04 Aug 2025IEEE Trans. Geosci. Remote. Sens. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Although the application of Transformers to 3-D point cloud processing has achieved significant progress and success, it is still challenging for existing 3-D Transformer methods to efficiently and accurately learn both valuable global and local features for improved applications. This article presents a novel point cloud representational learning network, called 3-D Dual Self-attention global local (GLocal) Transformer Network (3DGTN), for improved feature learning in both classification and segmentation tasks, with the following key contributions. First, a GLocal feature learning (GFL) block with the dual self-attention mechanism [i.e., a novel point-patch self-attention, called PPSA, and a channel-wise self-attention (CSA)] is designed to efficiently learn the global and local context information. Second, the GFL block is integrated with a multiscale Graph Convolution-based local feature aggregation (LFA) block, leading to a GLocal information extraction module that can efficiently capture critical information. Third, a series of GLocal modules are used to construct a new hierarchical encoder–decoder structure to enable the learning of information in different scales in a hierarchical manner. The proposed framework is evaluated on both classification and segmentation datasets, demonstrating that the proposed method is capable of outperforming many state-of-the-art methods on both synthetic and LiDAR data. Our code has been released at https://github.com/d62lu/3DGTN.