DQFormer: Toward Unified LiDAR Panoptic Segmentation With Decoupled Queries for Large-Scale Outdoor Scenes

Yu Yang, Jianbiao Mei, Siliang Du, Yilin Xiao, Huifeng Wu, Xiao Xu, Yong Liu

Published: 01 Jan 2025, Last Modified: 07 May 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: LiDAR panoptic segmentation (LPS) performs semantic and instance segmentation for things (foreground objects) and stuff (background elements), essential for scene perception and remote sensing. While most existing methods separate these tasks using distinct branches (i.e., semantic and instance), recent approaches have unified LPS through a query-based paradigm. However, the distinct spatial distributions of foreground objects and background elements in large-scale outdoor scenes pose challenges. This article presents DQFormer, a novel framework for unified LPS that employs a decoupled query workflow to adapt to the characteristics of things and stuff in outdoor scenes. It first utilizes a feature encoder to extract multiscale voxel-wise, point-wise, and bird’s eye view (BEV) features. Then, a decoupled query generator proposes informative queries by localizing things/stuff positions and fusing multilevel BEV embeddings. A query-oriented mask decoder uses masked cross-attention to decode segmentation masks, which are combined with query semantics to produce panoptic results. Extensive experiments on large-scale outdoor scenes, including the vehicular datasets nuScenes and SemanticKITTI, as well as the aerial point cloud dataset DALES, show that DQFormer outperforms superior methods by +1.8%, +0.9%, and +3.5% in panoptic quality (PQ), respectively. Code is available at https://github.com/yuyang-cloud/DQFormer