DQFormer: Toward Unified LiDAR Panoptic Segmentation With Decoupled Queries for Large-Scale Outdoor Scenes
Abstract: LiDAR panoptic segmentation (LPS) performs semantic and instance segmentation for things (foreground objects) and stuff (background elements), essential for scene perception and remote sensing. While most existing methods separate these tasks using distinct branches (i.e., semantic and instance), recent approaches have unified LPS through a query-based paradigm. However, the distinct spatial distributions of foreground objects and background elements in large-scale outdoor scenes pose challenges. This article presents DQFormer, a novel framework for unified LPS that employs a decoupled query workflow to adapt to the characteristics of things and stuff in outdoor scenes. It first utilizes a feature encoder to extract multiscale voxel-wise, point-wise, and bird’s eye view (BEV) features. Then, a decoupled query generator proposes informative queries by localizing things/stuff positions and fusing multilevel BEV embeddings. A query-oriented mask decoder uses masked cross-attention to decode segmentation masks, which are combined with query semantics to produce panoptic results. Extensive experiments on large-scale outdoor scenes, including the vehicular datasets nuScenes and SemanticKITTI, as well as the aerial point cloud dataset DALES, show that DQFormer outperforms superior methods by +1.8%, +0.9%, and +3.5% in panoptic quality (PQ), respectively. Code is available at https://github.com/yuyang-cloud/DQFormer
Loading