Abstract: 3D instance segmentation plays a crucial role in understanding complex 3D scenes. However, existing methods often depend on fixed query positions-typically selected from large point clouds via center prediction or farthest point sampling-which may deviate from actual instance centers and thus compromise the instance aggregation quality. To address this limitation, we propose EipFormer, a Transformer-based framework that involves the progressive aggregation and dual position embedding strategies. Specifically, leveraging the offset prediction branch, EipFormer combines the original and central shifted coordinate systems to aggregate instances. The progressive aggregation process includes coarse, fine, and merge stages. Based on the designed weighted farthest point sampling, the coarse stage updates instance queries to capture global context. These queries are then refined to aggregate local context using aggregation averaging and center matching techniques. Finally, we introduce a merge stage that merges fragmented instances. Extensive experiments on STPLS3D, S3DIS, and ScanNet benchmarks validate the effectiveness of EipFormer, showing consistent improvements of 1.5-3.3 in AP and 2.1-3.7 in AP50 over state-of-the-art approaches.
External IDs:dblp:journals/eswa/ZhaoZKY26
Loading