Multi-scale Twin-attention for 3D Instance Segmentation

Tran Dang Trung Duc; Byeongkeun Kang; Yeejin Lee

Multi-scale Twin-attention for 3D Instance Segmentation

Tran Dang Trung Duc, Byeongkeun Kang, Yeejin Lee

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, transformer-based techniques incorporating superpoints have become prevalent in 3D instance segmentation. However, they often encounter an over-segmentation problem, especially noticeable with large objects. Additionally, unreliable mask predictions stemming from superpoint mask prediction further compound this issue. To address these challenges, we propose a novel framework called MSTA3D. It leverages multi-scale feature representation and introduces twin-attention mechanisms to effectively capture them. Furthermore, MSTA3D integrates a box query with a box regularizer, offering a complementary spatial constraint alongside semantic queries. Experimental evaluations on ScanNetV2, ScanNet200, and S3DIS datasets demonstrate that our approach surpasses state-of-the-art 3D instance segmentation methods.

Primary Subject Area: [Content] Vision and Language

Secondary Subject Area: [Content] Media Interpretation

Relevance To Conference: 3D instance segmentation refers to a task that involves identifying and separating individual objects within an image, including detecting object boundaries and assigning a unique label to each identified object. Its significant role in computer vision has surged corresponding to the demand for 3D perception in various applications, such as augmented/virtual reality, autonomous driving, robotics, and indoor scanning.

Supplementary Material: zip

Submission Number: 1440

Loading