Multi-scale Twin-attention for 3D Instance Segmentation

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recently, transformer-based techniques incorporating superpoints have become prevalent in 3D instance segmentation. However, they often encounter an over-segmentation problem, especially noticeable with large objects. Additionally, unreliable mask predictions stemming from superpoint mask prediction further compound this issue. To address these challenges, we propose a novel framework called MSTA3D. It leverages multi-scale feature representation and introduces twin-attention mechanisms to effectively capture them. Furthermore, MSTA3D integrates a box query with a box regularizer, offering a complementary spatial constraint alongside semantic queries. Experimental evaluations on ScanNetV2, ScanNet200, and S3DIS datasets demonstrate that our approach surpasses state-of-the-art 3D instance segmentation methods.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: 3D instance segmentation refers to a task that involves identifying and separating individual objects within an image, including detecting object boundaries and assigning a unique label to each identified object. Its significant role in computer vision has surged corresponding to the demand for 3D perception in various applications, such as augmented/virtual reality, autonomous driving, robotics, and indoor scanning.
Supplementary Material: zip
Submission Number: 1440
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview