MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Kang Zeng; Hao Shi; Jiacheng Lin; Siyu Li; Jintao Cheng; Kaiwei Wang; Zhiyong Li; Kailun Yang

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code of this work will be made publicly available.

Primary Subject Area: [Content] Media Interpretation

Relevance To Conference: This work proposes a novel 3D point cloud Moving Object Segmentation (MOS) framework based on the state space model. It aims to address the weak coupling of spatial and temporal information in previous methods for the MOS task and advance the research on perception of dynamic scene in the 3D vision field. Furthermore, this work innovatively establishes a connection between the point cloud segmentation task in 3D vision and the selective copying task in the field of natural language processing, introducing the concept of treating point cloud data as natural language sequences, providing guidance for subsequent multimodal work.

Supplementary Material: zip

Submission Number: 1097

Loading