Keywords: 3D point cloud, multimodal architecture, automatic annotation, LiDar
TL;DR: multimodal model to process 3D point cloud with visual guidance for 3D automatic annotation
Abstract: The laborious nature of manual point cloud labeling drives the growing interest in 3D auto-annotation. The challenge is amplified by the sparse and irregular distribution of point clouds. This leads to the under-performance of current autolabelers, particularly with hard-to-detect samples characterized by truncation, occlusion, or distance.
In response, we propose a multimodal context-aware transformer (MMCAT) that integrates 3D point cloud geometry with image-based semantic insights to improve 3D bounding box annotations through 2D visual guidance. Our approach utilizes visual hints from three perspectives to integrate the 2D and 3D dimensions.
Initially, we develop point and image encoders to align LiDAR and image data, establishing a unified semantic bridge between image visuals and point cloud geometry. Subsequently, our box encoder processes 2D box coordinates to improve accuracy in determining object positions and dimensions within 3D space. Finally, our multimodal encoders enhance feature interactions, improving point cloud interpretation and annotation accuracy, especially for challenging samples.
MMCAT lies in its strategic use of 2D visual prompts to bolster 3D representation and annotation processes. We validate MMCAT's efficacy through extensive experiments on the widely recognized KITTI and Waymo Open datasets, particularly highlighting its superior performance with hard samples.
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6807
Loading