Keywords: Cooperative Perception, Auto 3D Annotation, Autonomous Driving
Abstract: The creation of large-scale 3D datasets for cooperative perception is hindered by the high cost of LiDAR sensors and manual annotation. While vision-only methods offer a low-cost alternative, they suffer from inherent scale ambiguity. This paper introduces CoSA-3D, a novel vision-only,training-free offline framework for automatic 3D annotation in cooperative scenarios. Our method utilizes easily obtained multi-agent images and builds upon the latest zero-shot Structure-from-Motion (SfM) foundation models. A core contribution is our geometric fiducial alignment module, which leverages inter-agent relative poses to rectify SfM-generated pose inaccuracies and recover the true metric scale. This approach, combined with robust multi-agent fusion, effectively handles asynchronous data and overcomes occlusion. We evaluated CoSA-3D on the Griffin dataset, demonstrating the best label quality that significantly surpasses existing methods, particularly in challenging long-range (50-100m) scenes. The framework's generalization is further validated on a custom-collected real-world cooperative dataset. Ablation studies validate that our geometric alignment and data fusion mechanisms are fundamental to the framework's high accuracy. CoSA-3D provides a scalable, accurate, and LiDAR-free solution for 3D cooperative annotation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 4
Loading