RoCo: Robust Cooperative Perception By Iterative Object Matching and Pose Adjustment

Zhe Huang; Shuo Wang; Yongcai Wang; Wanting Li; Deying Li; Lei Wang

RoCo: Robust Cooperative Perception By Iterative Object Matching and Pose Adjustment

Zhe Huang, Shuo Wang, Yongcai Wang, Wanting Li, Deying Li, Lei Wang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Collaborative autonomous driving with multiple vehicles usually requires the data fusion from multiple modalities. To ensure effective fusion, the data from each individual modality shall maintain a reasonably high quality. However, in collaborative perception, the quality of object detection based on a modality is highly sensitive to the relative pose errors among the agents. It leads to feature misalignment and significantly reduces collaborative performance. To address this issue, we propose RoCo, a novel unsupervised framework to conduct iterative object matching and agent pose adjustment. To the best of our knowledge, our work is the first to model the pose correction problem in collaborative perception as an object matching task, which reliably associates common objects detected by different agents; On top of this, we propose a graph optimization process to adjust the agent poses by minimizing the alignment errors of the associated objects, and the object matching is re-done based on the adjusted agent poses. This process is iteratively repeated until convergence. Experimental study on both simulated and real-world datasets demonstrates that the proposed framework RoCo consistently outperforms existing relevant methods in terms of the collaborative object detection performance, and exhibits highly desired robustness when the pose information of agents is with high-level noise. Ablation studies are also provide to show the impact of its key parameters and components. The code will be released.

Primary Subject Area: [Experience] Multimedia Applications

Secondary Subject Area: [Content] Multimodal Fusion, [Experience] Multimedia Applications

Relevance To Conference: We propose a method for matching and correcting the poses of agents and objects in a multi-vehicle collaborative system. It enhances the precision and reliability of modality and better prepare it for the potential multimodal fusion step in the sequel. We believe this work will advance the development of multimodal fusion and promote the practical application of multi-vehicle collaboration.

Supplementary Material: zip

Submission Number: 1009

Loading