CoCMT: Towards Communication-Efficient Corss-Modal Transformer For Collaborative Perception

Rujia Wang; Hao Xiang; JINLONG LI; Runsheng Xu; Zhengzhong Tu

CoCMT: Towards Communication-Efficient Corss-Modal Transformer For Collaborative Perception

Rujia Wang, Hao Xiang, JINLONG LI, Runsheng Xu, Zhengzhong Tu

27 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep learning, vehicle-to-vehicle cooperative perception, 3D object detection

TL;DR: We present an object-query-based collaboration framework that enables efficient communication while unifying homogeneous and heterogeneous cooperative perception tasks.

Abstract: Cooperative perception systems in autonomous driving enhance each agent’s perceptual capabilities by sharing visual information with others and demonstrated effectiveness in handling prominent challenges like occlusions and long-range detection. However, most existing cooperative systems transmit feature maps, such as bird's-eye view (BEV) representations, which include substantial background data and are costly to process due to their high dimensionality. This paradigm introduces a trade-off between improved perception and increased communication overhead. To address this challenge, we present CoCMT, an object-query-based collaboration framework that enables efficient communication while unifying homogeneous and heterogeneous cooperative perception tasks. Within CoCMT, we introduce the Efficient Query Transformer (EQFormer) to effectively fuse multi-agent object queries and implement a synergistic deep supervision approach to accelerate convergence during training. Extensive experiments on the OPV2V and V2V4Real datasets demonstrate that CoCMT surpasses current state-of-the-art methods in performance while offering significant communication efficiency. Notably, on the real-world V2V4Real dataset, our proposed CoCMT model (Top-50 object queries) requires merely 0.416 Mb bandwidth during inference. This reduces bandwidth consumption by 323 times compared to SOTA methods while improving AP@70 by 1.1. The code and models will be open-sourced.

Supplementary Material: pdf

Primary Area: applications to robotics, autonomy, planning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8797

Loading