CroMA: Cross-Modality Adaptation for Monocular BEV Perception

Yunze Man; Liangyan Gui; Yu-Xiong Wang

CroMA: Cross-Modality Adaptation for Monocular BEV Perception

Yunze Man, Liangyan Gui, Yu-Xiong Wang

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Multi-Modality, Cross-Modality, Domain Adaptation, Autonomous Driving

Abstract: Incorporating multiple sensor modalities, and closing the domain gaps between training and deployment are two challenging yet critical topics for self-driving. Existing adaption works only focus on visual-level domain gaps, overlooking the sensor-type gaps which exist in reality. A model trained with a collection of sensor modalities may need to run on another setting with less types of sensors available. In this work, we propose a Cross-Modality Adaptation (CroMA) framework to facilitate the learning of a more robust monocular BEV perception model, which transfer the point clouds knowledge from LiDAR sensor during training phase to the camera-only testing scenario. The absence of LiDAR during testing negates the usage of it as model input. Hence, our key idea lies in the design of a LiDAR-teacher and Camera-student knowledge distillation model, as well as a multi-level adversarial learning mechanism, which adapt and align the features learned from different sensors and domains. This work results in the first open analysis of cross-domain perception and cross-sensor adaptation model for monocular 3D tasks in the wild. We benchmark our approach on large-scale datasets under various domain shifts and show state-of-the-art results against various baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

TL;DR: We propose a Cross-Modality Adaptation (CroMA) framework to learn a robust monocular BEV perception model under sensor shift and domain gaps.

Supplementary Material: zip

18 Replies

Loading