Keywords: Multi-Modality, Cross-Modality, Domain Adaptation, Autonomous Driving
TL;DR: We propose a Cross-Modality Adaptation (CroMA) framework to learn a robust monocular BEV perception model under sensor shift and domain gaps.
Abstract: Incorporating multiple sensor modalities, and closing the domain gaps between training and deployment are two challenging yet critical topics for self-driving. Existing adaption works only focus on visual-level domain gaps, overlooking the sensor-type gaps which exist in reality. A model trained with a collection of sensor modalities may need to run on another setting with less types of sensors available. In this work, we propose a Cross-Modality Adaptation (CroMA) framework to facilitate the learning of a more robust monocular BEV perception model, which transfer the point clouds knowledge from LiDAR sensor during training phase to the camera-only testing scenario. The absence of LiDAR during testing negates the usage of it as model input. Hence, our key idea lies in the design of a LiDAR-teacher and Camera-student knowledge distillation model, as well as a multi-level adversarial learning mechanism, which adapt and align the features learned from different sensors and domains. This work results in the first open analysis of cross-domain perception and cross-sensor adaptation model for monocular 3D tasks in the wild. We benchmark our approach on large-scale datasets under various domain shifts and show state-of-the-art results against various baselines.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)