Robu-MARC: A Masked Autoencoder-Aided Camera-Radar Network for Robust 3D Perception under Sensor Corruption

Robu-MARC: A Masked Autoencoder-Aided Camera-Radar Network for Robust 3D Perception under Sensor Corruption

ICLR 2026 Conference Submission17626 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Camera, Automotive Radar, Sensor Fusion, nuScenes, nuScenes-C, Masked Auto-Enconder, Sensor Corruption, Autonomous Driving, 3D Object Detection

Abstract: Autonomous vehicles rely on robust perception systems, yet real-world conditions such as poor lighting, adverse weather, and dynamic environments often lead to corrupted camera images, posing significant challenges for reliable sensor fusion and downstream perception. In this paper, we propose Robu-MARC (Robust Masked Autoencoder-Aided Radar Camera), a fusion framework designed to enhance 3D perception in autonomous vehicles under sensor corruption. Robu-MARC integrates a Masked Autoencoder (MAE) with a Vision Transformer backbone to reconstruct degraded camera images and compute reconstruction error. This error serves a dual purpose; It weights the confidence attention map used in bird's eye view fusion and is incorporated into the loss function to guide training to corruption-tolerant spatial representations. On the radar side, Robu-MARC introduces a radar-specific cross-attention mechanism and applies Doppler-aware and radar cross-section (RCS)-aware Gaussian expansion strategies independently. By avoiding joint modeling of Doppler velocity and radar cross-section, the model improves target detection and enhances the reliability of multimodal fusion in real-world driving scenarios. We evaluate Robu-MARC on the nuScenes dataset and its corrupted variants, including scenarios with corrupted camera images. The performance of Robu-MARC is promising in object detection task across clean and corrupted images. This work advances robust multimodal fusion for autonomous driving and highlights the effectiveness of reconstruction-guided attention and selective radar feature refinement through Doppler- and RCS-aware processing in handling corrupt images.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 17626

Loading