MAPLE: Context-aware Multimodal Augmentation for Long-tail 3D Object Detection

MAPLE: Context-aware Multimodal Augmentation for Long-tail 3D Object Detection

ICLR 2026 Conference Submission12953 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Instance Augmentation, 3D Perception, Autonomous Driving

Abstract: 3D object detection is essential for autonomous driving but remains limited by the long-tail distribution of real-world data. Instance-level augmentation methods, whether copy-paste or asset rendering, are typically restricted to LiDAR and offer only modest variation with limited scene context. We introduce MAPLE, a training-free pipeline for multimodal augmentation that generates synchronized RGB-LiDAR pairs. Objects are inserted through context-aware inpainting in the image domain, and paired pseudo-LiDAR is reconstructed via depth estimation. To ensure cross-modal plausibility, MAPLE incorporates semantic and geometric verification modules that filter inconsistent generations. We further propose a success-rate evaluation that quantifies error reduction across verification stages, providing a principled measure of pipeline reliability. On the nuScenes benchmark, MAPLE consistently improves both detection and segmentation in multimodal and LiDAR-only settings. Code will be released publicly.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 12953

Loading