Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation

Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation

ICLR 2026 Conference Submission13116 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-modal Learning, Semantic Segmentation, Knowledge Distillation

Abstract: Simultaneously using multimodal inputs from multiple sensors to train segmentors is intuitively advantageous but practically challenging. A key challenge is unimodal bias, where multimodal segmentors over-rely on certain modalities, causing performance drops when these modalities are missing—common in real-world applications. To this end, we develop the \textbf{first} framework for learning robust segmentor that can handling any combinations of visual modalities. Specifically, we first introduce a parallel multimodal learning strategy for learning a strong teacher. The cross-modal and unimodal distillation is then achieved in the multiscale representation space by transferring the feature-level knowledge from multimodal to anymodal segmentors, aiming at addressing the unimodal bias and avoiding over reliance on specific modalities. Moreover, a prediction-level modality-agnostic semantic distillation is proposed to achieve semantic knowledge transferring for segmentation. Extensive experiments on both synthetic and real world multi-sensor benchmarks demonstrate that our method achieves superior performance (+6.37% & +6.15%).

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 13116

Loading