MaskMentor: Unlocking the Potential of Masked Self-Teaching for Missing Modality RGB-D Semantic Segmentation
Abstract: Existing RGB-D semantic segmentation methods struggle to handle modality missing input, where only RGB images or depth maps are available, leading to degenerated segmentation performance. We tackle this issue using MaskMentor, a new pre-training framework for modality missing segmentation, which advances its counterparts via two novel designs: Masked Modality and Image Modeling (M2IM), and Self-Teaching via Token-Pixel Joint reconstruction (STTP). M2IM simulates modality missing scenarios by combining both modality- and patch-level random masking. Meanwhile, STTP offers an effective self-teaching strategy, where the trained network assumes a dual role, simultaneously acting as both the teacher and the student. The student with modality missing input is supervised by the teacher with complete modality input through both token- and pixel-wise masked modeling, closing the gap between missing and complete input modalities. By integrating M2IM and STTP, MaskMentor significantly improves the generalization ability of the trained model across diverse input conditions, and outperforms state-of-the-art methods on two popular benchmarks by a considerable margin. Extensive ablation studies further verify the effectiveness of the above contributions.
Primary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: 1.We propose the MaskMentor framework, which unlocks the potential of MIM for more accurate missing modality RGB-D segmentation.
2.We design M2IM pre-training approach, which combines both patch- and modality-level masking and significantly enforces the cross-modal modeling capabilities of MIM.
3.We present STTP, a MIM-based self-teaching method, which can effectively improve the predictive power from missing modality input using supervisions offered by complete-modality data and integrates fine-grained spatial characteristics with high-level semantic information.
Submission Number: 5678
Loading