MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation

Published: 09 Apr 2024, Last Modified: 24 Apr 2024SynData4CVEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Domain Adaptation, Semantic Segmentation, Depth Guidance
TL;DR: We present a novel masking and feature fusion strategy to facilitate information exchange between different input modalities for unsupervised domain adaptative semantic segmentation.
Abstract: Unsupervised Domain Adaptation (UDA) is the task of bridging the domain gap between a labeled source domain, e.g., synthetic data and an unlabeled target domain. We observe that current UDA methods show inferior results on fine structures and tend to oversegment objects with ambiguous appearance. To address these shortcomings, we propose to leverage depth predictions, as depth discontinuities often coincide with segmentation boundaries. We show that naively incorporating depth does not fully exploit its potential. To this end, we present, MICDrop which learns a joint feature representation by masking image encoder features by inversely masking depth encoder features. With this simple yet effective complementary masking strategy, we enforce the use of both modalities when learning the joint feature representation. We further propose a feature fusion module to improve both global and local information sharing. MICDrop can be plugged into various recent UDA methods and consistently improves results across standard UDA benchmarks, obtaining new state-of-the-art performances.
Submission Number: 52