Generalized Source-free Domain-adaptive Segmentation via Reliable Knowledge Propagation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Unanticipated domain shifts can severely degrade model performance, prompting the need for model adaptation techniques (i.e., Source-free Domain Adaptation (SFDA)) to adapt a model to new domains without accessing source data. However, existing SFDA methods often sacrifice source domain performance to improve adaptation on the target, limiting overall model capability. In this paper, we focus on a more challenging paradigm in semantic segmentation, Generalized SFDA (G-SFDA), aiming to achieve robust performance on both source and target domains. To achieve this, we propose a novel G-SFDA framework, Reliable Knowledge Propagation (RKP), for semantic segmentation tasks, which leverages the text-to-image diffusion model to propagate reliable semantic knowledge from the segmentation model. The key of RKP lies in aggregating the predicted reliable but scattered segments into a complete semantic layout and using them to activate the diffusion model for conditional generation. Subsequently, diverse images with multiple domain factors can be synthesized to retrain the segmentation model. This enables the segmentation model to learn domain-invariant knowledge across multiple domains, improving its adaptability to target domain, maintaining discriminability to source domain, and even handling unseen domains. Our model-agnostic RKP framework establishes new state-of-the-art across current SFDA segmentation benchmarks, significantly advancing various SFDA methods. The code will be open source.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: In this paper, we aim to unlock the potential of text-to-image diffusion models in cross-domain semantic segmentation tasks. The original text-to-image diffusion models are challenging to leverage effectively to harness the benefits of multimodal pre-training for source-free domain-adaptive segmentation tasks. To address this challenge, we propose a reliable knowledge propagation framework to activate text-to-image diffusion models, enabling them to use text as guidance to synthesize multi-domain target data. Therefore, our approach is highly relevant to multimodal methods, as it effectively enhances the generalization performance of existing methods by leveraging text-to-image diffusion models.
Supplementary Material: zip
Submission Number: 1040
Loading