Adversarial Experts Model for Black-box Domain Adaptation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Black-box domain adaptation treats the source domain model as a black box. During the transfer process, the only available information about the target domain is the noisy labels output by the black-box model. This poses significant challenges for domain adaptation. Conventional approaches typically tackle the black-box noisy label problem from two aspects: self-knowledge distillation and pseudo-label denoising, both achieving limited performance due to limited knowledge information. To mitigate this issue, we explore the potential of off-the-shelf vision-language (ViL) multimodal models with rich semantic information for black-box domain adaptation by introducing an Adversarial Experts Model (AEM). Specifically, our target domain model is designed as one feature extractor and two classifiers, trained over two stages: In the knowledge transferring stage, with a shared feature extractor, the black-box source model and the ViL model act as two distinct experts for joint knowledge contribution, guiding the learning of one classifier each. While contributing their respective knowledge, the experts are also updated due to their own limitation and bias. In the adversarial alignment stage, to further distill expert knowledge to the target domain model, adversarial learning is conducted between the feature extractor and the two classifiers. A new consistency-max loss function is proposed to measure two classifier consistency and further improve classifier prediction certainty. Extensive experiments on multiple datasets demonstrate the effectiveness of our approach. Our source code will be released.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: 1. This work promotes the development of data privacy protection in the training process of multimedia data. It focuses on black-box domain adaptation, achieving knowledge transfer solely using black-box APIs. This means that the source domain data and model are inaccessible, effectively protecting data privacy. 2. It facilitates the application of vision-language multimodal models. It introduces a vision-language model, carrying high-level semantic information to aid in the training of black-box domain adaptation. Additionally, adaptive adjustments are made to the vision-language model using target domain data. 3. The work improves the fusion of multimedia data. The algorithm leverages knowledge from both the black-box model and the vision-language model and then employs adversarial learning to integrate this knowledge into the model of the target domain.
Supplementary Material: zip
Submission Number: 2666
Loading