SEAL: Separate and Augment with Pseudo-Labeling for Efficient Multimodal Multi-Target Domain Adaptation
Keywords: Multimodal learning, domain adaptation, mutual information maximzation
Abstract: This paper investigates the multimodal multi-target domain adaptation problem where independent shifts of all modalities lead to an exponential number of multimodal target domains. We categorize them into F-target domains, where only one modality shifts, and U-target domains, where multiple modalities shift simultaneously. To alleviate the burden of collecting data from all domains, we propose a novel multimodal multi-target domain adaptation approach that requires only labeled samples from the source domain and unlabeled samples from F-target domains, thus achieving linear sample complexity. Specifically, we first disentangle each modality’s representation into task-relevant and domain-relevant components via mutual information maximization. Then, we augment source domain samples by recombining these components to emulate labeled samples from F-target and U-target domains. Moreover, we introduce a pseudo-labeling strategy that exploits the unshifted modalities of each F-target domain sample to generate pseudo labels for training. The overall design follows the principle of ``\textbf{SE}parate and \textbf{A}ugment with pseudo-\textbf{L}abeling'' (\textbf{SEAL}) to enable efficient multimodal multi-target domain adaptation. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art approaches on widely used benchmark datasets. The code is available in the supplementary material.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Generation
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 3146
Loading