SEAL: Separate and Augment with Pseudo-Labeling for Efficient Multimodal Multi-Target Domain Adaptation

SEAL: Separate and Augment with Pseudo-Labeling for Efficient Multimodal Multi-Target Domain Adaptation

ACL ARR 2026 January Submission3146 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal learning, domain adaptation, mutual information maximzation

Abstract: This paper investigates the multimodal multi-target domain adaptation problem where independent shifts of all modalities lead to an exponential number of multimodal target domains. We categorize them into F-target domains, where only one modality shifts, and U-target domains, where multiple modalities shift simultaneously. To alleviate the burden of collecting data from all domains, we propose a novel multimodal multi-target domain adaptation approach that requires only labeled samples from the source domain and unlabeled samples from F-target domains, thus achieving linear sample complexity. Specifically, we first disentangle each modality’s representation into task-relevant and domain-relevant components via mutual information maximization. Then, we augment source domain samples by recombining these components to emulate labeled samples from F-target and U-target domains. Moreover, we introduce a pseudo-labeling strategy that exploits the unshifted modalities of each F-target domain sample to generate pseudo labels for training. The overall design follows the principle of ``\textbf{SE}parate and \textbf{A}ugment with pseudo-\textbf{L}abeling'' (\textbf{SEAL}) to enable efficient multimodal multi-target domain adaptation. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art approaches on widely used benchmark datasets. The code is available in the supplementary material.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Generation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 3146

Loading