Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial Training

Hongzhao Li, Hualei Wan, Liangzhi Zhang, Mingyuan Jiu, Shupan Li, Mingliang Xu, Muhammad Haris Khan

Published: 27 Oct 2025, Last Modified: 21 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Multimodal Domain Generalization (MMDG) aims to enhance the robustness of multimodal models against distribution shifts in unseen target domains. Unlike unimodal domain generalization methods, which primarily focus on mitigating domain bias within individual modalities, MMDG faces unique challenges, notably modality heterogeneity (divergent feature spaces) and stability discrepancy (varying sensitivity to domain shifts). To tackle these challenges, we propose Modality-Domain Joint Adversarial Training, a unified framework that addresses these challenges through two key innovations: (1) a tri-discriminator adversarial module that mitigates domain biases in both modality-specific and multimodal representations, while suppressing modality-heterogeneous patterns in the representation space; and (2) a stability-aware dynamic weighting mechanism that adaptively balances modality contributions based on cross-domain stability, reducing reliance on unstable modalities. Additionally, we provide the first theoretical error bound for MMDG, offering a theoretical foundation that supports the effectiveness of our approach. Our approach achieves state-of-the-art performance on the EPIC-Kitchens and HAC datasets while using 75.2% fewer parameters than previous MMDG methods. The source code is available at https://github.com/lihongzhao99/MMDG-Joint-Adversarial-Training.

External IDs:doi:10.1145/3746027.3754954