Two-Level Test-Time Adaptation in Multimodal Learning

Published: 03 Jul 2024, Last Modified: 12 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: test-time adaptation, multimodal learning, fine-tuning
Abstract: Test-time adaptation (TTA) aims to modulate parameters of the pre-trained source model utilizing samples from the target domain without accessing the source data. Although recent studies have revealed the high potential of TTA in different computer vision tasks, most TTA methods are constrained to the uni-modal adaptation tasks, while the reliability bias caused by uni-modal data corruption is not sufficiently discussed in multimodal tasks. Although some most recent methods suppressed the cross-modal information discrepancy (i.e. reliability bias) via modulating a modality-sharing module, the domain adaptation for the modality-specific module was neglected. In this paper, we propose a two-level test-time adaptation method (namely 2LTTA) considering both intra-modal distribution shift and cross-modal reliability bias in multimodal learning. 2LTTA modulates all normalization layers and self-attention modules of the encoder corresponding to the corrupted modality and the modality-sharing block. Additionally, we adopted a two-level objective function considering both intra-modal distribution shift and cross-modal reliability bias in the modality fusion block. Shannon entropy with sample reweighting was utilized to reduce the intra-modal distribution shift caused by data corruption. A diversity-promoting loss was employed to reduce the cross-modal information discrepancy. Our experiments demonstrate the superiority of 2LTTA over baseline methods on various data sets.
Submission Number: 3
Loading