TL;DR: A study on test-time adaptation for uni-modal distribution shift, an under-explored scenario where the distribution shift occurs in only one modality.
Abstract: Modern machine learning applications are characterized by the increasing size of deep models and the growing diversity of data modalities. This trend underscores the importance of efficiently adapting pre-trained multi-modal models to the test distribution in real time, i.e., multi-modal test-time adaptation. In practice, the magnitudes of multi-modal shifts vary because multiple data sources interact with the impact factor in diverse manners. In this research, we investigate the the under-explored practical scenario *uni-modal distribution shift*, where the distribution shift influences only one modality, leaving the others unchanged. Through theoretical and empirical analyses, we demonstrate that the presence of such shift impedes multi-modal fusion and leads to the negative transfer phenomenon in existing test-time adaptation techniques. To flexibly combat this unique shift, we propose a selective adaptation schema that incorporates multiple modality-specific adapters to accommodate potential shifts and a ``router'' module that determines which modality requires adaptation. Finally, we validate the effectiveness of our proposed method through extensive experimental evaluations.
Code available at https://github.com/chenmc1996/Uni-Modal-Distribution-Shift.
Lay Summary: In real-world applications like self-driving cars or medical imaging, machines often face data that looks different from what they’ve trained on—a problem called “distribution shift.” Traditional methods assume all types of data (like images and sounds) change together, but in reality, only one type of data (e.g., just camera images or just audio) might shift. This “uni-modal” shift can confuse models, making them unreliable.
To address this, we developed a selective adaptation method that lets models focus on fixing only the shifted data type. Imagine a self-driving car where the camera sees snowy roads (shifted data), but the LiDAR (another data type) works normally. Our approach uses a “router” to detect which data type needs adjustment and a lightweight “adapter” to fix just that part, leaving the rest of the model unchanged. This prevents the model from overreacting to stable data, like misadjusting the LiDAR when only the camera is affected.
Through experiments on video and audio datasets, we showed our method outperforms existing techniques in handling such shifts. By isolating and adapting only the affected data type, our approach ensures machines stay accurate and reliable in dynamic real-world scenarios, where different data types may change independently. This is crucial for building robust AI systems that can adapt safely without retraining, especially in high-stakes fields like healthcare or autonomous driving.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/chenmc1996/Uni-Modal-Distribution-Shift
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: test-time adaptation, multi-modal learning
Submission Number: 6433
Loading