Perturbating, Tuning, and Collaborating: Harnessing Vision Foundation Models for Single Domain Generalization on Medical Imaging

Chuang Liu; Yichao Cao; YingYing Zhang; Xiu Su; Haogang Zhu

Perturbating, Tuning, and Collaborating: Harnessing Vision Foundation Models for Single Domain Generalization on Medical Imaging

Chuang Liu, Yichao Cao, YingYing Zhang, Xiu Su, Haogang Zhu

Published: 01 Jan 2025, Last Modified: 21 Jul 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Single Domain Generalization (SDG) is critical in medical imaging applications. Recently, Vision Foundation Models (VFMs) have spearheaded a trend in AI development due to their robust generalizability and versatility. This work aims to fully explore the generalization capabilities of VFMs alongside the domain-specific expertise of specialized models, thoroughly investigating the boundaries of their respective capabilities, thereby collaboratively addressing SDG challenges within medical imaging. We propose a framework for Collaborative reasoning between Specialized and Universal models for Single Domain Generalization (CollaSU-SDG) in medical imaging. Specifically, we first design a model-aware perturbation injection method from the perspective of single-source domain data, enabling differentiated and adaptive perturbation injection for two different scales of models. Then, a domain expansion adapter is designed for the VFM to adapt to the augmented single-source domain medical data. Lastly, we introduce an adaptive hierarchical transfer and dynamic dense prompting method that facilitate collaborative reasoning between the specialized and universal models, eliminating the need for explicit prompts. Through these designs, CollaSU-SDG fully leverages the strengths of both specialized and universal models, achieving robust out-of-distribution generalization capabilities on single-source domain data. Experimental results demonstrate that CollaSU-SDG significantly advances the state-of-the-art performance across a wide range of medical datasets. All the code will be publicly available.

Loading