Abstract: Adapting pretrained Vision Language Models like CLIP, for medical image analysis in federated learning (FL) offers cross-modal insights while preserving privacy. However, effective cross-domain federated adaptation requires intensive fine-tuning and knowledge sharing, challenging in low-resource medical practice due to the divergence between pretrained natural image and medical imagery. Moreover, the significant statistical heterogeneity (non-IID) of medical data exacerbates these challenges. To address these issues, this paper introduces a framework that tames CLIP for non-IID federated medical image classification. This develops client-specific personalized models by reinforcement and constrain local cross-modal alignment, enabling the models to integrate client-specific and globally common knowledge. This approach not only addresses non-IID challenges but also optimizes the trade-off between performance and efficiency. Extensive experiments on real-world medical image datasets confirm the effectiveness and superiority of our FedTCA.
External IDs:dblp:conf/miccai/ChenS25
Loading