Overcoming heterogeneous data in federated medical vision-language pre-training

Aowen Wang, Zhiwang Zhang, Dongang Wang, Fanyi Wang, Haotian Hu, Jinyang Guo, Yipeng Zhou, Chaoyi Pang, Shiting Wen

Published: 01 Jan 2025, Last Modified: 04 Nov 2025Proceedings of the AAAI Conference on Artificial IntelligenceEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The scarcity of data in the medical field brings challenges to collaborative training in medical vision-language pre-training (VLP) across different clients Thus, collaborative training in medical VLP faces two significant challenges: First, the medical data requires privacy and therefore cannot be directly shared across different clients. Second, medical data distribution across institutes is typically heterogeneous, hindering local model alignment and representation capabilities. To simultaneously overcome these two challenges, we propose a framework called personalized model selector with fused multimodal information (PMS-FM). The contribution of PMS-FM is two-fold: 1) PMS-FM uses embeddings to represent information in different formats, allowing for the fusion of multimodal data. 2) PMS-FM adapts to personalized data distributions by training multiple models. A model selector then identifies and selects the best-performing model for each individual client. Extensive experiments with multiple real-world medical datasets demonstrate the superb performance of PMS-FM over existing federated learning methods on different zero-shot classification tasks.

External IDs:doi:10.1609/aaai.v39i7.32807