Abstract: Foundation models (FMs), known as pre-trained models, have garnered significant interest in Industrial Internet due to their remarkable performance and robust generalization capabilities in downstream tasks. However, with the increasing requirements of computing infrastructure and data privacy protection for large foundation models, existing learning frameworks face challenges such as data privacy leakage, poor scalability, and deployment difficulties. To address these issues, this paper proposes a novel collaborative Transformer Block (TB)-wise training framework based on Federated Learning (FL), which consists of three stages: pre-training, graph regularization, and personalized training. To tackle the challenge of statistical heterogeneity in distributed data, we design a Graph Convolutional Network (GCN)-based update operator that captures local training representations. Besides, we conduct an analysis based on feature similarity to enhance the interpretability of our algorithm. We choose popular vision Transformer models for the experiments, extensive results demonstrate that our framework can jointly train multiple clients to build a foundation model while improving the single client's personalized performance. The proposed method outperforms state-of-the-art frameworks under various data distributions and system heterogeneity settings, highlighting its robust performance.
Loading