Enhancing Personal Decentralized Federated Learning through Model Decoupling

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Personalized Federated Learning, Partial Personalization, Decentralizaed Training
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Personalized Federated Learning (FL) aims to produce many local personalized models rather than one global model to encounter an insurmountable problem -- data heterogeneity in real federated systems. However, almost all existing works have to face central communication burdens and the risk of disruption if the central server fails. Only limited efforts have been made without a central server but they still suffer from high local computation, catastrophic forgetting, and worse convergence due to the full model aggregation process. Therefore, in this paper, we propose a PFL framework through model decoupling called DFedMDC, which pursues robust communication and better model performance with a convergence guarantee. It personalizes the “right” components in the modern deep models by alternately updating the shared and personal parameters to train partially personalized models in a peer-to-peer manner. To further promote the shared parameters aggregation process, we propose DFedSMDC via integrating the local Sharpness Aware Minimization (SAM) optimizer to update the shared parameters. Specifically, it adds proper perturbation in the gradient direction to alleviate the shared model inconsistency across clients. Theoretically, we provide convergence analysis of both algorithms in the general non-convex setting with partial personalization and SAM optimizer for the shared model. We analyze the ill impact of the statistical heterogeneity $\delta^2$, the smoothness $L_u, L_v, L_{uv}, L_{vu}$ of loss functions, and communication topology ($1-\lambda$) on the convergence. Our experiments on several real-world data with various data partition settings demonstrate that (i) partial personalized training is more suitable for personalized decentralized FL, which results in state-of-the-art (SOTA) accuracy compared with the SOTA PFL baselines; (ii) the shared parameters with proper perturbation make partial personalized FL more suitable for decentralized training, where DFedSMDC achieves most competitive performance.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4664
Loading