Abstract: Decentralized Learning (DL) enables privacy-preserving collaboration among organizations or users to enhance the performance of local deep learning models. However, model aggregation becomes challenging when client data is heterogeneous, and identifying compatible collaborators without direct data exchange remains a pressing issue. In this paper, we investigate the effectiveness of various similarity metrics in DL for identifying peers for model merging, conducting an empirical analysis across multiple datasets with distribution shifts. Our research provides insights into the performance of these metrics, examining their role in facilitating effective collaboration. By exploring the strengths and limitations of these metrics, we contribute to the development of robust DL methods.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Included pseudocode for the algorithm to enhance clarity and understanding.
We have detailed the communication costs associated with each metric, emphasizing that the empirical loss metric is more costly than the others.
We have added a discussion in the future work section on optimal transport and the Wasserstein distance.
We have corrected the ordering of Table 2 to match the subsections of the Results section.
We have added standard deviations to Table 2
We have run extra experiments on varying number of sampled clients $m$ for the CIFAR-10 two-cluster setup. See results of these experiments in Figure 6 in the Appendix, where we also discuss it.
We have added a justification of FedSim's aggregation method in Section 3.5.
Added following papers to our discussion or related work
- An Improved Federated Clustering Algorithm with Model-based Clustering, Vardhan et al.
- Fedgroup: Efficient federated learning via decomposed similarity-based clustering, Duan et al.
- Fedsoft: Soft clustered federated learning with proximal local updating, Ruan et al.
- Quasi-global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data, Lin et al.
- Neighborhood gradient mean: An efficient decentralized learning method for non-iid data, Aketi et al.
- Data-heterogeneity-aware mixing for decentralized learning, Dandi et al.
- Computational optimal transport: With applications to data science, Peyré et al.
- Federated learning with hierarchical clustering of local updates to improve training on non-IID data, Briggs et al.
Assigned Action Editor: ~Sebastian_U_Stich1
Submission Number: 2828
Loading