One Arrow, Two Hawks: Sharpness-aware Minimization for Federated Learning via Global Model Trajectory

Yuhang Li; Tong Liu; Yangguang Cui; Ming Hu; Xiaoqiang Li

One Arrow, Two Hawks: Sharpness-aware Minimization for Federated Learning via Global Model Trajectory

Yuhang Li, Tong Liu, Yangguang Cui, Ming Hu, Xiaoqiang Li

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a novel method to directly reduce the sharpness of the global model, which achieves high generalization with less communication and computation cost in the FL framework.

Abstract: Federated learning (FL) presents a promising strategy for distributed and privacy-preserving learning, yet struggles with performance issues in the presence of heterogeneous data distributions. Recently, a series of works based on sharpness-aware minimization (SAM) have emerged to improve local learning generality, proving to be effective in mitigating data heterogeneity effects. However, most SAM-based methods do not directly consider the global objective and require two backward pass per iteration, resulting in diminished effectiveness. To overcome these two bottlenecks, we leverage the global model trajectory to directly measure sharpness for the global objective, requiring only a single backward pass. We further propose a novel and general algorithm FedGMT to overcome data heterogeneity and the pitfalls of previous SAM-based methods. We analyze the convergence of FedGMT and conduct extensive experiments on visual and text datasets in a variety of scenarios, demonstrating that FedGMT achieves competitive accuracy with state-of-the-art FL methods while minimizing computation and communication overhead. Code is available at https://github.com/harrylee999/FL-SAM.

Lay Summary: In federated learning (FL), data heterogeneity causes local models to diverge, harming global performance. Existing sharpness-aware methods are costly and miss global smoothness. We propose **FedGMT**, using a **Global Model Trajectory** with exponential moving averages (EMA) and KL divergence to guide local updates toward global flat minima. ADMM aligns local-global directions. Experiments show FedGMT cuts computation by 33%, converges at \(O(1/T)\), and outperforms baselines on diverse datasets, even with high heterogeneity. A lighter variant, FedGMT-v2, reduces communication for real-world use. This work enhances FL efficiency and privacy-preserving collaboration on edge devices.

Link To Code: https://github.com/harrylee999/FL-SAM

Primary Area: General Machine Learning

Keywords: Federated learning, sharpness-aware minimization

Submission Number: 5901

Loading