Achieving Global Flatness in Decentralized Learning with Heterogeneous Data

TMLR Paper5987 Authors

24 Sept 2025 (modified: 09 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Decentralized training enables peer-to-peer on-device learning without relying on a central server, but suffers from degraded generalization performance under heterogeneous data distributions due to local overfitting. One strategy to mitigate this is to seek flatter loss landscapes during local optimization at each client. However, with extreme data heterogeneity, local objectives may diverge from the global one, yielding local flatness rather than true global flatness. To mitigate this challenge, we introduce GFlat, a novel decentralized algorithm that enables each client to estimate and incorporate an approximation of the global update direction while seeking a flatter loss landscape locally. This lightweight strategy allows each client to directly contribute to global flatness without requiring additional communication or centralized coordination. We theoretically analyze the convergence properties of GFlat and validate its performance through extensive experiments across a range of datasets, model architectures, and communication topologies. GFlat consistently improves generalization in non-IID data settings and achieves up to 6.75\% higher test accuracy compared to state-of-the-art decentralized methods.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Eduard_Gorbunov1
Submission Number: 5987
Loading