DSGD-AC: controlled consensus errors improve generalization in decentralized training

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: decentralized optimization, distributed training, sharpness-aware minimization
TL;DR: DSGD-AC improves generalization in decentralized training by intentionally preserving worker disagreements that act as implicit sharpness regularization.
Abstract: Decentralized SGD reduces communication overhead in distributed training but aggressively enforces consensus among workers by driving model disagreements to zero as learning rates decay. We argue that this vanishing consensus eliminates beneficial structured perturbations that promote better generalization, similar to Sharpness-Aware Minimization (SAM) but without additional gradient computations. We propose Decentralized SGD with Adaptive Consensus (DSGD-AC), which intentionally preserves non-vanishing consensus errors during late-stage training. Our key insight is that consensus errors are data-dependent and correlate with ascent directions on local datasets, providing implicit sharpness regularization over data distributions. Empirically, DSGD-AC improves the generalization over SGD with negligible computational overhead on classic deep learning tasks. By treating consensus as a tunable resource rather than a nuance to minimize, DSGD-AC offers a simple yet effective approach to improve generalization in decentralized training.
Submission Number: 66
Loading