Rethinking Regularization in Federated Learning: An Initialization Perspective

Rethinking Regularization in Federated Learning: An Initialization Perspective

ICLR 2026 Conference Submission17590 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: federated learning, data heterogeneity, regularization, initialization

TL;DR: We show that using FedDyn solely as an initialization strategy, rather than throughout the entire training process, achieves faster convergence and lower cost than standalone regularization methods.

Abstract: In federated learning, numerous regularization methods have been introduced to alleviate local drift caused by data heterogeneity. While all share the goal of reducing client drift, their effects on client gradients and the resulting features learned by local models differ. Our comparative analysis shows that among the tested regularization methods, FedDyn is the most effective, achieving superior accuracy-to-round while simultaneously reducing inter client gradient divergence and preserving global model features during local training. Nevertheless, regularization methods, including FedDyn, are only approximations of an ideal scheme that would completely remove local drift and guarantee convergence to the global stationary point. In practice, deviations from this ideal give rise to side effects and, together with the additional computational and communication costs, limit their practicality. Since the performance differences among federated learning algorithms diminish once models are well-initialized, it is more efficient to restrict regularization to the pre-training phase, where its benefits outweigh these drawbacks. Our study of pre-training strategies for FedAvg demonstrates that FedDyn provides the most effective initialization, a property tied to its convergence behavior near the global stationary point. Extensive experiments across both cross-silo and cross-device settings confirm that applying FedDyn solely for pre-training yields faster convergence and reduced overhead compared to maintaining regularization throughout the entire training process.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 17590

Loading