Abstract: Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under heterogeneous (non-IID) data and local updates, the compression residual can decay slowly. This induces a mismatch between where gradients are evaluated and where the (decompressed) update is effectively applied, often slowing progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which introduces a tunable step-ahead coefficient \(\alpha_r\in[0,1]\) and previews only a fraction of the residual while carrying the remainder through standard EF. SA-PEF interpolates smoothly between EF (\(\alpha_r=0\)) and full step-ahead EF (SAEF; \(\alpha_r=1\)). For nonconvex objectives with \(\delta\)-contractive compressors, we develop a second-moment bound and a residual recursion that yield nonconvex stationarity guarantees under data heterogeneity and partial client participation. With a constant inner stepsize, the bound exhibits the standard \(\mathcal{O}\!\bigl((\eta\,\eta_0TR)^{-1}\bigr)\) optimization term and an \(R\)-independent variance/heterogeneity floor induced by biased compression. Our analysis highlights a step-ahead-controlled residual contraction factor \(\rho_r\), explaining the observed early-phase acceleration, and suggests choosing \(\alpha_r\) near a theory-predicted optimum to balance SAEF’s rapid warm-up with EF’s long-run stability. Experiments across architectures, datasets, and compressors show that SA-PEF consistently reaches target accuracy in fewer communication rounds than EF.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The "Broader Impact" statement was revised.
Assigned Action Editor: ~Konstantin_Mishchenko1
Submission Number: 7442
Loading