Parameter-Agnostic Error Feedback Enhanced With Hessian-Corrected Momentum

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Error feedback, error compensation, federated learning, distributed optimization
TL;DR: We propose optimal method and lower bounds for methods with biased compression for non-convex optimization
Abstract: Advanced machine learning models often rely on massive datasets distributed across many nodes. To reduce communication overhead in large-scale stochastic optimization, compression is widely used, though it may introduce noise and harm convergence. Error feedback mitigates this by accumulating and reusing compression error, while Hessian-vector products provide variance reduction and improve complexity. Building on these ideas, we design a distributed algorithm for finding $\varepsilon$-stationary points of nonconvex $L$-smooth functions that leverages error feedback, normalization, and second-order momentum. Unlike prior methods requiring problem parameters to tune stepsizes, our algorithm is parameter-agnostic: it uses $\mathcal{O}(1)$ batch size and a time-varying learning rate independent of $L$ and the functional gap. The method achieves $\mathcal{O}(\varepsilon^{-3})$ communication complexity. We prove a lower bound which shows that this rate is optimal. These findings establish the complexity of nonconvex distributed stochastic optimization with higher-order methods.
Submission Number: 52
Loading