Keywords: Federated learning, Second order optimization, Generalizaton error bounds, Newton's method
TL;DR: This paper proposes a federated Newton method (sharing first-order and second-order information) with generalization error bounds.
Abstract: Most federated learning algorithms, such as FedAvg and FedProx, only communicate first-order information, which can be inefficient under heterogeneous data and leaves their statistical behavior poorly understood. We propose FedNewton, a second-order federated learning method that shares both gradient and curvature information while retaining a lightweight communication pattern. In a kernel ridge regression setting, we derive non-asymptotic excess-risk bounds for FedNewton and establish minimax-optimal learning rates, explicitly quantifying the roles of local sample size, data heterogeneity, and model heterogeneity. Our theory further shows that, under benign conditions, the federated error of FedNewton decays exponentially in the number of communication rounds. Beyond this RKHS regime, we instantiate FedNewton in a practical _backbone+head_ federated fine-tuning setting and conduct large-scale experiments on standard vision benchmarks, demonstrating that FedNewton achieves strong accuracy and efficiency compared with state-of-the-art first-order and second-order baselines.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 19042
Loading