Stability and Generalization Analysis of Decentralized SGD: Sharper Bounds Beyond Lipschitzness and Smoothness
TL;DR: We develop stability and generalization bounds for decentralized SGD.
Abstract: Decentralized SGD (D-SGD) is a popular optimization method to train large-scale machine learning models. In this paper, we study the generalization behavior of D-SGD for both smooth and nonsmooth problems by leveraging the algorithm stability. For convex and smooth problems, we develop stability bounds involving the training errors to show the benefit of optimization in generalization. This improves the existing results by removing the Lipschitzness assumption and implying fast rates in a low-noise condition. We also develop the first optimal stability-based generalization bounds for D-SGD applied to nonsmooth problems. We further develop optimization error bounds which imply minimax optimal excess risk rates. Our novelty in the analysis consists of an error decomposition to use the co-coercivity of functions as well as the control of a neighboring-consensus error.
Lay Summary: Training large machine learning models often necessitates data from numerous devices. However, in cases where there is no central server to consolidate this data, Decentralized Stochastic Gradient Descent (D-SGD) offers a solution. In this approach, each device works on its individual data and collaborates with neighboring devices. This study explores the efficacy of models trained using D-SGD on new data, a concept known as generalization. Our research reveals that D-SGD demonstrates strong generalization capabilities across various problem types, whether the functions are smooth (such as curves without abrupt changes) or nonsmooth with sharp corners. Furthermore, we have introduced novel mathematical tools that provide more robust generalization guarantees based on less stringent assumptions. This finding underscores the reliability and effectiveness of decentralized learning, particularly in scenarios where data is dispersed, and inter-device communication is limited.
Primary Area: Theory->Learning Theory
Keywords: Algorithmic stability, generalization analysis, decentralized SGD
Submission Number: 2999
Loading