A General Framework for Approximate and Delayed Gradient Descent for Decomposable Cost Functions

Xinran Zheng, Tara Javidi, Behrouz Touri

Published: 01 Jan 2024, Last Modified: 15 May 2025IEEE Control. Syst. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose and analyze a generalized framework for distributed, delayed, and approximate stochastic gradient descent. Our framework considers n local agents who utilize their local data and computation to collectively assist a central server tasked with optimizing a global cost function composed of local cost functions accessible to the local agents. This framework is very general, subsuming a great variety of algorithms in federated learning and distributed optimization. In particular, this framework allows each local agent to approximate and share a stochastic (possibly biased) and delayed estimate of its local function gradient. Focusing on strongly convex functions with sufficient degree of smoothness, we characterize the mean square error in terms of the varying step-size, the approximation error (bias), and the delay in computing the gradients. This characterization, together with a careful design of step size process, establishes an optimal convergence rate that aligns with centralized stochastic gradient descent (SGD).