Distributed SGD in overparameterized Linear Regression

Distributed SGD in overparameterized Linear Regression

TMLR Paper945 Authors

14 Mar 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We consider distributed learning using constant stepsize SGD over several devices, each sending a final model update to a central server. In a final step, the local estimates are aggregated. We prove in the setting of overparameterized linear regression general upper bounds with matching lower bounds and derive learning rates for specific data generating distributions. We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size. We further compare distributed SGD with distributed ridge regression and provide an upper bound of the excess SGD-risk in terms of the excess RR-risk for a certain range of the sample size.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Sebastian_U_Stich1

Submission Number: 945

Loading