Can stochastic weight averaging improve generalization in private learning?Download PDF

Published: 16 Apr 2023, Last Modified: 22 Apr 2023RTML Workshop 2023Readers: Everyone
Keywords: differential privacy, stochastic weight averaging, generalization
TL;DR: Stochastic weight averaging improves generalization, accuracy, and stability in private learning
Abstract: We investigate stochastic weight averaging (SWA) for private learning in the context of generalization and model performance. Differentially private (DP) optimizers are known to suffer from reduced performance and high variance in comparison to non-private learning. However, the generalization properties of DP optimizers have not been studied much, in particular for large-scale machine learning models. SWA is variant of stochastic gradient descent (SGD) which averages the weights along the SGD trajectory. We consider a DP adaptation of SWA (DP-SWA) which incurs no additional privacy cost and has little computational overhead. For quadratic objective functions, we show that DP-SWA converges to the optimum at the same rate as non-private SGD, which implies convergence to zero for the excess risk. For non-convex objective functions, we observe throughout multiple experiments on standard benchmark datasets that averaging model weights improves generalization, model accuracy, and performance variance.
0 Replies

Loading