Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Tao Li; Qinghua Tao; Weihao Yan; Yingwen Wu; Zehao Lei; Kun Fang; Mingzhen He; Xiaolin Huang

Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Tao Li, Qinghua Tao, Weihao Yan, Yingwen Wu, Zehao Lei, Kun Fang, Mingzhen He, Xiaolin Huang

Published: 25 Mar 2024, Last Modified: 25 Mar 2024Accepted by TMLREveryoneRevisionsBibTeX

Abstract: Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objective with random weight perturbation (RWP). While RWP offers advantages in computation and is closely linked to AWP on a mathematical basis, its empirical performance has consistently lagged behind that of AWP. In this paper, we revisit the use of RWP for improving generalization and propose improvements from two perspectives: i) the trade-off between generalization and convergence and ii) the random perturbation generation. Through extensive experimental evaluations, we demonstrate that our enhanced RWP methods achieve greater efficiency in enhancing generalization, particularly in large-scale problems, while also offering comparable or even superior performance to SAM. The code is released at https://github.com/nblt/mARWP.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: 1. We have analyzed the smoothness properties of RWP/m-RWP (Theorem 1 and 3). 2. We have discussed the trade-off between convergence and generalization (Section 5.1) and analyzed how our mixed loss objective improves such trade-off (Lemma1, Figure 3). 3. We have revised our goal from "improving the convergence" to a more specific "trade-off between convergence and generalization" (Section 5.2). 4. We have added a discussion on the convergence of RWP/ARWP (Section 5.3). 5. We have performed a grid search for $\rho$ in SAM and revised corresponding results. We also have added a discussion on the contributions of our proposed techniques (Section 6.1, Table 1 and 2).

Code: https://github.com/nblt/mARWP

Supplementary Material: zip

Assigned Action Editor: ~Jakub_Mikolaj_Tomczak1

Submission Number: 1957

Loading