Optimum Shifting to Stabilize Training and Improve Generalization of Deep Neural Networks

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep Networks; Optimization; Generalization;
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Recent studies have shown that generalization is correlated with the sharpness of the loss landscape and flat minima suggests a better generalization ability than sharp minima. In this paper, we introduce a method called optimum shifting (OS), which changes the parameters of a neural network from sharper minima to a flatter one while maintaining the same training loss. Our approach is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations, enabling adjustment of parameters in solution space. This can be accomplished by solving a constrained optimization problem, which is easy to implement. We prove that the minima we move to will be flatter than the original one. Furthermore, we introduce a practical stochastic optimum shifting (SOS) technique utilizing neural collapse theory to reduce computational costs and provide more degrees of freedom for optimum shifting. In our experiments, we present various DNNs (e.g., VGG, ResNet, DenseNet and Vit) on the Cifar 10/100 and Tiny-Imagenet datasets to validate the effectiveness of our method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4496
Loading