Orthonormalising gradients improves neural network optimisation

TMLR Paper1063 Authors

17 Apr 2023 (modified: 19 Jun 2023)Withdrawn by AuthorsEveryoneRevisionsBibTeX
Abstract: The optimisation of neural networks can be improved by orthogonalising the gradients before the optimisation step, ensuring the diversification of the learned intermediate representations. We orthonormalise the gradients of a layer's components/filters with respect to each other to separate out the latent features. Our method of orthogonalisation allows the weights to be used more flexibly, in contrast to restricting the weights to an orthogonal sub-space. We tested this method on image classification, ImageNet and CIFAR-10, and on the semi-supervised learning BarlowTwins, obtaining both better accuracy than SGD with fine-tuning and better accuracy for naïvely chosen hyper-parameters.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=Uv7QfY19cE
Changes Since Last Submission: Fixed font. Tuned hyper-parameters to beat the original of resnet20's performance, which is the current SOTA for that architecture.
Assigned Action Editor: ~Joan_Bruna1
Submission Number: 1063
Loading