Improving CNN training by Riemannian optimization on the generalized Stiefel manifold combined with a gradient-based manifold search

ICLR 2025 Conference Submission14034 Authors

28 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Riemannian optimization, Convolutional neural networks, gradient-based optimization, deep neural networks, generalized Stiefel manifold
TL;DR: We optimize CNNs on the generalized Stiefel manifold with Riemannian optimization and a gradient-based manifold search.
Abstract: Enforcing orthonormality constraints in deep learning has been shown to provide significant benefits. Although hard restrictions can be applied by constraining parameter matrices to the Stiefel manifold, this approach limits the solution space to that specific manifold. We show that a generalized Stiefel constraint $X^TSX=\mathbb{I}$ for Riemannian optimization can lead to even faster convergence than in previous work on CNNs, which enforced orthonormality. The gained flexibility comes from a larger search space. In this paper, we therefore propose a novel approach that retains the advantages of compact restrictions while using a gradient-based formulation to adapt the solution space defined by $S$. This approach results in overall faster convergence rates and improved test performance across CIFAR10, CIFAR100, SVHN, and Tiny ImageNet32 datasets on GPU hardware.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14034
Loading