Towards Better Orthogonality Regularization with Disentangled Norm in Training Deep CNNs

Changhao Wu; zhang shenan; Fangsong Long; Ziliang Yin; Tuo Leng

Towards Better Orthogonality Regularization with Disentangled Norm in Training Deep CNNs

Changhao Wu, zhang shenan, Fangsong Long, Ziliang Yin, Tuo Leng

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Orthogonality Regularization, Disentangled Norm

TL;DR: We propose a novel orthogonality regularization toolkit, which achieve state-of-the-art performance and unveil thrilling mechanism

Abstract: In addressing feature redundancy and training instability in CNNs, orthogonality regularization has emerged as a promising approach. Specifically, a variant termed kernel orthogonality regularization seeks to optimize models by minimizing the residual between kernel functions of convolutional filters and the identity matrix. Contrary to methods that measure the kernel residual as a holistic entity, our approach introduces a tailored measure that disentangles diagonal and correlation components from the kernel matrix, thereby mitigating their mutual interference during training. Models equipped with this strict kernel orthogonality measure outperform existing methods in near-orthogonality. Notably, we observe test accuracy improvements for shallow architectures. However, as model depth increases, the efficacy of our strict kernel orthogonality approach diminishes. Given the challenges of strict kernel orthogonality in deeper models and the inherent non-compliance of specific convolutional layers with the kernel orthogonality definition, we introduce the concept of a relaxation theory, wherein strict orthogonality is a special case. By adopting this relaxed kernel orthogonality regularization, we observe enhanced model performance in deeper architectures, suggesting it as a robust alternative to the strict counterpart. To validate our approach's efficacy in achieving near-orthogonality and enhancing model performance, we conduct rigorous experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100 datasets. We observe state-of-the-art gains in model performance from the toolkit and obtain more robust models with expressive features. These experiments demonstrate the efficacy of our toolkit while highlighting the often overlooked challenges in orthogonality regularization.

Primary Area: metric learning, kernel learning, and sparse coding

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7225

Loading