In addressing feature redundancy and training instability in CNNs, orthogonality regularization has emerged as a promising approach. Specifically, a variant termed kernel orthogonality regularization seeks to optimize models by minimizing the residual between kernel functions of convolutional filters and the identity matrix.
Contrary to methods that measure the kernel residual as a holistic entity, our approach introduces a tailored measure that disentangles diagonal and correlation components from the kernel matrix, thereby mitigating their mutual interference during training. Models equipped with this strict kernel orthogonality measure outperform existing methods in near-orthogonality. Notably, we observe test accuracy improvements for shallow architectures. However, as model depth increases, the efficacy of our strict kernel orthogonality approach diminishes.
Given the challenges of strict kernel orthogonality in deeper models and the inherent non-compliance of specific convolutional layers with the kernel orthogonality definition, we introduce the concept of a relaxation theory, wherein strict orthogonality is a special case. By adopting this relaxed kernel orthogonality regularization, we observe enhanced model performance in deeper architectures, suggesting it as a robust alternative to the strict counterpart.
To validate our approach's efficacy in achieving near-orthogonality and enhancing model performance, we conduct rigorous experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100 datasets. We observe state-of-the-art gains in model performance from the toolkit and obtain more robust models with expressive features. These experiments demonstrate the efficacy of our toolkit while highlighting the often overlooked challenges in orthogonality regularization.