Stacking More Linear Operations with Orthogonal Regularization to Learn Better

Published: 2022, Last Modified: 10 Nov 2025ICIP 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: How to improve the generalization of CNN models has been a long-lasting problem in the deep learning community. This paper presents a runtime parameter/FLOPs-free method to strengthen CNN models by stacking linear convolution operations during training. We show that overparameterization with appropriate regularization can lead to a smooth optimization landscape that improves the performance. Concretely, we propose to add a 1 × 1 convolutional layer before and after the original k × k convolutional layer respectively, without any non-linear activations between them. In addition, Quasi-Orthogonal Regularization is proposed to maintain the added 1 × 1 filters as orthogonal matrixes. After training, those two 1 × 1 layers can be fused into the original k × k layer without changing the original network architecture, leaving no extra computations at inference, i.e. parameter/FLOPs-free.
Loading