Keywords: Residual Connection, Orthogonalization, Orthogonal Residual Updates, Representation Learning, Deep Learning
TL;DR: Our Orthogonal Residual Update improves deep networks by adding only novel, stream-orthogonal module outputs, boosting generalization, stability, and efficiency.
Abstract: Residual connections are pivotal for deep neural networks, enabling greater depth by mitigating vanishing gradients. However, in standard residual updates, the module’s output is directly added to the input stream. This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module’s capacity for learning entirely novel features. In this work, we introduce _Orthogonal Residual Update_: we decompose the module’s output relative to the input stream and add only the component orthogonal to this stream. This design aims to guide modules to contribute primarily new representa-tional directions, fostering richer feature learning while promoting more efficient training. We demonstrate that our orthogonal update strategy improves generalization accuracy and training stability across diverse architectures (ResNetV2, Vision Transformers) and datasets (CIFARs, TinyImageNet, ImageNet-1k), achieving, for instance, a +3.78 pp Acc@1 gain for ViT-B on ImageNet-1k. Code and models are available at https://github.com/BootsofLagrangian/ortho-residual.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 3042
Loading