Orthogonal Over-Parameterized Training

Weiyang Liu; Rongmei Lin; Zhen Liu; James Matthew Rehg; Li Xiong; Adrian Weller; Le Song

Orthogonal Over-Parameterized Training

Weiyang Liu, Rongmei Lin, Zhen Liu, James Matthew Rehg, Li Xiong, Adrian Weller, Le Song

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Neural Network, Hyperspherical Energy, Inductive Bias, Orthogonality

Abstract: The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By maintaining the minimum hyperspherical energy during training, OPT can greatly improve the empirical generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We consider multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient descent. For better scalability, we further propose the stochastic OPT which performs orthogonal transformation stochastically for partial dimensions of the neuron. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization. We provide some insights on why OPT yields better generalization. Extensive experiments validate the superiority of OPT.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We propose a novel training framework that can effectively encourage hyperspherical diversity for neural networks.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/orthogonal-over-parameterized-training/code)

Reviewed Version (pdf): /references/pdf?id=RAROu6lUi

9 Replies

Loading