Conic Activation Functions

Published: 10 Oct 2024, Last Modified: 07 Nov 2024UniRepsEveryoneRevisionsBibTeXCC BY 4.0
Supplementary Material: zip
Track: Proceedings Track
Keywords: Neural Network Architectures, Activation Functions, Equivariance in Neural Networks
TL;DR: Replace pointwise activations with improved performance and training! Created from unified symmetries!
Abstract: Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set—the positive orthant—we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models—including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models' training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.
Submission Number: 2
Loading