Conic Activation Functions

Changqing Fu; Laurent D. Cohen

Conic Activation Functions

Changqing Fu, Laurent D. Cohen

Published: 10 Oct 2024, Last Modified: 07 Nov 2024UniRepsEveryoneRevisionsBibTeXCC BY 4.0

Supplementary Material: zip

Track: Proceedings Track

Keywords: Neural Network Architectures, Activation Functions, Equivariance in Neural Networks

TL;DR: Replace pointwise activations with improved performance and training! Created from unified symmetries!

Abstract: Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set—the positive orthant—we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models—including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models' training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.

Submission Number: 2

Loading