ReLU Characteristic Activation Analysis

Published: 16 Jun 2024, Last Modified: 17 Jun 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: ReLU activation, deep learning, optimization, parameterization, normalization, neural network, training dynamics
TL;DR: We identify an instability of popular neural network parameterizations and normalization during training and resolve the issue with a novel parameterization.
Abstract: We introduce a novel approach for analyzing the training dynamics of ReLU networks by examining the characteristic activation boundaries of individual ReLU neurons. Our proposed analysis reveals a critical instability in common neural network parameterizations and normalizations during stochastic optimization, which impedes fast convergence and hurts generalization performance. Addressing this, we propose Geometric Parameterization (GmP), a novel neural network parameterization technique that effectively separates the radial and angular components of weights in the hyperspherical coordinate system. We show theoretically that GmP resolves the aforementioned instability issue. We report empirical results on various models and benchmarks to verify GmP's theoretical advantages of optimization stability, convergence speed and generalization performance.
Student Paper: Yes
Submission Number: 2
Loading