Analyzing and Improving Surrogate Gradient Training in Binary Neural Networks Using Dynamical Systems Theory
Keywords: differentiable linear algebra, surrogate gradients, exploding/vanishing gradients, Lyapunov exponents, Lyapunov spectrum, chaos, RNN, Jacobian
TL;DR: We analyze and mitigate exploding and vanishing gradients in surrogate gradient training by introducing a novel tool inspired from dynamical systems theory (surrogate Lyapunov exponents)
Abstract: Training Binary Recurrent Networks on tasks that span long time horizons is challenging, as the discrete activation function renders the error landscape non-differentiable. Surrogate gradient training replaces the discrete activation function with a differentiable one in the backward pass but still suffers from exploding and vanishing gradients.
We leverage the connection between gradient stability and Lyapunov exponents to address this issue from a dynamical systems perspective, extending our previous work on Lyapunov exponent regularization to non-differentiable systems.
We use differentiable linear algebra to regularize surrogate Lyapunov exponents, a method we call surrogate gradient flossing.
We show that surrogate gradient flossing enhances performance on temporally demanding tasks.
Submission Number: 52
Loading