Analyzing and Improving Surrogate Gradient Training in Binary Neural Networks Using Dynamical Systems Theory

Published: 27 Jun 2024, Last Modified: 20 Aug 2024Differentiable Almost EverythingEveryoneRevisionsBibTeXCC BY 4.0
Keywords: differentiable linear algebra, surrogate gradients, exploding/vanishing gradients, Lyapunov exponents, Lyapunov spectrum, chaos, RNN, Jacobian
TL;DR: We analyze and mitigate exploding and vanishing gradients in surrogate gradient training by introducing a novel tool inspired from dynamical systems theory (surrogate Lyapunov exponents)
Abstract: Training Binary Recurrent Networks on tasks that span long time horizons is challenging, as the discrete activation function renders the error landscape non-differentiable. Surrogate gradient training replaces the discrete activation function with a differentiable one in the backward pass but still suffers from exploding and vanishing gradients. We leverage the connection between gradient stability and Lyapunov exponents to address this issue from a dynamical systems perspective, extending our previous work on Lyapunov exponent regularization to non-differentiable systems. We use differentiable linear algebra to regularize surrogate Lyapunov exponents, a method we call surrogate gradient flossing. We show that surrogate gradient flossing enhances performance on temporally demanding tasks.
Submission Number: 52
Loading