Keywords: implicit layers, deep learning acceleration, deep equilibrium model
Abstract: Implicit neural networks including deep equilibrium models have achieved superior task performance with better parameter efficiency in various applications. However, it is often at the expense of higher computation costs during inference. In this work, we identify a phenomenon named $\textbf{heterogeneous convergence}$ that exists in deep equilibrium models and other iterative methods. We observe much faster convergence of state activations in certain dimensions therefore indicating the dimensionality of the underlying dynamics of the forward pass is much lower than the defined dimension of the states. We thereby propose to exploit heterogeneous convergence by storing past linear operation results (e.g., fully connected and convolutional layers) and only propagating the state activation when its change exceeds a threshold. Thus, for the already converged dimensions, the computations can be skipped. We verified our findings and reached 84\% FLOPs reduction on the implicit neural representation task, 73\% on the Sintel and 76\% on the KITTI datasets for the optical flow estimation task while keeping comparable task accuracy with the models that perform the full update.
Primary Area: Deep learning architectures
Submission Number: 12336
Loading