Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network. Following recent work in simplifying these networks with model pruning and a novel mapping of work onto GPUs, we design an efficient implementation for sparse RNNs. We investigate several optimizations and tradeoffs: Lamport timestamps, wide memory loads, and a bank-aware weight layout. With these optimizations, we achieve speedups of 5x over the next best algorithm using only 36 out of a P100's 56 SMs for a hidden layer of size 2304, batch size of 4, and a density of 10%. Further, our technique allows for models of over 3x the size to fit on a GPU for a speedup of 5x, enabling larger networks to help advance the state-of-the-art. We present a case study on NMT with LSTMs in the appendix.
TL;DR:Combining network pruning and persistent kernels into a practical, fast, and accurate network implementation.