Adaptive Pruning of Neural Language Models for Mobile Devices

Raphael Tang; Jimmy Lin

Adaptive Pruning of Neural Language Models for Mobile Devices

Raphael Tang, Jimmy Lin

27 Sept 2018 (modified: 22 Jun 2025)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Neural language models (NLMs) exist in an accuracy-efficiency tradeoff space where better perplexity typically comes at the cost of greater computation complexity. In a software keyboard application on mobile devices, this translates into higher power consumption and shorter battery life. This paper represents the first attempt, to our knowledge, in exploring accuracy-efficiency tradeoffs for NLMs. Building on quasi-recurrent neural networks (QRNNs), we apply pruning techniques to provide a "knob" to select different operating points. In addition, we propose a simple technique to recover some perplexity using a negligible amount of memory. Our empirical evaluations consider both perplexity as well as energy consumption on a Raspberry Pi, where we demonstrate which methods provide the best perplexity-power consumption operating point. At one operating point, one of the techniques is able to provide energy savings of 40% over the state of the art with only a 17% relative increase in perplexity.

Keywords: Inference-time pruning, Neural Language Models

Data: [Penn Treebank](https://paperswithcode.com/dataset/penn-treebank), [WikiText-103](https://paperswithcode.com/dataset/wikitext-103), [WikiText-2](https://paperswithcode.com/dataset/wikitext-2)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/adaptive-pruning-of-neural-language-models/code)

9 Replies

Loading