Autotuning LSTM for Accelerated Execution on EdgeDownload PDFOpen Website

2021 (modified: 15 Jan 2022)ISCAS 2021Readers: Everyone
Abstract: Deployment of Deep Neural Networks (DNNs) on edge devices is highly desirable to address user privacy concerns and minimize the turnaround time of AI applications. However, the execution of DNN models on a battery-operated device requires a highly optimized implementation specific to the target hardware. Moreover, as different layers of a DNN exhibit distinct computation and memory characteristics, it is imperative to optimize each layer separately. This is in contrast to the widely deployed library-based approach where all the configurations of DNN operations share the same implementation. In this paper, we address this issue by auto-tuning the implementation of Long Short Term Memory (LSTM) operations which are widely used in sequence based AI applications. To exhaustively search through the space of optimizations and its parameters, we develop a high-level autotuning framework based on Halide. We use grid search to find the parameters that lead to minimum runtime and further present TPE based search method to find the near-optimal runtime in a limited number of trials. We observe 2.2× -3.1× speedup in execution time for LSTM layers used in widely deployed GNMT and DeepSpeech2 models.
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview