FPGA Acceleration of Dynamic Neural Networks: Challenges and Advancements

Published: 01 Jan 2024, Last Modified: 29 Sept 2024COINS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Modern machine learning methods continue to produce models with a high memory footprint and computational complexity that are increasingly difficult to deploy in resource constrained environments. This is, in part, driven by a focus on costly, power-intensive GPUs, which has a feedback effect on the variety of methods and models chosen for development. We advocate for a transition away from the general purpose processing towards a more targeted, power-efficient, form of hardware, the Field-Programmable Gate Array (FPGA). These devices allow the user to programmatically tailor the model processing architecture, resulting in increased inference performance and lower power demands. Their resources however are limited, which leads to the necessity of simplifying the target deep machine learning models. Dynamic Deep Neural Networks (DNNs) are a class of models that go beyond limits of static model compression, by tuning computational workload to the difficultly of inputs on a per-sample basis. In spite of the model simplification capabilities of Dynamic DNNs and the provable efficiency of FPGAs, little work has been done towards accelerating Dynamic DNNs on FPGAs. In this paper we discuss why this occurs by highlighting the challenges and limitations, both at the software and hardware level. We detail the available efficiency, performance gains, and practical benefits of state-of-the-art Dynamic DNN implementations when FPGAs are adopted as the acceleration device. Finally, we present our conclusions and recommendations for continued research in this space.
Loading