A Resource-Saving Energy-Efficient Reconfigurable Hardware Accelerator for BERT-based Deep Neural Network Language Models using FFT Multiplication
Abstract: Bidirectional Encoder Representations from Transformers (BERT) based language models are a new class of deep neural networks with an attention mechanism. They emerge as a better alternative to the traditional recurrent neural networks for better sequence representation. They have achieved state-of-the-art performance in various natural language processing (NLP) tasks. Nevertheless, they demand intensive computation, energy, and memory requirements which pose a major challenge for their deployment on resource-constrained platforms and edge devices. To mitigate these limitations, this paper proposes a novel hardware accelerator design dedicated for BERT-based architectures with a reconfigurable functionality that improves circuit reusability and reduces hardware resources utilization. To the best of our knowledge, it is the first to present a holistic design and implementation of a reconfigurable hardware accelerator for BERT-based deep neural network language models. The proposed design leverages Fast Fourier Transform-based multiplication on block-circulant matrices for accelerating BERT weights matrices' multiplication. It is evaluated for different BERT-based model configurations on mainstream popular benchmarks while achieving a state-of-the-art performance. It is also evaluated for distinct batch sizes to study the impact of the batch size on the energy efficiency. A cross-platform comparative analysis shows that the proposed hardware accelerator achieves $6 \times, 27 \times, 3.18 \times$, and $8 \times$ improvement compared to $C P U$, and up to $1.17 \times, 1.77 \times$, $5 \times$, and $86 \times$ improvement compared to GPU in latency, throughput, power consumption, and energy efficiency, respectively. This design is suitable for efficient NLP on resource-constrained platforms where low latency and high throughput are critical.
External IDs:dblp:conf/iscas/RizkRR0B22
Loading