HSBNN: A High-Scalable Bayesian Neural Networks Accelerator Based on Field Programmable Gate Arrays (FPGA)

Published: 01 Jan 2025, Last Modified: 07 Jun 2025Cogn. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Traditional artificial neural networks have inherent overfitting problems and tend to produce overly confident predictions due to their reliance on point estimation methods. In contrast, Bayesian theory offers a probabilistic framework that replaces point estimation with probability distributions, effectively addressing issues of overconfidence. The brain is also believed working under the Bayesian rules, the neural networks of which evaluate the precision of prior knowledge and incoming evidence, achieving the balance of weight updating to the most reliable information sources [1]. By integrating Bayesian principles with artificial neural networks, the bio-inspired Bayesian Neural Networks (BNNs) can generate predictions accompanied by confidence evaluations, enhancing their practical applicability. To further improve the computational efficiency of BNNs and enable scalable deployment on edge devices, we propose a High-Scalable Bayesian Neural Network (HSBNN) accelerator based on field-programmable gate arrays (FPGAs) with multiple optimizations. A resource-saving Gaussian random number generator (RS-GRNG) optimized for FPGAs shows high efficiency, which seamlessly extends to support parallel sampling of weight distributions, enabling reliable confidence probability evaluations. Furthermore, the parameterization of BNN architectures with configuration files and employment of a layer-by-layer computing mode ensure that different BNNs can be accelerated without reprogramming the FPGA, offering excellent scalability. The entire system, implemented with the OpenCL heterogeneous computing library, leverages parallel processing units and pipeline channels to achieve high acceleration performance and efficient data transfer. The experiment results demonstrate that the system achieves a data processing throughput of 1.002 milliseconds per image, exceeding CPU performance by 1000-fold and GPU performance by nearly 500-fold.
Loading