CHEBYUNIT: HARDWARE-ACCELERATED ENERGY-EFFICIENT FPGA WITH LOW COMPUTATION COMPLEXITY FOR ARTIFICIAL INTELLIGENCE ACCELERATION
Keywords: KAN, Chebyshev-KAN, FPGA, Hardware Acceleration, deeplearning
Abstract: Multi-Layer Perceptrons (MLPs) achieve high accuracy but are hindered by a large number of parameters, leading to significant memory and power consumption. While Kolmogorov-Arnold Networks (KANs) address this by using learnable functions instead of weight matrices, their B-spline implementations are complicated in hardware designs. To overcome this limitation, we propose a novel hardware framework for Chebyshev-KANs, leveraging the recursive properties and numerical stability of Chebyshev polynomials. Our core component, the ChebyUnit, efficiently generates polynomial bases and reuses coefficients from on-chip storage to perform lightweight inner product operations in a streaming fashion. This approach significantly reduces external memory access called Double Data Rate(DDR) traffic and resource utilization while maintaining high throughput. Our Verilog implementation on a Xilinx ZCU102 Field-Programmable Gate Array(FPGA) demonstrates over 90% reductions in Look-Up Table(LUT), Flip-Flop(FF), and Digital Signal Processing(DSP) utilization compared to a baseline high-level synthesis (HLS) design, all while preserving excellent approximation accuracy. These findings confirm the practical efficiency of Chebyshev-KANs, positioning them as a promising solution for interpretable and energy-efficient neural networks, particularly in resource-constrained edge AI applications.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 15814
Loading