CHEBYUNIT: HARDWARE-ACCELERATED ENERGY-EFFICIENT FPGA WITH LOW COMPUTATION COMPLEXITY FOR ARTIFICIAL INTELLIGENCE ACCELERATION

CHEBYUNIT: HARDWARE-ACCELERATED ENERGY-EFFICIENT FPGA WITH LOW COMPUTATION COMPLEXITY FOR ARTIFICIAL INTELLIGENCE ACCELERATION

ICLR 2026 Conference Submission15814 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: KAN, Chebyshev-KAN, FPGA, Hardware Acceleration, deeplearning

Abstract: Multi-Layer Perceptrons (MLPs) achieve high accuracy but are hindered by a large number of parameters, leading to significant memory and power consumption. While Kolmogorov-Arnold Networks (KANs) address this by using learnable functions instead of weight matrices, their B-spline implementations are complicated in hardware designs. To overcome this limitation, we propose a novel hardware framework for Chebyshev-KANs, leveraging the recursive properties and numerical stability of Chebyshev polynomials. Our core component, the ChebyUnit, efficiently generates polynomial bases and reuses coefficients from on-chip storage to perform lightweight inner product operations in a streaming fashion. This approach significantly reduces external memory access called Double Data Rate(DDR) traffic and resource utilization while maintaining high throughput. Our Verilog implementation on a Xilinx ZCU102 Field-Programmable Gate Array(FPGA) demonstrates over 90% reductions in Look-Up Table(LUT), Flip-Flop(FF), and Digital Signal Processing(DSP) utilization compared to a baseline high-level synthesis (HLS) design, all while preserving excellent approximation accuracy. These findings confirm the practical efficiency of Chebyshev-KANs, positioning them as a promising solution for interpretable and energy-efficient neural networks, particularly in resource-constrained edge AI applications.

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Submission Number: 15814

Loading