Abstract: FPGAs provide customizable, low-power, and real-time ML Models acceleration for embedded systems, making them ideal for edge applications like robotics and IoT. However, ML models are computationally intensive and rely heavily on multiplication operations, which dominate the overall resource and power consumption, especially in deep neural networks. Currently available open-source frameworks, such as hls4ml, FINN and Tensil artificial intelligence (AI), facilitate FPGA-based implementation of ML algorithms but exclusively use accurate arithmetic operators, failing to exploit the inherent error resilience of ML models. Meanwhile, a large body of research in approximate computing has produced Approximate Multipliers that offer substantial reductions in area, power, and latency by sacrificing a small amount of accuracy. However, these Approximate Multipliers are not integrated into widely used hardware generation workflows, and no automated mechanism exists for incorporating them into ML model implementations at both software and hardware levels. In this work, we extend the hls4ml framework to support the use of Approximate Multipliers. Our approach enables seamless evaluation of multiple approximate designs, allowing tradeoffs between resource usage and inference accuracy to be explored efficiently. Experimental results demonstrate up to 3.94% LUTs savings and 7.33% reduction in On-Chip Power, with accuracy degradation of 1% compared to accurate designs.
External IDs:dblp:journals/esl/AsgharBSHUSK25
Loading