Abstract: Nonlinear function approximation has been studied for decades. With the rise of transformer-based AI models, the need for efficient, low-complexity circuit implementations for functions like softmax, GELU, and layer normalization has intensified due to non-negligible hardware overhead. Existing methods reuse softmax hardware for GELU or reconfigure both, but the preprocessing required for their GELU approximation results in area inefficiency. To address this, we propose a novel successive approximation technique that reduces preprocessing complexity and compensates for errors in successive steps. Additionally, our reconfigurable design supports softmax, GELU, and square root functions, optimizing hardware area and flexibility. Experimental results show a 2.48x increase in throughput per area for softmax and a 4.96x increase for GELU, with only a 0.09% accuracy loss under BERT-base models in comparison to related work.
External IDs:dblp:conf/iscas/WuTSHL25
Loading