Keywords: Transformer architectures; ReLU-based approximation; Nonlinear operator approximation; Hardware-software co-design;
TL;DR: A unified hybrid framework with ReLU-arithmetic architectures that replaces Transformer nonlinear operators is proposed, achieving plug-and-play substitution with negligible accuracy loss.
Abstract: The deployment of modern Transformer models on edge devices is critically bottlenecked by computationally intensive non-linear operators like GELU, Softmax, and LayerNorm, which demand diverse and power-hungry specialized hardware units. Existing functional approximation techniques suffer from two critical failures: they are function-specific, leading to hardware bloat, and they rely on unstable heuristics that yield poor accuracy. We introduce HARA (Hybrid Arithmetic-ReLU Networks Approximation), a framework that resolves these issues by systematically replacing all such operators with a single, canonical architecture built from simple arithmetic primitives and a shallow ReLU network. HARA’s core algorithmic innovation is an Optimized Parameter Initialization pipeline that employs dynamic programming to systematically derive near-optimal parameters, ensuring high-fidelity approximation and robustness where heuristic methods fail. Crucially, hardware synthesis estimations project that HARA's unified approach reduces the silicon area for non-linear processing by over 60\% compared to using separate, specialized functional units. We demonstrate across four modern architectures (BERT, Swin, LLaMA, and Stable Diffusion) that these significant hardware savings are achieved with negligible impact on model performance (e.g., <0.1% accuracy change) and are fully compatible with 8-bit quantization. By systematically co-designing software approximations with a simplified hardware target, HARA provides a practical and extensible paradigm for deploying state-of-the-art Transformer models on resource-constrained devices.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 18677
Loading