Abstract: This paper proposes a range-invariant approximation of non-linear operations for training computations of Transformer-based large language models. The proposed method decomposes the approximation into the scaling and the range-invariant resolution for LUT approximation, covering diverse data ranges of non-linear operations with drastically reduced LUT entries during task-dependent BERT fine-tuning. We demonstrate that the proposed method robustly approximates all the non-linear operations of BERT without score degradation on challenging GLUE benchmarks using only a single-entry LUT, facilitating 52% area savings in hardware implementation.
Loading