NLI : Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dynamic Programming, Non-linear Approximation, Large Language Models, Quantization, Hardware Acceleration, Edge Inference, Calibration-Free
TL;DR: We introduce NLI—a non-uniform linear-interpolation scheme with an ultra-light hardware block—that replaces high-precision activations, preserves LLM accuracy, and slashes compute/area for edge deployment.
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, but their deployment is often constrained by substantial memory footprints and computational costs. While prior work has achieved significant progress in compressing and accelerating linear layers, nonlinear layers—such as SiLU, RMSNorm, and Softmax—still heavily depend on high-precision floating-point operations. In this paper, we propose a calibration-free, dynamic-programming-optimal, and hardware-friendly framework called \underline{N}on-uniform \underline{L}inear \underline{I}nterpolation (NLI). NLI is capable of efficiently approximating a variety of nonlinear functions, enabling seamless integration into LLMs and other deep neural networks with almost no loss in accuracy. NLI ingeniously recasts cutpoint selection as a dynamic-programming problem, achieving the \emph{globally} minimal interpolation error in $\mathcal{O}(M \times N^2)$ time via Bellman’s optimality principle. Based on the NLI algorithm, we also design and implement a plug-and-play universal nonlinear computation unit. Hardware experiments demonstrate that the NLI Engine achieves more than 4× improvement in computational efficiency compared to the state-of-the-art designs.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 6629
Loading