Abstract: Recently, several technology companies have released online inference services for clients based on Transformer-based large language models, which show excellent performance in various tasks. However, in these services, the inputs usually involve clients’ sensitive information. To address this problem, many works have proposed secure inference on language models such as GPT. For language models, complex mathematical functions like Gaussian Error Linear Unit (GELU) are used extensively and dominate the main cost of secure inference. In this work, we systematically study the existing secure GELU protocols and classify previous methods into two categories: polynomial-based protocols and lookup table (LUT)-based protocols. We point out several important characteristics and tradeoffs for these two classes of secure GELU protocols. Based on these observations and analysis, we propose a new secure GELU protocol, called Simple. The main technique that Simple uses involves a LUT of small size to retrieve approximate polynomials for fitting residual error functions caused by a crude approximation for GELU, which achieves state-of-the-art (SOTA) overhead and accuracy performance. We conduct extensive experiments and benchmark the previous 6 secure GELU protocols. The experimental comparison shows that our Simple protocol achieves 1.1 ∼ 8784.3× computation and 1.4 ∼ 188.8× communication improvements while reducing 1.2∼80.2× errors.
Loading