MPC-Minimized Secure LLM Inference

Deevashwer Rathee; Dacheng Li; Ion Stoica; Hao Zhang; Raluca Popa

MPC-Minimized Secure LLM Inference

Deevashwer Rathee, Dacheng Li, Ion Stoica, Hao Zhang, Raluca Popa

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: secure inference, secure multi-party computation (MPC), transformer, large language model (LLM), open-source foundational model, fine-tuning, LoRA, head-merging

Abstract: Many inference services based on large language models (LLMs) pose a privacy concern, either revealing user prompts to the service or the proprietary weights to the user. Secure inference offers a solution to this problem through secure multi-party computation (MPC), however, it is still impractical for modern LLM workload due to the large overhead imposed by MPC. To address this overhead, we propose MARILL, a framework that adapts LLM fine-tuning to minimize MPC usage during secure inference. MARILL introduces high-level architectural changes during fine-tuning that significantly reduce the number of expensive operations needed within MPC during inference, by removing some and relocating others outside MPC without compromising security. As a result, MARILL-generated models are more efficient across all secure inference protocols and our approach complements MPC-friendly approximations for such operations. Compared to standard fine-tuning, MARILL results in $2.2−11.3\times$ better runtime and $2.4−6.9\times$ better communication during secure inference across various MPC settings, while typically preserving over $90$% performance across downstream tasks. Anonymous code is available at https://anonymous.4open.science/r/MPC-auto-B100.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5381

Loading