Keywords: Large language model, private inference
Abstract: With the widespread deployment of public large language models (LLMs) such as ChatGPT, protecting user prompt privacy has become an increasingly critical issue. Existing privacy-preserving inference methods often compromise either utility or computational cost. In this paper, we propose SharedRequest, a model‑agnostic and privacy‑preserving framework for LLM inference. SharedRequest is independent of the LLM architecture, requiring no model modifications or access to internal parameters. It obscures sensitive information by mixing original prompts with noisy variants and amortizes the inference cost over a large batch of queries. The LLM server only observes a shuffled mix of queries, including original and noisy, without revealing user identities. By clustering semantically equivalent instructions, our mechanism reduces per-prompt token charges with minimal impact on LLM response quality. Empirical results demonstrate that SharedRequest achieves over $20\%$ higher utility compared to prior differential privacy–based techniques, and its shared-prompt mechanism reduces query cost by up to $5\times$ compared to non‑batched inference.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Large language model, private inference
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 5870
Loading