SharedRequest: Privacy-Preserving Model‑Agnostic Inference for Large Language Models

SharedRequest: Privacy-Preserving Model‑Agnostic Inference for Large Language Models

ACL ARR 2026 January Submission5870 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language model, private inference

Abstract: With the widespread deployment of public large language models (LLMs) such as ChatGPT, protecting user prompt privacy has become an increasingly critical issue. Existing privacy-preserving inference methods often compromise either utility or computational cost. In this paper, we propose SharedRequest, a model‑agnostic and privacy‑preserving framework for LLM inference. SharedRequest is independent of the LLM architecture, requiring no model modifications or access to internal parameters. It obscures sensitive information by mixing original prompts with noisy variants and amortizes the inference cost over a large batch of queries. The LLM server only observes a shuffled mix of queries, including original and noisy, without revealing user identities. By clustering semantically equivalent instructions, our mechanism reduces per-prompt token charges with minimal impact on LLM response quality. Empirical results demonstrate that SharedRequest achieves over $20\%$ higher utility compared to prior differential privacy–based techniques, and its shared-prompt mechanism reduces query cost by up to $5\times$ compared to non‑batched inference.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Large language model, private inference

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 5870

Loading