Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs

Xun Wang; Jing Xu; Franziska Boenisch; Michael Backes; Christopher A. Choquette-Choo; Adam Dziedzic

Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs

Xun Wang, Jing Xu, Franziska Boenisch, Michael Backes, Christopher A. Choquette-Choo, Adam Dziedzic

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a method for efficient and privacy-preserving transfer of soft prompts tuned on a distilled small model to a larger model using public data.

Abstract: Prompting has become a dominant paradigm for adapting large language models (LLMs). While discrete (textual) prompts are widely used for their interpretability, soft (parameter) prompts have recently gained traction in APIs. This is because they can encode information from more training samples while minimizing the user's token usage, leaving more space in the context window for task-specific input. However, soft prompts are tightly coupled to the LLM they are tuned on, limiting their generalization to other LLMs. This constraint is particularly problematic for *efficiency* and *privacy*: (1) tuning prompts on each LLM incurs high computational costs, especially as LLMs continue to grow in size. Additionally, (2) when the LLM is hosted externally, soft prompt tuning often requires sharing private data with the LLM provider. For instance, this is the case with the NVIDIA NeMo API. To address these issues, we propose POST (**P**rivacy **O**f **S**oft prompt **T**ransfer), a framework that enables private tuning of soft prompts on a small model and subsequently transfers these prompts to a larger LLM. POST uses knowledge distillation to derive a small model directly from the large LLM to improve prompt transferability, tunes the soft prompt locally, optionally with differential privacy guarantees, and transfers it back to the larger LLM using a small public dataset. Our experiments show that POST reduces computational costs, preserves privacy, and effectively transfers high-utility soft prompts.

Lay Summary: Large language models (LLMs) like ChatGPT have become powerful tools for various tasks, but customizing them for specific needs often requires sharing sensitive data with the model providers, raising privacy concerns. Additionally, tailoring these massive models can be computationally intensive. This paper introduces POST, a novel method that allows users to personalize LLMs without compromising their private data or needing significant computational resources. The approach works by first creating a smaller version of the large model. Users can then fine-tune this compact model locally using their own data, ensuring privacy. After this local tuning, the adjustments are transferred back to the original large model using publicly available data, eliminating the need to share any private information. POST paves the way for broader and more responsible use of AI technologies by enabling secure and efficient customization of LLMs.

Link To Code: https://github.com/sprintml/POST

Primary Area: Social Aspects->Safety

Keywords: prompt transfer, soft prompt, privacy, distillation, confidentiality

Submission Number: 2591

Loading