Soft prompting might be a bug, not a feature

Luke Bailey; Gustaf Ahdritz; Anat Kleiman; Siddharth Swaroop; Finale Doshi-Velez; Weiwei Pan

Soft prompting might be a bug, not a feature

Luke Bailey, Gustaf Ahdritz, Anat Kleiman, Siddharth Swaroop, Finale Doshi-Velez, Weiwei Pan

Published: 23 Jun 2023, Last Modified: 12 Jul 2023DeployableGenerativeAIEveryoneRevisions

Keywords: Soft prompting, LLMs, Interpretability, Prompting

TL;DR: We show that learnt soft prompts differ greatly in the model embedding space from natural tokens and argue this leads to corresponding safety concerns.

Abstract: Prompt tuning, or "soft prompting," replaces text prompts to generative models with learned embeddings (i.e. vectors) and is used as an alternative to parameter-efficient fine-tuning. Prior work suggests analyzing soft prompts by interpreting them as natural language prompts. However, we find that soft prompts occupy regions in the embedding space that are distinct from those containing natural language, meaning that direct comparisons may be misleading. We argue that because soft prompts are currently uninterpretable, they could potentially be a source of vulnerability of LLMs to malicious manipulations during deployment.

Submission Number: 56

Loading