Keywords: Prompt Sensitivity, Large Language Model, Model Evaluation, Explainable AI
TL;DR: We propose a fine-grained metric to evaluate the prompt sensitivity of LLMs and utilize it to investigate the key factors that influence prompt sensitivity, as well as their underlying mechanisms.
Abstract: The remarkable capabilities of large language models (LLMs) are often undermined by their instability. That is, LLMs are sensitive to prompts, as even subtle and semantically irrelevant changes in prompts can cause dramatic fluctuations in performance, a phenomenon known as prompt sensitivity. Previous studies typically evaluate prompt sensitivity by comparing the LLM's final outputs when prompts change. However, such coarse-grained metrics fail to explain the internal reasons for prompt sensitivity.
In this paper, we introduce the game-theoretic interaction framework as a fine-grained tool to analyze prompt sensitivity of LLMs. Specifically, we disentangle the output score of the LLM into a set of interactions. Each interaction represents a nonlinear relationship associated with a combination of input variables. We discover that subtle changes to prompts can trigger significant instability in interactions, even when the final outputs of the LLM remain the same. To this end, we propose an Interaction-based Prompt Sensitivity (IPS) metric by quantifying changes in interactions when we introduce subtle changes to prompts. We apply the proposed IPS metric to 50 open-source LLMs and uncover four factors that reduce the prompt sensitivity of LLMs, including supervised fine-tuning, increased model scales, dense architectures, and few-shot learning. More crucially, we discover a common mechanism by which these four factors reduce prompt sensitivity: all these four factors tend to reduce the prompt sensitivity of low-order interactions (*i.e.*, interactions involving few input variables).
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 10086
Loading