Is Privacy Always Prioritized over Learning? Probing LLMs' Value Priority Belief under External Perturbations
Keywords: LLM, Benchmark, Evaluation, Psychometrics, Value, Alignment
Abstract: The value alignment of Large Language Models (LLMs) is critical because value is the foundation of LLM decision-making and behavior. Some recent work show that LLMs have similar value rankings. However, little is known about how susceptible LLM value rankings are to external influence and how different values are correlated with each other. In this work, we investigate the plasticity of LLM value systems by examining how their value rankings are influenced by different prompting strategies and exploring the intrinsic relationships between values. To this end, we design 6 different value transformation prompting methods including direct instruction, rubrics, in-context learning, scenario, persuasion, and persona, and benchmark the effectiveness of these methods on 3 different families and totally 8 LLMs. Our main findings include that the value rankings in large LLMs are much more susceptible to external influence than small LLMs, and there are intrinsic correlations between certain values (e.g., Privacy and Respect). Besides, through detailed correlation analysis, we find that the value correlations are more similar between large LLMs of different families than small LLMs of the same family. We also identify that scenario method is the strongest persuader and can help entrench the value rankings.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 12980
Loading