Keywords: persona vector, LLM, long-context, personality, prompting, steering, personality maintenance, Big-5, activation, judge LLM, traits, Big-5 traits
TL;DR: We use persona vectors (sampled activations) instead of text prompts to steer LLM personality for better long-context performance.
Abstract: Large language models (LLMs) often struggle to maintain consistent behavior across extended, multi-turn interactions, especially when asked to assume a defined personality or role. While prior work has explored personality assignment techniques for LLMs, the stability of these traits over long conversations remains underexamined. Prompt-based approaches can generate personality-consistent responses in the short term, but rarely induce persistent behavioral change and frequently increase hallucination rates. To address this limitation, we employ persona vectors, which are representations of personality traits as directions in a model's activation space, as a more reliable and cheaper mechanism for long-term personality maintenance. We adapt existing extraction frameworks to a curated library of prompts designed to elicit the Big Five personality traits. We apply persona vectors to the activations of two test LLMs and use GPT-4 to evaluate the alignment of generated responses with target personality traits. We show that over long contexts, activation steering offers a possible advantage over traditional text-prompting methods. However, we note differences in results among Big-5 personality traits, possibly resulting from how the traits are encouraged or suppressed during LLM pre-training.
Submission Number: 71
Loading