Keywords: persona prompting
Abstract: Persona prompting instructs large language models to adopt specific roles (e.g., ``you are a mathematician''), and has gained widespread adoption, yet its effectiveness remains inconsistent and poorly understood. We present a systematic evaluation of persona prompting across mathematics, psychology, and law using four state-of-the-art language models. Our study compares baseline prompts, domain priming (non-persona cues), three types of personas (generic, historical figures, modern experts), negated personas, and model-generated optimal prompts across Chain-of-Thought (CoT) and direct answering modes. Results show that domain priming consistently improves performance (+2.5% mean with Gemini), while persona prompting exhibits volatility, often harming performance (-6.1% drop with Gemini, -3.3% with GPT-4.1 in mathematics with CoT reasoning). More concerning, negated personas often match or exceed positive persona performance, revealing instability in persona-based approaches. When models generate their own optimal personas and priming strategies, priming approaches consistently outperform persona approaches, yet persona volatility persists even with optimization. Our findings suggest domain priming as a more reliable alternative to persona prompting, challenging the assumption that instructing models to adopt expert roles consistently improves specialized reasoning tasks.
Supplementary Material: zip
Submission Number: 165
Loading