Does Persona Change Reasoning? A Causal Mediation Analysis of System Prompt Interventions

Published: 24 Apr 2026, Last Modified: 24 Apr 2026CauScale 2026EveryoneRevisionsCC BY 4.0
Keywords: causal mediation analysis, large language models, reasoning robustness, causal inference
TL;DR: Across five reasoning LLMs, causal mediation shows persona prompts matter only for the smallest model; larger models are largely robust, and any effects mainly operate through reasoning quality rather than direct output bias.
Abstract: Tweaking system prompts to assign personas is common practice in LLM deployment, yet there is little understanding of whether these interventions genuinely alter reasoning or merely bias outputs. Extending the causal rating framework of Lakkaraju et al. (2025), we propose a causal mediation model that decomposes persona effects into changes through reasoning quality versus direct output bias. We conduct a factorial experiment with five contrasting personas across five reasoning LLMs on 300 GSM8K mathematical reasoning problems. We find that persona effects are significant only for the smallest model, with larger models showing near-complete robustness. When personas do affect accuracy, the effect flows primarily through changes in judged reasoning quality, and reasoning traces serve as strong statistical predictors of correctness.
Submission Number: 30
Loading