Keywords: Sycophancy, Recency bias, LLM Sycophancy
Abstract: We proposed a novel way to evaluate sycophancy of LLMs in a direct and neutral way, mitigating uncontrolled bias, noise, or manipulative language deliberately injected to prompts in prior works. A key novelty in our approach is the use of LLM-as-a-judge, evaluation of sycophancy as a zero-sum game in a bet setting. Under this framework, sycophancy serves one individual (the user) while explicitly incurring cost on another. Comparing four leading models -- Gemini 2.5 Pro, ChatGpt 4o, Mistral-Large-Instruct-2411, and Claude Sonnet 3.7 -- we find that while all models exhibit sycophantic tendencies in the common setting, in which sycophancy is self-serving to the user and incurs no cost on others, Claude and Mistral exhibit ``moral remorse'' and over-compensate for their sycophancy in case it explicitly harms a third party.
Additionally, we observed that all models are biased toward the answer proposed last. Crucially, we find that these two phenomena are not independent; sycophancy and recency bias interact to produce `constructive interference' effect, where the tendency to agree with the user is exacerbated when the user’s opinion is presented last.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 9324
Loading