Challenging the Evaluator: LLM Sycophancy under User Rebuttal

ACL ARR 2025 May Submission6535 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) often exhibit \textit{sycophancy}, distorting responses to align with user beliefs, notably by readily agreeing with user counterarguments. Paradoxically, LLMs are increasingly adopted as successful evaluative agents for tasks such as grading and adjudicating claims. This research investigates this tension: why do LLMs show sycophancy when challenged in subsequent conversational turns, yet perform well when evaluating conflicting arguments presented simultaneously? We empirically tested these contrasting scenarios by varying key interaction patterns. We found that state-of-the-art models: (1) are more likely to endorse a user's counterargument when framed as a follow-up from a user, rather than when both responses are presented simultaneously for evaluation; (2) show increased susceptibility to persuasion from challenges with more detailed reasoning, even when the reasoning is incorrect; and (3) are more readily swayed by casually phrased feedback than by formal critiques, even when the casual input lacks substantive justification. Our results highlight the risk of relying on LLMs for judgment tasks without accounting for conversational framing. Code and conversation logs are publicly available at this \href{https://anonymous.4open.science/r/challenging_the_judge-3BB0}{anonymous repository}.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation
Languages Studied: English
Submission Number: 6535
Loading