Measuring Sycophancy of Language Models in Multi-turn Dialogues

Measuring Sycophancy of Language Models in Multi-turn Dialogues

ACL ARR 2025 May Submission3747 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) are expected to provide helpful and harmless responses, yet they often exhibit \textit{sycophancy}—conforming to user beliefs regardless of factual accuracy or ethical soundness. Prior research on sycophancy has primarily focused on single-turn factual correctness, overlooking the dynamics of real-world interactions. In this work, we introduce \textbf{SYCON Bench} (\textbf{SY}cophantic \textbf{CON}formity benchmark), a novel evaluation suite that assesses sycophantic behavior in multi-turn, free-form conversational settings. Our benchmark measures how quickly a model conforms to the user (\textit{Turn of Flip}) and how frequently it shifts its stance under sustained user pressure (\textit{Number of Flip}). Applying \textsc{SYCON Bench} to 17 LLMs across three real-world scenarios, we find that sycophancy remains a prevalent failure mode. Our analysis shows that alignment tuning amplifies sycophantic behavior, whereas model scaling and reasoning optimization strengthen the model's ability to resist undesirable user views. Reasoning models generally outperform instruction-tuned models but often fail when they over-index on logical exposition instead of directly addressing the user's underlying beliefs. Finally, we evaluate four additional prompting strategies and demonstrate that adopting a third-person perspective reduces sycophancy by up to 63.8\% in debate scenario.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, evaluation methodologies, metrics

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 3747

Loading