Abstract: Large Language Models (LLMs) are expected to provide helpful and harmless responses, yet they often exhibit \textit{sycophancy}—conforming to user beliefs regardless of factual accuracy or ethical soundness. Prior research on sycophancy has primarily focused on single-turn factual correctness, overlooking the dynamics of real-world interactions. In this work, we introduce \textbf{SYCON Bench} (\textbf{SY}cophantic \textbf{CON}formity benchmark), a novel evaluation suite that assesses sycophantic behavior in multi-turn, free-form conversational settings. Our benchmark measures how quickly a model conforms to the user (\textit{Turn of Flip}) and how frequently it shifts its stance under sustained user pressure (\textit{Number of Flip}). Applying \textsc{SYCON Bench} to 17 LLMs across three real-world scenarios, we find that sycophancy remains a prevalent failure mode. Our analysis shows that alignment tuning amplifies sycophantic behavior, whereas model scaling and reasoning optimization strengthen the model's ability to resist undesirable user views. Reasoning models generally outperform instruction-tuned models but often fail when they over-index on logical exposition instead of directly addressing the user's underlying beliefs. Finally, we evaluate four additional prompting strategies and demonstrate that adopting a third-person perspective reduces sycophancy by up to 63.8\% in debate scenario.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, evaluation methodologies, metrics
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 3747
Loading