SycophancyBench: Evaluating and Balancing Subtle Sycophancy and Trust in Large Language Models

SycophancyBench: Evaluating and Balancing Subtle Sycophancy and Trust in Large Language Models

ACL ARR 2026 January Submission6330 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sycophancy, benchmarking, evaluation

Abstract: Large Language Models (LLMs) often display sycophancy—a tendency to agree with or flatter users regardless of factual accuracy. While overt sycophancy is frequently exaggerated and thus noticeable, more subtle forms such as hedging, biased phrasing, or polished formatting can be far harder to detect. These behaviors are concerning because they may silently undermine user trust and distort decision-making, yet existing benchmarks treat sycophancy as a single phenomenon and overlook such nuance. In this work, we introduce SycophancyBench, the first benchmark explicitly designed to disentangle overt from subtle sycophancy. Our dataset spans multiple domains including factual QA, opinions, decision-making, and safety, with paired responses capturing factual, overtly sycophantic, and subtly sycophantic behaviors under varied stylistic conditions. We provide standardized evaluation dimensions—faithfulness, sensitivity to sycophancy, trust calibration, and style robustness—enabling systematic analysis of detection thresholds where humans and evaluation models fail to notice subtle sycophancy. Beyond measurement, we propose a dual-objective reward framework that encourages truthfulness and politeness while penalizing sycophantic tendencies. Together, our contributions establish a principled foundation for understanding how nuanced sycophancy affects trust and for developing models that remain both polite and genuinely faithful.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, evaluation, NLP datasets

Contribution Types: Data resources

Languages Studied: English

Submission Number: 6330

Loading