The Yes-Bias in  LLM Reasoning

Mark Obozov; Egor Salygin; Peter Losev; Artem Alekseev; Nikolay Bushkov; Stanislav Moiseev

The Yes-Bias in LLM Reasoning

Mark Obozov, Egor Salygin, Peter Losev, Artem Alekseev, Nikolay Bushkov, Stanislav Moiseev

Published: 05 Mar 2026, Last Modified: 30 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: reasoning, benchmarking, steering

Abstract: Large Language Models (LLMs) are increasingly used for mathematical assistance and evaluation, yet they often exhibit sycophancy: bending reasoning or judgments toward a user’s stated beliefs or preferred answers at the expense of correctness. While the effect was thoroughly studied in classical applications of LLMs, potential drawbacks in reasoning tasks were not obvious. In this work, we propose benchmarks of this failure mode in two mathematical reasoning settings: multimodal solution grading and fake-task solving. For the latter, we introduce a scalable construction of contradictory problems which is based on iGSM. For example, GPT 5.2 (High) exhibited 36.03% sycophantic behavior on synthetic fake tasks (70.24% excluding the samples where the model was not competent enough). Leveraging this benchmark, we find that sycophancy in reasoning models is common and, importantly, is amplified by RLHF (Reinforcement Learning from Human Feedback). Applying a state-of-the-art preference optimization procedure (SimPO) increases the amount of sycophantic failures. Finally, we show sycophancy can be reduced using popular method of mechanistic interpretability: steering vectors. Our findings underscore an important weakness in LLM reasoning and propose a step towards getting rid of this issue. More broadly, our work questions the post-training lifecycle of modern reasoning LLMs.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.

Submission Number: 139

Loading