A Multi-Factor Analysis of Sycophancy in Open-Source LLMs: User Confidence, Persona Effects, Multi-Turn Dynamics, and Model Scale

A Multi-Factor Analysis of Sycophancy in Open-Source LLMs: User Confidence, Persona Effects, Multi-Turn Dynamics, and Model Scale

ACL ARR 2026 January Submission8340 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, Sycophancy, AI Safety, Language Model Alignment, LLM Behavior, Open Source Models, Ollama

Abstract: Large language models increasingly function as conversational agents, yet recent incidents, including multiple suicides linked to AI chatbot interactions, highlight urgent safety risks from sycophancy, where models prioritize agreement over accuracy. We provide a comprehensive multi-factor analysis of sycophancy in open-source LLMs (1B-176B parameters), examining how user confidence, model architecture, role assignments, and conversation length shape agreement-seeking behavior. Using extended variants of Sycophancy-Eval and SYCON-Bench, we evaluate ten models across confidence-modulated prompts and multi-turn dialogue tests. We find: (1) high user confidence amplifies sycophantic responses by up to 16.8 percentage points, especially in smaller models; (2) persona and moral-compass assignments shift susceptibility by up to 1.02 ToF points; (3) extended dialogue reveals bimodal failure patterns rather than gradual erosion. Our findings highlight that scaling alone doesn't solve safety problems and demonstrate that sycophancy is scenario-dependent, requiring specialized mitigation strategies.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: safety and alignment, model bias/fairness evaluation, ethical considerations in NLP applications, benchmarking, evaluation and metrics, conversational modeling

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 8340

Loading