A Multi-Factor Analysis of Sycophancy in Open-Source LLMs: User Confidence, Persona Effects, Multi-Turn Dynamics, and Model Scale
Keywords: LLMs, Sycophancy, AI Safety, Language Model Alignment, LLM Behavior, Open Source Models, Ollama
Abstract: Large language models increasingly function as conversational agents, yet recent incidents, including multiple suicides linked to AI chatbot interactions, highlight urgent safety risks from sycophancy, where models prioritize agreement over accuracy. We provide a comprehensive multi-factor analysis of sycophancy in open-source LLMs (1B-176B parameters), examining how user confidence, model architecture, role assignments, and conversation length shape agreement-seeking behavior.
Using extended variants of Sycophancy-Eval and SYCON-Bench, we evaluate ten models across confidence-modulated prompts and multi-turn dialogue tests. We find: (1) high user confidence amplifies sycophantic responses by up to 16.8 percentage points, especially in smaller models; (2) persona and moral-compass assignments shift susceptibility by up to 1.02 ToF points; (3) extended dialogue reveals bimodal failure patterns rather than gradual erosion. Our findings highlight that scaling alone doesn't solve safety problems and demonstrate that sycophancy is scenario-dependent, requiring specialized mitigation strategies.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: safety and alignment, model bias/fairness evaluation, ethical considerations in NLP applications, benchmarking, evaluation and metrics, conversational modeling
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 8340
Loading