Sycophancy Claims about Language Models: The Missing Human-in-the-Loop

Jan Batzner; Volker Stocker; Stefan Schmid; Gjergji Kasneci

Sycophancy Claims about Language Models: The Missing Human-in-the-Loop

Jan Batzner, Volker Stocker, Stefan Schmid, Gjergji Kasneci

Published: 06 Mar 2025, Last Modified: 05 May 2025ICLR 2025 Bi-Align Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sycophancy, terminology, definition, personalization, human-in-the-loop, agreeableness

TL;DR: This paper examines the terminology challenge in measuring AI sycophancy, highlighting the need for more precise definitions to distinguish sycophantic behavior from related concepts in language model evaluation.

Abstract: Sycophantic response patterns in Large Language Models (LLMs) have been increasingly claimed in the literature. We review methodological challenges in measuring LLM sycophancy and identify five core operationalizations. Despite sycophancy being inherently human-centric, current research does not evaluate human perception. Our analysis highlights the difficulties in distinguishing sycophantic responses from related concepts in AI alignment and offers actionable recommendations for future research.

Submission Type: Tiny Paper (2 Pages)

Archival Option: This is an archival submission

Presentation Venue Preference: ICLR 2025

Submission Number: 34

Loading