GASLIGHTBENCH: Quantifying LLM Susceptibility to Social Prompting

Published: 24 Sept 2025, Last Modified: 29 Nov 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: benchmark, sycophancy, llms, dataset, ai safety, prompt manipulation, social cues, multi-turn dialogue, LLM-as-a-judge
TL;DR: GaslightBench: a plug-and-play benchmark (24,160 single-turn; 720 multi-turn) using social/linguistic modifiers to probe sycophancy, comprehensive analyses of ten modifier families across nine statement domains.
Abstract: Large language models (LLMs) can be manipulated by simple social and linguistic cues, producing sycophancy. We introduce GaslightBench, a plug-and-play benchmark that systematically applies socio-psychological and linguistic modifiers (e.g. flattery, false citations, assumptive language) to trivially verifiable facts to test model sycophancy. The dataset comprises a single-turn prompting section of 24,240 prompts spanning nine domains and ten modifier families, and a multi-turn prompting section of 720 four-turn dialogue sequences, evaluated via LLM-as-a-judge. Across a subset of 800 randomly sampled single-turn prompts and all 720 multi-turn dialogues, we find that state-of-the-art models consistently score highly in single-turn prompting (92%-98% accuracy) while multi-turn prompting results in highly varied accuracies ranging from ~60%-98%. We find that injecting bias into the model via a descriptive background induces the most sycophancy, decreasing accuracy by up to 23% in single-turn prompting. Across almost all the models we analyze, we also find a statistically significant difference in verbosity between sycophantic and non-sycophantic responses. GaslightBench standardizes stress tests of prompt-style susceptibility and identifies which social cues most undermine factual reliability. By treating LLMs as human-like socially influenced agents, we reveal novel methods to elicit sycophancy using common verbal techniques. This highlights a critical vulnerability, one that can be utilized to induce favorable responses that may be ethically disruptive or spread misinformation. We release our code and data at https://gaslightbench-web.vercel.app/
Submission Number: 226
Loading