AssertBench: A Benchmark for LLM Resistance to User-Induced Factual Bias

Jaeho Lee; Atharv Chowdhary

AssertBench: A Benchmark for LLM Resistance to User-Induced Factual Bias

Jaeho Lee, Atharv Chowdhary

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmark, bias, framing, prompt, sycophancy

Abstract: Recent benchmarks have probed factual consistency and rhetorical robustness in Large Language Models (LLMs). However, a knowledge gap exists regarding the influence of framing effects on LLMs' evaluation of facts. AssertBench addresses this by sampling evidence-supported facts from FEVEROUS, a fact verification dataset. For each fact, we construct two framing prompts: one in which the user claims the statement is factually correct, and another in which the user claims it is incorrect. We then record the model's agreement and reasoning. AssertBench isolates framing-induced variability from the model's underlying factual knowledge by stratifying results based on the model's accuracy on the same claims when presented neutrally. In doing so, this benchmark aims to measure an LLM's ability to "stick to its guns" when presented with contradictory user assertions about the same fact.

Submission Number: 58

Loading