TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering
Abstract: Large Language Models (LLMs) are increasingly used to answer factual, information-seeking questions (ISQs). While prior work often focuses on false misleading information, little attention has been paid to true but strategically persuasive content that can derail a model’s reasoning. To address this gap, we introduce a new evaluation dataset, TruthTrap, in two languages, i.e., English and Farsi, on Iran-related ISQs, each paired with a correct explanation and a persuasive-yet-misleading true hint. We then evaluate five diverse LLMs (spanning proprietary and open-source systems) via factuality classification and multiple-choice QA tasks, finding that accuracy drops by 25%, on average, when models encounter these misleading yet factual hints. Also, the models' predictions match the hint-aligned options up to 76.0 percent of the time. Notably, models often misjudge such hints in isolation yet still integrate them into final answers. Our results highlight a significant limitation in LLM outputs, underscoring the importance of robust fact-verification and emphasizing real-world risks posed by partial truths in domains like social media, education, and policy-making.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, language resources, multilingual corpora, datasets for low resource languages
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: English, Farsi
Submission Number: 4280
Loading