FactGuard: Detecting Unanswerable Questions in Long-Context Texts for Reliable LLM Responses

FactGuard: Detecting Unanswerable Questions in Long-Context Texts for Reliable LLM Responses

ACL ARR 2025 May Submission1454 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Extractive reading comprehension systems are designed to locate the correct answer to a question within a given text. However, a persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably recognizing unanswerable queries. Despite significant advances in large language models (LLMs) for reading comprehension, this issue remains critical, particularly as the length of supported contexts continues to expand. To address this challenge, we propose an innovative data augmentation methodology grounded in a multi-agent collaborative framework. Unlike traditional methods, such as the costly human annotation process required for datasets like SQuAD 2.0, our method autonomously generates evidence-based question-answer pairs and systematically constructs unanswerable questions. Using this methodology, we developed the FactGuard-Bench dataset, which comprises 25,220 examples of both answerable and unanswerable question scenarios, with context lengths ranging from 8K to 128K. Experimental evaluations conducted on eight popular LLMs reveal that even the most advanced models achieve only 61.79\% overall accuracy. We emphasize the importance of a model's ability to reason about unanswerable questions to avoid generating plausible but incorrect answers. This capability provides valuable insights for the training and optimization of LLMs.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: automatic creation and evaluation of language resources,NLP datasets,metrics,evaluation

Contribution Types: Data resources, Data analysis

Languages Studied: english,chinese

Submission Number: 1454

Loading