FactGuard: Detecting Unanswerable Questions in Long-Context Texts for Reliable LLM Responses

FactGuard: Detecting Unanswerable Questions in Long-Context Texts for Reliable LLM Responses

ICLR 2026 Conference Submission17376 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: unanswerable question, NLP datasets, metrics

TL;DR: We propose a novel framework to autonomously generate the FactGuard-Bench dataset, improving LLMs' accuracy in distinguishing answerable and unanswerable questions in long-context reading comprehension.

Abstract: Large language models (LLMs) have demonstrated significant advances in reading comprehension. However, a persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably recognizing unanswerable queries. This issue remains critical, particularly as the length of supported contexts continues to expand. To address this challenge, we propose a collaborative multi-task workflow called FactGuard to automatically generate evidence-based question-answer pairs and systematically construct unanswerable questions. Using this methodology, we developed the FactGuard-Bench dataset, which comprises 25,220 examples of both answerable and unanswerable question scenarios, with context lengths ranging from 4K to 128K. Experimental evaluations conducted on nine popular LLMs reveal that all LLMs exhibit significant performance gap between answerable and unanswerable questions and the most advanced models achieve only 67.67\% overall accuracy. After training with FactGuard-Bench, the model achieves an overall accuracy of 81.17\%, along with enhanced reasoning capabilities on unanswerable questions. Our code is publicly available at https://anonymous.4open.science/r/FACTGUARD-5BBC

Primary Area: datasets and benchmarks

Submission Number: 17376

Loading