SIS-Fact: Towards Systematic, Interpretable and Scalable Factuality Evaluation for LLM

Published: 07 Jul 2025, Last Modified: 07 Jul 2025KnowFM @ ACL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: automatic evaluation, efficient models, text-to-text generation, analysis
TL;DR: A Systematic, Interpretable and Scalable Factuality Evaluation for LLM, the system includes a pipeline, a open-source dataset and an evaluator model.
Abstract: Despite Large Language Models' advances, document-grounded generation still suffers from factual errors. Current evaluations oversimplify error analysis by applying binary judgements, while costly human-annotated datasets contain under-representative error distributions. To address these challenges, we propose a novel framework named SIS-Fact (Systematic, Interpretable and Scalable Factuality Evaluation), which integrates systematic error typologies, synthetic data generation pipelines, and high-quality interpretable annotations for comprehensive factuality evaluation. Specifically, we first develop ten diverse methods to synthesize six error types in grounded generation, including both intrinsic and extrinsic errors. In this way, we develop SIS-Fact Dataset, a high-quality document-grounded factuality evaluation dataset characterized by challenging errors and interpretable error analysis. Based on SIS-Fact Dataset, we introduce SIS-Fact Evaluator, an advanced factuality evaluation model capable of fine-grained analysis and correction. Our extensive experiments show that SIS-Fact Evaluator achieves SOTA performance in SIS-Fact Dataset while maintaining strong generalization across existing multiple factuality benchmarks.
Archival Status: Non-archival (not included in proceedings)
Submission Number: 21
Loading