Fast and Accurate Factual Inconsistency Detection Over Long Documents

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Natural Language Generation
Keywords: Hallucination detection, Inconsistency detection, hallucination, automatic evaluation, metric, long document, efficient, fast, accurate, long, natural language generation, task agnostic, nlp, nlg
TL;DR: Introducing SCALE, a task agnostic hallucination/factual inconsistency detection model for long documents and ScreenEval, a novel factual inconsistency detection dataset for long dialogue documents.
Abstract: Generative AI models exhibit remarkable potential; however, hallucinations across various tasks present a significant challenge, particularly for longer inputs that current approaches struggle to address effectively. We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy. Specifically, SCALE is a Natural Language Inference (NLI) based model that uses large text chunks to condition over long texts. This approach achieves state-of-the-art performance in factual inconsistency detection for diverse tasks and long inputs. Additionally, we leverage the chunking mechanism and employ a novel algorithm to explain SCALE's decisions through relevant source sentence retrieval. Our evaluations reveal that SCALE outperforms existing methods on both standard benchmarks and a new long-form dialogue dataset ScreenEval we constructed. Moreover, SCALE surpasses competitive systems in efficiency and model explanation evaluations. We have released our code and data publicly to GitHub.
Submission Number: 2466
Loading