Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers

Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers

ACL ARR 2024 June Submission3888 Authors

16 Jun 2024 (modified: 06 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present Factcheck-Bench, a holistic end-to-end framework for annotating and evaluating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels for fact-checking and correcting not just the final prediction, but also the intermediate steps that a fact-checking system might need to take. Based on this framework, we construct an open-domain factuality benchmark in three-levels of granularity: claim, sentence, and document. We further propose a system, Factcheck-GPT, which follows our framework, and we show that it outperforms several popular LLM fact-checkers. We make our annotation tool, annotated data, benchmark, and code available at \url{http://anonymous.for.review}

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Fact-checking system evaluation, fine-grained end-to-end solution

Contribution Types: Data resources, Data analysis

Languages Studied: English

Submission Number: 3888

Loading