Fact-checking: Generative LLMs Don't Pay Attention to Details

Anonymous

Fact-checking: Generative LLMs Don't Pay Attention to Details

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We evaluate the knowledge and reasoning capabilities of generative LLMs for fact-checking in a closed-book style.

Abstract: Fact-checking is an established knowledge intensive natural language processing (NLP) task, including evidence retrieval and claim verification steps. Meanwhile, generative large language models (LLMs) increasingly memorize facts as the model sizes grow. This memorization ability of language models leads to the question of whether extractive retrieval is still necessary if the facts required to check a claim have been seen during pre-training. Consequently, this paper evaluates generative LLMs’ closed-book fact-checking performance on two Wikipedia-based datasets, FEVER and HOVER. Instead of retrieving the evidence from external knowledge bases, we let the model generate rationales and verify the claim by itself in a few-shot setup. For the simple dataset, the best verification performance of selected open-source LLMs achieves an F1 score up to 89% (GPT-4 93%); for the complex dataset, the performance reaches 62% (GPT-4 70%). Compared to the claim-only verification, the extra rationale generation step has boosted the verification performance by 4.05 percentage points on FEVER and 5.12 on HOVER.

Paper Type: long

Research Area: Generation

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English

0 Replies

Loading