Claim Check-Worthiness Detection: How Well do LLMs Grasp Annotation Guidelines?

Claim Check-Worthiness Detection: How Well do LLMs Grasp Annotation Guidelines?

ACL ARR 2024 June Submission5425 Authors

16 Jun 2024 (modified: 13 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rising threat of disinformation underscores the need to fully or partially automate the fact-checking process. Identifying text segments requiring fact-checking is known as claim detection (CD) and claim check-worthiness detection (CW), the latter incorporating complex domain-specific criteria of worthiness and often framed as a ranking task. Zero- and few-shot LLM prompting is an attractive option for both tasks, as it bypasses the need for labeled datasets and allows verbalized claim and worthiness criteria to be directly used for prompting. We evaluate the LLMs' predictive accuracy and accuracy on five CD/CW datasets from diverse domains, each utilizing a different worthiness criterion. We examine two key aspects: (1) how to best distill factuality and worthiness criteria into a prompt, and (2) how much context to provide for each claim. To this end, we experiment with different levels of prompt verbosity and varying amounts of contextual information given to the model. We additionally evaluate the top-performing models with ranking metrics, resembling prioritization done by fact-checkers. Our results show that optimal prompt verbosity varies, meta-data alone adds more performance boost than co-text, and confidence scores can be directly used to produce reliable check-worthiness rankings.

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: misinformation detection and analysis, fact checking

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 5425

Loading