Perception, Understanding and Reasoning: A Multimodal Benchmark for Video Fake News Detection

Perception, Understanding and Reasoning: A Multimodal Benchmark for Video Fake News Detection

ACL ARR 2026 January Submission5592 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal, Fake News Detection

Abstract: The advent of multi-modal large language models (MLLMs) has greatly advanced research on video fake news detection (VFND) tasks. Existing benchmarks typically focus on the detection accuracy, while failing to provide fine-grained assessments for the entire detection process. To address these limitations, we introduce POVFNDB (Process-oriented Video Fake News Detection Benchmark), a process-oriented benchmark comprising 10 tasks designed to systematically evaluate MLLMs' perception, understanding, and reasoning capabilities in VFND. This benchmark contains 36,240 human-annotated question-answer (QA) in structured or open-ended formats, spanning 15 distinct evaluation dimensions that characterize different aspects of the video fake news detection process.Using POVFNDB, we conduct comprehensive evaluations on both proprietary and open-source MLLMs. Moreover, We fine-tune Qwen2.5VL-7B-Instruct on a reasoning dataset generated by our proposed POVFND-CoT, a chain-of-thought method that utilizes rationales from evaluation results and rationale validation. The resulting model achieves sota performance on VFND.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Multimodal, Fake News Detection

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: English,Chinese

Submission Number: 5592

Loading