Hallucination Localization in Video Captioning

Hallucination Localization in Video Captioning

ACL ARR 2025 May Submission2273 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We propose a novel task, hallucination localization in video captioning, which aims to identify hallucinations in video captions at the span level (i.e. individual words or phrases). This allows for a more detailed analysis of hallucinations compared to existing sentence-level hallucination detection task. We manually annotate 1,167 hallucination instances from VideoLLM-generated captions to build HLVC-Dataset, a specialized dataset for hallucination localization. We further implement a VideoLLM-based baseline method and conduct quantitative and qualitative evaluations to benchmark current performance on hallucination localization.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Resources and Evaluation

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 2273

Loading