LVCap-Eval: Towards Holistic Long Video Caption Evaluation for Multimodal LLMs

Yanghai Wang; Zhe Cao; Yuanxing Zhang; Yifan Yao; Jiahao Wang; Liang Gong; Zhiyu Pan; Zijie Zhang; Zhidong Gan; Minxin Dai; Yonghong Lin; Shihao Li; Qianqian Xie; Xintao Wang; Jiaheng Liu; Zhaoxiang Zhang

LVCap-Eval: Towards Holistic Long Video Caption Evaluation for Multimodal LLMs

Yanghai Wang, Zhe Cao, Yuanxing Zhang, Yifan Yao, Jiahao Wang, Liang Gong, Zhiyu Pan, Zijie Zhang, Zhidong Gan, Minxin Dai, Yonghong Lin, Shihao Li, Qianqian Xie, Xintao Wang, Jiaheng Liu, Zhaoxiang Zhang

13 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Caption, VLLMs, Benchmark

Abstract: Generating coherent and factually grounded captions for long-form videos is a critical yet underexplored challenge for multimodal large language models (MLLMs). Existing benchmarks, which predominantly feature short clips, are insufficient for evaluating a model's ability to capture narrative structure and fine-grained details over extended durations. To address this gap, we introduce LVCap-Eval, a benchmark for long-form video captioning. LVCap-Eval comprises 200 videos from six diverse domains, ranging from 2 to 20 minutes, and features a dual-dimension evaluation protocol that assesses both scene-level narrative coherence and event-level factual accuracy. To facilitate model improvement, we also provide a pipeline for generating a training corpus, demonstrating that fine-tuning with as few as 7,000 samples yields substantial gains. Our evaluation of existing MLLMs on this benchmark reveals a significant performance disparity: while leading closed-source models (e.g., Gemini-2.5-Pro) perform robustly across various video durations, their open-source counterparts degrade sharply as video length increases. Finally, our analysis of these model failures highlights potential directions for improving the long-video comprehension of MLLMs.

Primary Area: datasets and benchmarks

Submission Number: 4694

Loading