This folder includes supplementary materials of paper【Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison】.

The contents of each file are described as follows:
【01-demo.mp4】A demo showing the characteristics of the benchmark. The video is paired with annotations from 5 human annotators, groundtruth, and the output of 6 LVLMs.
【02-videos-for-examples-in-appendix】A folder containing 6 videos, corresponding to Figure A12-A17 in the Appendix. Please refer to the corresponding content in the Appendix of the PDF for viewing. We hope that these videos can serve as a supplement to the Appendix and better help understand the differences in performance between humans and LVLMs on these examples.

        