VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?

Jiaqi WANG; Weijia Wu; Yi Zhan; Rui Zhao; Ming Hu; James Cheng; Wei Liu; Philip Torr; Kevin Qinghong Lin

VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?

Jiaqi WANG, Weijia Wu, Yi Zhan, Rui Zhao, Ming Hu, James Cheng, Wei Liu, Philip Torr, Kevin Qinghong Lin

Published: 24 Mar 2026, Last Modified: 24 Mar 2026CVPR 2026 Workshop VGBEEveryoneRevisionsBibTeXCC BY 4.0

Submission Type: Short Papers (up to 4 pages)

Keywords: Video Generation, Video Understanding, AIGC Detection

TL;DR: We present the first ASMR-focused benchmark posing new challenges for frontier video generation and understanding.

Abstract: Recent video generation models can produce increasingly realistic videos, making synthetic content harder to distinguish from real footage. Existing benchmarks mainly emphasize semantic alignment or coarse physical plausibility, offering limited sensitivity to subtle realism failures. We present VideoASMR-Bench, a preliminary benchmark based on ASMR videos, a domain that naturally requires fine-grained audio--visual synchronization, material realism, and sensory consistency. The benchmark contains 1,500 real ASMR clips curated from social media and 2,235 synthetic counterparts generated by contemporary video generation models. Using a binary real-versus-fake judgment task, we conduct an initial evaluation of representative video-language models (VLMs) and human annotators. Our early findings show that even strong proprietary VLMs still lag behind humans in detecting AI-generated ASMR videos, while audio cues provide clear gains for authenticity judgment. These results suggest that ASMR is a sensitive and underexplored testbed for evaluating both video realism and multimodal video understanding.

Submission Number: 6

Loading