MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: Computational Social Science, Visual Question Answering, Data Sets or Data Repositories
TL;DR: We present MimeQA, a question-answering dataset on mime videos, to evaluate video LLMs' nonverbal social reasoning capabilities, and found that models perform below human performance.
Abstract: As AI becomes more closely integrated with peoples' daily activities, socially intelligent AI that can understand and interact seamlessly with humans in daily lives is increasingly important. However, current works in AI social reasoning all rely on language-only or language-dominant approaches to benchmark and training models, resulting in systems that are improving in verbal communication but struggle with nonverbal social understanding. To address this limitation, we tap into a novel data source rich in nonverbal social interactions -- mime videos. Mimes refer to the art of expression through gesture and movement without spoken words, which presents unique challenges and opportunities in interpreting nonverbal social communication. We contribute a new dataset called MimeQA, obtained by sourcing ~8 hours of videos clips from YouTube and developing a comprehensive video question-answering benchmark comprising 806 carefully annotated and verified question-answer pairs, designed to probe nonverbal social reasoning capabilities. Using MimeQA, we evaluate state-of-the-art video large language models (VideoLLMs) and find that they achieve low accuracy, generally ranging from 20-30%, while humans score 86\%. Our analysis reveals that VideoLLMs often fail to ground imagined objects and over-rely on the text prompt while ignoring subtle nonverbal interactions. We hope to inspire future work in AI models that embody true social intelligence capable of interpreting non-verbal human interactions.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/hzli1202/MimeQA
Code URL: https://github.com/MIT-MI/MimeQA
Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling
Submission Number: 1008
Loading