Keywords: Emotional and Social Intelligence (ESI), AI Benchmark, Semi-automated Data Generation Framework, ESI-Bench
TL;DR: Addressing poor AI emotional and social intelligence benchmarks, proposes a high-efficiency,semi-automated framework to generate high-quality datasets, introducing ESI-Bench benchmarks for improved evaluation and advancing human-AI interaction
Abstract: Recent work has increasingly focused on the evolution and intelligent modeling of interpersonal interaction and collaboration. Driven by advances in Multimodal Large Language Models (MLLMs), emotional intelligence (EI) and social intelligence (SI) have emerged as core competencies for AI systems: they enable agents to modulate behavior in complex contexts, infer others’ intentions, maintain social relationships, and ultimately support natural human-machine interaction and seamless collaboration. To systematically investigate AI capabilities and pathways for understanding EI and SI, the community has introduced benchmarks such as EQ-Bench, Social-IQ 2.0, and V-Social, advancing research on emotion understanding, social behavior modeling, and social common sense reasoning. However, existing approaches generated datasets exhibit limited semantic separability between options, low question answer relevance (QA relevance), low dataset complexity (with extremely high accuracy), low ground truth correctness,narrow modality coverage, and pronounced inherent biases. Meanwhile, these data construction pipeline suffer from high annotation costs and lengthy data-collection cycles. To address the shortcomings of the existing evaluation datasets, We introduce ESI-Bench, a benchmark comprising 1,105 videos and 5,490 meticulously generated QA pairs. It offers accurate cross-modal alignment, high semantic separability, strong QA relevance, reliable ground truths, and substantially reduced inherent bias, enabling clear performance stratification across state-of-the-art (SOTA) models.We also propose a semi-automated, high-efficiency data generation framework. Our framework integrates multiple models (open-source and closed-source) with complementary strengths and couples them with a lightweight manual verification loop,enabling low-cost, large-scale construction of high-quality emotional social intelligence datasets. This work provides a scalable paradigm for constructing rigorous emotional and social intelligence (ESI) evaluations and aims to advance research toward more capable human-AI interaction.
Primary Area: datasets and benchmarks
Submission Number: 8840
Loading