Keywords: Egocentric Social Reasoning, First-Person Video Understanding, Embodied AI, Causal Inference, Multimodal Large Language Models
TL;DR: An egocentric video reasoning challenge evaluating social intelligence through multi-dimensional causal and intent inference.
Abstract: The Egocentric Language-Vision Interactive Network Knowledge(EgoLink) Challenge redefines the cognitive boundaries of embodied agents in social contexts. While Embodied AI ultimately aims to perceive and interact from an egocentric perspective, current research predominantly emphasizes physical navigation while neglecting deep social understanding. EgoLink introduces a large-scale, real-world egocentric benchmark that employs a multi-dimensional Multiple-Choice Question(MCQ) format to evaluate models' reasoning capabilities across emotions, causal logic, and behavioral intents in human interactions. This challenge bridges the gap between perception and social cognition, advancing Embodied AI toward socially-aware general intelligence.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 10
Loading