EgoErrorVQA: Assess Egocentric Comprehension Capabilities Through Procedural Errors For Ego-Agentic AI

EgoErrorVQA: Assess Egocentric Comprehension Capabilities Through Procedural Errors For Ego-Agentic AI

ACL ARR 2026 January Submission3170 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Egocentric Vision, AI Agent, VQA, Procedural Comprehension.

Abstract: An increasing number of intelligent systems interact with daily human activities, making robust egocentric visual information processing essential. However, existing benchmarks for Visual Agents and Visual Language Models (VLMs) primarily focus on third-person perspectives or capture only short-term visual understanding, limiting their ability to model long-horizon, action-centric procedures. To bridge this gap, we propose EgoErrorVQA, the first visual question answering (VQA) task designed for egocentric procedure comprehension with explicit modeling of procedural errors that reflect common execution failures in real-world tasks. EgoErrorVQA evaluates a range of models using both open-ended and multiple-choice questions, revealing persistent weaknesses in handling procedures with stepwise logical dependencies. In addition, we develop a user-friendly evaluator agent based on the Agent2Agent (A2A) protocol, enabling rigorous and standardized evaluation of visual agents through VQA-based interaction. Finally, we introduce EgoError-CoT, a training-free framework that improves reasoning through in-context learning and task-specific chain-of-thought prompting, delivering consistent gains without additional training.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, automatic creation and evaluation of language resources, evaluation methodologies, evaluation, metrics.

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources

Languages Studied: English

Submission Number: 3170

Loading