Towards Embodied Agent Intent Explanation in Human-Robot Collaboration: ACT Error Analysis and Solution Conceptualization

Amanuel Ergogo; Zhao Han

Towards Embodied Agent Intent Explanation in Human-Robot Collaboration: ACT Error Analysis and Solution Conceptualization

Amanuel Ergogo, Zhao Han

Published: 07 May 2025, Last Modified: 18 May 2025ICRA Workshop Human-Centered Robot LearningEveryoneRevisionsBibTeXCC BY 4.0

Workshop Statement: Robot Learning from Demonstration (LfD) has become a widely used approach for defining robot behaviors, enabling the application of robots in dynamic human environments. However, actions generated by large-scale models often lack transparency, making it difficult for humans to anticipate or verify robot decisions. This opacity raises concerns about safety, trust, and effective collaboration. To address these challenges, we introduce \textbf{Contextual Robot Intent Explanation (CRIE)}, a \textbf{model-agnostic} and \textbf{real-time} framework that leverages \textbf{ contextual information, such as changes in the environment resulting from robot actions, rather than only analyzing how actions generated by embodied agents evolve in time and space}, to infer intent and provide explanations to human collaborators. This approach \textbf{reduces ambiguity in robot behavior and enables human-in-the-loop verification before the robot acts}. To evaluate CRIE, we train policies using the \textbf{Action Chunking Transformer (ACT) and diffusion models} on two manipulation tasks—\textbf{object handover and robot-assisted medication dispensing} using a data set collected from human demonstrations. We evaluated performance under three conditions: \textbf{(1) CRIE}, where humans collaborate with ACT- and diffusion-based embodied agents while receiving \textbf{ intention explanations} from CRIE; \textbf{(2) standard execution}, where humans collaborate with the same agents without CRIE; and \textbf{(3) human baseline}, where we compare performance with the human demonstrations used for training. \textbf{For all scenarios, we measured performance using safety compliance, task success rate, and duration to complete the task.} Our results show that...%CRIE improves interpretability and facilitates safety verification while maintaining task efficiency. This work highlights the potential of intent-explanation frameworks to enhance human-robot collaboration by making learned robot policies more understandable and predictable.

Keywords: human-robot collaboration, human-robot interaction, robot explanation, error analysis

Abstract: Collaborative robots must not only perform competently but also communicate their intents transparently to ensure safety and efficiency in shared task environments. However, state-of-the-art robot policies such as Action Chunking Transformer (ACT) models are opaque, which may make it difficult for human partners to interpret or predict their actions and intent to facilitate task coordination. To confirm this, we conducted a two-condition comparative study in a collaborative medication-dispensing scenario, showing inaccurate estimation of an ACT robot’s intent led to miscoordination, duplicate medicine retrievals, and safety risks such as simultaneous access to shared shelf space. Specifically, we trained an ACT agent on human-human demonstration data and tested it in a human-agent condition. Compared to the human-human baseline, the opaque agent had a 36\% drop in task success from 97\% to 62\%, a 17-fold increase in safety incidents from 2\% to 34\%, i.e., simultaneous access to shared shelf space and incorrect medication delivery, as well as a 44\% increase in task completion (18s to 26s). This evidenced critical coordination breakdowns due to the lack of transparent intent. In this work in progress, we thus conceptualize model-agnostic CRIE (Contextual Robot Intent Explanation) that predicts robot intention and explains in natural language without modifying the underlying policy itself. By analyzing multimodal contextual features—such as task phase, spatial configuration, and action trajectories—CRIE aims for real-time transparency about the robot’s future actions. Our results will demonstrate how contextual, policy-agnostic intent explanations help close the gap between high-performing but opaque policies and transparent, human-compatible robot teamwork.

Submission Number: 33

Loading