HandMeThat: Human-Robot Communication in Physical and Social Environments

Yanming Wan; Jiayuan Mao; Joshua B. Tenenbaum

HandMeThat: Human-Robot Communication in Physical and Social Environments

Yanming Wan, Jiayuan Mao, Joshua B. Tenenbaum

Published: 17 Sept 2022, Last Modified: 23 May 2023NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: Pragmatic Reasoning, Goal Inference, Instruction Following

Abstract: We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.

Author Statement: Yes

TL;DR: HandMeThat is a benchmark for evaluating instruction understanding and following in physical and social environments.

URL: http://handmethat.csail.mit.edu

Dataset Url: http://handmethat.csail.mit.edu

License: The code for the benchmark is under the MIT license: https://opensource.org/licenses/MIT

Supplementary Material: pdf

Contribution Process Agreement: Yes

In Person Attendance: Yes

45 Replies

Loading