A Self-Supervised Method for Mapping Human Instructions to Robot Policies

Hsin-Wei Yu; Po-Yu Wu; Chih-An Tsao; You-An Shen; Shih-Hsuan Lin; Zhang-Wei Hong; Yi-Hsiang Chang; Chun-Yi Lee

A Self-Supervised Method for Mapping Human Instructions to Robot Policies

Hsin-Wei Yu, Po-Yu Wu, Chih-An Tsao, You-An Shen, Shih-Hsuan Lin, Zhang-Wei Hong, Yi-Hsiang Chang, Chun-Yi Lee

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: In this paper, we propose a modular approach which separates the instruction-to-action mapping procedure into two separate stages. The two stages are bridged via an intermediate representation called a goal, which stands for the result after a robot performs a specific task. The first stage maps an input instruction to a goal, while the second stage maps the goal to an appropriate policy selected from a set of robot policies. The policy is selected with an aim to guide the robot to reach the goal as close as possible. We implement the above two stages as a framework consisting of two distinct modules: an instruction-goal mapping module and a goal-policy mapping module. Given a human instruction in the evaluation phase, the instruction-goal mapping module first translates the instruction to a robot-interpretable goal. Once a goal is derived by the instruction-goal mapping module, the goal-policy mapping module then follows up to search through the goal-policy pairs to look for policy to be mapped by the instruction. Our experimental results show that the proposed method is able to learn an effective instruction-to-action mapping procedure in an environment with a given instruction set more efficiently than the baselines. In addition to the impressive data-efficiency, the results also show that our method can be adapted to a new instruction set and a new robot action space much faster than the baselines. The evidence suggests that our modular approach does lead to better adaptability and efficiency.

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco)

4 Replies

Loading