- TL;DR: In this paper, we present an imitation learning approach that combines language, vision, and motion in order to synthesize natural language-conditioned control policies.
- Abstract: In this work we propose a novel end-to-end imitation learning approach which combines natural language, vision, and motion information to produce an abstract representation of a task, which in turn can be used to synthesize specific motion controllers at run-time. This multimodal approach enables generalization to a wide variety of environmental conditions and allows an end-user to influence a robot policy through verbal communication. We empirically validate our approach with an extensive set of simulations and show that it achieves a high task success rate over a variety of conditions while remaining amenable to probabilistic interpretability.
- Keywords: robot learning, imitation learning, natural language processing