Keywords: goal-conditioned reinforcement learning, language-guided reinforcement learning, rational speech acts, pragmatism
TL;DR: Building on the pedagogy and pragmatism concepts from Developmental Psychology, we show how learning from language instructions can benefit from a Bayesian goal inference mechanism to reduce referential ambiguity.
Abstract: Teaching an agent to perform new tasks using natural language can easily be hindered by ambiguities in interpretation. When a teacher provides an instruction to a learner about an object by referring to its features, the learner can misunderstand the teacher's intentions, for instance if the instruction ambiguously refer to features of the object, a phenomenon called referential ambiguity. We study how two concepts derived from cognitive sciences can help resolve those referential ambiguities: pedagogy (selecting the right instructions) and pragmatism (learning the preferences of the other agents using inductive reasoning). We apply those ideas to a teacher/learner setup with two artificial agents on a simulated robotic task (block-stacking). We show that these concepts improve sample efficiency for training the learner.
3 Replies
Loading