Keywords: language advice, human in the loop, human interation, reinforcement learning
Abstract: Natural language advice has the potential to accelerate reinforcement learning, but utilizing diverse and highly detailed forms of language efficiently remains unsolved. Existing methods focus on mapping natural language to individual elements of MDPs such as reward functions or policies, but such approaches limit the scope of language they consider to make such mappings possible. We propose to leverage language advice by translating sentences to a grounded formal language for expressing information about every element of an MDP and its solution, including policies, plans, reward functions, and transition functions. We also introduce a new model-based reinforcement learning algorithm, RLang-Dyna-Q, capable of leveraging all such advice, and demonstrate in two sets of experiments that grounding language to every element of an MDP leads to significant performance gains. In additional symbol-grounding demonstrations we show how vision-language models can annotate important structure in the environment in the form of RLang vocabulary files, eliminating the need for human labels.
Primary Area: reinforcement learning
Supplementary Material: zip
Submission Number: 1815
Loading