Continual Learning for Instruction Following from Realtime Feedback

Alane Suhr; Yoav Artzi

Continual Learning for Instruction Following from Realtime Feedback

Alane Suhr, Yoav Artzi

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 spotlightEveryoneRevisionsBibTeX

Keywords: continual learning, interaction, instruction following, user feedback, natural language processing, language grounding, situated interaction, collaboration

TL;DR: We design and deploy a learning approach for improving instruction-following agents from feedback provided in real time during collaborative interactions with human users.

Abstract: We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to immediate reward. We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.

Supplementary Material: zip

Submission Number: 8531

Loading