Enabling End Users to Program Robots Using Reinforcement Learning

Tewodros W. Ayalew, Jennifer Wang, Michael L. Littman, Blase Ur, Sarah Sebo

Published: 01 Jan 2025, Last Modified: 21 May 2025HRI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Reinforcement learning (RL) is a powerful learning technique in robotics, where people can specify rewards that robots learn how to maximize through a process of trialanderror. Despite the numerous advantages of RL to robot programming, no approaches to our knowledge have sought to enable nontechnical users to specify RL programs for robots. In this work, we designed two novel RL-based robot programming paradigms for non-technical users: Full MDP Programming (Full-MDP) and Goal-Only MDP Programming (Goal-MDP). To evaluate the efficacy of these two approaches, we ran a between-subjects online user study ($N$ = 409) where participants were asked to program a simulated robot to complete example household tasks (e.g., delivering coffee) using one of our RL programming paradigms or a commonly used baseline: Sequential Programming (Seq), or Trigger-Action Programming (TAP). While users neither performed well nor reported positive experiences with the FullMDP interface, user performance and experience with Goal-MDP was similar to the baselines (Seq and TAP) with significantly shorter programs. These results demonstrate that RL-based paradigms like Goal-MDP are a viable alternative to more traditional approaches and provide a starting point for robot programming interfaces that allow end-users to leverage the myriad benefits of RL for programming robots.