Improving the Transparency of Robot Policies Using Demonstrations and Reward Communication

Published: 20 Aug 2025, Last Modified: 02 Feb 2026ACM Transactions on Human-Robot InteractionEveryoneCC BY 4.0
Abstract: Demonstrations are a powerful way to teach robot decision-making to humans. Although informative demonstrations may be selected a priori using the machine teaching framework, student learning may deviate from the pre-selected curriculum in situ. This article thus explores augmenting a curriculum of pre-selected demonstrations with a closed-loop teaching framework inspired by principles from the education literature, such as the zone of proximal development and the testing effect. We utilize tests accordingly to close the loop and maintain a novel particle filter model of human beliefs throughout the learning process, allowing us to provide demonstrations that are targeted at the human’s current understanding in real time. A user study finds that our proposed closed-loop teaching framework reduces the regret (i.e., the suboptimality) of human test responses by 43% over an open-loop baseline. We also compare our closed-loop teaching framework against another baseline of directly communicating the robot’s reward function in a second user study. We find that our closed-loop teaching outperforms direct reward communication by 64%, but we also observe synergies from the use of both teaching forms. Finally, we observe strong interaction effects between the teaching form and the domains considered in both user studies, seeing increased learning outcomes from well-designed demonstration-based teaching in the more challenging domain.
Loading