Keywords: preference learning, policy alignment, representation learning
Abstract: While imitation learning-based methods have gained popularity for enabling robots to perform complex tasks, they lack flexibility for adapting to preferences and general interpretability. Drawing insights from feature-based reward learning literature, which emphasizes alignment to human intent, we propose a novel abstraction called a "preference boundary" to reusably represent preferences. We also propose a method for validating alignment with the user's preferences and provide preliminary results evaluating these methods. We conclude with discussions of insights, next steps, and limitations.
Submission Number: 8
Loading