Interpretable Human-in-the-Loop In-Context Preference Learning Via Preference Boundaries

Valerie K. Chen; Julie Shah; Andreea Bobu

Interpretable Human-in-the-Loop In-Context Preference Learning Via Preference Boundaries

Valerie K. Chen, Julie Shah, Andreea Bobu

Published: 08 Jun 2025, Last Modified: 22 Jun 2025CRLH PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: preference learning, policy alignment, representation learning

Abstract: While imitation learning-based methods have gained popularity for enabling robots to perform complex tasks, they lack flexibility for adapting to preferences and general interpretability. Drawing insights from feature-based reward learning literature, which emphasizes alignment to human intent, we propose a novel abstraction called a "preference boundary" to reusably represent preferences. We also propose a method for validating alignment with the user's preferences and provide preliminary results evaluating these methods. We conclude with discussions of insights, next steps, and limitations.

Submission Number: 8

Loading