Abstract: A key challenge in reward learning from human input is that desired agent behavior often changes based on context. For example, a robot must adapt to avoid a stove once it becomes hot. We observe that while high-level preferences (e.g., prioritizing safety over efficiency) often remain constant, context alters the saliency–or importance–of reward features. For instance, stove heat changes the relevance of the robot’s proximity, not the underlying preference for safety. Moreover, these contextual effects recur across tasks, motivating the need for transferable representations to encode them. Existing multi-task and meta-learning methods simultaneously learn representations and task preferences, at best implicitly capturing contextual effects and requiring substantial data to separate them from task-specific preferences. Instead, we propose explicitly modeling and learning context-dependent feature saliency separately from context-invariant preferences. We introduce calibrated features–modular representations that capture contextual effects on feature saliency–and present specialized paired comparison queries that isolate saliency from preference for efficient learning. Simulated experiments show our method improves sample efficiency, requiring 10x fewer preference queries than baselines to achieve equivalent reward accuracy, with up to 15% better performance in low-data regimes (5–10 queries). An in-person user study (N=12) demonstrates that participants can effectively teach their personal contextual preferences with our method, enabling adaptable and personalized reward learning.
External IDs:dblp:conf/hri/Forsey-SmerekSB26
Loading