Abstract: To enhance adaptability in human-robot interaction, it is essential to model the values of human stakeholders. Traditional approaches often involve learning a single reward function from human preferences. However, this approach often fails to capture the diversity in human values. To address this, I propose a shift towards distributional methods that learn a set of reward functions representing diverse human preferences. In prior collaborative work, I led the development of Pareto Optimal Preference Learning (POPL), a method that learns such reward functions directly from human preferences. POPL demonstates the ability to cater reward functions to individuals at test time and ensure fairness across groups. I propose that this set of reward functions can then be used to generate policies that are appropriate for new humans and environments, even if they were not explicitly present in the training distribution.
External IDs:dblp:conf/hri/Bahlous-Boldi25
Loading