Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries

Published: 01 May 2025, Last Modified: 12 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains where objectives are hard to formalize. However, traditional methods based on pairwise trajectory comparisons face notable challenges, including the difficulty in comparing trajectories with subtle differences and the limitation of conveying only ordinal information, limiting direct inference of preference strength. In this paper, we introduce a novel *distinguishability query*, enabling humans to express preference strength by comparing two pairs of trajectories. Labelers first indicate which of two pairs is easier to distinguish, then provide preference feedback only on the easier pair. Our proposed query type directly captures preference strength and is expected to reduce the cognitive load on the labeler. We further connect this query to cardinal utility and difference relations and develop an efficient query selection scheme to achieve a better trade-off between query informativeness and easiness. Experimental results demonstrate the potential of our method for faster, data-efficient learning and improved user-friendliness in RLHF benchmarks, particularly in classical control settings where preference strength is critical for expected utility maximization.
Lay Summary: When training artificial intelligence (AI) systems like robots to behave the way people want, researchers often ask humans to pick their favorite between two short examples of AI behavior. But this approach has two major issues: it is difficult to compare similar behaviors, and it does not show how strongly someone prefers one over the other. This makes it harder for AI to learn effectively. We introduce a new kind of question, called a distinguishability query. Instead of comparing just one pair of behaviors, people are shown two comparisons and asked which is easier to judge. Then, they only give feedback on that easier one. This small change helps in two big ways: it gives the AI more insight into how strong a preference is, and it reduces the effort required from people. We test this method on tasks where robots learn to move or manipulate objects. Our system not only learns faster than previous methods, but also asks questions that are easier for people to answer. This approach brings us closer to building AI that can quickly and efficiently learn what humans truly value with less frustration for those providing feedback.
Primary Area: Reinforcement Learning->Deep RL
Keywords: Reinforcement Learning from Human Feedback, Preference-based Reinforcement Learning, Human-in-the-loop Machine Learning
Submission Number: 15643
Loading