Abstract: This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While
binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal
information inherent in user decision-making remains largely unexploited. We
propose novel methodologies to incorporate response time information alongside
binary choice data, leveraging the Evidence Accumulation Drift Diffusion (EZ)
model, under which response time is informative of the preference strength. We
develop Neyman-orthogonal loss functions that achieve oracle convergence rates
for reward model learning, matching the theoretical optimal rates that would be
attained if the expected response times for each query were known a priori. Our
theoretical analysis demonstrates that for linear reward functions, conventional
preference learning suffers from error rates that scale exponentially with reward
magnitude. In contrast, our response time-augmented approach reduces this to
polynomial scaling, representing a significant improvement in sample efficiency.
We extend these guarantees to non-parametric reward function spaces, establishing
convergence properties for more complex, realistic reward models. Our extensive
set of experiments validate our theoretical findings in the context of preference
learning over images.
Loading