Rating-based Reinforcement Learning

Devin White, Mingkang Wu, Ellen Novoseller, Vernon Lawhern, Nicholas R Waytowich, Yongcan Cao

Published: 24 Mar 2024, Last Modified: 05 May 2026Proceedings of the AAAI Conference on Artificial IntelligenceEveryoneCC BY 4.0

Abstract: This paper develops a novel rating-based reinforcement learning (RbRL) approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multiclass loss function. We finally conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the performance of the new rating-based reinforcement learning approach.