Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

Abstract: We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where instead of receiving a numeric reward at each step, the RL agent only receives preferences over trajectory ...
0 Replies
Loading