Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function ApproximationDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 15 May 2023ICML 2022Readers: Everyone
Abstract: We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where instead of receiving a numeric reward at each step, the RL agent only receives preferences over trajectory ...
0 Replies

Loading