Optimistic Policy Optimization with Bandit FeedbackDownload PDFOpen Website

2020 (modified: 12 May 2023)ICML 2020Readers: Everyone
Abstract: Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, ...
0 Replies

Loading