Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPsDownload PDFOpen Website

Andrea Zanette, Emma Brunskill

2018 (modified: 23 May 2025)ICML 2018Readers: Everyone
Abstract: In order to make good decision under uncertainty an agent must learn from observations. To do so, two of the most common frameworks are Contextual Bandits and Markov Decision Processes (MDPs). In t...
0 Replies

Loading