Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown TransitionDownload PDFOpen Website

2020 (modified: 07 Oct 2024)ICML 2020Readers: Everyone
Abstract: We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm...
0 Replies

Loading