2020 (modified: 07 Oct 2024)ICML 2020Readers: Everyone
Abstract:We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm...