2009 (modified: 05 Nov 2022)Math. Oper. Res. 2009Readers: Everyone
Abstract:We consider a Markov decision process (MDP) setting in which the reward function is allowed to change after each time step (possibly in an adversarial manner), yet the dynamics remain fixed. Simila...