Q-learning for history-based reinforcement learning

Mayank Daswani, Peter Sunehag, Marcus Hutter

2013 (modified: 25 Jan 2025)ACML 2013Readers: Everyone

Abstract: We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs. We do this in...

0 Replies