The Gambler's Problem and Beyond


Sep 25, 2019 Blind Submission readers: everyone Show Bibtex
  • Keywords: the gambler's problem, reinforcement learning, fractal and self-similarity, Bellman equation
  • TL;DR: The optimal value function is fractal and is like a Cantor function.
  • Abstract: We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by \cite{sutton2018reinforcement}, where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points but without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous case. Though simple as it might seem, the value function is pathological: fractal, self-similar, non-smooth on any interval, zero derivative almost everywhere, and not written as elementary functions. Sharing these properties with the Cantor function, it holds a complexity that has been uncharted thus far. With the analysis, our work could lead insights on improving value function approximation, Q-learning, and gradient-based algorithms in real applications and implementations.
0 Replies