Abstract: We prove a fundamental limitation on the computational efficiency of a large class of Reinforcement Learning (RL) methods. This limitation applies to model-free RL methods as well as some model-based methods, such as AlphaZero. We provide a formalism that describes this class and present a family of RL problems provably intractable for these methods. Conversely, the problems in the family can be efficiently solved by toy methods. We identify several types of algorithms proposed in the literature that can avoid our limitation, including algorithms that construct an inverse dynamics model, and planning algorithms that leverage an explicit model of the dynamics.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Matteo_Papini1
Submission Number: 3246
Loading