A limitation on black-box dynamics approaches to Reinforcement Learning

TMLR Paper3246 Authors

26 Aug 2024 (modified: 26 Jan 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We prove a fundamental limitation on the computational efficiency of a large class of Reinforcement Learning (RL) methods. This limitation applies to model-free RL methods as well as some model-based methods, such as AlphaZero. We provide a formalism that describes this class and present a family of RL problems provably intractable for these methods. Conversely, the problems in the family can be efficiently solved by toy methods. We identify several types of algorithms proposed in the literature that can avoid our limitation, including algorithms that construct an inverse dynamics model, and planning algorithms that leverage an explicit model of the dynamics.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Matteo_Papini1
Submission Number: 3246