Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

Eugene A. Feinberg, Jefferson Huang, Bruno Scherrer

Published: 2014, Last Modified: 01 Oct 2024Oper. Res. Lett. 2014EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ<math><mi is="true">λ</mi></math>-policy iteration algorithms are not strongly polynomial.