Abstract: This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ<math><mi is="true">λ</mi></math>-policy iteration algorithms are not strongly polynomial.
Loading