Abstract: Highlights•Two recent Q-learning algorithms, AQL and SPAQL, are evaluated on two classical control benchmarks.•Based on insights from control theory, a new algorithm, SPAQL-TS, is introduced.•It is shown that both SPAQL and SPAQL-TS outperform TRPO in the Cartpole problem.
Loading