Abstract: Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a _closed-loop_ fashion. In this work, we introduce the paradigm of _open-loop reinforcement learning_ where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman's equation from dynamic programming, our work builds on _Pontryagin's principle_ from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, demonstrating remarkable performance compared to existing baselines.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: No
Submission Number: 36
Loading