Koopman Constrained Policy Optimization: A Koopman operator theoretic method for differentiable optimal control in robotics
Keywords: Optimal Control, Policy Optimization, Constrained Policy Optimization, Imitation Learning, Koopman Autoencoder, Koopman Operator Theory, Representation Learning, Model Predictive Control
TL;DR: Koopman Constrained Policy Optimization (KCPO) combines implicitly differentiable model predictive control with a deep Koopman autoencoder for constrained policy optimization algorithm with hard box constraints on controls.
Abstract: We introduce Koopman Constrained Policy Optimization (KCPO), combining implicitly differentiable model predictive control with a deep Koopman autoencoder for robot learning in unknown and nonlinear dynamical systems. KCPO is a new policy optimization algorithm that trains neural policies end-to-end with hard box constraints on controls. Guaranteed satisfaction of hard constraints helps ensure the performance and safety of robots. We perform imitation learning with KCPO to recover expert policies on the Simple Pendulum, Cartpole Swing-Up, Reacher, and Differential Drive environments, outperforming baseline methods in generalizing to out-of-distribution constraints in most environments after training.
Submission Number: 45
Loading