Combining model-based design and model-free policy optimization to learn safe, stabilizing controllers
Abstract: This paper introduces a framework for learning a safe, stabilizing controller for
a system with unknown dynamics using model-free policy optimization algorithms. Using a
nominal dynamics model, the user specifies a candidate Control Lyapunov Function (CLF)
around the desired operating point, and specifies the desired safe-set using a Control Barrier
Function (CBF). Using penalty methods from the optimization literature, we then develop a
family of policy optimization problems which attempt to minimize control effort while satisfying
the pointwise constraints used to specify the CLF and CBF. We demonstrate that when
the penalty terms are scaled correctly, the optimization prioritizes the maintenance of safety
over stability, and stability over optimality. We discuss how standard reinforcement learning
algorithms can be applied to the problem, and validate the approach through simulation. We
then illustrate how the approach can be applied to a class of hybrid models commonly used in
the dynamic walking literature, and use it to learn safe, stable walking behavior over a randomly
spaced sequence of stepping stones.
0 Replies
Loading