Abstract: Model-free Deep Reinforcement Learning (DRL)
controllers have demonstrated promising results on various
challenging non-linear control tasks. While a model-free DRL
algorithm can solve unknown dynamics and high-dimensional
problems, it lacks safety assurance. Although safety constraints
can be encoded as part of a reward function, there still exists a
large gap between an RL controller trained with this modified
reward and a safe controller. In contrast, instead of implicitly
encoding safety constraints with rewards, we explicitly colearn a Twin Neural Lyapunov Function (TNLF) with the
control policy in the DRL training loop and use the learned
TNLF to build a runtime monitor. Combined with the path
generated from a planner, the monitor chooses appropriate
waypoints that guide the learned controller to provide collisionfree control trajectories. Our approach inherits the scalability
advantages from DRL while enhancing safety guarantees. Our
experimental evaluation demonstrates the effectiveness of our
approach compared to DRL with augmented rewards and
constrained DRL methods over a range of high-dimensional
safety-sensitive navigation tasks
Loading