Learning Safe Control via On-the-Fly Bandit Exploration

Alexandre Capone; Ryan Kazuo Cosner; Aaron Ames; Sandra Hirche

Learning Safe Control via On-the-Fly Bandit Exploration

Alexandre Capone, Ryan Kazuo Cosner, Aaron Ames, Sandra Hirche

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We guarantee safe control by learning on the fly, without requiring a prior model or backup controller

Abstract: Control tasks with safety requirements under high levels of model uncertainty are increasingly common. Machine learning techniques are frequently used to address such tasks, typically by leveraging model error bounds to specify robust constraint-based safety filters. However, if the learned model uncertainty is very high, the corresponding filters are potentially invalid, meaning no control input satisfies the constraints imposed by the safety filter. While most works address this issue by assuming some form of safe backup controller, ours tackles it by collecting additional data on the fly using a Gaussian process bandit-type algorithm. We combine a control barrier function with a learned model to specify a robust certificate that ensures safety if feasible. Whenever infeasibility occurs, we leverage the control barrier function to guide exploration, ensuring the collected data contributes toward the closed-loop system safety. By combining a safety filter with exploration in this manner, our method provably achieves safety in a general setting that does not require any prior model or backup controller, provided that the true system lies in a reproducing kernel Hilbert space. To the best of our knowledge, it is the first safe learning-based control method that achieves this.

Lay Summary: Deploying robots in an unknown environment is especially challenging. It is unclear how they will behave, which can result in unsafe behavior, such as a collision, internal damage to the machine, or even posing risks to other humans. We mitigate this by proposing a method that acts safely given the current information, and collects new data if it estimates that it does not have sufficient information to perform safe actions.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Primary Area: Applications->Robotics

Keywords: Safety, Gaussian processes, control barrier functions, Bayesian models

Submission Number: 8284

Loading