How Sure to Be Safe? Difficulty, Confidence and Negative Side EffectsDownload PDF

Published: 05 Dec 2022, Last Modified: 05 May 2023MLSW2022Readers: Everyone
Abstract: A principal concern for AI systems is the occurrence of negative side effects, such as a robot cleaner breaking a vase. This is critical when these systems use machine learning models that were trained to maximise performance, without knowledge or feedback about the negative side effects. Within Vase World and SafeLife, two safety benchmarking domains, we analyse side effects during operation and demonstrate that their magnitude is influenced by task difficulty. Using two forms of confidence measure, we demonstrate that wrapping existing RL agents with these confidence measures enables with safety policies that activate when the agent's confidence falls below a specified threshold extends the Pareto frontier of both performance and safety.
1 Reply