Ensemble-based Uncertainty Estimation with overlapping alternative Predictions

Dirk Eilers; Felippe Schmoeller Roza; Karsten Roscher

Ensemble-based Uncertainty Estimation with overlapping alternative Predictions

Dirk Eilers, Felippe Schmoeller Roza, Karsten Roscher

08 Oct 2022 (modified: 05 May 2023)Deep RL Workshop 2022Readers: Everyone

Keywords: safe reinforcement learning, Safe RL, distributional shift, uncertainty estimation, model ensemble, out of distribution detection, OOD, safety

TL;DR: This paper proposes an aproach on ensemble based epistemic uncertainty estimation on gridworld scenarios with discrete action spaces and overlapping alternative predictions based on action count variance and delta to ID.

Abstract: A reinforcement learning model will predict an action in whatever state it is. Even if there is no distinct outcome due to unseen states the model may not indicate that. Methods for uncertainty estimation can be used to indicate this. Although a known approach in Machine Learning, most of the available uncertainty estimation methods are not able to deal with the choice overlap that happens in states where multiple actions can be taken by a reinforcement learning agent with a similar performance outcome. In this work, we investigate uncertainty estimation on simplified scenarios in a gridworld environment. Using ensemble-based uncertainty estimation we propose an algorithm based on action count variance (ACV) to deal with discrete action spaces and a calculation based on the in-distribution delta (IDD) of the action count variance to handle overlapping alternative predictions. To visualize the expressiveness of the model uncertainty we create heatmaps for different in-distribution (ID) and out-of-distribution (OOD) scenarios and propose an indicator for uncertainty. We can show that the method is able to indicate potentially unsafe states when the agent is facing novel elements in the OOD scenarios while capable to distinguish uncertainty resulting from OOD instances from uncertainty caused by the overlapping of alternative predictions.

0 Replies

Loading