Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

Alexander Tapley

Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

Alexander Tapley

Published: 27 Oct 2023, Last Modified: 27 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX

TL;DR: This paper introduces the ARLIN Toolkit, a Python library that provides explainability outputs for reinforcement learning models that can be used to identify potential policy vulnerabilities and critical points.

Abstract: Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Reinforcement Learning (RL) model and increase user trust and adoption into real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained RL model prior to deployment, therefore limiting the potential for mission failure or mistakes by the system. This paper introduces the ARLIN (Assured RL Model Interrogation) Toolkit, a Python library that provides explainability outputs for trained RL models that can be used to identify potential policy vulnerabilities and critical points. Using XRL datasets, ARLIN provides detailed analysis into an RL model's latent space, creates a semi-aggregated Markov decision process (SAMDP) to outline the model's path throughout an episode, and produces cluster analytics for each node within the SAMDP to identify potential failure points and vulnerabilities within the model. To illustrate ARLIN's effectiveness, we provide sample API usage and corresponding explainability visualizations and vulnerability point detection for a publicly available RL model. The open-source code repository is available for download at https://github.com/mitre/arlin.

Submission Track: Demo Track

Application Domain: None of the above / Not applicable

Clarify Domain: The library is domain invariant and can be applied in any domain that is utilizing deep reinforcement learning models.

Survey Question 1: The ARLIN Toolkit provides detailed, human-interpretable explainability outputs to aid researchers and machine learning engineers in identifying potential vulnerabilities and critical points within their reinforcement learning models prior to their deployment into safety-critical scenarios. Through various analysis graphs and visualizations, researchers can visualize how a reinforcement learning model will navigate a given environment and detect paths that can result in mission failure and areas of the environment where the model has low confidence in its predictions.

Survey Question 2: Reinforcement learning models are increasingly being used in safety-critical environments for potentially dangerous tasks. Before reinforcement learning models can safely be deployed into high-risk scenarios, the associated vulnerabilities need to be well-understood and accounted for. The use of the ARLIN toolkit can help increase the safety of deployed reinforcement learning models and identify potential vulnerabilities and mistake-prone situations before the system is deployed into critical scenarios, reducing the potential for mission failure or dangerous outcomes.

Survey Question 3: This work utilizes a variety of techniques including statistical analysis, neural network latent analysis, and semi-aggregated Markov decision process (SAMDP) generation to provide explainability outputs to users.

Submission Number: 5

Loading