Self-Interpretable Reinforcement Learning via Rule Ensembles

Published: 18 May 2025, Last Modified: 16 Oct 2025AAMASEveryoneRevisionsCC BY 4.0
Abstract: Current reinforcement learning (RL) models, often functioning as complex 'black boxes,' obscure decision-making processes. This lack of transparency limits its applicability in critical real-world applications where clear reasoning behind algorithmic choices is crucial. To tackle this issue, we suggest moving from neural network or tabular approaches to a rule ensemble model, which improves decision-making clarity and adapts dynamically to environmental interactions. Instead, our method constructs additive rule ensembles to approximate the Q-value in reinforcement learning using orthogonal gradient boosting (OGB) combined with a post-processing rule replacement technique. This method enables the model to provide inherent explanations through the use of rules. Our study sets a theoretical foundation for rule ensembles within the reinforcement learning framework, emphasizing their capacity to boost interpretability and facilitate the analysis of rule impacts. Experimental results from seven classic environments demonstrate that our proposed rule ensembles match or exceed the performance of representative RL models such as DQN, A2C, and PPO, while also providing self-interpretability and transparency.
Loading