Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor–Critic in Reproducing-Kernel Hilbert Spaces
Keywords: Reinforcement learning, explainable reinforcement learning, Shapley values, reproducing-kernel Hilbert spaces
TL;DR: This paper firstly extends interpretability to RKHS-based Actor–Critic methods to assist the optimization process and establishes a global non-asymptotic convergence bound under state perturbations.
Abstract: Actor--critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use *state attributions* to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward. We propose *RKHS--SHAP-based Advanced Actor--Critic (RSA2C)*, an attribution-aware, kernelized, two–timescale AC, including Actor, Value Critic, and Advantage Critic. The Actor is instantiated in a vector-valued reproducing kernel Hilbert space (RKHS) with a Mahalanobis-weighted operator-valued kernel, while the Value Critic and Advantage Critic reside in scalar RKHSs. These RKHS-enhanced components use sparsified dictionaries: the Value Critic maintains its own dictionary, while the Actor and Advantage Critic share one. State attributions, computed from the Value Critic via RKHS--SHAP (kernel mean embedding for on-manifold expectations and conditional mean embedding for off-manifold expectations), are converted into Mahalanobis-gated weights that modulate Actor gradients and Advantage Critic targets. Theoretically, we derive a global, non-asymptotic convergence bound under *state perturbations*, showing stability through the perturbation-error term and efficiency through the convergence-error term. Empirical results on three standard continuous-control environments show that RSA2C achieves efficiency, stability, and interpretability.
Primary Area: reinforcement learning
Submission Number: 3528
Loading