Interpretable Reinforcement Learning via Meta-Policy Guidance

Raban Emunds; Jannis Blüml; Quentin Delfosse; Kristian Kersting

Interpretable Reinforcement Learning via Meta-Policy Guidance

Raban Emunds, Jannis Blüml, Quentin Delfosse, Kristian Kersting

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, interpretabile RL, hierarchical RL

Abstract: Deep reinforcement learning has demonstrated strong performance across a range of tasks, but its reliance on opaque neural policies hinders interpretability and alignment. Neurosymbolic approaches attempt to improve transparency by integrating symbolic reasoning, but when applied at the level of fine-grained actions, they often produce policies whose complexity obscures interpretability. We introduce LENS, an object-centric hierarchical reinforcement learning framework that combines neural low-level skill policies with symbolic high-level meta-policies to achieve both efficiency and interpretability. Our approach leverages the use of object-centric representations, which structure the environment in a way that enables large language models to generate meaningful skill definitions, reward functions, and meta-policy rules. We further extend the symbolic reasoning layer with a neuro-symbolic formulation of the meta-policies, improving expressiveness and generalization. All components are trained jointly using an off-policy algorithm that supports efficient and parallel learning of the sub-policies. LENS demonstrates strong performance while maintaining interpretability.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Raban_Emunds1

Track: Regular Track: unpublished work

Submission Number: 164

Loading