Interpretable Reinforcement Learning with Self-Abstraction and Refinement

Yunze Wu; Jingyu Cao; Ke Fan; Yuan Zhou; Jianzhu Ma

Interpretable Reinforcement Learning with Self-Abstraction and Refinement

Yunze Wu, Jingyu Cao, Ke Fan, Yuan Zhou, Jianzhu Ma

18 Sept 2025 (modified: 01 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interpretable Reinforcement Learning, Interactive Reinforcement Learning, Hierarchical Reinforcement Learning, Language Model Prompting

TL;DR: We design a interactive reinforcement learning model to deal with composite tasks based on interpretability and expert refinement by provided by automaton synthesis and LLM.

Abstract: We propose ReLIC, a reinforcement learning method with interactivity for composite tasks. Traditional RL methods lack interpretability, so it is difficult to integrate expert knowledge and refine the trained model. ReLIC is composed of a high-level logical model, low-level action policies, and a self-abstraction and refinement module. At its high level, it takes in predicates as its input so that we can design a synthesis algorithm to illustrate our high-level model's logical structure as an automaton, demonstrating our model's interpretability. At its low level, deep reinforcement learning is utilized for detailed action control to maintain high performance. Furthermore, based on the structured information provided by the automaton, ReLIC leverages GPT-4o to generate expert predicates and refine the automaton by injecting expert predicates and performing joint training, thereby enhancing RELIC's performance. ReLIC outperforms state-of-the-art baselines in several benchmarks with continuous state and action spaces. Additionally, ReLIC does not require humans to hard-code logical structures, so it can solve logically uncertain tasks.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Submission Number: 12667

Loading