HiLoRL: A Hierarchical Logical Model for Learning Composite Tasks

Chuan Hu; Jingyu Cao; Jinpeng Zhang; Yunze Wu; Yi Wu; Zhilei Xu; Jianzhu Ma; Yuan Zhou

HiLoRL: A Hierarchical Logical Model for Learning Composite Tasks

Chuan Hu, Jingyu Cao, Jinpeng Zhang, Yunze Wu, Yi Wu, Zhilei Xu, Jianzhu Ma, Yuan Zhou

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Hierarchical Reinforcement Learning, Adaptive Logic Planner, Interpretability, Expert Knowledge Instruction

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We design a hierarchical reinforcement learning model to deal with composite tasks, meanwhile providing interpretability and selective domain knowledge instruction mechanism

Abstract: We propose HiLoRL, a hierarchical model to learn policies for composite tasks. Recent studies mostly focus on using human-specified logical specifications, which is laborious and produces models that perform poorly when facing tasks not entirely human-predictable. HiLoRL is composed of a high-level logical planner and low-level action policies. It initially learns a rough rule at its upper level with the help of low-level policies and then uses joint training with surrogate rewards to refine the rough rule and low-level policies. Furthermore, HiLoRL can incorporate specialized predicates derived from expert knowledge, thereby enhancing its training speed and performance. We also design a synthesis algorithm to illustrate our high-level planner's logical structure as an automaton, demonstrating our model's interpretability. HiLoRL outperforms state-of-the-art baselines in several benchmarks with continuous state and action spaces. Additionally, HiLoRL does not require human to hard-code logical structures, so it can solve logically uncertain tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7638

Loading