Curriculum-Based Termination Critic for Scalable Program Decomposition in Hierarchical Reinforcement Learning

ICLR 2026 Conference Submission25563 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hierarchical Reinforcement Learning
Abstract: We introduce a Curriculum-Based Termination Critic (CBTC) for hierarchical reinforcement learning (HRL) to solve the problem of program decomposition for scaleable programming in complex task environments. Traditional termination critics yet make some static heuristics on the other side that have difficulties to cope with different tasks in complexity and prevents the agent to learn right hierarchy abstractions effectively. The CBTC presents a dynamic curriculum-driven framework that selects the difficulty of the tasks on the fly and incrementally adjusts the difficulty according to the agent's learning progress, in order to make programs decomposition into manageable subtasks more efficient. Our strategy combines three components: a module of difficulty progression to autonomously adjust the complexity of the tasks, a termination critic based on reward to stabilize the decisions for the completion of the subtasks and an option-critic hybrid controller to orchestrate the switching strategy between decomposition methods. The termination critic makes use of a transformer-based framework to operate on program states and the curriculum descriptor, while the high-level policy utilizes graph neural networks to reason on abstract syntax trees. Experiments show that the CBTB performs better than traditional HRL techniques both in terms of success rate and time efficiency, especially in those cases where the programs contain many stages to be synthesized. The proposed approach is entirely differentiable and compatible with existing architectures for HRL and is a principled answer for scaling program decomposition in real-world applications.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 25563
Loading