Hierarchical Gradient-Informed Reinforcement Learning for Scalable and Partially Observable Dynamic Resource Allocation

Hierarchical Gradient-Informed Reinforcement Learning for Scalable and Partially Observable Dynamic Resource Allocation

ICLR 2026 Conference Submission16778 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dynamic resource allocation, Hierarchical reinforcement learning, Partial observability, Scalability

Abstract: Dynamic resource allocation problems (DRAPs) are prevalent in critical domains like transportation and energy management, and can be naturally modeled as dynamic systems, posing challenges in scalability and partial observability. We propose a novel framework, \textbf{Hierarchical Gradient-Informed Reinforcement Learning (HGRL)}, which integrates hierarchical multi-agent reinforcement learning with a Global Demand Inference Network (GDI-Net). HGRL decomposes DRAPs into multi-scale subproblems, enabling scalable decision-making across large environments. GDI-Net addresses partial observability by inferring and identifying multi-scale global demand and directional gradients from local agent observations, enhancing policy awareness and guiding exploration. Experiments on synthetic and real-world datasets demonstrate that HGRL significantly outperforms strong baselines, achieving up to 55.1% improvement in demand coverage and 35.5% improvement in transportation efficiency on the real-world dataset. Code is available at: https://anonymous.4open.science/r/HGRL4DRA-B40FS/.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 16778

Loading