Hierarchical Reinforcement Learning for Real-Time Policy Optimization in Complex Logistics Networks

Published: 30 Jul 2025, Last Modified: 30 Jul 2025AI4SupplyChain 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vehicle Routing Problem, Deep Reinforcement Learning, Hierarchical Policy, Logistics Optimization, Stochastic Environments, Real-Time Decision Making
TL;DR: This paper introduces HARL, a novel hierarchical reinforcement learning framework that generates real-time, adaptive routing policies for complex logistics networks with heterogeneous vehicles, diverse demands, and unpredictable travel times.
Abstract: Efficiently managing logistics operations, particularly Vehicle Routing Problems (VRPs), is critical in modern supply chains. These operations are often characterized by complex challenges including heterogeneous vehicle fleets, diverse demand types, and stochastic environmental factors like travel times, all requiring real-time adaptive decision-making. Existing approaches often struggle to simultaneously address these multifaceted issues. This paper introduces HARL (Hierarchical Emergency Logistics Planning with Reinforcement Learning), a novel framework designed for real-time policy optimization in such complex logistics scenarios. HARL features an attention-based policy optimizer with a unique hierarchical decoder architecture and dilated temporal convolutions to manage intricate action spaces and temporal dependencies. Trained using the REINFORCE algorithm, the model dynamically adapts to changing conditions. We demonstrate HARL’s effectiveness through experiments on synthetic VRP instances and a real-world case study derived from disaster response logistics. Results show that HARL significantly outperforms state-of-the-art reinforcement learning baselines and traditional heuristics in both solution quality and computational efficiency, offering a robust and generalizable approach for complex VRP research and AI-driven supply chain optimization.
Submission Number: 9
Loading