End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW

TMLR Paper6721 Authors

30 Nov 2025 (modified: 11 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are repre- sentative of practical settings where decision-makers have to deal with uncertainty in the operational environment as well as multiple conflicting objectives due to different stakehold- ers. We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regula- tions on shift length, although more/different objectives could be incorporated. Learning- based methods offer earnest computational advantages as they can repeatedly solve problems with limited interference from the decision-maker. We specifically focus on end-to-end deep learning models that leverage the attention mechanism and multiple solution trajectories. These models have seen several successful applications in routing problems. However, since travel times are not a direct input to these models due to the large dimensions of the travel time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple objectives. In turn, we propose a model that simultaneously addresses stochasticity and multi-objectivity and provide a refined training mechanism for this model through scenario clustering to reduce training time. Our results show that our model is capable of construct- ing a Pareto Front of good quality within acceptable run times compared to three baselines. We also provide two ablation studies to assess our model’s suitability in different settings.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=RbuzrGD62Z
Changes Since Last Submission: Margins were adjusted in accordance with the tempelate. This is was a mistake on my side (the corresponding author) that went initially unnoticed.
Assigned Action Editor: ~Matteo_Papini1
Submission Number: 6721
Loading