Reinforcement learning for solving the pricing problem in column generation for routing

Abdo Abouelrous, Laurens Bliek, Adriana F. Gabor, Yaoxin Wu, Yingqian Zhang

Published: 01 Dec 2025, Last Modified: 17 Feb 2026Operations Research PerspectivesEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we address the problem of Column Generation (CG) for routing problems using Reinforcement Learning (RL). Specifically, we use a RL model based on the attention-mechanism architecture to find the columns with most negative reduced cost in the Pricing Problem (PP). Unlike previous Machine Learning (ML) applications for CG, our model deploys an end-to-end mechanism that independently solves the pricing problem without the help of any heuristic. We consider a variant of Vehicle Routing Problem (VRP) as a case study for our method. Through a series of experiments comparing our approach with a Dynamic Programming (DP)-based heuristic for solving the PP, we demonstrate that the proposed method obtains solutions for the linear relaxation up to a reasonable objective gap and significantly faster than the DP-based heuristic for the PP.

External IDs:doi:10.1016/j.orp.2025.100364