Keywords: inference, reinforcement learning, diffusion model, combinatorial optimization
Abstract: An open challenge in neural combinatorial optimization (CO), such as using reinforcement learning (RL) and diffusion models (DMs), is the speed–quality trade-off: sequential RL decoders generalize well but tend to settle for suboptimal tours, while DMs generate high-quality full solutions at the cost of long training and slow iterative sampling. We present **SpSCO**, a new framework inspired by speculative sampling (SpS) for large language models (LLMs) inference. Resembling SpS in LLMs, a light-weight draft model (analogous to the sequential RL decoder in SpSCO) collaborates with a high-capacity target model (analogous to a DM in SpSCO) to achieve fast, robust, and high-quality inference -- the target model is triggered only when there is a "cognitive divergence" between the draft and target models or internal uncertainty of the draft model. This SpS strategy allows SpSCO to achieve high solution quality while reducing the computational overhead from DMs. Notably, SpSCO is model-agnostic and can be plug-and-play across various RL and DM backbones. It also shows strong robustness: even with under-trained, suboptimal RL and diffusion backbones, SpSCO achieves state-of-the-art performance on diverse CO instances across various scales while attaining faster inference time on large-scale instances.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 15912
Loading