Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models
Abstract: While plan-and-infill decoding in Masked Diffusion Models (MDMs) shows promise for mathematical and code reasoning, performance remains highly sensitive to slot infilling order, often yielding substantial output variance. We introduce DiffuSearch, a framework that formulates slot selection as decision making and optimises infilling orders through Monte Carlo Tree Search (MCTS). DiffuSearch uses look-ahead simulations to evaluate partial completions before commitment, systematically exploring the combinatorial space of generation orders. Experiments show an average improvement of 3.2% over autoregressive baselines and 8.0% over baseline plan-and-infill, with notable gains of 19.5% on MBPP and 4.9% on MATH500. Our analysis reveals that while DiffuSearch predominantly follows sequential ordering, incorporating non-sequential generation is essential for maximising performance. We observe that larger exploration constants, rather than increased simulations, are necessary to overcome model confidence biases and discover effective orderings. These findings establish MCTS-based planning as an effective approach for enhancing generation quality in MDMs.
Lay Summary: Current AI models generate text the way we read: one word after another, left to right. A newer kind of AI model, known as a diffusion model, works like filling in a form or solving a crossword; it can complete different blank sections in any order, which makes it faster and sometimes better. However, there's a catch within diffusion models: the order in which they fill those blanks significantly affects the quality of the answer. For instance, if you fill them in a poor order, small early mistakes snowball into a confused and incorrect result. This is the limitation of the latest diffusion models, where the model simply picks whichever blank it feels most confident about, which often goes wrong.
To mitigate this issue, we built a method that treats "which blank to fill next" as a planning problem, inspired by the look-ahead search strategy that powered game-playing systems like AlphaGo. Before committing to a choice, it first imagines how each option would play out and picks the one leading to the best overall answer. This produces better results on maths and programming tasks — nearly 20% better on one coding benchmark. Interestingly, the best solutions mostly still run left to right, but the ability to occasionally break from that order is exactly what makes the difference, similar to human behaviour when solving mathematical and programming tasks.
Primary Area: Deep Learning->Large Language Models
Keywords: LLM Reasoning, Diffusion Models, Monte Carlo Tree Search, Large Language Models
Originally Submitted PDF: pdf
Submission Number: 30186
Loading