Inference-Aligned SFT for Diffusion LLMs via Group-based Trajectory Sampling
Keywords: diffusion language models, discrete diffusion, supervised fine-tuning
TL;DR: We reduce the training–inference mismatch in diffusion LLMs by aligning supervised fine-tuning with Group-based Trajectory Sampling, yielding consistent accuracy gains without rollout-based trajectory collection.
Abstract: Diffusion large language models (dLLMs) are trained to denoise randomly masked sequences, yet in practice, they are commonly decoded by progressively unmasking tokens in order of model confidence. Consequently, the masking patterns used in supervised fine-tuning (SFT) often diverge from those encountered at inference-time, resulting in suboptimal training signals. We propose Group-based Trajectory Sampling, which constructs inference-aligned training trajectories directly from ground-truth targets. We use an initial model to iteratively categorize ground-truth tokens into ordered groups based on how much context the model needs to confidently predict each one. By training on trajectories sampled in this group order, the model learns masking patterns closer to what it would actually produce during inference. Across Sudoku, Countdown, and Trip Planning, our approach consistently outperforms standard SFT, yielding consistent accuracy gains across diverse settings. These findings demonstrate that aligning training trajectories with inference-time unmasking enables more reliable SFT of dLLMs.
Submission Number: 70
Loading