Keywords: diffusion language models, discrete diffusion, supervised fine-tuning
TL;DR: We reduce the training–inference mismatch in diffusion LLMs by aligning supervised fine-tuning with Context-sensitivity Aware Trajectory Sampling (CATS), yielding consistent accuracy gains without rollout-based trajectory collection.
Abstract: Diffusion large language models (dLLMs) are trained to denoise randomly masked sequences, yet in practice, they are commonly decoded by progressively unmasking tokens in order of model confidence. Consequently, the masking patterns used in supervised fine-tuning (SFT) often diverge from those encountered during inference, resulting in suboptimal training signals. We propose Context-sensitivity Aware Trajectory Sampling (CATS), which constructs inference-aligned training trajectories directly from ground-truth targets. We use an initial model to iteratively categorize ground-truth tokens into groups based on how much context the model needs to confidently predict each one. By training on trajectories sampled in this order, the model learns masking patterns closer to what it would actually produce during inference. Across Sudoku, Countdown, and Trip Planning, our approach outperforms standard SFT, yielding accuracy gains across diverse settings. These findings demonstrate that aligning training trajectories with inference-time unmasking enables more reliable SFT of dLLMs.
Submission Number: 70
Loading