Single-Step Initialization for Exploratory Parallel Rollouts in Diffusion LLMs

Dongjae Jeon; Bumjun Kim; Mingyu Kim; Albert No

Single-Step Initialization for Exploratory Parallel Rollouts in Diffusion LLMs

Dongjae Jeon, Bumjun Kim, Mingyu Kim, Albert No

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion language models, parallel decoding, rollout generation, policy optimization, GRPO, exploration, rollout diversity, early branching, initialization.

TL;DR: Training free parallel decoding accelerates dLLM rollouts, but confidence based selection delays branching. A single random initialization step restores exploration and improves diversity, reasoning potential, and RL performance.

Abstract: We propose training-free parallel decoding for rollout generation in diffusion large language model (dLLM) policy optimization, reducing rollout cost without auxiliary models or policy modification. We find, however, that confidence-based decoding suffers from delayed branching, and parallel decoding largely inherits this characteristic. Rollouts agree on both unmasked tokens and positions for much of generation, leading to a lack of exploration that weakens the group-relative learning signal. We address this with a minimal initialization step in which each rollout independently unmasks one uniformly random position after which the original sampler resumes unchanged. The intervention is drop-in compatible with any sampling strategies. Combined with Fast-dLLM on LLaDA-8B-Instruct, it improves rollout diversity and yields stronger downstream RL performance on GSM8K and MATH-500.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 262

Loading