Planner and Executor: Collaboration between Discrete Diffusion And Auto-regressive Models in Reasoning

Lina Berrayana; Ahmed Heakl; Muhammad Abdullah Sohail; Thomas Hofmann; Salman Khan; Wei Chen

Planner and Executor: Collaboration between Discrete Diffusion And Auto-regressive Models in Reasoning

Lina Berrayana, Ahmed Heakl, Muhammad Abdullah Sohail, Thomas Hofmann, Salman Khan, Wei Chen

17 Sept 2025 (modified: 24 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Language Models (DLLMs), Large Language Models (LLMs), Hybrid Architectures

TL;DR: Collaboration between Discrete Diffusion And Auto-regressive Models in Reasoning

Abstract: Current reasoning models achieve high accuracy but require long token sequences, making them costly. Discrete diffusion language models (DDLMs) offer parallel, flexible generation within a fixed number of tokens. This motivates a hybrid design where a DDLM serves as the planner and an autoregressive model (ARM) as the executor, combining efficiency with accuracy. We conduct a systematic study of such planner--executor pairings across text- and latent-space collaboration. Results show that DDLM$\to$ARM collaboration is most effective, especially when interaction occurs in latent space. A learned projector maps DDLM latents into the ARM’s embedding space, bypassing some of diffusion’s limitations and enabling substantial gains on challenging reasoning tasks. For instance, when DDLM$\to$ARM communication shifts from text space to latent space, accuracy improves from 27.0\% to 54.0\% on DART-5 and from 0.0% to 14.0% on AIME24. With only 64 planner tokens and $\sim$5 executor tokens, the latent-space pipeline surpasses Qwen3.1-7B on DART-5 and AIME while only using 1.9-2.2% tokens, and performs slightly below DeepSeek-R1 while operating with a similarly negligible token budget and only a 3B-sized executor. These findings highlight that diffusion's global revision and autoregression's finalization are complementary, and that latent exchange enables budget-aware reasoning without sacrificing robustness.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 9915

Loading