DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Yunhai Hu; Zining Liu; Xiangyang Yin; Tianhua Xia; BO BAO; Eric Sather; Vithursan Thangarasa; Sai Qian Zhang

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Yunhai Hu, Zining Liu, Xiangyang Yin, Tianhua Xia, BO BAO, Eric Sather, Vithursan Thangarasa, Sai Qian Zhang

Published: 02 Mar 2026, Last Modified: 22 Apr 2026ES-Reasoning @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient AI, LLM Reasoning, Reinforcement Learning

Abstract: Speculative reasoning has recently been proposed as a means to accelerate reasoning-intensive generation in large multimodal models, but its effectiveness is often constrained by misalignment between speculative drafts and target-verified reasoning. In this work, we introduce $\textit{DREAM-R}$, a framework that substantially improves the performance of speculative reasoning. At its core, DREAM-R employs $\textit{Speculative Alignment Policy Optimization}$ (SAPO), a reinforcement-learning objective that trains draft models to generate reasoning steps that are faithful to target trajectories and concise. We further propose a $\textit{Contrastive Probability Normalization}$ (CPN) that uses a ratio-based criterion to provide stable and interpretable acceptance of speculative steps only when positive evidence clearly dominates, thereby preventing error propagation. Building on these components, we develop a $\textit{Fully Parallel Speculative Reasoning}$ (FPSR) framework that parallelizes draft generation, target-side reasoning, and verification across multi-step reasoning, enabling early stopping and clean fallback. Experiments on reasoning-heavy benchmarks demonstrate up to $2.48\times$ speedup while preserving target-model accuracy, yielding substantial efficiency gains without compromising reasoning quality.

Submission Number: 52

Loading