DIVA: Discrete Diffusion Vision-Language-Action Models for Parallelized Action Generation

Xiufeng Song; Yiran Qin; Yan Tai; Li Kang; Heng Zhou; Siqi Luo; Jiwen Yu; Ling Yang; Philip Torr; LEI BAI; Ruimao Zhang; Xiaohong Liu; Zhenfei Yin

DIVA: Discrete Diffusion Vision-Language-Action Models for Parallelized Action Generation

Xiufeng Song, Yiran Qin, Yan Tai, Li Kang, Heng Zhou, Siqi Luo, Jiwen Yu, Ling Yang, Philip Torr, LEI BAI, Ruimao Zhang, Xiaohong Liu, Zhenfei Yin

02 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Embodied Artificial Intelligence

Abstract: Vision-Language-Action (VLA) models have shown promising results in robot control, yet prevailing auto-regressive frameworks suffer from inherent limitations, such as error accumulation and temporal rigidity in action generation. To address this, we introduce a DIscrete diffusion Vision-language-Action model (DIVA), a discrete diffusion-based VLA framework that reformulates action generation as an iterative denoising process over discrete latent representations. The innovation of DIVA lies in the unified discrete diffusion architecture that systematically integrates three core designs: first, a learnable discrete action tokenization process bridges continuous action with the structural multimodal space. Second, A latent-driven policy learning strategy is proposed to align the representative space of the vision-language backbone and the policy head through a joint optimization. Third, a selective group unmasking strategy is introduced during the discrete diffusion decoding to preserve spatiotemporal coherence. Extensive evaluation demonstrates that DIVA achieves state-of-the-art performance in both simulated and real-world environments, validating its advantages in generating coherent, precise, and generalizable robot behaviors. Our work establishes a robust and scalable paradigm for future embodied decision-making systems.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 797

Loading