PADRE: Pseudo-Likelihood based Alignment of Diffusion Language Models

Shiv Shankar

PADRE: Pseudo-Likelihood based Alignment of Diffusion Language Models

Shiv Shankar

Published: 09 Jul 2025, Last Modified: 16 Jul 2025AI4Math@ICML25 PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: RLHF, diffusion language model

Abstract: Policy-gradient reinforcement learning (PGRL) is widely used to improve language model reasoning. However these methods do not work well with diffusion based language models (dLLMs). Most attempts to apply PGRL to dLLMs, either are extremely unscalable or use unprincipled approximations. Our proposed framework (PADRE) uses a novel objective pseudo-likelihood based objective for alignment of dLLMs. Our objective has the same optima as PGRL based optimization, but does not need to evaluate likelihood from dLLMs. Experiments on mathematical reasoning benchmarks show that PADRE matches or surpasses the performance of GRPO and related baselines. Our approach provides a stable and practical alternative for RL-based fine-tuning of reasoning-focused dLLMs.

Submission Number: 178

Loading