A2D2: Finetuning Any-Length Discrete Diffusion for Adaptive Decoding

Sophia Tang; Yuchen Zhu; Molei Tao; Pranam Chatterjee

A2D2: Finetuning Any-Length Discrete Diffusion for Adaptive Decoding

Sophia Tang, Yuchen Zhu, Molei Tao, Pranam Chatterjee

Published: 02 Mar 2026, Last Modified: 02 Apr 2026ReALM-GEN 2026 - ICLR 2026 WorkshopEveryoneRevisionsCC BY 4.0

Keywords: discrete diffusion, fine-tuning, reinforcement learning, reward optimization, reward alignment, adaptive decoding

TL;DR: We introduce A2D2, a unified framework for reward-guided joint fine-tuning of an any-length masked diffusion model policy and adaptive inference schedule.

Abstract: Masked discrete diffusion models (MDMs) offer a simple and stable likelihood-based framework for sequence generation and have recently been extended to any-length settings via token insertion. However, principled reward-guided fine-tuning for any-length discrete diffusion remains largely unexplored. We introduce Finetuning **A**ny-Length **D**iscrete **D**iffusion for **A**daptive Decoding (**A2D2**), a unified framework for reward-guided fine-tuning of any-length MDMs. A2D2 formulates generation as a controlled continuous-time Markov chain and jointly optimizes insertion and unmasking policies to learn a reward-tilted path measure without requiring target samples. We derive the Radon–Nikodym derivative for the joint insertion–unmasking process and introduce the Adaptive Joint Decoding (AJD) loss, which provably minimizes trajectory-induced error while preserving the target distribution. Empirically, A2D2 improves reward optimization, generation accuracy, and flexibility over prior fixed-length and inference-time guidance methods.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 40

Loading