WeFT: Weighted Entropy-driven Fine-Tuning for dLLMs

Guowei Xu; Wenxin Xu; Zhao Jiawang; Kaisheng Ma

WeFT: Weighted Entropy-driven Fine-Tuning for dLLMs

Guowei Xu, Wenxin Xu, Zhao Jiawang, Kaisheng Ma

19 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion language models, entropy-based methods

TL;DR: We propose WeFT, an entropy-driven weighted fine-tuning method for diffusion language models that enhances reasoning ability compared to SFT while maintaining efficiency.

Abstract: Diffusion models have recently shown strong potential in language modeling, offering faster generation compared to traditional autoregressive approaches. However, applying supervised fine-tuning (SFT) to diffusion models remains challenging, as they lack precise probability estimates at each denoising step. While the diffusion mechanism enables the model to reason over entire sequences, it also makes the generation process less predictable and often inconsistent. This highlights the importance of controlling key tokens that guide the direction of generation. To address this issue, we propose WeFT, a weighted SFT method for diffusion language models, where tokens are assigned different weights based on their entropy. Derived from diffusion theory, WeFT delivers substantial gains: training on s1K, s1K-1.1, and 3k samples from open-r1, it achieves relative improvements of 39\%, 64\%, and 83\% over standard SFT on four widely used reasoning benchmarks (Sudoku, Countdown, GSM8K, and MATH-500). The code is provided in the supplementary material.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14817

Loading