An Optimal Transport View of Activation Steering In Masked Diffusion Models

Published: 01 Mar 2026, Last Modified: 05 Apr 2026TTU at ICLR 2026 (Main)EveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion Large Language Models (dLLMs) offer a non-autoregressive alternative to left-to-right decoding, but inference-time control in dLLMs remains underdeveloped relative to autoregressive LLMs. Prior activation-steering methods for masked diffusion models (MDMs), a prominent dLLM formulation, focus primarily on concept negation and employ heuristics that do not explicitly optimize the transport objective. We introduce an Optimal Transport (OT) view of activation steering for MDMs: given contrastive prompt distributions, we learn a lightweight affine map that transports pooled activation distributions from a source behavior to a target behavior. This perspective unifies common steering rules (activation addition, mean-shift, directional ablation) as special cases of an affine transport map, and motivates the use of the OT estimator that matches first- and second-order moments. Across three state-of-the-art dLLMs (LLaDA-Instruct, LLaDA 1.5, Dream-Instruct), affine OT steering improves instruction-following accuracy (e.g., +6.5 to +11.9 absolute points) with no inference-time overhead.
Submission Number: 36
Loading