Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

ACL ARR 2026 January Submission7120 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Masked Diffusion Language Models, Autoregressive Language Models, Mechanistic Interpretability, Circuit Analysis, Post-training, Logit Lens, Neuron Explanation

Abstract: Post-training pretrained Autoregressive models (ARMs) into Masked Diffusion models (MDMs) has emerged as a cost-effective strategy to overcome the limitations of sequential generation. However, the internal algorithmic transformations induced by this paradigm shift remain unexplored, leaving it unclear whether post-trained MDMs acquire genuine bidirectional reasoning capabilities or merely repackage autoregressive heuristics. In this work, we address this question by conducting a comparative circuit analysis of ARMs and their MDM counterparts. Our analysis reveals a systematic ``mechanism shift'' dependent on the \textbf{structural nature of the task}. Structurally, we observe a distinct divergence: while MDMs largely retain autoregressive circuitry for tasks dominated by local causal dependencies, they abandon initialized pathways for global planning tasks, exhibiting distinct rewiring characterized by increased early-layer processing. Semantically, we identify a transition from \textbf{sharp}, localized specialization in ARMs to \textbf{distributed} integration in MDMs. Through these findings, we conclude that diffusion post-training does not merely adapt model parameters but fundamentally reorganizes internal computation to support \textbf{non-sequential global planning}.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: feature attribution

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 7120

Loading