ExLM: Rethinking the Impact of $\texttt{[MASK]}$ Tokens in Masked Language Models

Kangjie Zheng; Junwei Yang; Siyue Liang; Bin Feng; Zequn Liu; Wei Ju; Zhiping Xiao; Ming Zhang

ExLM: Rethinking the Impact of $\texttt{[MASK]}$ Tokens in Masked Language Models

Kangjie Zheng, Junwei Yang, Siyue Liang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Abstract: Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly masking portions of the input sequences with $\texttt{[MASK]}$ tokens and learning to reconstruct the original content based on the remaining context. This paper explores the impact of $\texttt{[MASK]}$ tokens on MLMs. Analytical studies show that masking tokens can introduce the ***corrupted semantics*** problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands $\texttt{[MASK]}$ tokens in the input context and models the dependencies between these expanded states. This enhancement increases context capacity and enables the model to capture richer semantic information, effectively mitigating the corrupted semantics problem during pre-training. Experimental results demonstrate that ExLM achieves significant performance improvements in both text modeling and SMILES modeling tasks. Further analysis confirms that ExLM enriches semantic representations through context enhancement, and effectively reduces the semantic multimodality commonly observed in MLMs.

Lay Summary: When teaching AI models to understand language, researchers often hide words with special [MASK] tokens. However, this can confuse the model by creating unclear or unrealistic sentence meanings. Our work shows that this confusion harms learning more than previously thought. We propose a new method, ExLM, that expands and connects these [MASK] tokens to give the model more context, leading to better understanding and stronger performance across tasks.

Primary Area: Deep Learning->Foundation Models

Keywords: Masked Language Models, Pre-trained Models, Language Models, Text Modeling, SMILES Modeling

Submission Number: 248

Loading