MSFT: Mitigating Spurious Correlations in Text Classification via Feature Induction in Embedding Layers and Tensor Stretching

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: NLP; Plugin; Spurious Correlations
Abstract: Text classification stands as one of the core tasks in natural language processing (NLP). However, it has long been plagued by the issue of spurious correlations, where models tend to learn associations between non-informative or spurious input features and class labels, thereby constraining generalization capability and classification performance. To tackle this challenge, we formalize the notion of a ‘semantic centroid' and, leveraging this construct, propose a novel plugin termed MSFT (Mitigating Spurious Correlations in Text Classification via Feature Induction in Embedding Layers and Tensor Stretching). The MSFT plugin first computes a semantic centroid—encapsulating the global semantic information of the entire dataset—by aggregating all embedding tensors within the dataset. During model training, it alleviates spurious correlations through tensor distance stretching in the embedding space, specifically targeting the subset of data that drives the formation of such spurious associations. Designed for seamless integration with existing classification architectures, MSFT boasts strong generality and scalability. We conduct experiments on two representative categories of language models: BERT-style masked language models (MLMs) and autoregressive large language models (LLMs; e.g., GPT-2). Extensive experiments across multiple datasets demonstrate that our plugin effectively mitigates the detrimental impacts of spurious correlations while consistently improving classification performance. Notably, it achieves or even surpasses state-of-the-art (SOTA) benchmarks in spurious correlation mitigation for text classification, regardless of the underlying model architecture.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 7446
Loading