MPS: A Multi-Perspective Benchmark For Assessing Spurious Correlations in Text Classification

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: NLP; Spurious Correlations; Benchmark
Abstract: Text classification is especially susceptible to diverse spurious correlations, such as those related to word-frequency and concept-level patterns. Nevertheless, there is a lack of a comprehensive and standardized benchmark for evaluating the robustness of models against these spurious correlations. To address this crucial issue, we present MPS (Multi - Perspective Benchmark For Assessing Spurious Correlations in Text Classification). To construct this benchmark, we collect eight widely used text classification datasets and introduce five categories of spurious correlations for each of them, producing 40 variants of datasets for comprehensively evaluating spurious correlations in diverse settings.We then extensively evaluate various text classification models and state-of-the-art anti-spurious correlation methods on this benchmark, which uncovers the vulnerabilities of these models and methods to diverse spurious correlations. A follow-up comparative analysis on this benchmark is performed to assess the performance of these anti-spurious correlation methods and humans in diverse settings.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 7442
Loading