Causally Estimating the Sensitivity of Neural NLP Models to Spurious FeaturesDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: spurious feature, causality, robustness, data augmentation
Abstract: Recent work finds modern natural language processing (NLP) models relying on spurious features for prediction. Mitigating such effects is thus important. Despite this need, there is no quantitative measure to evaluate or compare the effects of different forms of spurious features in NLP. We address this gap in the literature by quantifying model sensitivity to spurious features with a causal estimand, dubbed CENT, which draws on the concept of {\it average treatment effect} from the causality literature. By conducting simulations with four prominent NLP models --- TextRNN, BERT, RoBERTa and XLNet --- we rank the models against their sensitivity to artificial injections of eight spurious features. We further hypothesize and validate that models that are more sensitive to a spurious feature will be less robust against perturbations with this feature during inference. Conversely, data augmentation with this feature improves robustness to similar perturbations. We find statistically significant inverse correlations between sensitivity and robustness, providing empirical support for our hypothesis. Our findings contribute to the interpretation of models and their robustness.
One-sentence Summary: We quantify model sensitivity to spurious features with a causal term and find significant correlations between sensitivity and robustness.
5 Replies

Loading