An Empirical Study on Robustness of Language Models via Decoupling Representation and Classifier

An Empirical Study on Robustness of Language Models via Decoupling Representation and Classifier

ACL ARR 2024 June Submission1081 Authors

14 Jun 2024 (modified: 06 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent studies indicate that shortcut learning behavior exists in language models, and thus a number of mitigation methods are proposed, such as advanced PLMs and debiasing methods. However, few studies have explored how different factors affect the robustness of language models. To bridge this gap, we study the different PLMs and analyze the effect of representations and classifiers on robustness using probing techniques on the NLU tasks. First, we find that the low robustness of language models is not due to the inseparability of representations on the challenging dataset. Second, we find that a potential reason for the difficulty in improving the robustness of language models is the significantly high similarity between the representations with opposite semantics from in-distribution and out-of-distribution. Third, we find that debiasing methods are likely to distort representations and merely improve performance by better classifiers in some cases. Finally, we propose a probing tool to measure the impact on the robustness of language models from representations and classifiers using the decoupled training strategy with debiasing methods. In addition, we conduct extensive experiments on real-world datasets, suggesting the effectiveness of the proposed methods.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: data shortcuts/artifacts; data shortcuts/artifacts; data shortcuts/artifacts;

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 1081

Loading