More "Clever" than "Hans": Probing and Adversarial Training in Translationese Classification

ACL ARR 2024 June Submission3031 Authors

15 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Modern classifiers, especially neural networks, excel at leveraging subtle signals competing with many other signals in the data. When such noisy setups lead to accuracy rates of 90\%+, as is for instance the case with current high-performance neural translationese classifiers, it raises concerns about potential spurious correlations in the data with the target labels -- a phenomenon often referred to as "Clever Hans". Recent research has indeed found evidence that high-performance multi-lingual BERT translationese classifiers use spurious topic information in the form of location names, rather than just translationese signals. In this paper, we address two difficult open problems associated with confounding signals in translationese classification. First, we use probing to provide direct evidence that these classifiers learn and use spurious topic correlations, some potentially unknown. Second, we introduce adversarial training as a strategy to mitigate any spurious topic correlation, including those unknown apriori. We show the effectiveness of our approach on translationese classification using three multi-lingual models, two language pairs, and four translationese data sets, as well as on a non-translationese classification task: occupation classification.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: multilingualism, language change, multilingual benchmarks, multilingual evaluation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: German, Spanish, English, French
Submission Number: 3031
Loading