More "Clever" than "Hans": Probing and Adversarial Training in Translationese ClassificationDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Modern classifiers, especially neural networks, excel at leveraging faint and subtle signals competing with many other signals in the data. When such potentially noisy setups lead to high accuracy rates (e.g., 90%+), it produces concerns about the authenticity of the results, raising questions about potential spurious correlations -- a phenomenon often referred to as "Clever Hans". We explore this phenomenon in the context of translationese classification, where previous work has found indirect and episodic evidence that a high-performance BERT classifier learns to use spurious topic information rather than just translationese signals. In this paper, we first use probing to provide direct evidence that high-performance translationese classifiers pick up unknown potentially spurious topic correlations. We then introduce adversarial training as a strategy to mitigate any such potentially spurious topic correlations, where previous work was only able to mitigate specific known (episodic) Clever Hans. We demonstrate the effectiveness of our approach on translationese classification tasks on two translation pairs.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: German, Spanish, English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview