More "Clever" than "Hans": Probing and Adversarial Training in Translationese Classification

Anonymous

More "Clever" than "Hans": Probing and Adversarial Training in Translationese Classification

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Modern classifiers, especially neural networks, excel at leveraging faint and subtle signals competing with many other signals in the data. When such potentially noisy setups lead to high accuracy rates (e.g., 90%+), it produces concerns about the authenticity of the results, raising questions about potential spurious correlations -- a phenomenon often referred to as "Clever Hans". We explore this phenomenon in the context of translationese classification, where previous work has found indirect and episodic evidence that a high-performance BERT classifier learns to use spurious topic information rather than just translationese signals. In this paper, we first use probing to provide direct evidence that high-performance translationese classifiers pick up unknown potentially spurious topic correlations. We then introduce adversarial training as a strategy to mitigate any such potentially spurious topic correlations, where previous work was only able to mitigate specific known (episodic) Clever Hans. We demonstrate the effectiveness of our approach on translationese classification tasks on two translation pairs.

Paper Type: long

Research Area: Multilinguality and Language Diversity

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: German, Spanish, English

0 Replies

Loading