Abstract: Modern classifiers, especially neural networks, excel at leveraging faint and subtle signals competing with many other signals in the data. When such potentially noisy setups lead to high accuracy rates (e.g., 90%+), it produces concerns about the authenticity of the results, raising questions about potential spurious correlations -- a phenomenon often referred to as "Clever Hans". We explore this phenomenon in the context of translationese classification, where previous work has found indirect and episodic evidence that a high-performance BERT classifier learns to use spurious topic information rather than just translationese signals. In this paper, we first use probing to provide direct evidence that high-performance translationese classifiers pick up unknown potentially spurious topic correlations. We then introduce adversarial training as a strategy to mitigate any such potentially spurious topic correlations, where previous work was only able to mitigate specific known (episodic) Clever Hans. We demonstrate the effectiveness of our approach on translationese classification tasks on two translation pairs.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: German, Spanish, English
0 Replies
Loading