An Evaluation Framework for Explainability Approaches in Seq2Seq Machine Translation Models

An Evaluation Framework for Explainability Approaches in Seq2Seq Machine Translation Models

ACL ARR 2025 February Submission8333 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The importance of input features in the decision-making of neural network models is a well-explored area of machine learning research. Numerous approaches have been developed to estimate and explain the behavior of these models. Among the models that rely on neural networks, the sequence-to-sequence (seq2seq) architecture is particularly complex. Although general techniques can be applied to these models, the evaluation of explainability methods in this context remains underexplored. In this paper, we propose a novel approach, based on forward simulatabitly, to automatically evaluate explainability methods for transformer-based seq2seq models. The idea is to inject the learned knowledge from a large model into a smaller one and measure the change in the results for the smaller model. We experiment with eight explainability methods using inseq to extract the attribution score of input to the output sequence. Then, we inject this information into the attention mechanism of an encoder-decoder transformer model for machine translation. Our results demonstrate that this framework can serve as an automatic evaluation method for explainability techniques and a knowledge distillation process that improves performance. According to our experiments, the attention attribution and value zeroing methods consistently increased the result in three machine translation tasks and composition operators.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: Explainable AI, Machine Translation

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: Engish, German, French, Arabic

Submission Number: 8333

Loading