Do Pre-Trained Language Models Truly Focus on the Content They Are Expected to?

ACL ARR 2024 June Submission2233 Authors

15 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Pre-trained language models (PLMs) have significantly revolutionized various natural language processing tasks, showcasing extraordinary capabilities in text comprehension and processing. Despite their widespread success, the elucidation of PLMs' interest towards the input texts remains unclear, i.e., which part of the inputs gains models' attention. Existing methods either rely on various stringent assumptions or ignore the intricate dependency relations inherent in natural language, causing inaccurate estimation results. In response to this limitation, this paper introduces a novel perturbation-based approach for estimating the PLMs' interest, comprising two crucial designs, i.e., the co-perturbation strategy and an adaptive optimization algorithm. Specifically, the strategy aims to inject noises across all input words, thereby confronting the inherent combinatorial explosion challenge. Furthermore, the proposed adaptive algorithm focuses on the estimation of interest degree for disentangling the output changes caused by the co-perturbation setting. Through extensive experimentation on various PLMs and datasets, we verify the effectiveness of the proposed approach.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Feature Attribution
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 2233
Loading