Extensions to Interpretability Methods for Fact-Intensive ApplicationsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: It would be advantageous if we could interpret the predictions of LMs in fact-intensive situations. Recent work has proposed several such interpretability approaches, but all are limited to idealized test situations that do not align with model behaviour in practice. We show that we can extend an interpretability method to non-ideal situations and apply it to study factual consistency. We find that consistent predictions generally correspond to the same underlying fact recall processes and identify a limitation of interpretability methods with respect to applied scenarios. Current methods cannot interpret cases for which a LM abstains from performing fact recall, something we find to usually be the case for inconsistent predictions.
Paper Type: short
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview