Residualized Similarity Prediction for Maintaining Interpretability in Authorship Verification

ACL ARR 2024 June Submission4111 Authors

16 Jun 2024 (modified: 05 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Responsible use of authorship verification systems not only requires high accuracies but also interpretable solutions. Neural methods achieve high accuracies, but their representations lack direct interpretability, whereas methods using interpretable linguistic features generally perform worse than neural methods. In this paper, we introduce residualized similarity prediction (RSP), a novel method of supplementing systems using interpretable features with a neural network to improve their performance while maintaining interpretability. The key idea is to use the neural network to predict a residual similarity, i.e. the error in the similarity predicted by the interpretable system. Our evaluation on three datasets shows that using RSP improves authorship verification predictions over a fully interpretable system, multiple neural models, as well as weighted ensembles of these two (RSP yields gains in 17 of the 24 combinations), all while maintaining interpretability as measured using a new interpretability confidence metric.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Authorship Verification, Explanation Faithfulness, Feature Attribution
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4111
Loading