Residualized Similarity Prediction for Faithfully Explainable Authorship Verification

Residualized Similarity Prediction for Faithfully Explainable Authorship Verification

ACL ARR 2025 February Submission5171 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Responsible use of authorship verification (AV) systems not only requires high accuracies but also interpretable solutions. More importantly, for systems to be used to make decisions with real-world consequences requires faithfulness in a model's prediction. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully -- if there is an explanation given for a prediction, it doesn't represent the reasoning process behind the model's prediction. In this paper, we introduce residualized similarity prediction, a novel method of supplementing systems using interpretable features with a neural network to improve their performance while maintaining interpretability. The key idea is to use the neural network to predict a residual similarity, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Authorship Verification, Explanation Faithfulness, Feature Attribution

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English, Russian

Submission Number: 5171

Loading