A transfer learning approach to cross-domain authorship attribution

Georgios Barlas, Efstathios Stamatatos

2021 (modified: 29 Oct 2021)Evol. Syst. 2021Readers: Everyone

Abstract: Authorship attribution attempts to identify the authors behind texts and has important applications mainly in digital forensics, cyber-security, digital humanities, and social media analytics. A challenging yet realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in, for example, topic or genre. In this paper, we propose the use of transfer learning based on pre-trained neural network language models and a multi-headed classifier. A series of experiments is reported to compare the effectiveness of our approach on cross-topic, cross-genre, and cross-fandom conditions with state-of-the-art methods. We also demonstrate the crucial effect of the normalization corpus (an unlabeled corpus used to adjust the output of our classifier) in cross-domain attribution and the usefulness of shallower layers of pre-trained models.

0 Replies